WO2024017363A1 - Action recognition method and apparatus, and storage medium, sensor and vehicle - Google Patents

Action recognition method and apparatus, and storage medium, sensor and vehicle Download PDF

Info

Publication number
WO2024017363A1
WO2024017363A1 PCT/CN2023/108545 CN2023108545W WO2024017363A1 WO 2024017363 A1 WO2024017363 A1 WO 2024017363A1 CN 2023108545 W CN2023108545 W CN 2023108545W WO 2024017363 A1 WO2024017363 A1 WO 2024017363A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
image
action recognition
noise
type
Prior art date
Application number
PCT/CN2023/108545
Other languages
French (fr)
Chinese (zh)
Inventor
牛寅
陈枭雄
岑冠男
Original Assignee
联合汽车电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 联合汽车电子有限公司 filed Critical 联合汽车电子有限公司
Publication of WO2024017363A1 publication Critical patent/WO2024017363A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the invention belongs to the technical field of smart vehicles, and in particular relates to an action recognition method, device, storage medium, sensor and vehicle.
  • the kick automatic tailgate system is an implementation of smart cars in the field of perception control.
  • the system automatically recognizes human body kicking movements through sensors installed around the body and controls the automatic opening of the tailgate.
  • the core of the system is used to identify Sensor for kicking motion of human body.
  • a capacitive electric field emitting electrode is installed at the rear and bottom of the vehicle.
  • a capacitance will be formed between the sensor and the electrode; The capacitance value changes with the distance between the leg and the electrode, and the kicking movement of the human body can be recognized.
  • ultra-wideband UWB Ultra Wide Band
  • UWB radar can simultaneously measure distance, speed, angle and other information of objects to obtain richer features; at the same time, if the UWB anchor points on the left or right rear of the vehicle are reused, further cut costs.
  • Embodiments of the present invention disclose an action recognition method, device, storage medium, sensor and vehicle; the action recognition method identifies valid data in relevant signals through artificial intelligence methods, and can effectively filter interference signals and white noise.
  • the method includes a first data collection step and a fourth action recognition step; wherein the first data collection step scans and acquires a first original signal, converts the first original signal into a two-dimensional array form and forms a first image, This first image is a distance-time image.
  • the fourth action recognition step passes the first type feature processing step, the second type feature
  • the processing steps, the third category determination output step and other related steps identify the valid data; among them, the first type feature processing step: extract the first type feature data in the time dimension of the first image; the second type feature processing step: extract the first type feature data. The second type of feature data in the image distance and speed dimensions; the third category determination output step: synthesizing the first type feature data and the second type feature data to obtain an action recognition data set; wherein the third category determination output step also includes : Obtain third recognition result data by classifying and identifying the action recognition data set; wherein, the action recognition data set is divided into at least three categories: the first effective data set, the second noise data set, and the third interference data set.
  • its first data acquisition step periodically acquires a first original signal, which may be a fixed-length I/Q complex signal; its first image is arranged by a first original signal sequence, and its two-dimensional
  • the array is in the form of MxN; where M and N are natural numbers; the third identification result data includes switch quantities or signals used to trigger relevant mechanisms.
  • the method may also include a second preprocessing step and a third intermediate processing step; wherein the second preprocessing step also includes a noise reduction processing step, and the third intermediate processing step further includes a fast Fourier transform step FFT and /or the short-time Fourier transform STFT step; the STFT step adds a window function before the FFT, and the window function can be a hanning window; the first image is processed by denoising to obtain the second denoised image; the second denoising The fourth action recognition step can be further performed by replacing the first image with the latter image, thereby improving the effect of the related recognition process; the third intermediate processing step obtains the signal-to-noise ratio SNR of the second denoised image and forms a second image for Extraction of the second type of feature data.
  • the second preprocessing step also includes a noise reduction processing step
  • the third intermediate processing step further includes a fast Fourier transform step FFT and /or the short-time Fourier transform STFT step
  • the STFT step adds a window function before the F
  • its first type of feature data can be a first feature vector with a length of N
  • its second type of feature data can be a second feature vector with a length of L
  • its action recognition data set can be a length of ( N+L) third feature vector
  • the pulse repetition interval PRI Pulse Repetition Interval
  • the first original signal can be the echo of an ultra-wideband UWB radar
  • the operating frequency of the radar can be set at 6.4GHZ. and 8GHZ or set its wavelength to Between 3.75CM and 4.69CM.
  • the method may also include an anti-shake output step and a model training step; wherein, the anti-shake output step obtains the third recognition result data for R consecutive times to confirm the validity of the data, avoid malfunctions caused by interference signals, and improve The reliability and robustness of the system; where R is a natural number greater than or equal to 2.
  • the third recognition result data can be obtained R times in succession and all are first valid data, then the first original signal is determined to be a valid signal.
  • the model training step may include a noise reduction model training step, a recognition model training step and/or a related model training step; wherein the noise reduction model training step is used in the noise reduction processing step; the noise reduction model training step is performed by normalization Process the training sample X-Normal, X-Noise>; Obtain decoding output Y by inputting Target value; wherein, in the forward inference stage, the first image can also be normalized and the normalized result is input into the autoencoder.
  • the synthetic minority class oversampling method SMOTE Synthetic Minority Oversampling Technique
  • SMOTE Synthetic Minority Oversampling Technique
  • the recognition model training step can also be performed on the first image or the second denoised image and the second image respectively.
  • Normalization processing then input the normalized sample into the action recognition model to obtain the predicted probability distribution P; then use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; and use the gradient descent method Iteratively optimize the model parameters until the loss drops to the preset accuracy range.
  • an embodiment of the present invention also discloses an action recognition device, which includes a first data acquisition module and a third action recognition module; wherein: the first data acquisition module scans and acquires a first original signal, and converts the first original signal into In the form of a two-dimensional array, a first image is formed, and the first image is a distance-time image.
  • its third action recognition module may include a first type feature processing module, a second type feature processing module, and a third type judgment output unit; wherein, the first type feature processing module: extracts the third type of feature processing module.
  • the first type feature data in the time dimension of an image
  • the second type feature processing module extracts the second type feature data in the distance and speed dimensions of the first image
  • the third category determination output unit synthesizes the first type feature data and the second type
  • the action recognition data set is obtained after classifying the feature data
  • its third category judgment output unit classifies and identifies the action recognition data set, and obtains the third recognition result data
  • the action recognition data set is divided into at least a first valid data set, a third valid data set, and a first valid data set.
  • the first data acquisition module periodically acquires the first original signal, where the first original signal may be a fixed-length I/Q complex signal; the first image is arranged by the first original signal sequence, and its two-dimensional The array is in the form of MxN; where M and N are natural numbers; the third recognition result data includes the switch quantity or signal used to trigger the relevant mechanism.
  • the first original signal may be a fixed-length I/Q complex signal
  • the first image is arranged by the first original signal sequence, and its two-dimensional
  • the array is in the form of MxN; where M and N are natural numbers
  • the third recognition result data includes the switch quantity or signal used to trigger the relevant mechanism.
  • the device may also include: a second data processing module to improve the effect of feature recognition through data preprocessing; wherein the second data processing module includes a noise reduction processing module and an intermediate processing module; the intermediate processing module can complete rapid Fourier transform FFT and/or short-time Fourier transform STFT; its STFT adds a window function before the FFT, and the window function can be a hanning window function; its noise reduction processing module processes the first image to obtain the second noise reduction image; the second denoised image can replace the first image to participate in the processing of the third action recognition module; the intermediate processing module obtains the signal-to-noise ratio SNR of the second denoised image and forms a second image for the second type of feature Data extraction.
  • the second data processing module includes a noise reduction processing module and an intermediate processing module
  • the intermediate processing module can complete rapid Fourier transform FFT and/or short-time Fourier transform STFT; its STFT adds a window function before the FFT, and the window function can be a hanning window function; its noise reduction processing module processes the
  • its first type of feature data is a first feature vector of length N
  • its second type of feature data is a second feature vector of length L
  • its action recognition data set is a length of (N+L)
  • the third feature vector of Probability distribution of data values; among them, the category with the highest probability corresponds to the third recognition result data.
  • the optional convolution kernel is 3, and the normalization method can be implemented by softmax.
  • the pulse repetition interval PRI of the first original signal can be a fixed value, and the first original signal can be the echo of an ultra-wideband UWB radar.
  • the working frequency of the radar is between 6.4GHZ and 8GHZ or the wavelength is between 6.4GHZ and 8GHZ. Between 3.75CM and 4.69CM.
  • the device may also include a fourth control output module; after acquiring the third recognition result data R times continuously; if the third recognition result data acquired R times are all first valid data, then determine The first original signal is a valid signal.
  • its second data processing module may also include a noise reduction model training module, a recognition model training module and/or a related model training module; wherein, the noise reduction model training module is used for the optimization of noise reduction processing; the noise reduction model training module Normalize the training sample X to obtain the normalized sample X-Normal, and superimpose random white noise on the For ⁇ X-Normal, Until the loss function Loss reaches the target value; in the forward inference stage, the first image can also be normalized and the normalized result is input into the autoencoder; the second data processing module can use the synthetic minority class
  • the oversampling method SMOTE amplifies the training samples to balance the sample data; in addition, the recognition model training module can perform normalization processing on the first image or the second denoised image and the second image respectively; and then normalize the The final sample is input into the action recognition model, and the predicted probability distribution P is obtained and Focal Loss is used as the loss function to calculate the error Loss between the predicted value and the real label; then the gradient descent method is
  • embodiments of the present invention also disclose related products using the above methods and devices, including computer storage media, sensors and vehicles.
  • the storage medium includes a storage medium body for storing a computer program; when the computer program is executed by the microprocessor, the above action recognition method can be implemented.
  • sensors and vehicles include any of the above devices and storage media, which can also recognize relevant feature data and respond to relevant actions. The specific process will not be described again.
  • the embodiment of the present invention uses UWB radar to identify human body kicking movements, and combines deep learning technology with UWB radar to solve the problems of misrecognition and misoperation in the recognition technology, which can further improve the performance of related systems; in addition, using The kick sensor implemented by UWB radar is lower in cost. It combines UWB radar signals with deep learning models, and its recognition performance is also improved.
  • Figure 1 is a schematic flow diagram 1 of an embodiment of the method of the present invention.
  • Figure 2 is a schematic diagram of data collection according to the method and product embodiment of the present invention.
  • Figure 3 is a schematic structural diagram of the noise reduction auto-encoding and decoding method and product embodiment of the present invention.
  • Figure 4 is a schematic diagram of the action recognition structure of the method and product embodiment of the present invention.
  • Figure 5 is a schematic diagram of the action recognition process according to the method embodiment of the present invention.
  • Figure 6 is a schematic diagram of the model training process according to the method embodiment of the present invention.
  • Figure 7 is a schematic flow diagram 2 of the method embodiment of the present invention.
  • Figure 8 is a schematic structural diagram of an embodiment of the device of the present invention.
  • Figure 9 is a schematic structural diagram of an action recognition module according to an embodiment of the device of the present invention.
  • Figure 10 is a schematic structural diagram of the model training module of the device embodiment of the present invention.
  • Figure 11 is a schematic structural diagram of an embodiment of the product of the present invention.
  • Figure 12 is a schematic diagram 2 of the composition and structure of an embodiment of the product of the present invention.
  • Figure 13 is a schematic diagram three of the composition and structure of an embodiment of the product of the present invention.
  • Figure 14 is a schematic diagram 4 of the composition and structure of an embodiment of the product of the present invention.
  • the first type of feature processing step involves data extraction in the time dimension
  • the second type of feature processing step involves data extraction on distance and speed dimensions
  • M is a natural number, M is greater than or equal to 2;
  • 1005-Feature map that is, the N output quantities obtained through LSTM in the embodiment
  • the action recognition method shown in Figures 1, 2, 4, and 5 includes a first data collection step 100 and a fourth action recognition step 400; wherein the first data collection step 100 scans and acquires the first original signal 001 , convert the first original signal 001 into a two-dimensional array form and form a first image 011, and the first image 011 is a distance-time image.
  • its fourth action recognition step 400 includes a first type feature processing step 410, a second type feature processing step 420, and a third type judgment output step 430; as shown in Figure 4, its first type feature processing step Step 410 extracts the first type feature data 1111 in the time dimension of the first image 011, and the second type feature processing step 420 extracts the second type feature data 2222 in the distance and speed dimensions of the first image 011; the third category determination output step 430 synthesizes the One type of feature data 1111 and the second type of feature number After data 2222, the action recognition data set is obtained.
  • the third recognition result data 3333 is obtained by classifying and identifying the action recognition data set; wherein the action recognition data set is at least divided into a first effective data set, a second noise data set, a third Interference dataset three categories.
  • the first data acquisition step 100 periodically acquires the first original signal 001, which is a fixed-length I/Q complex signal; and whose first image 011 is composed of the first original signal 001.
  • the signal 001 is arranged in sequence, and its two-dimensional array is in the form of MxN; where M and N are natural numbers; in addition, the third recognition result data 3333 in Figure 4 includes switch quantities or signals used to trigger related mechanisms.
  • the action recognition method also includes a second preprocessing step 200 and a third intermediate processing step 300;
  • the second preprocessing step 200 may include a noise reduction processing step, and the third intermediate processing step 300 may include Fast Fourier transform step FFT and/or short-time Fourier transform STFT; wherein STFT adds a window function before FFT, and the window function can be a hanning window; the second reduced noise can be obtained by processing the first image 011 through noise reduction.
  • the second noise-reduced image 111 can replace the first image 011 to perform the fourth action recognition step 400; the third intermediate processing step 300 obtains the signal-to-noise ratio SNR of the second noise-reduced image 111, and forms a third
  • the second image 222 is used to extract the second type of feature data 2222.
  • the first type of feature data 1111 can be a first feature vector with a length of N
  • the second type of feature data 2222 can be a second feature vector with a length of L
  • the action recognition data set is a The third eigenvector of length (N+L).
  • the convolution kernel can be 3; the normalization process can be obtained by using the softmax method.
  • the pulse repetition interval PRI of the first original signal 001 is fixed.
  • the first original signal 001 is the echo of an ultra-wideband UWB radar.
  • the operating frequency of the radar can be selected between 6.4GHZ and 8GHZ or the wavelength can be between 6.4GHZ and 8GHZ. Between 3.75CM and 4.69CM.
  • the method may also include an anti-shake output step 500 and a model training step 600; wherein the anti-shake output step 500 ensures the reliability of the recognition by obtaining the third recognition result data 3333 for R consecutive times; where , R is a natural number greater than or equal to 2; if the third recognition result data 3333 obtained R times are all first valid data, it is determined that the first original signal 001 is a valid signal.
  • the model training step 600 includes a noise reduction model training step 602, a recognition model training step 604 and/or a related model training step 60M, where M is a natural number and M is not less than 2; wherein, the noise reduction model training step 602 Used for the noise reduction processing step; the noise reduction model training step 602 normalizes the training sample X to obtain the normalized sample X-Normal, and superimposes random white noise on the X-Normal to obtain the noise sample X-Noise; and then normalizes The sample X-Normal and the noise sample X-Noise construct a training sample pair ⁇ X-Normal, Normal); iteratively optimize the encoding and decoding parameters until the loss function Loss reaches the target value; wherein, in the forward inference stage, the first image 011 can also be normalized and the normalized result can be input into the auto-encoding device.
  • the synthetic minority class oversampling method SMOTE can be used to amplify the training samples.
  • the recognition model training step 604 can perform normalization processing on the first image 011 or the second denoised image 111 and the second image 222 respectively; and then input the normalized sample into the action
  • the predicted probability distribution P is obtained;
  • Focal Loss is then used as the loss function to calculate the error Loss between the predicted value and the real label; and the gradient descent method is used to iteratively optimize the model parameters until the Loss drops to the preset accuracy range.
  • the embodiment of the present invention also discloses an action recognition device 700, which includes a first data acquisition module 710 and a third action recognition module 730; wherein: the first data acquisition module 710 scans and acquires the first original signal. 001, convert the first original signal 001 into a two-dimensional array form and form a first image 011.
  • the first image 011 is a distance-time image.
  • its third action recognition module 730 includes a first type feature processing module 771, a second type feature processing module 772, and a third type judgment output unit 773; among which, the first type feature processing module 771 extracts the first image 011 The first type of feature data in the time dimension 1111, the second type of feature processing The module 772 extracts the second type feature data 2222 in the distance and speed dimensions of the first image 011; the third category determination output unit 773 synthesizes the first type feature data 1111 and the second type feature data 2222 to obtain an action recognition data set; the third category The determination output unit 773 classifies and recognizes the action recognition data set to obtain third recognition result data 3333; wherein the action recognition data set is divided into at least three categories: a first valid data set, a second noise data set, and a third interference data set.
  • the first data acquisition module 710 periodically acquires the first original signal 001, which is a fixed-length I/Q complex signal; the first image 011 is composed of the first original signal 001 Arranged in sequence, the two-dimensional array is in the form of MxN; where M and N are natural numbers; the third recognition result data 3333 includes switch quantities or signals used to trigger related mechanisms.
  • the action recognition device also includes: a second data processing module 720; as shown in Figure 10, the second data processing module 720 includes a noise reduction processing module and an intermediate processing module; the intermediate processing module Complete the fast Fourier transform FFT and/or the short-time Fourier transform STFT; STFT adds a window function before the FFT, and the window function can be in the form of hanning; the noise reduction processing module processes the first image 011 to obtain the second reduction The denoised image 111; the second denoised image 111 replaces the first image 011 to participate in the processing of the third action recognition module 730; the intermediate processing module obtains the signal-to-noise ratio SNR of the second denoised image 111 and forms the second image 222 Used for the extraction of the second type of feature data 2222.
  • the second data processing module 720 includes a noise reduction processing module and an intermediate processing module
  • the intermediate processing module Complete the fast Fourier transform FFT and/or the short-time Fourier transform STFT
  • STFT adds a window function before the FFT,
  • the first type of feature data 1111 is a first feature vector with a length of N
  • the second type of feature data 2222 is a second feature vector with a length of L
  • the action recognition data set is a length of (N +L) third feature vector; perform 1x1 convolution on the third feature vector with the preset convolution kernel 3 to obtain the activation values of various types of data in the action recognition data set, normalize the activation values and obtain the action recognition data Set the probability distribution of the values of various types of data; among them, the category with the highest probability corresponds to the third recognition result data 3333; among them, the normalization method can be implemented by using softmax.
  • the pulse repetition interval PRI of the first original signal 001 is fixed.
  • the first original signal 001 may be the echo of an ultra-wideband UWB radar.
  • the operating frequency of the radar is between 6.4 GHZ and 8 GHZ or the wavelength is between 3.75 CM and 4.69 CM. between CM.
  • the device 700 may also include a fourth control output module 740; wherein the fourth control output module 740 continuously obtains the third recognition result data 3333 R times, and R is greater than or equal to 2. Natural number; if the third recognition result data 3333 obtained R times are all first valid data, then the first original signal 001 is determined to be a valid signal; in addition, the second data processing module 720 may also include a noise reduction model training module 721, a recognition Model training module 722 and/or related model training module 72M; among them, the noise reduction model training module 721 is used for the optimization of noise reduction processing; the noise reduction model training module 721 obtains the normalized sample X- by normalizing the training sample X Normal, and superimpose random white noise on X-Normal to obtain the noise sample X-Noise; then construct the training sample pair ⁇ X-Normal, X-Noise> from the normalized sample X-Normal and the noise sample X-Noise; and input X -Noise to the autoencode
  • the first image 011 can also be normalized and the normalized result can be input into the autoencoder; in addition, the second data processing module 720 can use the synthetic minority class oversampling method SMOTE to The training samples are amplified.
  • the recognition model training module 722 can perform normalization processing on the first image 011 or the second denoised image 111 and the second image 222 respectively; and then input the normalized samples into the action recognition model to obtain predictions.
  • the computer storage medium 903 shown in Figures 11 to 14 includes a storage medium body for storing a computer program; when the computer program is executed by a microprocessor, any action recognition method disclosed in the present invention can be implemented.
  • the sensor 905 shown in FIG. 14 can use any action recognition device and/or storage medium disclosed in the present invention; similarly, vehicles using the device or storage medium disclosed in the present invention also naturally fall into the scope of this invention. protection scope of the invention.
  • embodiments of the present invention can use NXP's UWB radar chip to operate between 6.4GHz and 8GHz, and its wavelength range is between 3.75CM and 4.69CM.
  • the UWB radar periodically emits narrow pulse signals with a fixed PRI. If there is a target within the detection range, the received echo will carry target information.
  • This device embodiment consists of a UWB radar data acquisition module, a data processing module, and an action recognition module. It consists of blocks and output modules; its UWB radar data acquisition module is responsible for collecting the original I/Q signals received by the UWB radar, the data processing module is responsible for preprocessing the original I/Q signals such as noise reduction and FFT, and the action recognition module is responsible for preprocessing the UWB radar signals.
  • the characteristics of human kicking movements are extracted from the signal to determine whether it is a human kicking movement.
  • the output module is responsible for synthesizing multiple recognition results and outputting control signals.
  • the UWB radar data acquisition module receives the original I/Q signal of a fixed length each time, and arranges the original signal sequence into a two-dimensional array of MxN, which can also be expressed as a two-dimensional image, recorded as a range-time image Img_DT; further Ground, input Img_DT to the data processing module, perform denoising processing on Img_DT through the autoencoder, and obtain the denoised image Img_Denoise; further, perform STFT processing on Img_Denoise along the slow time dimension, and calculate the signal for the processed image.
  • the distance-velocity heat map Img_DVH is obtained; then Img_Denoise and Img_DVH are input into the action recognition module at the same time to obtain the recognition result of human kicking action; in addition, if the action recognition module recognizes human kicking action multiple times, output The module outputs the control signal for opening the tailgate.
  • the UWB radar acquisition module is implemented by dynamically maintaining a buffer queue: due to the existence of the buffer queue, the writing speed and reading speed of data may be inconsistent, that is, when the length of the data in the queue accumulates to exceed the length that needs to be read, The reading is only performed when the length is reached; in addition, an overlap area is set between two consecutive reads, which can reduce the probability that a continuous action is split into 2 frames of data.
  • the overlap can be between [0,1) , a typical value is 0.5.
  • an autoencoder can be used to denoise the image Img_DT: its autoencoder can adopt an Encoder-Decoder structure, as shown in Figure 3; among them, the Encoder is used to encode the input image Img_DT through stacked convolution (Convolution ), activation (Activation) and maximum value pooling (MaxPooling) to extract effective features of the image and reduce the feature dimension; Decoder is used to use encoded low-dimensional features through stacked convolution (Convolution) and activation (Activation) and upsample (Upsample) to restore the image content; the output of the Decoder is the denoised image Img_Denoise, whose size is consistent with the input size of the Encoder.
  • the Encoder is used to encode the input image Img_DT through stacked convolution (Convolution ), activation (Activation) and maximum value pooling (MaxPooling) to extract effective features of the image and reduce the feature dimension
  • Decoder is used to use encoded low-dimensional features through
  • the training process includes:
  • direct FFT will cause frequency leakage problems; in order to reduce frequency leakage, short-time Fourier transform (STFT) can be used to first multiply the signal by a window function (such as hanning window), then do FFT (Fast Fourier Transformation).
  • STFT short-time Fourier transform
  • the signal-to-noise ratio SNR 20log(
  • the action recognition module is an end-to-end recognition model that combines the convolutional neural network CNN (Convolutional Neural Network) and the recurrent neural network RNN (Recurrent Neural Network). As shown in Figure 4, the model has two branches. Branch 1 mainly extracts features in the time dimension, and branch 2 mainly extracts features in the distance and speed dimensions.
  • CNN Convolutional Neural Network
  • RNN Recurrent Neural Network
  • branch 1 is Img_Denoise.
  • This branch uses 1D-CNN+LSTM, that is, the one-dimensional convolutional long short-term memory network LSTM (Long Short Term Memory) to extract object features: take a column of Img_Denoise as an object, and perform a Dimensional convolution, where the convolution kernel size is 1xk (k is typically 3), the number of convolution kernels is C, and the feature after 1D-CNN is an MxNxC tensor; the tensor is accumulated along the C direction to obtain MxN Large and small feature maps CNN_F; split CNN_F by columns into N vectors of length M, and input the N M-dimensional vectors into LSTM in turn (LSTM is a type of RNN), and N outputs can be obtained. That is, F_branch1 of branch 1 is a feature vector of length N.
  • LSTM Long Short Term Memory
  • branch 2 uses a typical 2D-CNN to extract object features. It is internally stacked by a series of convolutional layers, batch normalization layers, activation layers, pooling layers, etc., after 2D -F_branch2 of feature branch 2 after CNN is a feature vector of length L.
  • F_branch1 and F_branch2 are merged to obtain a feature vector with a length of (N+L); a 1x1 convolution is performed on the feature vector, and the number of convolution kernels is is 3, the activation values of each category are obtained; finally, the activation values of each category are normalized through softmax to obtain the probability distribution of the category, and the category with the highest probability is the output of the model.
  • the method of the present invention sets the number of categories to 3.
  • One category is background noise
  • one category is human body kicks
  • the other category is others (such as a cat crawling under the car, strong wind, etc.), that is, those that are easy to interact with human feet. Kick motion confuses the interference unified as a class.
  • the training process includes: normalizing the training samples Img_Denoise and Img_DVH respectively; inputting the normalized sample pairs into the model as shown in Figure 4 to obtain predictions Probability distribution P; use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; then use the gradient descent method to iteratively optimize the model parameters until the Loss drops to no longer small.
  • the synthetic minority oversampling technique SMOTE Synthetic Minority Oversampling Technique
  • the denoising autoencoder shown in Figure 3 as long as it uses CNN as the Encoder or Decoder, should fall within the scope of this solution;
  • the action recognition model shown in Figure 4 the LSTM in branch 1 can also be replaced Into other RNN models, such as gate recurrent unit GRU (Gate Recurrent Unit), etc.;
  • the 2D-CNN model in branch 2 as long as it consists of a series of convolutional layers, batch normalization layers, activation layers, pooling layers, and full One or more components in the connection layer are stacked, and no matter how they are combined, they should all fall within the scope of the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

An action recognition method and apparatus, and a storage medium, a sensor and a vehicle. An ultra wide band (UWB) radar is used to recognize a kicking action of a human body, and a deep learning technique is combined with the UWB radar to solve the problems of mis-recognition and mis-operation in a recognition technique, such that the performance of a related system can be further improved.

Description

一种动作识别方法、装置、存储介质、传感器及车辆An action recognition method, device, storage medium, sensor and vehicle 技术领域Technical field
本发明属于智能车技术领域,尤其涉及一种动作识别方法、装置、存储介质、传感器及车辆。The invention belongs to the technical field of smart vehicles, and in particular relates to an action recognition method, device, storage medium, sensor and vehicle.
背景技术Background technique
脚踢自动尾门系统,属于智能车在感知控制领域的一种实现;该系统通过安装在车身周围的传感器自动识别人体脚踢动作,并控制尾门自动开启;该系统的核心是用于识别人体脚踢动作的传感器。The kick automatic tailgate system is an implementation of smart cars in the field of perception control. The system automatically recognizes human body kicking movements through sensors installed around the body and controls the automatic opening of the tailgate. The core of the system is used to identify Sensor for kicking motion of human body.
目前,市场上主流的方法是使用电容传感器,一般是在车辆后保险杠下方,车辆尾部和底部各安装一个电容电场发射电极,当探测区域出现人体腿部,则会与电极之间形成电容;电容值随腿部与电极的距离发生变化,进而可识别出人体脚踢动作。Currently, the mainstream method on the market is to use capacitive sensors, usually under the rear bumper of the vehicle. A capacitive electric field emitting electrode is installed at the rear and bottom of the vehicle. When a human leg appears in the detection area, a capacitance will be formed between the sensor and the electrode; The capacitance value changes with the distance between the leg and the electrode, and the kicking movement of the human body can be recognized.
随着汽车的智能化发展,雷达已逐渐成为汽车感知系统必不可少的组成部分;其中超宽带UWB(Ultra Wide Band)雷达由于其带宽大、穿透性强、成本低等优点,具有较大的发展潜力。With the intelligent development of automobiles, radar has gradually become an indispensable part of the automobile perception system; among them, ultra-wideband UWB (Ultra Wide Band) radar has great advantages due to its large bandwidth, strong penetration, and low cost. development potential.
相比于电容传感器仅测量电容值的变化,UWB雷达可以同时测量物体的距离、速度、角度等信息,获取更加丰富的特征;同时,若复用车辆左后或右后方UWB锚点,可进一步降低成本。Compared with capacitive sensors that only measure changes in capacitance values, UWB radar can simultaneously measure distance, speed, angle and other information of objects to obtain richer features; at the same time, if the UWB anchor points on the left or right rear of the vehicle are reused, further cut costs.
发明内容Contents of the invention
本发明实施例公开了一种动作识别方法、装置、存储介质、传感器及车辆;该作识别方法通过人工智能方法识别相关信号中的有效数据,并能够对干扰信号和白噪声进行有效的过滤。Embodiments of the present invention disclose an action recognition method, device, storage medium, sensor and vehicle; the action recognition method identifies valid data in relevant signals through artificial intelligence methods, and can effectively filter interference signals and white noise.
具体地,该方法包括第一数据采集步骤、第四动作识别步骤;其中,第一数据采集步骤扫描并获取第一原始信号,转换该第一原始信号为二维数组形式并形成第一图像,该第一图像为距离-时间图像。Specifically, the method includes a first data collection step and a fourth action recognition step; wherein the first data collection step scans and acquires a first original signal, converts the first original signal into a two-dimensional array form and forms a first image, This first image is a distance-time image.
进一步地,其第四动作识别步骤通过第一类特征处理步骤、第二类特征 处理步骤、第三类别判定输出步骤等相关步骤对有效数据进行识别;其中,第一类特征处理步骤:提取第一图像时间维度的第一类特征数据;第二类特征处理步骤:提取第一图像距离和速度维度的第二类特征数据;第三类别判定输出步骤:合成第一类特征数据与第二类特征数据后得到动作识别数据集;其中,所述第三类别判定输出步骤还包括:通过分类识别该动作识别数据集,得到第三识别结果数据;其中,动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。Further, the fourth action recognition step passes the first type feature processing step, the second type feature The processing steps, the third category determination output step and other related steps identify the valid data; among them, the first type feature processing step: extract the first type feature data in the time dimension of the first image; the second type feature processing step: extract the first type feature data. The second type of feature data in the image distance and speed dimensions; the third category determination output step: synthesizing the first type feature data and the second type feature data to obtain an action recognition data set; wherein the third category determination output step also includes : Obtain third recognition result data by classifying and identifying the action recognition data set; wherein, the action recognition data set is divided into at least three categories: the first effective data set, the second noise data set, and the third interference data set.
具体地,其第一数据采集步骤周期性获取第一原始信号,该第一原始信号可以是固定长度的I/Q复信号;其第一图像由第一原始信号序列排列而成,其二维数组为MxN形式;其中,M、N为自然数;其第三识别结果数据包括用于触发相关机构的开关量或信号。Specifically, its first data acquisition step periodically acquires a first original signal, which may be a fixed-length I/Q complex signal; its first image is arranged by a first original signal sequence, and its two-dimensional The array is in the form of MxN; where M and N are natural numbers; the third identification result data includes switch quantities or signals used to trigger relevant mechanisms.
进一步地,该方法还可包括第二预处理步骤和第三中间处理步骤;其中,第二预处理步骤还包括降噪处理步骤,其第三中间处理步骤还包括快速傅里叶变换步骤FFT和/或短时傅里叶变换STFT步骤;其STFT步骤在FFT之前增加一窗函数,该窗函数可以是hanning窗;通过降噪处理第一图像得到第二降噪后图像;该第二降噪后图像替代第一图像可进一步执行第四动作识别步骤,进而改善相关识别过程的效果;其第三中间处理步骤通过获取第二降噪后图像的信噪比SNR,并形成第二图像用于第二类特征数据的提取。Further, the method may also include a second preprocessing step and a third intermediate processing step; wherein the second preprocessing step also includes a noise reduction processing step, and the third intermediate processing step further includes a fast Fourier transform step FFT and /or the short-time Fourier transform STFT step; the STFT step adds a window function before the FFT, and the window function can be a hanning window; the first image is processed by denoising to obtain the second denoised image; the second denoising The fourth action recognition step can be further performed by replacing the first image with the latter image, thereby improving the effect of the related recognition process; the third intermediate processing step obtains the signal-to-noise ratio SNR of the second denoised image and forms a second image for Extraction of the second type of feature data.
具体地,其第一类特征数据可以是一个长度为N的第一特征向量,其第二类特征数据可以是一个长度为L的第二特征向量,其动作识别数据集则是一个长度为(N+L)的第三特征向量;通过以预设的卷积核对第三特征向量进行1x1卷积,获取动作识别数据集各类数据的激活值,再归一化处理激活值并得到动作识别数据集各类数据取值的概率分布;其中,概率最大的类别对应于第三识别结果数据;其卷积核可以是,其归一化过程可采用softmax方法。Specifically, its first type of feature data can be a first feature vector with a length of N, its second type of feature data can be a second feature vector with a length of L, and its action recognition data set can be a length of ( N+L) third feature vector; by performing 1x1 convolution on the third feature vector with the preset convolution kernel, the activation values of various types of data in the action recognition data set are obtained, and then the activation values are normalized and the action recognition is obtained The probability distribution of the values of various types of data in the data set; among them, the category with the highest probability corresponds to the third recognition result data; its convolution kernel can be, and its normalization process can use the softmax method.
进一步地,其第一原始信号的脉冲重复间隔PRI(Pulse Repetition Interval)可采用固定值,其第一原始信号可采用超宽带UWB雷达的回波,可将该雷达的工作频率设定在6.4GHZ与8GHZ之间或将其波长设定在 3.75CM与4.69CM之间。Furthermore, the pulse repetition interval PRI (Pulse Repetition Interval) of the first original signal can be a fixed value, the first original signal can be the echo of an ultra-wideband UWB radar, and the operating frequency of the radar can be set at 6.4GHZ. and 8GHZ or set its wavelength to Between 3.75CM and 4.69CM.
进一步地,该方法还可包括防抖输出步骤和模型训练步骤;其中,防抖输出步骤通过连续R次获取第三识别结果数据,来确认数据的有效性,避免干扰信号导致的误动作,提升系统的可靠性和鲁棒性;其中,R为大于或等于2的自然数。Further, the method may also include an anti-shake output step and a model training step; wherein, the anti-shake output step obtains the third recognition result data for R consecutive times to confirm the validity of the data, avoid malfunctions caused by interference signals, and improve The reliability and robustness of the system; where R is a natural number greater than or equal to 2.
若可以连续R次获取该第三识别结果数据,且均为第一有效数据,则判定该第一原始信号为有效信号。If the third recognition result data can be obtained R times in succession and all are first valid data, then the first original signal is determined to be a valid signal.
进一步地,其模型训练步骤可包括降噪模型训练步骤、识别模型训练步骤和/或相关模型训练步骤;其中,降噪模型训练步骤用于降噪处理步骤;该降噪模型训练步骤通过归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由归一化样本X-Normal与噪声样本X-Noise构造训练样本对<X-Normal,X-Noise>;通过输入X-Noise到自编码器得到解码输出Y;进而通过获取损失函数Loss=MSE(Y,X-Normal)并迭代优化编码与解码参数直至损失函数Loss达到目标值;其中,在前向推理阶段,亦可对第一图像进行归一化处理并将其归一化结果输入自编码器。Further, the model training step may include a noise reduction model training step, a recognition model training step and/or a related model training step; wherein the noise reduction model training step is used in the noise reduction processing step; the noise reduction model training step is performed by normalization Process the training sample X-Normal, X-Noise>; Obtain decoding output Y by inputting Target value; wherein, in the forward inference stage, the first image can also be normalized and the normalized result is input into the autoencoder.
具体地,可使用合成少数类过采样方法SMOTE(Synthetic Minority Oversampling Technique)对其训练样本进行扩增;其识别模型训练步骤亦可对第一图像或第二降噪后图像以及第二图像分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P;再使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;并使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。Specifically, the synthetic minority class oversampling method SMOTE (Synthetic Minority Oversampling Technique) can be used to amplify its training samples; the recognition model training step can also be performed on the first image or the second denoised image and the second image respectively. Normalization processing; then input the normalized sample into the action recognition model to obtain the predicted probability distribution P; then use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; and use the gradient descent method Iteratively optimize the model parameters until the loss drops to the preset accuracy range.
此外,本发明实施例还公开了一种动作识别装置,包括第一数据采集模块、第三动作识别模块;其中:第一数据采集模块扫描并获取第一原始信号,转换该第一原始信号为二维数组形式并形成第一图像,该第一图像即为距离-时间图像。In addition, an embodiment of the present invention also discloses an action recognition device, which includes a first data acquisition module and a third action recognition module; wherein: the first data acquisition module scans and acquires a first original signal, and converts the first original signal into In the form of a two-dimensional array, a first image is formed, and the first image is a distance-time image.
进一步地,其第三动作识别模块可包括第一类特征处理模块、第二类特征处理模块、第三类别判定输出单元;其中,第一类特征处理模块:提取第 一图像时间维度的第一类特征数据;第二类特征处理模块:提取第一图像距离和速度维度的第二类特征数据;第三类别判定输出单元:则合成第一类特征数据与第二类特征数据后得到动作识别数据集;其第三类别判定输出单元分类识别其动作识别数据集,得到第三识别结果数据;其中,该动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。Further, its third action recognition module may include a first type feature processing module, a second type feature processing module, and a third type judgment output unit; wherein, the first type feature processing module: extracts the third type of feature processing module. The first type feature data in the time dimension of an image; the second type feature processing module: extracts the second type feature data in the distance and speed dimensions of the first image; the third category determination output unit: synthesizes the first type feature data and the second type The action recognition data set is obtained after classifying the feature data; its third category judgment output unit classifies and identifies the action recognition data set, and obtains the third recognition result data; wherein, the action recognition data set is divided into at least a first valid data set, a third valid data set, and a first valid data set. There are three categories: the second noise data set and the third interference data set.
具体地,第一数据采集模块周期性获取第一原始信号,其中,该第一原始信号可以是固定长度的I/Q复信号;第一图像由第一原始信号序列排列而成,其二维数组为MxN形式;其中,M、N为自然数;其第三识别结果数据则包括用于触发相关机构的开关量或信号。Specifically, the first data acquisition module periodically acquires the first original signal, where the first original signal may be a fixed-length I/Q complex signal; the first image is arranged by the first original signal sequence, and its two-dimensional The array is in the form of MxN; where M and N are natural numbers; the third recognition result data includes the switch quantity or signal used to trigger the relevant mechanism.
进一步地,该装置还可包括:第二数据处理模块,通过数据预处理来改善特征识别的效果;其中,第二数据处理模块包括降噪处理模块、中间处理模块;其中间处理模块可完成快速傅里叶变换FFT和/或短时傅里叶变换STFT;其STFT在FFT之前增加一窗函数,该窗函数可以是hanning窗函数;其降噪处理模块处理第一图像得到第二降噪后图像;第二降噪后图像可替代第一图像参与第三动作识别模块的处理;其中间处理模块获取第二降噪后图像的信噪比SNR,并形成第二图像用于第二类特征数据的提取。Further, the device may also include: a second data processing module to improve the effect of feature recognition through data preprocessing; wherein the second data processing module includes a noise reduction processing module and an intermediate processing module; the intermediate processing module can complete rapid Fourier transform FFT and/or short-time Fourier transform STFT; its STFT adds a window function before the FFT, and the window function can be a hanning window function; its noise reduction processing module processes the first image to obtain the second noise reduction image; the second denoised image can replace the first image to participate in the processing of the third action recognition module; the intermediate processing module obtains the signal-to-noise ratio SNR of the second denoised image and forms a second image for the second type of feature Data extraction.
具体地,其第一类特征数据是一个长度为N的第一特征向量,第二类特征数据是一个长度为L的第二特征向量,其动作识别数据集是一个长度为(N+L)的第三特征向量;进而以预设的卷积核对第三特征向量进行1x1卷积,获取动作识别数据集各类数据的激活值,归一化处理该激活值并得到动作识别数据集各类数据取值的概率分布;其中,概率最大的类别对应于第三识别结果数据,可选该卷积核为3,其归一化方法可采用softmax来实现。Specifically, its first type of feature data is a first feature vector of length N, its second type of feature data is a second feature vector of length L, and its action recognition data set is a length of (N+L) The third feature vector of Probability distribution of data values; among them, the category with the highest probability corresponds to the third recognition result data. The optional convolution kernel is 3, and the normalization method can be implemented by softmax.
进一步地,其第一原始信号的脉冲重复间隔PRI可采用固定值,其第一原始信号可采用超宽带UWB雷达的回波,该雷达的工作频率介于6.4GHZ与8GHZ之间或波长介于范围3.75CM与4.69CM之间。Furthermore, the pulse repetition interval PRI of the first original signal can be a fixed value, and the first original signal can be the echo of an ultra-wideband UWB radar. The working frequency of the radar is between 6.4GHZ and 8GHZ or the wavelength is between 6.4GHZ and 8GHZ. Between 3.75CM and 4.69CM.
进一步地,该装置还可包括第四控制输出模块;当连续R次获取第三识别结果数据后;若R次获取的第三识别结果数据均为第一有效数据,则判定 该第一原始信号为有效信号。Further, the device may also include a fourth control output module; after acquiring the third recognition result data R times continuously; if the third recognition result data acquired R times are all first valid data, then determine The first original signal is a valid signal.
进一步地,其第二数据处理模块还可包括降噪模型训练模块、识别模型训练模块和/或相关模型训练模块;其中,降噪模型训练模块用于降噪处理的优化;降噪模型训练模块归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由归一化样本X-Normal与噪声样本X-Noise构造训练样本对<X-Normal,X-Noise>;输入X-Noise到自编码器得到解码输出Y;该降噪模型训练模块获取损失函数Loss=MSE(Y,X-Normal)并迭代优化编码与解码参数直至损失函数Loss达到目标值;其中,在前向推理阶段,亦可对第一图像进行归一化处理并将其归一化结果输入自编码器;该第二数据处理模块可使用合成少数类过采样方法SMOTE对训练样本进行扩增,以均衡样本数据;此外,识别模型训练模块对第一图像或第二降噪后图像以及第二图像可分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P并使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;再使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。Further, its second data processing module may also include a noise reduction model training module, a recognition model training module and/or a related model training module; wherein, the noise reduction model training module is used for the optimization of noise reduction processing; the noise reduction model training module Normalize the training sample X to obtain the normalized sample X-Normal, and superimpose random white noise on the For <X-Normal, Until the loss function Loss reaches the target value; in the forward inference stage, the first image can also be normalized and the normalized result is input into the autoencoder; the second data processing module can use the synthetic minority class The oversampling method SMOTE amplifies the training samples to balance the sample data; in addition, the recognition model training module can perform normalization processing on the first image or the second denoised image and the second image respectively; and then normalize the The final sample is input into the action recognition model, and the predicted probability distribution P is obtained and Focal Loss is used as the loss function to calculate the error Loss between the predicted value and the real label; then the gradient descent method is used to iteratively optimize the model parameters until the Loss drops to the preset value. accuracy range.
进一步地,本发明实施例还公开了采用上述方法和装置的相关产品,包括计算机存储介质、传感器及车辆。其中,该存储介质包括用于存储计算机程序的存储介质本体;当计算机程序在被微处理器执行时,可实现上述动作识别方法。Further, embodiments of the present invention also disclose related products using the above methods and devices, including computer storage media, sensors and vehicles. Wherein, the storage medium includes a storage medium body for storing a computer program; when the computer program is executed by the microprocessor, the above action recognition method can be implemented.
类似地,其传感器和车辆包括如上的任一装置、存储介质,亦可实现相关特征数据的识别进而对相关动作进行响应,具体过程不再赘述。Similarly, its sensors and vehicles include any of the above devices and storage media, which can also recognize relevant feature data and respond to relevant actions. The specific process will not be described again.
本发明实施例以UWB雷达识别人体脚踢动作,并将深度学习技术与UWB雷达相结合,用以解决识别技术中存在的误识别、误动作问题,可进一步提升相关系统的性能;此外,使用UWB雷达实现的脚踢传感器,其成本更低,将UWB雷达信号与深度学习模型相结合,其识别性能亦得到提升。The embodiment of the present invention uses UWB radar to identify human body kicking movements, and combines deep learning technology with UWB radar to solve the problems of misrecognition and misoperation in the recognition technology, which can further improve the performance of related systems; in addition, using The kick sensor implemented by UWB radar is lower in cost. It combines UWB radar signals with deep learning models, and its recognition performance is also improved.
需要说明的是,在本文中采用的“第一”、“第二”等类似的语汇,仅仅是为了描述技术方案中的各组成要素,并不构成对技术方案的限定,也不能理 解为对相应要素重要性的指示或暗示;带有“第一”、“第二”等类似语汇的要素,表示在对应技术方案中,该要素至少包含一个。It should be noted that the "first", "second" and other similar terms used in this article are only used to describe the various components of the technical solution, and do not constitute a limitation on the technical solution, nor can they be interpreted. It is interpreted as an indication or hint of the importance of the corresponding element; elements with similar terms such as "first" and "second" mean that the element contains at least one in the corresponding technical solution.
附图说明Description of drawings
为了更加清晰地说明本发明的技术方案,利于对本发明的技术效果、技术特征和目的进一步理解,下面结合附图对本发明进行详细的描述,附图构成说明书的必要组成部分,与本发明的实施例一并用于说明本发明的技术方案,但并不构成对本发明的限制。In order to explain the technical solution of the present invention more clearly and facilitate a further understanding of the technical effects, technical features and purposes of the present invention, the present invention will be described in detail below in conjunction with the accompanying drawings, which constitute an essential part of the specification and are closely related to the implementation of the present invention. The examples are used to illustrate the technical solution of the present invention, but do not constitute a limitation of the present invention.
附图中的同一标号代表相同的部件,具体地:The same reference numerals in the drawings represent the same parts, specifically:
图1为本发明方法实施例流程示意图一;Figure 1 is a schematic flow diagram 1 of an embodiment of the method of the present invention;
图2为本发明方法及产品实施例数据采集示意图;Figure 2 is a schematic diagram of data collection according to the method and product embodiment of the present invention;
图3为本发明方法及产品实施例降噪自编码解码结构示意图;Figure 3 is a schematic structural diagram of the noise reduction auto-encoding and decoding method and product embodiment of the present invention;
图4为本发明方法及产品实施例动作识别结构示意图;Figure 4 is a schematic diagram of the action recognition structure of the method and product embodiment of the present invention;
图5为本发明方法实施例动作识别流程示意图;Figure 5 is a schematic diagram of the action recognition process according to the method embodiment of the present invention;
图6为本发明方法实施例模型训练流程示意图;Figure 6 is a schematic diagram of the model training process according to the method embodiment of the present invention;
图7为本发明方法实施例流程示意图二;Figure 7 is a schematic flow diagram 2 of the method embodiment of the present invention;
图8为本发明装置实施例组成结构示意图;Figure 8 is a schematic structural diagram of an embodiment of the device of the present invention;
图9为本发明装置实施例动作识别模块结构示意图;Figure 9 is a schematic structural diagram of an action recognition module according to an embodiment of the device of the present invention;
图10为本发明装置实施例模型训练模块结构示意图;Figure 10 is a schematic structural diagram of the model training module of the device embodiment of the present invention;
图11为本发明产品实施例组成结构示意图一;Figure 11 is a schematic structural diagram of an embodiment of the product of the present invention;
图12为本发明产品实施例组成结构示意图二;Figure 12 is a schematic diagram 2 of the composition and structure of an embodiment of the product of the present invention;
图13为本发明产品实施例组成结构示意图三;Figure 13 is a schematic diagram three of the composition and structure of an embodiment of the product of the present invention;
图14为本发明产品实施例组成结构示意图四。Figure 14 is a schematic diagram 4 of the composition and structure of an embodiment of the product of the present invention.
其中:in:
001-第一原始信号,即实施例I/Q信号,001 - the first original signal, the embodiment I/Q signal,
011-第一距离-时间图像(二维数组)一,011-First distance-time image (two-dimensional array) one,
012-信号间重叠,012-Overlap between signals,
022-第一距离-时间图像(二维数组)二, 022-First distance-time image (two-dimensional array) two,
100-第一数据采集步骤,100-First data collection step,
111-第一降噪后图像(二维数组),111-The first denoised image (two-dimensional array),
200-第二预处理步骤,实施例中包括降噪;200-The second preprocessing step, including noise reduction in the embodiment;
300-第三中间处理步骤,实施例中构造距离速度图像;300-The third intermediate processing step, in the embodiment, the distance velocity image is constructed;
400-第四动作识别步骤,实施例中区分白噪声、肢体动作和干扰动作;400-The fourth action recognition step. In the embodiment, white noise, body movements and interference actions are distinguished;
410-第一类特征处理步骤,有关于时间维度的数据提取;410-The first type of feature processing step involves data extraction in the time dimension;
420-第二类特征处理步骤,有关于距离速度维度的数据提取;420-The second type of feature processing step involves data extraction on distance and speed dimensions;
430-第三类别判定输出步骤;430-The third category determination output step;
500-第五防抖输出步骤,500-The fifth anti-shake output step,
600-第六模型训练步骤,600-Sixth model training step,
602-降噪模型训练步骤,602-Noise reduction model training steps,
604-识别模型训练步骤,604-Recognition model training step,
60M-相关模型训练步骤,M为自然数,M大于或等于2;60M-related model training steps, M is a natural number, M is greater than or equal to 2;
700-动作识别装置,700-motion recognition device,
710-第一数据采集模块,710-First data acquisition module,
720-第二数据处理模块,720-Second data processing module,
721-降噪模型训练模块,721-Noise reduction model training module,
722-识别模型训练模块,722-Recognition model training module,
72M-相关模型训练模块,72M-related model training module,
730-第三动作识别模块,730-The third action recognition module,
740-第四控制输出模块;740-Fourth control output module;
771-第一类特征处理模块,771-The first type of feature processing module,
772-第二类特征处理模块,772-The second type of feature processing module,
773-第三类别判定输出模块,773-The third category judgment output module,
810-第八降噪自编码解码结构,810-The eighth denoising auto-encoding decoding structure,
811-编码器,811-encoder,
812-解码器,812-Decoder,
900-车辆, 900-vehicle,
901-控制器,901-Controller,
902-动作识别装置,902-motion recognition device,
903-计算机存储介质,903-Computer storage media,
905-传感器,905-sensor,
1001-一维卷积神经网络处理过程,即1D-CNN;1001-One-dimensional convolutional neural network processing process, namely 1D-CNN;
1003-特征张量,即CNN-F;1003-Feature tensor, namely CNN-F;
1005-特征图,即实施例经由LSTM获得的N个输出量;1005-Feature map, that is, the N output quantities obtained through LSTM in the embodiment;
1111-第一特征数据(向量),1111-First feature data (vector),
2001-二维卷积神经网络特征提取过程,即实施例经由2D-CNN获得的L个输出量;2001-Two-dimensional convolutional neural network feature extraction process, that is, the L outputs obtained through 2D-CNN in the embodiment;
2222-第二特征数据(向量),2222-Second feature data (vector),
3000-特征数据(向量)合并,3000-feature data (vector) merge,
3333-第三识别结果数据。3333-Third recognition result data.
具体实施方式Detailed ways
下面结合附图和实施例,对本发明作进一步的详细说明。当然,下列描述的具体实施例只是为了解释本发明的技术方案,而不是对本发明的限定。此外,实施例或附图中表述的部分,也仅仅是本发明相关部分的举例说明,而不是本发明的全部。The present invention will be further described in detail below in conjunction with the accompanying drawings and examples. Of course, the specific embodiments described below are only for explaining the technical solution of the present invention, but are not intended to limit the present invention. In addition, the parts described in the embodiments or drawings are only illustrations of the relevant parts of the present invention, and are not the entirety of the present invention.
如图1、图2、图4、图5所示的动作识别方法,包括第一数据采集步骤100、第四动作识别步骤400;其中,第一数据采集步骤100扫描并获取第一原始信号001,转换第一原始信号001为二维数组形式并形成第一图像011,第一图像011为距离-时间图像。The action recognition method shown in Figures 1, 2, 4, and 5 includes a first data collection step 100 and a fourth action recognition step 400; wherein the first data collection step 100 scans and acquires the first original signal 001 , convert the first original signal 001 into a two-dimensional array form and form a first image 011, and the first image 011 is a distance-time image.
如图5所示,其第四动作识别步骤400包括第一类特征处理步骤410、第二类特征处理步骤420、第三类别判定输出步骤430;如图4所示,其第一类特征处理步骤410提取第一图像011时间维度的第一类特征数据1111,第二类特征处理步骤420提取第一图像011距离和速度维度的第二类特征数据2222;第三类别判定输出步骤430合成第一类特征数据1111与第二类特征数 据2222后得到动作识别数据集。As shown in Figure 5, its fourth action recognition step 400 includes a first type feature processing step 410, a second type feature processing step 420, and a third type judgment output step 430; as shown in Figure 4, its first type feature processing step Step 410 extracts the first type feature data 1111 in the time dimension of the first image 011, and the second type feature processing step 420 extracts the second type feature data 2222 in the distance and speed dimensions of the first image 011; the third category determination output step 430 synthesizes the One type of feature data 1111 and the second type of feature number After data 2222, the action recognition data set is obtained.
进一步地,如图4所示,通过分类识别其动作识别数据集,得到第三识别结果数据3333;其中,动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。Further, as shown in Figure 4, the third recognition result data 3333 is obtained by classifying and identifying the action recognition data set; wherein the action recognition data set is at least divided into a first effective data set, a second noise data set, a third Interference dataset three categories.
具体地,如图2所示,第一数据采集步骤100:周期性获取第一原始信号001,其第一原始信号001为固定长度的I/Q复信号;其第一图像011由第一原始信号001序列排列而成,其二维数组为MxN形式;其中,M、N为自然数;此外,如图4的第三识别结果数据3333包括用于触发相关机构的开关量或信号。Specifically, as shown in Figure 2, the first data acquisition step 100: periodically acquires the first original signal 001, which is a fixed-length I/Q complex signal; and whose first image 011 is composed of the first original signal 001. The signal 001 is arranged in sequence, and its two-dimensional array is in the form of MxN; where M and N are natural numbers; in addition, the third recognition result data 3333 in Figure 4 includes switch quantities or signals used to trigger related mechanisms.
进一步地,如图7所示,该动作识别方法还包括第二预处理步骤200、第三中间处理步骤300;第二预处理步骤200可包括降噪处理步骤,第三中间处理步骤300可包括快速傅里叶变换步骤FFT和/或短时傅里叶变换STFT;其中,STFT在FFT之前增加一窗函数,该窗函数可以是hanning窗;可通过降噪处理第一图像011得到第二降噪后图像111;第二降噪后图像111可替代第一图像011执行第四动作识别步骤400;第三中间处理步骤300通过获取第二降噪后图像111的信噪比SNR,并形成第二图像222用于第二类特征数据2222的提取。Further, as shown in Figure 7, the action recognition method also includes a second preprocessing step 200 and a third intermediate processing step 300; the second preprocessing step 200 may include a noise reduction processing step, and the third intermediate processing step 300 may include Fast Fourier transform step FFT and/or short-time Fourier transform STFT; wherein STFT adds a window function before FFT, and the window function can be a hanning window; the second reduced noise can be obtained by processing the first image 011 through noise reduction. The second noise-reduced image 111 can replace the first image 011 to perform the fourth action recognition step 400; the third intermediate processing step 300 obtains the signal-to-noise ratio SNR of the second noise-reduced image 111, and forms a third The second image 222 is used to extract the second type of feature data 2222.
如图4所示,第一类特征数据1111可以是一个长度为N的第一特征向量,第二类特征数据2222可以是一个长度为L的第二特征向量,其动作识别数据集则是一个长度为(N+L)的第三特征向量。As shown in Figure 4, the first type of feature data 1111 can be a first feature vector with a length of N, the second type of feature data 2222 can be a second feature vector with a length of L, and the action recognition data set is a The third eigenvector of length (N+L).
进一步地,以预设的卷积核对第三特征向量进行1x1卷积,获取动作识别数据集各类数据的激活值,归一化处理该激活值并得到动作识别数据集各类数据取值的概率分布;其中,概率最大的类别则对应于第三识别结果数据3333。Further, perform 1x1 convolution on the third feature vector with a preset convolution kernel to obtain the activation values of various types of data in the action recognition data set, normalize the activation values and obtain the values of various types of data in the action recognition data set. Probability distribution; among them, the category with the highest probability corresponds to the third recognition result data 3333.
具体地,其卷积核可以为3;其归一化过程可采用softmax方法获得。Specifically, the convolution kernel can be 3; the normalization process can be obtained by using the softmax method.
进一步地,其第一原始信号001的脉冲重复间隔PRI固定,第一原始信号001为超宽带UWB雷达的回波,该雷达的工作频率可选择介于6.4GHZ与8GHZ之间或使得波长介于范围3.75CM与4.69CM之间。 Further, the pulse repetition interval PRI of the first original signal 001 is fixed. The first original signal 001 is the echo of an ultra-wideband UWB radar. The operating frequency of the radar can be selected between 6.4GHZ and 8GHZ or the wavelength can be between 6.4GHZ and 8GHZ. Between 3.75CM and 4.69CM.
此外,如图7所示,该方法还可包括防抖输出步骤500和模型训练步骤600;其中,防抖输出步骤500通过连续R次获取第三识别结果数据3333来确保识别的可靠性;其中,R为大于或等于2的自然数;若R次获取的第三识别结果数据3333均为第一有效数据,则判定第一原始信号001为有效信号。In addition, as shown in Figure 7, the method may also include an anti-shake output step 500 and a model training step 600; wherein the anti-shake output step 500 ensures the reliability of the recognition by obtaining the third recognition result data 3333 for R consecutive times; where , R is a natural number greater than or equal to 2; if the third recognition result data 3333 obtained R times are all first valid data, it is determined that the first original signal 001 is a valid signal.
如图6所示,其模型训练步骤600包括降噪模型训练步骤602、识别模型训练步骤604和/或相关模型训练步骤60M,M为自然数也M不小于2;其中,降噪模型训练步骤602用于降噪处理步骤;降噪模型训练步骤602归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由归一化样本X-Normal与噪声样本X-Noise构造训练样本对<X-Normal,X-Noise>;再输入X-Noise到自编码器得到解码输出Y;通过获取损失函数Loss=MSE(Y,X-Normal);迭代优化编码与解码参数直至该损失函数Loss达到目标值;其中,在前向推理阶段,亦可对第一图像011进行归一化处理并将其归一化结果输入所述自编码器。As shown in Figure 6, the model training step 600 includes a noise reduction model training step 602, a recognition model training step 604 and/or a related model training step 60M, where M is a natural number and M is not less than 2; wherein, the noise reduction model training step 602 Used for the noise reduction processing step; the noise reduction model training step 602 normalizes the training sample X to obtain the normalized sample X-Normal, and superimposes random white noise on the X-Normal to obtain the noise sample X-Noise; and then normalizes The sample X-Normal and the noise sample X-Noise construct a training sample pair <X-Normal, Normal); iteratively optimize the encoding and decoding parameters until the loss function Loss reaches the target value; wherein, in the forward inference stage, the first image 011 can also be normalized and the normalized result can be input into the auto-encoding device.
进一步地,为了确保训练样本的均衡性,可使用合成少数类过采样方法SMOTE对训练样本进行扩增。Furthermore, in order to ensure the balance of training samples, the synthetic minority class oversampling method SMOTE can be used to amplify the training samples.
此外,如图6所示,其识别模型训练步骤604可对第一图像011或第二降噪后图像111以及第二图像222分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P;再使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;并使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。In addition, as shown in Figure 6, the recognition model training step 604 can perform normalization processing on the first image 011 or the second denoised image 111 and the second image 222 respectively; and then input the normalized sample into the action In the recognition model, the predicted probability distribution P is obtained; Focal Loss is then used as the loss function to calculate the error Loss between the predicted value and the real label; and the gradient descent method is used to iteratively optimize the model parameters until the Loss drops to the preset accuracy range.
如图8所示,本发明实施例还公开了一种动作识别装置700,包括第一数据采集模块710、第三动作识别模块730;其中:第一数据采集模块710扫描并获取第一原始信号001,转换第一原始信号001为二维数组形式并形成第一图像011,该第一图像011为距离-时间图像。As shown in Figure 8, the embodiment of the present invention also discloses an action recognition device 700, which includes a first data acquisition module 710 and a third action recognition module 730; wherein: the first data acquisition module 710 scans and acquires the first original signal. 001, convert the first original signal 001 into a two-dimensional array form and form a first image 011. The first image 011 is a distance-time image.
图9所示,其第三动作识别模块730包括第一类特征处理模块771、第二类特征处理模块772、第三类别判定输出单元773;其中,第一类特征处理模块771提取第一图像011时间维度的第一类特征数据1111,第二类特征处理 模块772提取第一图像011距离和速度维度的第二类特征数据2222;第三类别判定输出单元773合成第一类特征数据1111与第二类特征数据2222后得到动作识别数据集;第三类别判定输出单元773分类识别动作识别数据集,得到第三识别结果数据3333;其中,动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。As shown in Figure 9, its third action recognition module 730 includes a first type feature processing module 771, a second type feature processing module 772, and a third type judgment output unit 773; among which, the first type feature processing module 771 extracts the first image 011 The first type of feature data in the time dimension 1111, the second type of feature processing The module 772 extracts the second type feature data 2222 in the distance and speed dimensions of the first image 011; the third category determination output unit 773 synthesizes the first type feature data 1111 and the second type feature data 2222 to obtain an action recognition data set; the third category The determination output unit 773 classifies and recognizes the action recognition data set to obtain third recognition result data 3333; wherein the action recognition data set is divided into at least three categories: a first valid data set, a second noise data set, and a third interference data set.
进一步地,如图2所示,第一数据采集模块710周期性获取第一原始信号001,其第一原始信号001为固定长度的I/Q复信号;第一图像011由第一原始信号001序列排列而成,二维数组为MxN形式;其中,M、N为自然数;第三识别结果数据3333包括用于触发相关机构的开关量或信号。Further, as shown in Figure 2, the first data acquisition module 710 periodically acquires the first original signal 001, which is a fixed-length I/Q complex signal; the first image 011 is composed of the first original signal 001 Arranged in sequence, the two-dimensional array is in the form of MxN; where M and N are natural numbers; the third recognition result data 3333 includes switch quantities or signals used to trigger related mechanisms.
进一步地,如图8所示,该动作识别装置还包括:第二数据处理模块720;如图10所示,该第二数据处理模块720包括降噪处理模块、中间处理模块;该中间处理模块完成快速傅里叶变换FFT和/或短时傅里叶变换STFT;其中,STFT在FFT之前增加一窗函数,窗函数可以是hanning形式的;降噪处理模块处理第一图像011得到第二降噪后图像111;第二降噪后图像111替代第一图像011参与第三动作识别模块730的处理;中间处理模块获取第二降噪后图像111的信噪比SNR,并形成第二图像222用于第二类特征数据2222的提取。Further, as shown in Figure 8, the action recognition device also includes: a second data processing module 720; as shown in Figure 10, the second data processing module 720 includes a noise reduction processing module and an intermediate processing module; the intermediate processing module Complete the fast Fourier transform FFT and/or the short-time Fourier transform STFT; STFT adds a window function before the FFT, and the window function can be in the form of hanning; the noise reduction processing module processes the first image 011 to obtain the second reduction The denoised image 111; the second denoised image 111 replaces the first image 011 to participate in the processing of the third action recognition module 730; the intermediate processing module obtains the signal-to-noise ratio SNR of the second denoised image 111 and forms the second image 222 Used for the extraction of the second type of feature data 2222.
如图4所示,第一类特征数据1111是一个长度为N的第一特征向量,第二类特征数据2222是一个长度为L的第二特征向量,动作识别数据集是一个长度为(N+L)的第三特征向量;以预设的卷积核3对第三特征向量进行1x1卷积,获取动作识别数据集各类数据的激活值,归一化处理激活值并得到动作识别数据集各类数据取值的概率分布;其中,概率最大的类别对应于第三识别结果数据3333;其中,归一化方法可以采用softmax实现。As shown in Figure 4, the first type of feature data 1111 is a first feature vector with a length of N, the second type of feature data 2222 is a second feature vector with a length of L, and the action recognition data set is a length of (N +L) third feature vector; perform 1x1 convolution on the third feature vector with the preset convolution kernel 3 to obtain the activation values of various types of data in the action recognition data set, normalize the activation values and obtain the action recognition data Set the probability distribution of the values of various types of data; among them, the category with the highest probability corresponds to the third recognition result data 3333; among them, the normalization method can be implemented by using softmax.
进一步地,第一原始信号001的脉冲重复间隔PRI固定,第一原始信号001可以是超宽带UWB雷达的回波,雷达的工作频率介于6.4GHZ与8GHZ之间或波长介于范围3.75CM与4.69CM之间。Further, the pulse repetition interval PRI of the first original signal 001 is fixed. The first original signal 001 may be the echo of an ultra-wideband UWB radar. The operating frequency of the radar is between 6.4 GHZ and 8 GHZ or the wavelength is between 3.75 CM and 4.69 CM. between CM.
如图8所示,该装置700还可包括第四控制输出模块740;其中,第四控制输出模块740连续R次获取第三识别结果数据3333,R为大于或等于2的 自然数;若R次获取的第三识别结果数据3333均为第一有效数据,则判定第一原始信号001为有效信号;此外,第二数据处理模块720还可包括降噪模型训练模块721、识别模型训练模块722和/或相关模型训练模块72M;其中,降噪模型训练模块721用于降噪处理的优化;降噪模型训练模块721通过归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由归一化样本X-Normal与噪声样本X-Noise构造训练样本对<X-Normal,X-Noise>;并输入X-Noise到自编码器得到解码输出Y;降噪模型训练模块721获取损失函数Loss=MSE(Y,X-Normal)并迭代优化编码与解码参数直至损失函数Loss达到目标值。As shown in Figure 8, the device 700 may also include a fourth control output module 740; wherein the fourth control output module 740 continuously obtains the third recognition result data 3333 R times, and R is greater than or equal to 2. Natural number; if the third recognition result data 3333 obtained R times are all first valid data, then the first original signal 001 is determined to be a valid signal; in addition, the second data processing module 720 may also include a noise reduction model training module 721, a recognition Model training module 722 and/or related model training module 72M; among them, the noise reduction model training module 721 is used for the optimization of noise reduction processing; the noise reduction model training module 721 obtains the normalized sample X- by normalizing the training sample X Normal, and superimpose random white noise on X-Normal to obtain the noise sample X-Noise; then construct the training sample pair <X-Normal, X-Noise> from the normalized sample X-Normal and the noise sample X-Noise; and input X -Noise to the autoencoder to obtain the decoded output Y; the noise reduction model training module 721 obtains the loss function Loss = MSE (Y, X-Normal) and iteratively optimizes the encoding and decoding parameters until the loss function Loss reaches the target value.
其中,在前向推理阶段,亦可对第一图像011进行归一化处理并将其归一化结果输入自编码器;此外,第二数据处理模块720可使用合成少数类过采样方法SMOTE对训练样本进行扩增。Among them, in the forward inference stage, the first image 011 can also be normalized and the normalized result can be input into the autoencoder; in addition, the second data processing module 720 can use the synthetic minority class oversampling method SMOTE to The training samples are amplified.
进一步地,识别模型训练模块722可对第一图像011或第二降噪后图像111以及第二图像222分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P并使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;再使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。Further, the recognition model training module 722 can perform normalization processing on the first image 011 or the second denoised image 111 and the second image 222 respectively; and then input the normalized samples into the action recognition model to obtain predictions. Probability distribution P and use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; then use the gradient descent method to iteratively optimize the model parameters until the Loss drops to the preset accuracy range.
如图11至14的计算机存储介质903,包括用于存储计算机程序的存储介质本体;当计算机程序在被微处理器执行时,可实现本发明所公开的任一动作识别方法。The computer storage medium 903 shown in Figures 11 to 14 includes a storage medium body for storing a computer program; when the computer program is executed by a microprocessor, any action recognition method disclosed in the present invention can be implemented.
此外,如图14所示的传感器905,可采用本发明所公开的任一动作识别装置和/或存储介质;类似地,采用本发明所公开装置或存储介质的车辆也自然地落入了本发明的保护范围。In addition, the sensor 905 shown in FIG. 14 can use any action recognition device and/or storage medium disclosed in the present invention; similarly, vehicles using the device or storage medium disclosed in the present invention also naturally fall into the scope of this invention. protection scope of the invention.
具体地,本发明实施例可使用NXP的UWB雷达芯片工作在6.4GHz-8GHz之间,其波长范围介于3.75CM~4.69CM。Specifically, embodiments of the present invention can use NXP's UWB radar chip to operate between 6.4GHz and 8GHz, and its wavelength range is between 3.75CM and 4.69CM.
该UWB雷达以固定的PRI周期性地发射窄脉冲信号,在探测范围内若存在目标,则接收到的回波中会携带目标信息。The UWB radar periodically emits narrow pulse signals with a fixed PRI. If there is a target within the detection range, the received echo will carry target information.
该装置实施例由UWB雷达数据采集模块,数据处理模块、动作识别模 块、输出模块组成;其UWB雷达数据采集模块负责采集UWB雷达接收的原始I/Q信号,数据处理模块负责对原始I/Q信号进行降噪、FFT等预处理,动作识别模块负责在UWB雷达信号中提取人体脚踢动作特征,判断是否为人体脚踢动作,输出模块负责综合多次识别结果并输出控制信号。This device embodiment consists of a UWB radar data acquisition module, a data processing module, and an action recognition module. It consists of blocks and output modules; its UWB radar data acquisition module is responsible for collecting the original I/Q signals received by the UWB radar, the data processing module is responsible for preprocessing the original I/Q signals such as noise reduction and FFT, and the action recognition module is responsible for preprocessing the UWB radar signals. The characteristics of human kicking movements are extracted from the signal to determine whether it is a human kicking movement. The output module is responsible for synthesizing multiple recognition results and outputting control signals.
具体可包括如图1的若干过程:Specifically, it may include several processes as shown in Figure 1:
其中,UWB雷达数据采集模块每次接收固定长度的原始I/Q信号,并将原始信号序列排列成MxN的二维数组形式,亦可表示为二维图像,记为距离-时间图像Img_DT;进一步地,将Img_DT输入到数据处理模块,通过自编码器对Img_DT进行降噪处理,得到降噪后的图像Img_Denoise;再进一步,对Img_Denoise沿慢时间维进行STFT处理,并对处理后的图像计算信噪比,得到距离-速度热力图Img_DVH;再将Img_Denoise和Img_DVH同时输入到动作识别模块中,得到人体脚踢动作的识别结果;此外,若动作识别模块多次识别到人体脚踢动作,则输出模块输出开启尾门的控制信号。Among them, the UWB radar data acquisition module receives the original I/Q signal of a fixed length each time, and arranges the original signal sequence into a two-dimensional array of MxN, which can also be expressed as a two-dimensional image, recorded as a range-time image Img_DT; further Ground, input Img_DT to the data processing module, perform denoising processing on Img_DT through the autoencoder, and obtain the denoised image Img_Denoise; further, perform STFT processing on Img_Denoise along the slow time dimension, and calculate the signal for the processed image. Noise ratio, the distance-velocity heat map Img_DVH is obtained; then Img_Denoise and Img_DVH are input into the action recognition module at the same time to obtain the recognition result of human kicking action; in addition, if the action recognition module recognizes human kicking action multiple times, output The module outputs the control signal for opening the tailgate.
如图2所示,UWB雷达采集模块通过动态维护一个缓冲队列实现:由于缓冲队列的存在,数据的写入速度和读取速度可以不一致,即当队列中的数据长度积累到超过需要读取的长度时,才进行读取;另外,连续两次读取之间设置一个重叠overlap区,这样可以降低一个连续的动作被拆分到2帧数据的概率,其overlap可取[0,1)之间,典型值为0.5。As shown in Figure 2, the UWB radar acquisition module is implemented by dynamically maintaining a buffer queue: due to the existence of the buffer queue, the writing speed and reading speed of data may be inconsistent, that is, when the length of the data in the queue accumulates to exceed the length that needs to be read, The reading is only performed when the length is reached; in addition, an overlap area is set between two consecutive reads, which can reduce the probability that a continuous action is split into 2 frames of data. The overlap can be between [0,1) , a typical value is 0.5.
进一步地,从队列中读取数据后,将数据排列成MxN的二维数组,其中M对应快时间维,N对应慢时间维,即每一列是对单个脉冲进行的采样信号,是一维距离像,共有N组脉冲采样信号组成;二维数组中的每个元素都是I+Q*j的复信号,对信号求模,即A=SQR(I^2+Q^2),图像Img_DT中的每个像素值即为对应的模值。Further, after reading the data from the queue, the data is arranged into a two-dimensional array of MxN, where M corresponds to the fast time dimension and N corresponds to the slow time dimension. That is, each column is a sampling signal of a single pulse and is a one-dimensional distance. Like, there are N groups of pulse sampling signals; each element in the two-dimensional array is a complex signal of I+Q*j, and the signal is modulo, that is, A=SQR(I^2+Q^2), image Img_DT Each pixel value in is the corresponding module value.
具体地,可使用自编码器对图像Img_DT降噪:其自编码器可采用Encoder-Decoder的结构,如图3所示;其中,Encoder用于对输入图像Img_DT进行编码,通过堆叠卷积(Convolution)、激活(Activation)和最大值池化(MaxPooling)提取图像的有效特征,降低特征维度;Decoder用于使用编码后的低维特征,通过堆叠卷积(Convolution)、激活(Activation) 和上采样(Upsample)恢复图像内容;Decoder的输出即为降噪后的图像Img_Denoise,其尺寸与Encoder的输入尺寸一致。Specifically, an autoencoder can be used to denoise the image Img_DT: its autoencoder can adopt an Encoder-Decoder structure, as shown in Figure 3; among them, the Encoder is used to encode the input image Img_DT through stacked convolution (Convolution ), activation (Activation) and maximum value pooling (MaxPooling) to extract effective features of the image and reduce the feature dimension; Decoder is used to use encoded low-dimensional features through stacked convolution (Convolution) and activation (Activation) and upsample (Upsample) to restore the image content; the output of the Decoder is the denoised image Img_Denoise, whose size is consistent with the input size of the Encoder.
进一步地,自编码器在使用前需要进行训练,其训练过程包括:Furthermore, the autoencoder needs to be trained before use. The training process includes:
对训练样本X进行归一化处理,即X_norm=(X-mean(X))/std(X);对训练样本X_norm叠加随机白噪声,即对每个像素值叠加一个服从N(0,1)分布的随机数,得到噪声训练样本X_noise,与X_norm组成训练样本对<X_norm,X_noise>;将X_noise输入如图3所示的自编码器中,得到Decoder的输出Y;计算损失函数Loss=MSE(Y,X_norm)并使用梯度下降法迭代优化Encoder和Decoder的参数,直至Loss降到不能再小为止;此外,在前向推理阶段,也需要先对输入图像Img_DT进行归一化处理,再输入到自编码器中。Normalize the training sample X, that is, X_norm=(X-mean(X))/std(X); superimpose random white noise on the training sample ) distribution of random numbers, obtain the noise training sample X_noise, which forms the training sample pair <X_norm, X_noise> with (Y, into the autoencoder.
具体地,由于图像的宽度有限,直接进行FFT会出现频率泄露的问题;为减小频率泄露,可使用短时傅里叶变换STFT(Short Time Fourier Transform),先对信号乘以一个窗函数(如hanning窗),再做FFT(Fast Fourier Transformation)。Specifically, due to the limited width of the image, direct FFT will cause frequency leakage problems; in order to reduce frequency leakage, short-time Fourier transform (STFT) can be used to first multiply the signal by a window function ( Such as hanning window), then do FFT (Fast Fourier Transformation).
其中,信噪比SNR=20log(|STFT(Img_Denoise)|);由该公式计算后即可得到距离-速度热力图Img_DVH。Among them, the signal-to-noise ratio SNR=20log(|STFT(Img_Denoise)|); after calculation by this formula, the distance-velocity heat map Img_DVH can be obtained.
进一步地,动作识别模块是一个端到端的识别模型,融合了卷积神经网络CNN(Convolutional Neural Network)和循环神经网络RNN(Recurrent Neural Network);如图4所示,模型共有2个分支,分支1主要提取时间维度的特征,分支2主要提取距离和速度维度的特征。Furthermore, the action recognition module is an end-to-end recognition model that combines the convolutional neural network CNN (Convolutional Neural Network) and the recurrent neural network RNN (Recurrent Neural Network). As shown in Figure 4, the model has two branches. Branch 1 mainly extracts features in the time dimension, and branch 2 mainly extracts features in the distance and speed dimensions.
具体地,分支1的输入是Img_Denoise,该分支使用1D-CNN+LSTM,即一维卷积长短期记忆网络LSTM(Long Short Term Memory)的方式提取对象特征:以Img_Denoise的一列为对象,进行一维卷积,其中卷积核尺寸为1xk(k典型为3),卷积核的数量为C,经过1D-CNN后的特征是一个MxNxC的张量;对该张量沿C方向进行累加,得到MxN大小的特征图CNN_F;将CNN_F按列进行拆分,共拆分为N个长度为M的向量,将N个M维向量依次输入LSTM(LSTM是RNN的一种),可得到N个输出,即分支1的F_branch1是一个长度为N的特征向量。 Specifically, the input of branch 1 is Img_Denoise. This branch uses 1D-CNN+LSTM, that is, the one-dimensional convolutional long short-term memory network LSTM (Long Short Term Memory) to extract object features: take a column of Img_Denoise as an object, and perform a Dimensional convolution, where the convolution kernel size is 1xk (k is typically 3), the number of convolution kernels is C, and the feature after 1D-CNN is an MxNxC tensor; the tensor is accumulated along the C direction to obtain MxN Large and small feature maps CNN_F; split CNN_F by columns into N vectors of length M, and input the N M-dimensional vectors into LSTM in turn (LSTM is a type of RNN), and N outputs can be obtained. That is, F_branch1 of branch 1 is a feature vector of length N.
类似地,分支2的输入是Img_DVH,该分支使用典型的2D-CNN提取对象特征,其内部由一系列卷积层、批归一化层、激活层、池化层等堆叠而成,经过2D-CNN后的特征分支2的F_branch2是一个长度为L的特征向量。Similarly, the input of branch 2 is Img_DVH. This branch uses a typical 2D-CNN to extract object features. It is internally stacked by a series of convolutional layers, batch normalization layers, activation layers, pooling layers, etc., after 2D -F_branch2 of feature branch 2 after CNN is a feature vector of length L.
进一步地,如图4所示,2个分支各自计算出特征向量后,将F_branch1和F_branch2合并,得到长度为(N+L)的特征向量;对该特征向量进行1x1卷积,卷积核数量为3,得到各个类别的激活值;最后通过softmax对各个类别的激活值进行归一化,得到类别的概率分布,概率最大的类别即为模型的输出。Further, as shown in Figure 4, after each of the two branches calculates the feature vector, F_branch1 and F_branch2 are merged to obtain a feature vector with a length of (N+L); a 1x1 convolution is performed on the feature vector, and the number of convolution kernels is is 3, the activation values of each category are obtained; finally, the activation values of each category are normalized through softmax to obtain the probability distribution of the category, and the category with the highest probability is the output of the model.
本发明方法将类别数设置为3,一类是背景噪声,一类是人体脚踢动作,还有一类是其他(如一只猫从车底钻过,大风影响等),即将那些容易和人体脚踢动作混淆的干扰统一作为一类。The method of the present invention sets the number of categories to 3. One category is background noise, one category is human body kicks, and the other category is others (such as a cat crawling under the car, strong wind, etc.), that is, those that are easy to interact with human feet. Kick motion confuses the interference unified as a class.
具体地,模型在使用前需要进行训练,其训练过程包括:对训练样本Img_Denoise和Img_DVH分别进行归一化处理;将归一化后的样本对输入到如图4所示的模型中,得到预测概率分布P;使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;再使用梯度下降法迭代优化模型参数,直至Loss降到不能再小为止。Specifically, the model needs to be trained before use. The training process includes: normalizing the training samples Img_Denoise and Img_DVH respectively; inputting the normalized sample pairs into the model as shown in Figure 4 to obtain predictions Probability distribution P; use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; then use the gradient descent method to iteratively optimize the model parameters until the Loss drops to no longer small.
可选地,若用于训练的样本之间不均衡,会影响模型效果,为解决该问题,可使用合成少数类过采样技术SMOTE(Synthetic Minority Oversampling Technique)方法对训练样本进行扩增。Optionally, if the samples used for training are unbalanced, the model effect will be affected. To solve this problem, the synthetic minority oversampling technique SMOTE (Synthetic Minority Oversampling Technique) method can be used to amplify the training samples.
进一步地,为提升系统的鲁棒性,增加了信号去抖策略:其类别信号需要连续R次(如R=2)都是脚踢动作,才会输出尾门控制信号。Furthermore, in order to improve the robustness of the system, a signal debounce strategy is added: its category signal needs to be a kicking action for R consecutive times (such as R=2) before the tailgate control signal will be output.
此外,图3所示的降噪自编码器,只要是使用CNN作为Encoder或Decoder的,都应属于本方案的范畴;如图4所示的动作识别模型,其分支1中的LSTM也可以替换成其他的RNN模型,如门循环单元GRU(Gate Recurrent Unit)等;分支2中的2D-CNN模型,只要是由一系列卷积层、批归一化层、激活层、池化层、全连接层中的一种或多种组件堆叠而成,无论以何种方式组合,都应属于本发明实施例的范畴。In addition, the denoising autoencoder shown in Figure 3, as long as it uses CNN as the Encoder or Decoder, should fall within the scope of this solution; the action recognition model shown in Figure 4, the LSTM in branch 1 can also be replaced Into other RNN models, such as gate recurrent unit GRU (Gate Recurrent Unit), etc.; the 2D-CNN model in branch 2, as long as it consists of a series of convolutional layers, batch normalization layers, activation layers, pooling layers, and full One or more components in the connection layer are stacked, and no matter how they are combined, they should all fall within the scope of the embodiments of the present invention.
需要说明的是,上述实施例仅是为了更清楚地说明本发明的技术方案, 本领域技术人员可以理解,本发明的实施方式不限于以上内容,基于上述内容所进行的明显变化、替换或替代,均不超出本发明技术方案涵盖的范围;在不脱离本发明构思的情况下,其它实施方式也将落入本发明的范围。 It should be noted that the above embodiments are only to illustrate the technical solution of the present invention more clearly. Those skilled in the art can understand that the implementation of the present invention is not limited to the above content, and obvious changes, substitutions or substitutions based on the above content do not exceed the scope of the technical solution of the present invention; without departing from the concept of the present invention , other implementations will also fall within the scope of the present invention.

Claims (17)

  1. 一种动作识别方法,其特征在于,包括:第一数据采集步骤(100)、第四动作识别步骤(400);其中,An action recognition method, characterized by including: a first data collection step (100) and a fourth action recognition step (400); wherein,
    所述第一数据采集步骤(100)扫描并获取第一原始信号(001),转换所述第一原始信号(001)为二维数组形式并形成第一图像(011),所述第一图像(011)为距离-时间图像;The first data acquisition step (100) scans and acquires a first original signal (001), converts the first original signal (001) into a two-dimensional array form and forms a first image (011), the first image (011) is the distance-time image;
    所述第四动作识别步骤(400)包括:第一类特征处理步骤(410)、第二类特征处理步骤(420)、第三类别判定输出步骤(430);其中,The fourth action recognition step (400) includes: a first type feature processing step (410), a second type feature processing step (420), and a third type judgment output step (430); wherein,
    所述第一类特征处理步骤(410):提取所述第一图像(011)时间维度的第一类特征数据(1111);所述第二类特征处理步骤(420):提取所述第一图像(011)距离和速度维度的第二类特征数据(2222);所述第三类别判定输出步骤(430):合成所述第一类特征数据(1111)与所述第二类特征数据(2222)后得到动作识别数据集;The first type feature processing step (410): extract the first type feature data (1111) in the time dimension of the first image (011); the second type feature processing step (420): extract the first The second type of feature data (2222) in the distance and speed dimensions of the image (011); the third category determination output step (430): synthesize the first type of feature data (1111) and the second type of feature data ( 2222) and then obtain the action recognition data set;
    其中,所述第三类别判定输出步骤(430)还包括:分类识别所述动作识别数据集,得到第三识别结果数据(3333);其中,所述动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。Wherein, the third category determination output step (430) also includes: classifying and identifying the action recognition data set to obtain third recognition result data (3333); wherein the action recognition data set is divided into at least the first valid There are three categories: data set, second noise data set, and third interference data set.
  2. 如权利要求1的所述动作识别方法,其中:The action recognition method of claim 1, wherein:
    所述第一数据采集步骤(100)周期性获取所述第一原始信号(001),所述第一原始信号(001)为固定长度的I/Q复信号;The first data acquisition step (100) periodically acquires the first original signal (001), which is a fixed-length I/Q complex signal;
    所述第一图像(011)由所述第一原始信号(001)序列排列而成,所述二维数组为MxN形式;其中,M、N为自然数;The first image (011) is arranged in sequence from the first original signal (001), and the two-dimensional array is in the form of MxN; where M and N are natural numbers;
    所述第三识别结果数据(3333)包括用于触发相关机构的开关量或信号。The third identification result data (3333) includes a switch value or signal used to trigger the relevant mechanism.
  3. 如权利要求2的所述动作识别方法,还包括:第二预处理步骤(200)、第三中间处理步骤(300);The action recognition method of claim 2, further comprising: a second preprocessing step (200) and a third intermediate processing step (300);
    所述第二预处理步骤(200)包括降噪处理步骤,所述第三中间处理步骤(300)包括快速傅里叶变换步骤FFT和/或短时傅里叶变换STFT;其中,所述STFT在FFT之前增加一窗函数,所述窗函数包括hanning窗;The second preprocessing step (200) includes a noise reduction processing step, and the third intermediate processing step (300) includes a fast Fourier transform step FFT and/or a short time Fourier transform STFT; wherein, the STFT Add a window function before FFT, and the window function includes a hanning window;
    降噪处理所述第一图像(011)得到第二降噪后图像(111);所述第二降 噪后图像(111)替代所述第一图像(011)执行所述第四动作识别步骤(400);The first image (011) is processed by noise reduction to obtain a second noise-reduced image (111); the second noise-reduced image is The post-noise image (111) replaces the first image (011) to perform the fourth action recognition step (400);
    所述第三中间处理步骤(300)获取所述第二降噪后图像(111)的信噪比SNR,并形成第二图像(222)用于所述第二类特征数据(2222)的提取。The third intermediate processing step (300) obtains the signal-to-noise ratio SNR of the second denoised image (111), and forms a second image (222) for extraction of the second type of feature data (2222) .
  4. 如权利要求3的所述动作识别方法,其中:The action recognition method of claim 3, wherein:
    所述第一类特征数据(1111)是一个长度为N的第一特征向量,所述第二类特征数据(2222)是一个长度为L的第二特征向量,所述动作识别数据集是一个长度为(N+L)的第三特征向量;The first type of feature data (1111) is a first feature vector with a length of N, the second type of feature data (2222) is a second feature vector with a length of L, and the action recognition data set is a The third eigenvector of length (N+L);
    以预设的卷积核对所述第三特征向量进行1x1卷积,获取所述动作识别数据集各类数据的激活值,归一化处理所述激活值并得到所述动作识别数据集各类数据取值的概率分布;其中,概率最大的类别对应于所述第三识别结果数据(3333)。Perform 1x1 convolution on the third feature vector with a preset convolution kernel to obtain activation values of various types of data in the action recognition data set, normalize the activation values and obtain various types of data in the action recognition data set. Probability distribution of data values; wherein the category with the highest probability corresponds to the third recognition result data (3333).
  5. 如权利要求4的所述动作识别方法,其中:The action recognition method of claim 4, wherein:
    所述卷积核为3;所述归一化过程采用softmax方法获得。The convolution kernel is 3; the normalization process is obtained by using the softmax method.
  6. 如权利要求3、4或5的任一所述动作识别方法,其中:The action recognition method according to any one of claims 3, 4 or 5, wherein:
    所述第一原始信号(001)的脉冲重复间隔PRI固定,所述第一原始信号(001)为超宽带UWB雷达的回波,所述雷达的工作频率介于6.4GHZ与8GHZ之间或所述雷达的雷达波的波长介于范围3.75CM与4.69CM之间。The pulse repetition interval PRI of the first original signal (001) is fixed, the first original signal (001) is the echo of an ultra-wideband UWB radar, and the operating frequency of the radar is between 6.4GHZ and 8GHZ or the The wavelength of radar waves is between 3.75CM and 4.69CM.
  7. 如权利要求1、2、3、4或5的任一所述动作识别方法,还包括:The action recognition method according to any one of claims 1, 2, 3, 4 or 5, further comprising:
    防抖输出步骤(500)、模型训练步骤(600);其中,Anti-shake output step (500), model training step (600); among them,
    所述防抖输出步骤(500)连续R次获取所述第三识别结果数据(3333),R为大于或等于2的自然数;若所述R次获取的所述第三识别结果数据(3333)均为所述第一有效数据,则判定所述第一原始信号(001)为有效信号;The anti-shake output step (500) continuously acquires the third recognition result data (3333) R times, where R is a natural number greater than or equal to 2; if the third recognition result data (3333) is acquired R times, are the first valid data, then it is determined that the first original signal (001) is a valid signal;
    所述模型训练步骤(600)包括降噪模型训练步骤(602)、识别模型训练步骤(604)和/或相关模型训练步骤(60M);其中,所述降噪模型训练步骤(602)用于降噪处理步骤;所述降噪模型训练步骤(602)归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由所述归一化样本X-Normal与所述噪声样本X-Noise构 造训练样本对<X-Normal,X-Noise>;输入所述X-Noise到自编码器得到解码输出Y;The model training step (600) includes a noise reduction model training step (602), a recognition model training step (604) and/or a related model training step (60M); wherein the noise reduction model training step (602) is used to Noise reduction processing step; the noise reduction model training step (602) normalizes the training sample X to obtain the normalized sample X-Normal, and superimposes random white noise on the X-Normal to obtain the noise sample X-Noise; and then The normalized sample X-Normal and the noise sample X-Noise are constructed Create a training sample pair <X-Normal, X-Noise>; input the X-Noise to the autoencoder to obtain the decoded output Y;
    获取损失函数Loss=MSE(Y,X-Normal);迭代优化编码与解码参数直至所述损失函数Loss达到目标值;其中,在前向推理阶段,亦对所述第一图像(011)进行所述归一化处理并将其归一化结果输入所述自编码器。Obtain the loss function Loss=MSE(Y, The normalization process is described above and the normalized result is input into the autoencoder.
  8. 如权利要求7的所述动作识别方法,其中:The action recognition method of claim 7, wherein:
    使用合成少数类过采样方法SMOTE对所述训练样本进行扩增;Use the synthetic minority class oversampling method SMOTE to amplify the training samples;
    所述识别模型训练步骤(604)对所述第一图像(011)或所述第二降噪后图像(111)以及所述第二图像(222)分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P;使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;再使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。The recognition model training step (604) performs normalization processing on the first image (011) or the second denoised image (111) and the second image (222) respectively; and then normalizes the The transformed samples are input into the action recognition model to obtain the predicted probability distribution P; Focal Loss is used as the loss function to calculate the error Loss between the predicted value and the real label; then the gradient descent method is used to iteratively optimize the model parameters until the Loss drops to the predetermined value. Set the accuracy range.
  9. 一种动作识别装置,包括:An action recognition device, including:
    第一数据采集模块(710)、第三动作识别模块(730);其中:The first data collection module (710), the third action recognition module (730); among which:
    所述第一数据采集模块(710)扫描并获取所述第一原始信号(001),转换所述第一原始信号(001)为二维数组形式并形成第一图像(011),所述第一图像(011)为距离-时间图像;The first data acquisition module (710) scans and acquires the first original signal (001), converts the first original signal (001) into a two-dimensional array form and forms a first image (011). One image (011) is a distance-time image;
    所述第三动作识别模块(730)包括:第一类特征处理模块(771)、第二类特征处理模块(772)、第三类别判定输出单元(773);其中,The third action recognition module (730) includes: a first type feature processing module (771), a second type feature processing module (772), and a third type judgment output unit (773); wherein,
    所述第一类特征处理模块(771):提取所述第一图像(011)时间维度的第一类特征数据(1111);所述第二类特征处理模块(772):提取所述第一图像(011)距离和速度维度的第二类特征数据(2222);所述第三类别判定输出单元(773):合成所述第一类特征数据(1111)与所述第二类特征数据(2222)后得到动作识别数据集;The first type of feature processing module (771): extracts the first type of feature data (1111) in the time dimension of the first image (011); the second type of feature processing module (772): extracts the first type of feature data (1111) The second type of feature data (2222) in the distance and speed dimensions of the image (011); the third category determination output unit (773): synthesizes the first type of feature data (1111) and the second type of feature data ( 2222) and then obtain the action recognition data set;
    所述第三类别判定输出单元(773)分类识别所述动作识别数据集,得到第三识别结果数据(3333);其中,所述动作识别数据集至少被划分为第一有效数据集、第二噪声数据集、第三干扰数据集三个类别。The third category judgment output unit (773) classifies and identifies the action recognition data set to obtain third recognition result data (3333); wherein the action recognition data set is divided into at least a first valid data set, a second valid data set, and a second effective data set. There are three categories: noise data set and third interference data set.
  10. 如权利要求9的所述动作识别装置,其中: The action recognition device of claim 9, wherein:
    所述第一数据采集模块(710)周期性获取所述第一原始信号(001),所述第一原始信号(001)为固定长度的I/Q复信号;The first data acquisition module (710) periodically acquires the first original signal (001), which is a fixed-length I/Q complex signal;
    所述第一图像(011)由所述第一原始信号(001)序列排列而成,所述二维数组为MxN形式;其中,M、N为自然数;The first image (011) is arranged in sequence from the first original signal (001), and the two-dimensional array is in the form of MxN; where M and N are natural numbers;
    所述第三识别结果数据(3333)包括用于触发相关机构的开关量或信号。The third identification result data (3333) includes a switch value or signal used to trigger the relevant mechanism.
  11. 如权利要求10的所述动作识别装置,还包括:第二数据处理模块(720);The action recognition device of claim 10, further comprising: a second data processing module (720);
    所述第二数据处理模块(720)包括降噪处理模块、中间处理模块;所述中间处理模块完成快速傅里叶变换FFT和/或短时傅里叶变换STFT;其中,所述STFT在FFT之前增加一窗函数,所述窗函数包括hanning窗;The second data processing module (720) includes a noise reduction processing module and an intermediate processing module; the intermediate processing module completes the fast Fourier transform FFT and/or the short time Fourier transform STFT; wherein the STFT is in the FFT Previously, a window function was added, and the window function includes a hanning window;
    所述降噪处理模块处理所述第一图像(011)得到第二降噪后图像(111);所述第二降噪后图像(111)替代所述第一图像(011)参与所述第三动作识别模块(730)的处理;The noise reduction processing module processes the first image (011) to obtain a second noise-reduced image (111); the second noise-reduced image (111) replaces the first image (011) to participate in the third image (011). Processing of three action recognition modules (730);
    所述中间处理模块获取所述第二降噪后图像(111)的信噪比SNR,并形成第二图像(222)用于所述第二类特征数据(2222)的提取。The intermediate processing module obtains the signal-to-noise ratio SNR of the second denoised image (111), and forms a second image (222) for extraction of the second type of feature data (2222).
  12. 如权利要求11的所述动作识别装置,其中:The action recognition device of claim 11, wherein:
    所述第一类特征数据(1111)是一个长度为N的第一特征向量,所述第二类特征数据(2222)是一个长度为L的第二特征向量,所述动作识别数据集是一个长度为(N+L)的第三特征向量;The first type of feature data (1111) is a first feature vector with a length of N, the second type of feature data (2222) is a second feature vector with a length of L, and the action recognition data set is a The third eigenvector of length (N+L);
    以预设的卷积核对所述第三特征向量进行1x1卷积,获取所述动作识别数据集各类数据的激活值,归一化处理所述激活值并得到所述动作识别数据集各类数据取值的概率分布;其中,概率最大的类别对应于所述第三识别结果数据(3333),所述卷积核包括自然数3,所述归一化方法包括softmax。Perform 1x1 convolution on the third feature vector with a preset convolution kernel to obtain activation values of various types of data in the action recognition data set, normalize the activation values and obtain various types of data in the action recognition data set. Probability distribution of data values; wherein, the category with the highest probability corresponds to the third recognition result data (3333), the convolution kernel includes a natural number 3, and the normalization method includes softmax.
  13. 如权利要求9、10、11或12的任一所述动作识别装置,其中:The action recognition device according to any one of claims 9, 10, 11 or 12, wherein:
    所述第一原始信号(001)的脉冲重复间隔PRI固定,所述第一原始信号(001)为超宽带UWB雷达的回波,所述雷达的工作频率介于6.4GHZ与8GHZ之间或所述雷达的雷达波的波长介于范围3.75CM与4.69CM之间。The pulse repetition interval PRI of the first original signal (001) is fixed, the first original signal (001) is the echo of an ultra-wideband UWB radar, and the operating frequency of the radar is between 6.4GHZ and 8GHZ or the The wavelength of radar waves is between 3.75CM and 4.69CM.
  14. 如权利要求9、10、11或12的任一所述动作识别装置,还包括: The action recognition device according to any one of claims 9, 10, 11 or 12, further comprising:
    第四控制输出模块(740);其中,The fourth control output module (740); wherein,
    所述第四控制输出模块(740)连续R次获取所述第三识别结果数据(3333),R为大于或等于2的自然数;若所述R次获取的所述第三识别结果数据(3333)均为所述第一有效数据,则判定所述第一原始信号(001)为有效信号;The fourth control output module (740) continuously acquires the third recognition result data (3333) R times, where R is a natural number greater than or equal to 2; if the third recognition result data (3333) acquired R times ) are the first valid data, then it is determined that the first original signal (001) is a valid signal;
    所述第二数据处理模块(720)还包括降噪模型训练模块(721)、识别模型训练模块(722)和/或相关模型训练模块(72M);其中,所述降噪模型训练模块(721)用于降噪处理的优化;所述降噪模型训练模块(721)归一化处理训练样本X得到归一化样本X-Normal,并对X-Normal叠加随机白噪声得到噪声样本X-Noise;再由所述归一化样本X-Normal与所述噪声样本X-Noise构造训练样本对<X-Normal,X-Noise>;输入所述X-Noise到自编码器得到解码输出Y;The second data processing module (720) also includes a noise reduction model training module (721), a recognition model training module (722) and/or a related model training module (72M); wherein, the noise reduction model training module (721 ) is used for the optimization of noise reduction processing; the noise reduction model training module (721) normalizes the training sample X to obtain the normalized sample X-Normal, and superimposes random white noise on the X-Normal to obtain the noise sample X-Noise ; Then construct a training sample pair <X-Normal, X-Noise> from the normalized sample X-Normal and the noise sample X-Noise; input the X-Noise to the autoencoder to obtain the decoding output Y;
    所述降噪模型训练模块(721)获取损失函数Loss=MSE(Y,X-Normal)并迭代优化编码与解码参数直至所述损失函数Loss达到目标值;其中,在前向推理阶段,亦对所述第一图像(011)进行所述归一化处理并将其归一化结果输入所述自编码器;所述第二数据处理模块(720)使用合成少数类过采样方法SMOTE对所述训练样本进行扩增;The noise reduction model training module (721) obtains the loss function Loss = MSE (Y, X-Normal) and iteratively optimizes the encoding and decoding parameters until the loss function Loss reaches the target value; wherein, in the forward inference stage, the The first image (011) performs the normalization process and inputs the normalized result into the autoencoder; the second data processing module (720) uses the synthetic minority class oversampling method SMOTE to Training samples are amplified;
    所述识别模型训练模块(722)对所述第一图像(011)或所述第二降噪后图像(111)以及所述第二图像(222)分别进行归一化处理;再将归一化后的样本输入动作识别模型中,得到预测概率分布P并使用Focal Loss作为损失函数,计算预测值与真实标签之间的误差Loss;再使用梯度下降法迭代优化模型参数,直至Loss降到预设的精度范围。The recognition model training module (722) performs normalization processing on the first image (011) or the second denoised image (111) and the second image (222) respectively; and then normalizes the The transformed samples are input into the action recognition model to obtain the predicted probability distribution P and use Focal Loss as the loss function to calculate the error Loss between the predicted value and the real label; then use the gradient descent method to iteratively optimize the model parameters until the Loss drops to the predetermined value. Set the accuracy range.
  15. 一种计算机存储介质,包括:A computer storage medium consisting of:
    用于存储计算机程序的存储介质本体;The storage medium body used to store computer programs;
    所述计算机程序在被微处理器执行时,实现如权利要求1至8所述的任一动作识别方法。When executed by a microprocessor, the computer program implements any action recognition method as described in claims 1 to 8.
  16. 一种传感器,包括:A sensor including:
    如权利要求9至14的任一所述动作识别装置(902); The action recognition device (902) according to any one of claims 9 to 14;
    和/或如权利要求15的所述存储介质(903)。and/or the storage medium (903) of claim 15.
  17. 一种车辆,包括:A vehicle including:
    如权利要求9至14的任一所述动作识别装置(902);The action recognition device (902) according to any one of claims 9 to 14;
    和/或如权利要求15的所述存储介质(903);And/or the storage medium (903) of claim 15;
    和/或如权利要求16的任一所述传感器(905)。 and/or a sensor (905) as claimed in any one of claims 16 to 16.
PCT/CN2023/108545 2022-07-21 2023-07-21 Action recognition method and apparatus, and storage medium, sensor and vehicle WO2024017363A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210857053.XA CN115273229A (en) 2022-07-21 2022-07-21 Action recognition method and device, storage medium, sensor and vehicle
CN202210857053.X 2022-07-21

Publications (1)

Publication Number Publication Date
WO2024017363A1 true WO2024017363A1 (en) 2024-01-25

Family

ID=83767185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/108545 WO2024017363A1 (en) 2022-07-21 2023-07-21 Action recognition method and apparatus, and storage medium, sensor and vehicle

Country Status (2)

Country Link
CN (1) CN115273229A (en)
WO (1) WO2024017363A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273229A (en) * 2022-07-21 2022-11-01 联合汽车电子有限公司 Action recognition method and device, storage medium, sensor and vehicle

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009062A1 (en) * 2013-07-02 2015-01-08 Brose Fahrzeugteile Gmbh & Co. Kommanditgesellschaft, Hallstadt Object detection device for a vehicle and vehicle having the object detection device
CN108229404A (en) * 2018-01-09 2018-06-29 东南大学 A kind of radar echo signal target identification method based on deep learning
CN111105068A (en) * 2019-11-01 2020-05-05 复旦大学 Numerical value mode correction method based on sequence regression learning
CN111505632A (en) * 2020-06-08 2020-08-07 北京富奥星电子技术有限公司 Ultra-wideband radar action attitude identification method based on power spectrum and Doppler characteristics
CN112389370A (en) * 2019-08-15 2021-02-23 大众汽车股份公司 Method for operating a door or a compartment door in a vehicle, authentication element and vehicle
CN113850204A (en) * 2021-09-28 2021-12-28 太原理工大学 Human body action recognition method based on deep learning and ultra-wideband radar
EP3978949A2 (en) * 2020-10-02 2022-04-06 Origin Wireless, Inc. System and method for wireless motion monitoring
CN115273229A (en) * 2022-07-21 2022-11-01 联合汽车电子有限公司 Action recognition method and device, storage medium, sensor and vehicle

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009062A1 (en) * 2013-07-02 2015-01-08 Brose Fahrzeugteile Gmbh & Co. Kommanditgesellschaft, Hallstadt Object detection device for a vehicle and vehicle having the object detection device
CN108229404A (en) * 2018-01-09 2018-06-29 东南大学 A kind of radar echo signal target identification method based on deep learning
CN112389370A (en) * 2019-08-15 2021-02-23 大众汽车股份公司 Method for operating a door or a compartment door in a vehicle, authentication element and vehicle
CN111105068A (en) * 2019-11-01 2020-05-05 复旦大学 Numerical value mode correction method based on sequence regression learning
CN111505632A (en) * 2020-06-08 2020-08-07 北京富奥星电子技术有限公司 Ultra-wideband radar action attitude identification method based on power spectrum and Doppler characteristics
EP3978949A2 (en) * 2020-10-02 2022-04-06 Origin Wireless, Inc. System and method for wireless motion monitoring
CN113850204A (en) * 2021-09-28 2021-12-28 太原理工大学 Human body action recognition method based on deep learning and ultra-wideband radar
CN115273229A (en) * 2022-07-21 2022-11-01 联合汽车电子有限公司 Action recognition method and device, storage medium, sensor and vehicle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DING, WEN ET AL.: "Radar-Based Human Activity Recognition Using Hybrid Neural Network Model With Multidomain Fusion", IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, vol. 57, no. 5, 24 March 2021 (2021-03-24), XP011882194, DOI: 10.1109/TAES.2021.3068436 *
XINYU LI, YUAN HE, XIAOJUN JING: "A survey of deep learning-based human activity recognition in radar", REMOTE SENSING, MOLECULAR DIVERSITY PRESERVATION INTERNATIONAL (MDPI), CH, vol. 11, no. 9, 30 April 2019 (2019-04-30), CH , pages 1068, XP009527370, ISSN: 2072-4292, DOI: http://www.mdpi.com/2072-4292/11/9/1068 *

Also Published As

Publication number Publication date
CN115273229A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN110348288B (en) Gesture recognition method based on 77GHz millimeter wave radar signal
WO2024017363A1 (en) Action recognition method and apparatus, and storage medium, sensor and vehicle
DE102019106204A1 (en) Decompression of ultrasonic signals that have been compressed using signal object classes
CN112819732B (en) B-scan image denoising method for ground penetrating radar
CN109917347A (en) A kind of radar pedestrian detection method based on the sparse reconstruct of time-frequency domain
CN111880157B (en) Method and system for detecting target in radar image
CN109670548A (en) HAR algorithm is inputted based on the more sizes for improving LSTM-CNN
CN113466852B (en) Millimeter wave radar dynamic gesture recognition method applied to random interference scene
CN110852158B (en) Radar human motion state classification algorithm and system based on model fusion
CN115393597B (en) Semantic segmentation method and device based on pulse neural network and laser radar point cloud
DE102019009243B3 (en) Sensor system with transmission of the ultrasonic signal to the computer system by means of approximating signal object compression and decompression
CN109979161A (en) A kind of tumble detection method for human body based on convolution loop neural network
WO2023029390A1 (en) Millimeter wave radar-based gesture detection and recognition method
US11783538B2 (en) Three dimensional image generating method and apparatus
Fortuna-Cervantes et al. Object detection in aerial navigation using wavelet transform and convolutional neural networks: A first approach
CN113112583A (en) 3D human body reconstruction method based on infrared thermal imaging
CN115859078A (en) Millimeter wave radar fall detection method based on improved Transformer
Tang et al. Transound: Hyper-head attention transformer for birds sound recognition
CN112764002B (en) FMCW radar gesture recognition method based on deformable convolution
CN111461267B (en) Gesture recognition method based on RFID technology
CN116304888A (en) Continuous human activity perception recognition method and system based on channel state information
Eisele et al. Convolutional neural network with data augmentation for object classification in automotive ultrasonic sensing
CN115937977A (en) Few-sample human body action recognition method based on multi-dimensional feature fusion
CN113421281A (en) Pedestrian micromotion part separation method based on segmentation theory
CN114627183A (en) Laser point cloud 3D target detection method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23842425

Country of ref document: EP

Kind code of ref document: A1