CN116580460B

CN116580460B - End-to-end neural network human body behavior recognition method and device based on millimeter wave radar

Info

Publication number: CN116580460B
Application number: CN202310836773.2A
Authority: CN
Inventors: 徐刚; 杜昊泽; 林佳璇; 张慧; 洪伟; 郭坤鹏; 周振超; 冯友怀
Original assignee: Southeast University; Nanjing Hawkeye Electronic Technology Co Ltd
Current assignee: Southeast University; Nanjing Hawkeye Electronic Technology Co Ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-24
Anticipated expiration: 2043-07-10
Also published as: CN116580460A

Abstract

The application discloses an end-to-end neural network human behavior recognition method and device based on millimeter wave radar. The method comprises the following steps: acquiring original echo data of human behaviors acquired by the radar; performing a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency transformation network comprises at least one complex full-connection layer; and taking the target time-frequency information as input of a classification network to obtain a human behavior classification result output by the classification network. The application replaces STFT transformation by complex time-frequency transformation network, transforms the target signal into time-frequency domain, then uses two-dimensional convolution layer and two-way long-short-term memory network to process, extracts the characteristic information of the target time-frequency information, and realizes classification and identification of human body behavior.

Description

End-to-end neural network human body behavior recognition method and device based on millimeter wave radar

Technical Field

The application relates to the technical field of machine learning, in particular to an end-to-end neural network human behavior recognition method and device based on millimeter wave radar.

Background

With the continuous development of the internet of things technology, human body behavior recognition technology based on various sensors (optical cameras, infrared, radar, wi-Fi and the like) is continuously and deeply applied to multiple scenes such as smart home, security inspection, health monitoring of the elderly and the like. The current human behavior recognition technology based on millimeter wave radar mainly uses a short-Time Fourier transform (STFT) method to preprocess radar original echo data, signals are represented in a Time-frequency domain, and a Time-Doppler Map (TDM) of a target is obtained after a signal module value is taken; or using a range-Doppler transformation method to sequentially perform a range-dimension fast Fourier transform (Fast Fourier Transform, FFT) and a Doppler-dimension FFT on the original echo data to obtain a range-Doppler diagram (Range Doppler Map, RDM) of the target, so as to convert the radar signal classification problem into an image classification problem. And then extracting characteristic information of the target TDM or RDM by using a two-dimensional deep convolutional neural network (2-Dimension Convolution Neutral Network, 2D-CNN), thereby completing classification of different human behavior actions.

The current mainstream method for extracting features and classifying targets by using the neural network is not an end-to-end neural network, and still depends on the manual preprocessing process of the original data, including STFT, distance-Doppler conversion and the like, so that the calculation flow is long and the calculation efficiency is low. The classification effect of the neural network which takes the target TDM or RDM as a sample input is limited by the STFT method and the distance-Doppler conversion, and the characteristic information of the target cannot be learned to the maximum extent. TDM and RDM are the result of signal taking the module value, neglect the real-imaginary part difference of the target signal, and the existing neural network is basically a real neural network, and the phase change information of the target in the motion process is not well utilized.

Disclosure of Invention

The application provides an end-to-end neural network human behavior recognition method and device based on millimeter wave radar, which can effectively solve the problem that the classification accuracy is low due to the fact that the real-imaginary part difference of a target signal is ignored by adopting the fast Fourier transform on radar data by adopting the neural network at present.

According to an aspect of the present application, there is provided an end-to-end neural network human body behavior recognition method based on millimeter wave radar, the method comprising: acquiring original echo data of human behaviors acquired by the radar; performing a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency transformation network comprises at least one complex full-connection layer; and taking the target time-frequency information as input of a classification network to obtain a human behavior classification result output by the classification network.

Further, the acquiring the original echo data of the human body behavior acquired by the radar includes: and carrying out sliding window type data processing on the original echo data to obtain a plurality of window data.

Further, the obtaining the target time-frequency information by the complex time-frequency transformation network according to the original echo data includes:

determining the complex time-frequency transformation network, wherein the complex time-frequency transformation network expression is as follows:

, wherein ,/>All are the values of sampling points and +.>For angular frequency +.>For the length of the input signal>Output of plural full connection layers, +.>Input of target echo signal, i.e. a plurality of full connection layers, ">For rectangular window function, ++>The weight coefficient of the plurality of full-connection layers is set to 0.

Further, the complex time-frequency transformation network comprises two complex full-connection layers, and the two complex full-connection layers in the complex time-frequency transformation network respectively process the distance and Doppler dimensions of the original echo data.

Further, the step of using the target time-frequency information as an input of a classification network to obtain a classification result of human behavior output by the classification network includes: extracting characteristics of the target time-frequency information through the classification network to obtain characteristic information; and outputting the classification result according to the characteristic information.

Further, the classification network includes a convolution block, a two-way long and short term memory network, and a full connection layer.

Further, the convolution block includes a two-dimensional convolution layer, a batch normalization layer, a linear activation layer, and a max pooling layer.

Further, the fully connected layer comprises a fully connected layer and a logistic regression layer.

Further, the original echo data comprises first dimension data and second dimension data, wherein fast time sampling obtains the first dimension data and slow time sampling obtains the second dimension data.

Further, the radar includes a first radar and a second radar, the first radar faces the front of the human body and emits radar waves, the second radar faces the back of the human body and emits radar waves, and the acquiring the original echo data of the human body behavior acquired by the radar further includes: and carrying out sliding window type data processing on the original echo data of the first radar and the original echo data of the second radar to obtain a plurality of window data.

According to an aspect of the application, there is provided an end-to-end neural network human body behavior recognition device based on millimeter wave radar, the device comprising a data acquisition unit for acquiring original echo data of human body behaviors acquired by the radar; the data conversion unit is used for carrying out a complex time-frequency conversion network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency conversion network comprises at least one complex full-connection layer; and the data classification unit is used for taking the target time-frequency information as the input of a classification network to obtain the human behavior classification result output by the classification network.

The application has the beneficial effects that the complex time-frequency conversion network replaces STFT conversion, the target signal is converted into the time-frequency domain, and then the two-dimensional convolution layer and the two-way long-short-term memory network are utilized for processing, so that the characteristic information of the target time-frequency information is extracted, and the classification and identification of the human body behaviors are realized. Compared with the STFT conversion, the complex time-frequency conversion network is provided with two complex full-connection layers, and the distance and Doppler dimensions are respectively processed to enable the processed target time-frequency information to have better characteristic extraction effect, so that the accuracy of the classification result is improved.

Drawings

The technical solution and other advantageous effects of the present application will be made apparent by the following detailed description of the specific embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a flowchart of steps of an end-to-end neural network human behavior recognition method based on millimeter wave radar according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a single radar-collected human behavior action provided by an embodiment of the present application.

Fig. 3 is a schematic diagram of human behavior action collected by two radars according to an embodiment of the present application.

Fig. 4 is a training set loss function curve according to an embodiment of the present application.

Fig. 5 is a verification set accuracy curve provided by an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an end-to-end neural network human behavior recognition method based on millimeter wave radar according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically connected, electrically connected or can be communicated with each other; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.

As shown in fig. 1, a flowchart of steps of a method for identifying human body behaviors based on an end-to-end neural network of a millimeter wave radar according to an embodiment of the present application includes:

step S110: and acquiring the original echo data of the human body behaviors acquired by the radar.

Illustratively, the acquiring the raw echo data of the human body behavior acquired by the radar includes: and carrying out sliding window type data processing on the original echo data to obtain a plurality of window data. The original echo data comprises first dimension data and second dimension data, wherein fast time sampling obtains the first dimension data and slow time sampling obtains the second dimension data.

Specifically, the present embodiment detects a human body (i.e., a target mentioned below) by using a frequency-modulated continuous wave millimeter wave radar, and the waveform parameters of the radar are configured as shown in table 1:

table 1: radar waveform parameter configuration table

The five human behavior actions are classified and identified according to the embodiment, and the five human behavior actions are respectively as follows: walking, running, boxing, squatting and stepping in place, and the action schematic is shown in figure 2. The raw echo data of the target received by the radar is two-dimensional matrix data, wherein the first dimension is fast time (i.e., time within a pulse) sampling and the second dimension is slow time (i.e., time between pulses) sampling. The radar collects the human body for a certain time, and in the embodiment, each window data is divided by taking 3s as a time period, so that the window data is input into the neural network for the next processing.

Step S120: and carrying out a complex time-frequency conversion network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency conversion network comprises at least one complex full-connection layer.

Illustratively, the performing a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, where the complex time-frequency transformation network includes at least one complex fully-connected layer including:

the complex time-frequency transformation network expression is:

The complex time-frequency transformation network comprises two complex fully connected layers, which process the distance and doppler dimensions of the original echo data, respectively.

It should be noted that "at least one" as used herein with a list of items means that different combinations of one or more of the listed items may be used and only one of each item in the list may be required. For example, "at least one of item a, item B, and item C" may include, but is not limited to, item a or item a and item B. The example may also include item a, item B, and item C, or item B and item C. In other examples, "at least one" may be, for example, but not limited to, two items a, one item B and ten items C, four items B and seven items C, or some other suitable combination.

Specifically, in the network training initialization stage, the complex full-connection layer weight coefficients are initialized as follows:

firstly, carrying out sliding window processing on the acquired original echo data of the radar so as to more finely represent the time-frequency information of the target in different time segments. And then sending signals in each time window of a single time period (3 s) into a complex time-frequency conversion network, wherein the complex time-frequency conversion network consists of two complex full-connection layers, respectively processing the distance and Doppler dimensions of the original echo data, and then splicing according to time sequence to obtain an output result of the complex time-frequency conversion network.

And taking the output result of the complex time-frequency conversion network as the input of the downstream classification recognition network, carrying out gradient back propagation according to the classification result, and dynamically adjusting the weight coefficient of the complex full-connection layer, so that the complex time-frequency conversion network converts the target time-domain signal into the target time-frequency information of which the downstream classification network is easier to extract the characteristics as much as possible, and realizing the end-to-end neural network training.

Step S130: and taking the target time-frequency information as input of a classification network to obtain a human behavior classification result output by the classification network.

Illustratively, the obtaining the human behavior classification result output by the classification network by taking the target time-frequency information as the input of the classification network includes: extracting characteristics of the target time-frequency information through the classification network to obtain characteristic information; and outputting the classification result according to the characteristic information.

Illustratively, the classification network includes a convolution block, a two-way long and short term memory network, and a fully connected layer. The convolution block comprises a two-dimensional convolution layer, a batch normalization layer, a linear activation layer and a maximum pooling layer. The full connection layer comprises a full connection layer and a logistic regression layer.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

And (3) carrying out experiments according to an algorithm flow, training the proposed neural network by using radar original echo data, and evaluating the accuracy and generalization performance of the neural network. The training set loss function curve is obtained, and as shown in fig. 3 and 4, the classification recognition accuracy is about 95%, which shows that the neural network has good characteristic extraction and generalization capability on the radar human behavior data set.

In the above embodiments, a single radar is used to perform data collection on human body behaviors, in some embodiments, two radars may be used to perform data collection on human body behaviors, specifically referring to fig. 3, where a first radar is denoted as radar a in fig. 3, a second radar is denoted as radar B in fig. 3, the first radar faces the front of the human body and emits radar waves, and the second radar faces the back of the human body and emits radar waves, that is, the first radar is disposed on the front of the human body, and the second radar is disposed on the back of the human body.

The original echo data received by the two radars (namely the first radar and the second radar) are sent into a network as two-way data for processing. Compared with the single radar detection mode in fig. 2, the echo data of the human body target at different visual angles can be obtained by adopting two radars, so that richer micro-motion characteristic information can be extracted from the echo data, and a data basis is provided for improving the recognition accuracy of the neural network.

Unlike the single radar embodiment, the two radar channels are adopted, the classification network is composed of two channels, the input is respectively the output data of the radar A echo data and the radar B echo data through the complex time-frequency conversion network, the two channels have the same network topology structure, and each channel is composed of three convolution blocks, a Bi-directional long-short-term memory network (Bi-LSTM) and a full-connection layer. The two-channel features were then combined by the cascade layer and the full-junction layer, and finally the logistic regression layer (Softmax) was used to give the classification result. As shown in fig. 5, the device for recognizing human body behaviors based on the millimeter wave radar in the embodiment of the application includes a data acquisition unit 10, a data conversion unit 20 and a data classification unit 30.

The data acquisition unit 10 is used for acquiring the original echo data of the human body behaviors acquired by the radar. Illustratively, the acquiring the raw echo data of the human body behavior acquired by the radar includes: and carrying out sliding window type data processing on the original echo data to obtain a plurality of window data. The original echo data comprises first dimension data and second dimension data, wherein fast time sampling obtains the first dimension data and slow time sampling obtains the second dimension data.

The data conversion unit 20 is configured to perform a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, where the complex time-frequency transformation network includes at least one complex full-connection layer. Illustratively, the performing a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, where the complex time-frequency transformation network includes at least one complex fully-connected layer including:

the complex time-frequency transformation network expression is:

firstly, carrying out sliding window processing on the acquired original echo data of the radar so as to more finely represent time-frequency information of targets in different time slices. Then the signals in each time window of a single time period (3 s) are sent into a complex time-frequency conversion network, wherein the complex time-frequency conversion networkThe complex is composed of two complex full-connection layers, the distance and Doppler dimensions of the original echo data are respectively processed, and then the complex full-connection layers are spliced in time sequence to obtain the output result of the complex time-frequency conversion network.

The data classification unit 30 is configured to take the target time-frequency information as an input of a classification network, and obtain a classification result of human behavior output by the classification network. Illustratively, the obtaining the human behavior classification result output by the classification network by taking the target time-frequency information as the input of the classification network includes: extracting characteristics of the target time-frequency information through the classification network to obtain characteristic information; and outputting the classification result according to the characteristic information.

As shown in fig. 6, a schematic structural diagram of an electronic device according to the present application is shown, specifically:

the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. It will be appreciated by those skilled in the art that the device structure shown in fig. 6 is not limiting of the device, and that the electronic device may also include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the processor 401 is a control center of the device, connects various parts of the entire device using various interfaces and lines, and performs various functions of the device and processes data by running or executing software programs and/or unit modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; the processor 401 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and preferably, the processor 401 may integrate an application processor, which primarily handles operating systems, user interfaces, application programs, and the like, with a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The electronic device may further comprise a power supply 403 for supplying power to the respective components, and preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of charge, discharge, power consumption management and the like are managed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 404 and an output unit 405, the input unit 404 being operable to receive input digital or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. Specifically, in the present application, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions, as follows:

acquiring original echo data of human behaviors acquired by the radar;

performing a complex time-frequency transformation network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency transformation network comprises at least one complex full-connection layer;

and taking the target time-frequency information as input of a classification network to obtain a human behavior classification result output by the classification network.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods described above may be performed by instructions, or by controlling associated hardware, which may be stored on a computer readable storage medium and loaded and executed by the processor 401.

To this end, the present application provides a computer-readable storage medium, which may include: read Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like. On which computer instructions are stored that are loaded by the processor 401 to perform the steps in any of the millimeter wave radar based end-to-end neural network human behavior recognition methods provided by the present application. For example, the computer instructions, when executed by the processor 401, perform the following functions:

acquiring original echo data of human behaviors acquired by the radar;

In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of one embodiment that are not described in detail in the foregoing embodiments may be referred to in the foregoing detailed description of other embodiments, which are not described herein again.

In the implementation, each unit or structure may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit or structure may be referred to the foregoing embodiments and will not be repeated herein.

In summary, although the present application has been described in terms of the preferred embodiments, the preferred embodiments are not limited to the above embodiments, and various modifications and changes can be made by one skilled in the art without departing from the spirit and scope of the application, and the scope of the application is defined by the appended claims.

Claims

1. The end-to-end neural network human behavior recognition method based on the millimeter wave radar is characterized by comprising the following steps of:

acquiring original echo data of human body behaviors acquired by the radar, wherein the data acquisition is carried out on the human body behaviors by adopting a first radar and a second radar;

the target time-frequency information is used as input of a classification network to obtain a human behavior classification result output by the classification network, the classification network consists of two channels, the input is output data of a complex time-frequency conversion network of first radar echo data and second radar echo data, the two channels have the same network topological structure, and each channel consists of three convolution blocks, a two-way long-short-term memory network and a full-connection layer;

the step of obtaining the target time-frequency information by the complex time-frequency conversion network according to the original echo data comprises the following steps:

，

wherein ,all are the values of sampling points and +.>For angular frequency +.>For the length of the input signal>Output of plural full connection layers, +.>Input of target echo signal, i.e. a plurality of full connection layers, ">For rectangular window function, ++>Is the weight coefficient of the complex number full-connection layer, the bias coefficient of the complex number full-connection layer is set to 0,。

2. the method for recognizing human body behaviors based on the end-to-end neural network of the millimeter wave radar according to claim 1, wherein the acquiring the original echo data of the human body behaviors acquired by the radar comprises:

and carrying out sliding window type data processing on the original echo data to obtain a plurality of window data.

3. The millimeter wave radar-based end-to-end neural network human behavior recognition method according to claim 1, wherein the complex time-frequency transformation network comprises two complex full-connection layers, and the two complex full-connection layers in the complex time-frequency transformation network process the distance and the doppler dimension of the original echo data respectively.

4. The millimeter wave radar-based end-to-end neural network human behavior recognition method according to claim 1, wherein the step of using the target time-frequency information as an input of a classification network to obtain a human behavior classification result output by the classification network comprises the steps of:

extracting characteristics of the target time-frequency information through the classification network to obtain characteristic information;

and outputting the classification result according to the characteristic information.

5. The millimeter wave radar-based end-to-end neural network human behavior recognition method of claim 1, wherein the classification network comprises a convolution block, a two-way long-short-term memory network and a full connection layer.

6. The millimeter wave radar-based end-to-end neural network human behavior recognition method of claim 5, wherein the convolution block comprises a two-dimensional convolution layer, a batch normalization layer, a linear activation layer and a maximum pooling layer.

7. The millimeter wave radar-based end-to-end neural network human behavior recognition method of claim 5, wherein the fully connected layer comprises a fully connected layer and a logistic regression layer.

8. The millimeter wave radar-based end-to-end neural network human behavior recognition method of claim 1, wherein the raw echo data comprises first dimension data and second dimension data, wherein fast time sampling obtains the first dimension data and slow time sampling obtains the second dimension data.

9. The millimeter wave radar-based end-to-end neural network human behavior recognition method of claim 1, wherein the radar comprises a first radar and a second radar, the first radar faces the front of the human body and emits radar waves, the second radar faces the back of the human body and emits radar waves, and the acquiring the original echo data of the human behavior acquired by the radar further comprises:

and carrying out sliding window type data processing on the original echo data of the first radar and the original echo data of the second radar to obtain a plurality of window data.

10. An end-to-end neural network human behavior recognition device based on millimeter wave radar, which is characterized by comprising

The data acquisition unit is used for acquiring original echo data of human body behaviors acquired by the radar, wherein the first radar and the second radar are adopted to acquire the data of the human body behaviors;

the data conversion unit is used for carrying out a complex time-frequency conversion network according to the original echo data to obtain target time-frequency information, wherein the complex time-frequency conversion network comprises at least one complex full-connection layer;

the data classification unit is used for taking the target time-frequency information as input of a classification network to obtain a human behavior classification result output by the classification network, the classification network consists of two channels, the input is output data of a complex time-frequency conversion network of first radar echo data and second radar echo data, the two channels have the same network topological structure, and each channel consists of three convolution blocks, a two-way long-short-period memory network and a full-connection layer;

the complex time-frequency transformation network expression is:

，