CN115859078A

CN115859078A - Millimeter wave radar fall detection method based on improved Transformer

Info

Publication number: CN115859078A
Application number: CN202211407979.5A
Authority: CN
Inventors: 包志强; 艾婷; 高帆; 王积军; 赵雨欣
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-11-10
Filing date: 2022-11-10
Publication date: 2023-03-28

Abstract

The invention relates to a millimeter wave radar fall detection method based on an improved Transformer, which comprises the following steps: setting millimeter wave radar parameters; collecting falling, walking, sitting and static action postures made by experimenters to form an original human body posture data set; preprocessing the data set to obtain a human body posture distance-Doppler frequency spectrum to construct a human body posture database; building a deep learning network model, wherein the deep learning network model comprises sign pre-extraction, feature extraction and classification output; respectively inputting a training set and a test set in a human body posture database into a deep learning network model, and training and testing the network model; and inputting the human body posture data into the model to realize falling detection. According to the method, a channel entry attention mechanism and a transform structure are combined, the characteristics of the collected millimeter wave radar signals are extracted, the capability of extracting the characteristics of the model is further enhanced, and the accuracy of fall detection can be greatly improved.

Description

Millimeter wave radar fall detection method based on improved transform

Technical Field

The invention belongs to the technical field of fall detection, and particularly relates to a millimeter wave radar fall detection method based on an improved transform.

Background

At present, there are various methods for fall detection, which are mainly classified into wearable devices and non-contact devices. The wearable device collects the motion state and vital sign data of a monitored object through the sensor, and then digital features which can be identified by a computer are extracted from the data to judge whether the monitored object falls down or not. For example, in the patent (CN 201710322157), a fall detection method based on a convolutional neural network and mobile phone sensor data utilizes the neural network to train and classify the three-axis sensor data built in a smart phone, so that a good detection effect can be achieved, but the method has the problems that the method is not suitable for old people to forget to actively wear the smart phone in a somatosensory manner. The non-contact equipment comprises a camera, WIFI, a radar sensor and the like. However, the image detection-based mode has the risk of invading the privacy of the individual, is not suitable for privacy places such as a washroom and a bedroom, needs sufficient light and is not suitable for the use of night scenes; an intelligent monitoring system provided based on WIFI channel state information can establish a fall detection model according to phase and amplitude difference values of the channel state information, and the method is easily interfered by other external signals and has poor model generalization.

Compared with the two modes based on image detection and WIFI channel state information, the radar sensor for fall detection has the advantages of being high in measurement accuracy, not influencing daily activities of users in the measurement process, and fully protecting privacy of the users. However, in the prior art, the radar sensor is used for acquiring the micro-doppler characteristics, and the CNN is used for training and identifying the patient action, so that the network structure is single, the extraction and utilization of the signal characteristics are limited, and the accuracy of the model needs to be improved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a millimeter wave radar fall detection method based on improved transform, which combines a channel entry attention mechanism and a transform structure, extracts the characteristics of the collected millimeter wave radar signals, further enhances the capability of model characteristic extraction, and can greatly improve the accuracy of fall detection.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a millimeter wave radar fall detection method based on improved Transformer comprises the following steps:

1) Setting millimeter wave radar parameters for acquiring action attitude data of experimenters;

2) Acquiring N groups of data of four action postures of falling, walking, sitting and standing by experimenters through a millimeter wave radar, and storing the N groups of data as bin files to form an original human body posture data set; the N groups of data are data intermediate frequency signals obtained by performing frequency mixing processing on millimeter wave radar transmitting signals and receiving signals;

3) Preprocessing bin files in the original human body posture data set to obtain a human body posture distance-Doppler frequency spectrum, storing the human body posture distance-Doppler frequency spectrum as an npy file, and constructing the npy file into a human body posture database; the human body posture database comprises a training set and a testing set;

4) Building a human body posture deep learning network model by using Python programming based on a Tensorflow frame; the human posture deep learning network model comprises a human posture characteristic pre-extraction model, a human posture characteristic extraction model and a human posture classification output model;

5) Respectively inputting a training set and a test set in the human body posture database constructed in the step 3) into the constructed human body posture deep learning network model, and training and testing the deep learning network model;

6) The human posture deep learning network model which completes training and testing receives human posture data of a person to be detected, wherein the human posture data are acquired by the millimeter wave radar, the probabilities of falling, walking, sitting and standing in input data are calculated according to parameters learned in the training process, the prediction result with the maximum probability value of each category is output, whether the person falls is judged according to the prediction result, and falling detection is achieved.

Preferably, in the step 1), the millimeter wave radar adopts a single-transmitting single-receiving antenna, and the starting frequency of the millimeter wave radar is set to 77GHz; the slope of the sweep frequency is set to be 25MHz/us; the duration of each Chirp is set to 50us; the period for collecting each frame of data is set to 50us.

Preferably, in the step 2), the total number of the N groups of data is 2000 groups, wherein 1-500 groups are fall data, 501-1000 groups are walking data, 1001-1500 groups are sitting, and 1501-2000 groups are static data.

Preferably, the step 3) specifically comprises the following steps:

31 Subjecting the bin file to one-dimensional FFT processing to obtain a distance spectrum of a data intermediate frequency signal, and filtering out a static target in a range-Doppler frequency spectrum range by an average phase subtraction method to generate a distance spectrum of a dynamic target;

32 Carrying out two-dimensional FFT processing on the distance spectrum of the dynamic target to obtain a human body posture distance-Doppler spectrum; the frame number of the human body posture distance-Doppler frequency spectrum is 60, the distance dimension is 60, and the speed dimension is 126;

33 All the human body posture distance-Doppler frequency spectrums are saved as npy files, the npy files are respectively marked with the file names of fall, walk, site and motionless, and the human body posture distance-Doppler frequency spectrums marked as fall, walk, site and motionless are respectively marked with a value of 4:1, randomly dividing the proportion into a training set and a testing set, and constructing the training set and the testing set into a human body posture database; where fall represents fall data, walk represents walk data, sit represents seat data, and motionless represents still data.

Preferably, in the step 33), the number of the human body posture distances-doppler spectrums, which are labeled as fall, walk, sit and motionless, of the npy files in the training set is 400 groups, and the total number is 1600 groups; the number of the human body posture distances, namely Doppler frequency spectrums, marked as fall, walk, site and motionless in the npy files in the test set is 100 groups, and the total number is 400 groups.

Preferably, in the step 41), building a human body posture feature pre-extraction model, firstly performing pooling operation on input data of a human body posture distance-doppler spectrum, reducing the size of the data, then mapping and outputting each frame of data contained in the data as a Patch through a full connection layer, and adding position information to each frame of data output by mapping to obtain a feature map.

Preferably, in the step 41), the human body posture feature extraction model is a pyramid structure composed of four Radar transform Block modules with different sizes, when the model is built, the input feature graph is sequentially processed by the four Radar transform Block modules with different sizes, an average pooling layer is added to the tail end of each Radar transform Block module, and the dimension of the feature graph output each time is sequentially changed;

each Radar Transformer Block module internally comprises a channel attention mechanism and an Encoder structure in a Transformer; the input feature maps are subjected to channel attention mechanism calculation for each channel in each Radar Transformer Block module, the feature maps containing important information are emphasized, and part of irrelevant feature maps are ignored; and then normalization processing is carried out through an Encoder structure, and the normalized data is input into a Multi-head Attention Mechanism (MHA) for processing.

Preferably, in the step 41), the human posture classification output model is a neural network MLP constructed by a full connection layer and a Softmax layer; the full connection layer is three layers, and the Softmax layer is one layer; the node books of the three fully-connected layers are 128, 64 and 4 in sequence;

the first two layers of the full-connection layer adopt a Gaussian Error Linear Unit (GELU) activation function, and the expression of the activation function is as follows:

the third layer adopts a Softmax activation function, standardizes output vectors of the full connection layer, enables the sum of all elements of the output vectors to be 1, and is used for representing the final prediction probability of the network on four actions of falling, walking, sitting and standing;

the expression of the Softmax activation function is as follows:

wherein z is _i Is the output value of the ith node, and C is the output number of the node, i.e. the classification category number.

Preferably, in the step 5), training the deep learning network model, inputting 1600 groups of data in a training set into the deep learning network model for iterative training, labeling the output prediction result and the real category to send into a loss function to record a loss value, optimizing by using a random gradient descent (SGD) optimization algorithm, updating network parameters, and storing a group of optimal network model parameters.

Preferably, in the step 5), the deep learning network model is tested, 400 groups of data in the test set are input into the deep learning network model for a fall detection test, a loss value and a test accuracy are recorded, and a model confusion matrix is output to observe a prediction result.

The invention has the technical effects and advantages that:

according to the millimeter wave radar fall detection method based on the improved transform, by combining a fall detection algorithm with the characteristics of a millimeter wave radar signal, firstly, global average pooling compression space information is carried out on an input feature map in a channel Attention mechanism, channel information is obtained, attention distribution information between channels is captured through two full-connection layers, weighted summation is carried out on the Attention distribution information and input features to serve as output feature data, then, after normalization processing is carried out on the feature data through an Encoder structure in the transform, the feature data are input into a Multi-head Attention Mechanism (MHA) to be processed, time sequence information and space information of the signal can be accurately extracted, the capability of extracting features of a human body attitude deep learning network model can be further enhanced, the problem that the transform structure is prone to over-fitting under a small sample is solved, and the fall detection accuracy rate on a test set can reach 96.75%, so that the fall detection accuracy rate is greatly improved.

Drawings

FIG. 1 is a schematic flow chart of a fall detection method of a millimeter wave radar based on an improved Transformer according to the present invention;

FIG. 2 is a schematic diagram of the format of raw body pose data collected by millimeter wave radar according to the present invention;

FIG. 3 is a diagram illustrating the result of filtering static clutter from the range spectrum when performing one-dimensional FFT on bin files according to the present invention;

FIG. 4 is a human body pose range-Doppler spectrum graph obtained by performing two-dimensional FFT processing on the range-Doppler spectrum according to the present invention;

FIG. 5 is a schematic overall structure diagram of a human posture deep learning network model built by the invention;

FIG. 6 is a flow chart of a channel attention mechanism in the human pose feature extraction model of the present invention;

FIG. 7 is a schematic structural diagram of an MHA of a multi-head attention mechanism in the Encoder structure of the human body posture feature extraction model according to the present invention;

FIG. 8 is a schematic diagram of the classification output structure of the human posture classification output model of the present invention;

FIG. 9 is a graph of the variation in loss value for the training model of the deep learning network model of the present invention;

FIG. 10 is a graph of the accuracy change of the training model of the deep learning network model of the present invention;

fig. 11 is a fall detection confusion matrix table of the invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples, which are given in conjunction with the accompanying drawings.

Referring to fig. 1, a method for fall detection of a millimeter wave radar based on an improved Transformer includes the following steps:

1) Setting millimeter wave radar parameters for acquiring action attitude data of experimenters; specifically, the millimeter wave radar adopts a single-transmitting single-receiving antenna, and the initial frequency of the millimeter wave radar is set to 77GHz; the slope of the sweep frequency is set to be 25MHz/us; the duration of each Chirp was set to 50us; the period for collecting each frame of data is set to 50us.

2) Acquiring N groups of data of four action postures of falling, walking, sitting and standing by experimenters through a millimeter wave radar, and storing the N groups of data as bin files to form an original human body posture data set; the four motion posture data of falling, walking, sitting and standing are data intermediate frequency signals obtained by performing frequency mixing processing on millimeter wave radar transmitting signals and receiving signals. Specifically, the total amount of four types of motion posture data of falling, walking, sitting and standing collected by the millimeter wave radar is 2000 groups, wherein 1-500 groups are falling data, 501-1000 groups are walking data, 1001-1500 groups are sitting and 1501-2000 groups are standing data.

3) Preprocessing bin files in the original human body posture data set to obtain a human body posture distance-Doppler frequency spectrum, storing the human body posture distance-Doppler frequency spectrum as an npy file, and constructing the npy file into a human body posture database; the body posture database includes a training set and a test set. The method specifically comprises the following steps.

31 One-dimensional FFT processing is carried out on the bin file to obtain the distance spectrum of the intermediate frequency signal of the data, and the static target in the range of the distance-Doppler frequency spectrum is filtered by an average phase subtraction method to generate the distance spectrum of the dynamic target.

32 Carrying out two-dimensional FFT processing on the distance spectrum of the dynamic target to obtain a human body posture distance-Doppler spectrum; the frame number of the human body posture distance-Doppler frequency spectrum is 60, the distance dimension is 60, and the speed dimension is 126.

Wherein, the number of the human body posture distances-Doppler frequency spectrums marked as fall, walk, sit and motionless in the npy files in the training set is 400 groups, and the total number is 1600 groups; the number of the human body posture distances, namely Doppler frequency spectrums, marked as fall, walk, site and motionless in the npy files in the test set is 100 groups, and the total number is 400 groups.

4) Building a human body posture deep learning network model by using Python programming based on a Tensorflow frame; the human posture deep learning network model comprises a human posture characteristic pre-extraction model, a human posture characteristic extraction model and a human posture classification output model.

Specifically, the construction of the human body posture feature pre-extraction model includes firstly performing pooling operation on input data of a human body posture distance-Doppler frequency spectrum, reducing the size of the data, then outputting each frame of data contained in the data as a Patch through full-connection layer mapping, and adding position information to each frame of data output through mapping to obtain a feature map.

Specifically, the human body posture feature extraction model is a pyramid structure formed by four Radar transform Block modules with different sizes, when the model is built, an input feature graph is sequentially processed by the four Radar transform Block modules with different sizes, an average pooling layer is added to the tail end of each Radar transform Block module, and the dimension of the feature graph output each time is sequentially changed.

Each Radar Transformer Block module internally comprises a channel attention mechanism and an Encoder structure in a Transformer; the input feature graph is subjected to channel attention mechanism calculation to calculate the weight of each channel in each Radar transform Block module, the feature graph containing important information is emphasized, and part of inconsequential feature graphs are ignored; and then normalization processing is carried out through an Encoder structure, and the normalized data is input into a Multi-head Attention Mechanism (MHA) for processing.

Specifically, the human posture classification output model is a neural network MLP which is built by a full connection layer and a Softmax layer; the full connection layer is three layers, and the Softmax layer is one layer; the node books of the three fully-connected layers are 128, 64 and 4 in sequence.

the expression of the Softmax activation function is as follows:

5) Respectively inputting the training set and the test set in the human body posture database constructed in the step 3) into the constructed human body posture deep learning network model, and training and testing the deep learning network model.

Specifically, training the deep learning network model, inputting 1600 groups of data in a training set into the deep learning network model for iterative training, marking the output prediction result and the real category, inputting the prediction result and the real category into a loss function to record a loss value, optimizing by using a random gradient descent (SGD) optimization algorithm, updating network parameters, and storing a group of optimal network model parameters.

Specifically, in the test of the deep learning network model, 400 groups of data in the test set are input into the deep learning network model to carry out the falling detection test, the loss value and the test accuracy are recorded, and a model confusion matrix is output to observe the prediction result.

In the embodiment, the falling detection algorithm is combined with the characteristics of millimeter wave radar signals, the channel attention is combined with the Encoder structure in the transform to extract the time sequence information and the spatial information of the signals, the capability of extracting the model features is further enhanced, the problem that the transform structure is easy to be over-fitted under a small sample is solved, the falling detection accuracy on a test set reaches 96.75%, and the falling detection accuracy is greatly improved.

Examples

Step 1) setting millimeter wave radar parameters for acquiring action attitude data of experimenters; adopting a single-transmitting single-receiving antenna for the millimeter wave radar, and setting the initial frequency of the millimeter wave radar to be 77GHz; the slope of the sweep frequency is set to be 25MHz/us; the duration of each Chirp was set to 50us; the period for collecting each frame of data is set to 50us.

In specific implementation, after the millimeter wave radar passes through the setting of the parameters, the sampling rate of the millimeter wave radar can reach 4000ksps, 256 times of ADC sampling can be carried out in each Chirp, and under the condition that the period for acquiring each frame of data is 50us, 60 frames of data can be acquired in each group of data, wherein each frame comprises 200 Chirps.

In specific implementation, the millimeter wave radar development board for collecting human body posture data adopts an AWR1642 millimeter wave radar development board and a DCA1000 data collection development board of TI company.

Step 2) acquiring N groups of data of four actions of falling, walking, sitting and standing by experimenters through a millimeter wave radar, and storing the N groups of data into bin files to form an original human posture data set; and the N groups of data are data intermediate frequency signals obtained by performing frequency mixing on millimeter wave radar transmitting signals and receiving signals. The format of N sets of raw body pose data is shown in fig. 2.

In specific implementation, when the action attitude data of experimenters is collected, in order to collect data more conveniently and accurately, the millimeter wave radar is fixed at a position 1 m away from the ground, and the experimenters make action attitudes within a range of 1-4 m away from the millimeter wave radar; in order to better fit the diversity and complexity of human body actions in an actual scene, experimenters sequentially make four action postures of falling, walking, sitting and standing; the falling comprises four action modes of falling forwards, falling backwards, falling leftwards and falling rightwards; the walking comprises four action modes of far to near, near to far, left to right and right to left, and the millimeter wave radar is used for collecting data.

During specific implementation, in order to improve the accuracy of fall detection, the action posture data of experimenters with different sexes, different heights and different weights can be collected during data collection.

In specific implementation, the total number of the N groups of data is 2000 groups, wherein 1-500 groups are fall data, 501-1000 groups are walking data, 1001-1500 groups are sitting, and 1501-2000 groups are static data.

when the method is specifically implemented, the method comprises the following steps:

31 Performing one-dimensional FFT processing on the bin file to obtain a range-Doppler frequency spectrum of the intermediate-frequency signal of the data, and filtering out static targets in the range of the range-Doppler frequency spectrum by an average phase subtraction method to generate the range-Doppler frequency spectrum of the dynamic target;

in specific implementation, because the phase of the static target does not change, the amplitude value becomes very small after the vector summation subtracts the mean value; the moving target has different phases due to the doppler effect, the vector accumulation can be offset, the mean value is small, the amplitude is not greatly influenced after the mean value is reduced, the amplitude of the moving target is highlighted, the signal-to-noise ratio of the signal is improved to a great extent, the purpose of filtering static clutter is achieved, and the processing result is shown in fig. 3.

32 Carrying out two-dimensional FFT processing on the range-Doppler frequency spectrum of the dynamic target to obtain the range-Doppler frequency spectrum of the human body posture; as shown in fig. 4; wherein, the frame number of the human body posture distance-Doppler frequency spectrum is 60, the distance dimension is 60, and the speed dimension is 126.

In specific implementation, the number of human body posture distances, namely Doppler frequency spectrums, marked as fall, walk, sit and motionless in the npy files in the training set is 400 groups, and the total number is 1600 groups; the number of the human body posture distances, namely Doppler frequency spectrums, marked as fall, walk, site and motionless in the npy files in the test set is 100 groups, and the total number is 400 groups.

In specific implementation, a human body posture feature pre-extraction model is built, firstly, pooling operation is carried out on input data of a human body posture distance-Doppler frequency spectrum, the size of the data is reduced, then each frame of data contained in the data is used as a Patch to be output through full connection layer mapping, and position information is added to each frame of data output through mapping to obtain a feature map.

In specific implementation, in order to realize efficient feature extraction, the largest pooling operation is performed on input data with 60 frames, 60 distance dimensions and 126 speed dimensions through a pooling layer with the filter size of 2x2 and the step size of 2, the size of the input data is reduced, the pooled output size is 60 frames, 30 distance dimensions and 63 speed dimensions, redundant information can be removed on the basis of keeping important information through the pooling operation, and the parameter quantity of a network model can be reduced to a great extent.

In specific implementation, the size of each frame of data output after mapping through a full-connection layer is 60x1890, in order to provide input data for a subsequent human posture classification output model and retain time sequence information of each frame of data, a classification vector with the dimension of 1x1890 is added to the data output by mapping, a position vector with the dimension of 61x1890 is added to the data output by mapping, and finally a feature map with the dimension of 61x1890 is obtained. As shown in fig. 5.

During construction, an input feature map is sequentially processed by four different-size Radar transducer Block modules, an average pooling layer is added to the tail end of each Radar transducer Block module, and the dimension of the feature map output each time is sequentially changed;

In the first step, global Average Pooling (GAP) compressed space information is firstly made on an input feature graph in a channel attention mechanism to obtain channel information; capturing attention distribution information between channels through two full connection layers, wherein the activation functions of the two full connection layers are respectively a ReLU function and a Sigmoid function; and finally, performing weighted summation on the obtained attention distribution information and the input characteristics to obtain output characteristic data. As shown in fig. 6.

And secondly, the Encoder structure normalizes the characteristic data output by the channel Attention mechanism and inputs the characteristic data into a Multi-head Attention Mechanism (MHA) for processing. The structure of MHA is shown in FIG. 7, where h represents the number of heads, and h in this model is 4. After being processed by a Multi-head Attention Mechanism (MHA), a first Radar transform Block module is output and enters an average pooling layer at the tail end of the first Radar transform Block module for pooling, the dimension of a feature map is changed, the other three Radar transform Block modules are processed in the same mode and output for pooling, wherein the dimension of four pooling processes are 2048, 1024, 512 and 256 in sequence, and the human body posture feature extraction model is in a pyramid structure.

In specific implementation, the human posture classification output model is a neural network MLP which is built by a full connection layer and a Softmax layer; the full connection layer is three layers, and the Softmax layer is one layer; the node books of the three fully-connected layers are 128, 64 and 4 in sequence; as shown in fig. 8.

In specific implementation, the first two layers of the full connection layer adopt Gaussian Error Linear Units (GELU) activation functions, and the expression of the activation functions is as follows:

in specific implementation, the third layer adopts a Softmax activation function, standardizes output vectors of the full connection layer, enables the sum of elements of the output vectors to be 1, and is used for representing the final prediction probability of the network on four actions of falling, walking, sitting and standing;

the expression of the Softmax activation function is as follows:

wherein z is _i C is the output number of the ith node, namely the classification category number.

In the specific implementation, the deep learning network model is trained, 1600 groups of data in a training set are input into the deep learning network model for iterative training, the output prediction result and the real category are labeled and input into a loss function to record a loss value, a random gradient descent (SGD) optimization algorithm is used for optimization, network parameters are updated, and a group of optimal network model parameters are stored.

In specific implementation, as shown in fig. 9 and 10, the number of training iterations of model iterative training is 50, the learning rate is 0.001, the iterative momentum is 0.9, and the iterative batch size is 64.

In the specific implementation, the deep learning network model is tested, 400 groups of data in the test set are input into the deep learning network model for fall detection testing, loss values and test accuracy are recorded, and a model confusion matrix is output to observe a prediction result.

In specific implementation, 400 groups of data in the test set are input into a stored network model for fall detection test, loss values and test accuracy are recorded, a model confusion matrix is output to compare a prediction result with an actual result, each column of the confusion matrix represents a prediction category, the total number of each column represents the number of data predicted as the category, each row represents a real category of the data, and the total number of each row of data represents the number of data instances of the category. As can be seen from the data in the first row of the fall detection confusion matrix table of fig. 11, 98 of the 100 fall data sets are correctly predicted, 1 is predicted as walk, and 1 is predicted as sit; as can be seen from the first column of data, 98 sets of fall data were correctly predicted and 2 sets of walk and 3 sets of sit data were mispredicted as fall. Therefore, the prediction accuracy rates of the fall, walk, sit and motionless categories are respectively 98%, 92% and 99%, and the total prediction accuracy rate is 96.75%.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the inventive concept of the present invention, and these changes and modifications are all within the scope of the present invention.

Claims

1. A millimeter wave radar fall detection method based on improved Transformer is characterized by comprising the following steps:

2) Acquiring N groups of data of four action postures of falling, walking, sitting and standing by experimenters through a millimeter wave radar, and storing the N groups of data as bin files to form an original human body posture data set; the N groups of data are data intermediate frequency signals obtained by mixing millimeter wave radar transmitting signals and receiving signals;

2. The improved Transformer-based millimeter wave radar fall detection method according to claim 1, wherein: in the step 1), the millimeter wave radar adopts a single-transmitting single-receiving antenna, and the initial frequency of the millimeter wave radar is set to 77GHz; the slope of the sweep frequency is set to be 25MHz/us; the duration of each Chirp was set to 50us; the period for collecting each frame of data is set to 50us.

3. The improved Transformer-based millimeter wave radar fall detection method according to claim 1, wherein: in the step 2), the total number of the N groups of data is 2000 groups, wherein 1-500 groups are fall data, 501-1000 groups are walking data, 1001-1500 groups are sitting, and 1501-2000 groups are static data.

4. The method for detecting the fall of the millimeter wave radar based on the improved Transformer as claimed in claim 3, wherein: the step 3) specifically comprises the following steps:

33 All the body posture distances-doppler frequency spectrums are saved as npy files, the npy files are respectively marked with the file names of fall, walk, sit and motionless, and the body posture distances-doppler frequency spectrums are respectively marked with the names of 4:1, randomly dividing the proportion into a training set and a testing set, and constructing the training set and the testing set into a human body posture database; where fall represents fall data, walk represents walk data, sit represents seat data, and motionless represents still data.

5. The improved Transformer-based millimeter wave radar fall detection method according to claim 4, wherein: in the step 33), the number of the human body posture distances, namely the Doppler frequency spectrums, of the npy files in the training set, with the file names marked as fall, walk, sit and motionless, is 400 groups, and the total number is 1600 groups; the number of the human body posture distance-Doppler frequency spectrums of the npy files in the test set, the file names of which are labeled as fall, walk, site and motionless, is 100, and the total number is 400.

6. The improved Transformer-based millimeter wave radar fall detection method according to claim 4, wherein: in the step 41), building a human body posture feature pre-extraction model, firstly performing pooling operation on input data of a human body posture distance-Doppler frequency spectrum, reducing the size of the model, then using each frame of data contained in the model as a Patch to be mapped and output through a full connection layer, and adding position information to each frame of data mapped and output to obtain a feature map.

7. The improved Transformer-based millimeter wave radar fall detection method according to claim 4, wherein: in the step 41), the human body posture feature extraction model is a pyramid structure formed by four Radar transform Block modules with different sizes, when the pyramid structure is built, an input feature graph is sequentially processed by the four Radar transform Block modules with different sizes, an average pooling layer is added to the tail end of each Radar transform Block module, and the dimension of the feature graph output each time is sequentially changed;

8. The improved Transformer-based millimeter wave radar fall detection method according to claim 4, wherein: in the step 41), the human posture classification output model is a neural network MLP built by a full connection layer and a Softmax layer; the full connection layer is three layers, and the Softmax layer is one layer; the node books of the three fully-connected layers are 128, 64 and 4 in sequence;

the expression of the Softmax activation function is as follows:

9. The method for detecting the fall of the millimeter wave radar based on the improved Transformer as claimed in claim 5, wherein: in the step 5), training the deep learning network model, inputting 1600 groups of data in the training set into the deep learning network model for iterative training, marking the output prediction result and the real category, inputting the prediction result into a loss function to record a loss value, optimizing by using a random gradient descent (SGD) optimization algorithm, updating network parameters, and storing a group of optimal network model parameters.

10. The improved Transformer-based millimeter wave radar fall detection method according to claim 5, wherein: in the step 5), the deep learning network model is tested, 400 groups of data in the test set are input into the deep learning network model for fall detection testing, loss values and test accuracy are recorded, and a model confusion matrix is output to observe prediction results.