CN116524537A - Human body posture recognition method based on CNN and LSTM combination - Google Patents
Human body posture recognition method based on CNN and LSTM combination Download PDFInfo
- Publication number
- CN116524537A CN116524537A CN202310465077.5A CN202310465077A CN116524537A CN 116524537 A CN116524537 A CN 116524537A CN 202310465077 A CN202310465077 A CN 202310465077A CN 116524537 A CN116524537 A CN 116524537A
- Authority
- CN
- China
- Prior art keywords
- time
- distance
- channel
- cnn
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000013135 deep learning Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000001228 spectrum Methods 0.000 claims abstract description 11
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 238000002372 labelling Methods 0.000 claims abstract description 7
- 238000005070 sampling Methods 0.000 claims description 22
- 238000013527 convolutional neural network Methods 0.000 claims description 21
- 230000036544 posture Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 16
- 238000002156 mixing Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- 238000012360 testing method Methods 0.000 abstract description 6
- 230000009471 action Effects 0.000 description 9
- 238000012544 monitoring process Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
The invention discloses a human body posture recognition method based on CNN and LSTM combination, firstly, collecting intermediate frequency signal sample data required by training and testing; secondly, performing distance dimension Fourier transform on the intermediate frequency signal data to obtain a time-distance image, summing the distance unit data of the target along the distance dimension to obtain a one-dimensional distance spectrum peak, performing short-time Fourier transform to obtain a time-frequency image, and labeling each image with labels of different categories; establishing a three-channel deep learning neural network model, combining a CNN (computer numerical network) network and an LSTM (computer numerical network) network in each channel, taking a time-frequency image as input by a first channel and a second channel, extracting features by a convolution kernel with size difference in a convolution layer, and taking a time-distance image as input by a third channel. Data is input into the model for training. According to the method, the millimeter wave radar is combined with the CNN and LSTM networks, multiple types of characteristic images are fused, time sequence characteristic information is fully utilized, and accuracy of recognizing human body gestures is improved.
Description
Technical Field
The invention relates to a human body posture recognition method based on combination of CNN and LSTM, in particular to a target posture recognition method based on combination of millimeter wave radar, convolutional Neural Network (CNN) and long-short-term memory network (LSTM).
Background
In modern society of rapid technological development, target detection and motion recognition and classification become important research directions. In daily life, the old people with inconvenient behaviors are monitored and protected, road conditions are analyzed and judged in an automatic driving technology, criminals are detected in various anti-terrorist actions, and the like, and the method belongs to the field of research. The existing target recognition and action classification technology mainly has three aspects, namely, the method is based on a camera and a shot video, and the complete picture is utilized for recognition; secondly, the human body actions are identified by wearing wearable sensor equipment; thirdly, the detection is performed by using non-contact sensors such as radar, vision sensors, infrared sensors and the like. In addition to the more invasive privacy problem generated by camera monitoring, the security problem of privacy disclosure can also be generated when data flow to terminal equipment is transmitted; the wearable device causes a lot of inconveniences in daily life. The radar has the characteristics of non-wearing type, no influence of illumination and air environment, wall-penetrating detection without privacy problem and the like, so that the radar gradually becomes a popular choice for indoor monitoring and recognition.
In addition, the traditional machine learning method is simple and effective in manually extracting features when aiming at simple specific tasks, and deep learning developed from the traditional neural network has the characteristics of strong learning ability, good portability and high coverage, so long as a large amount of data are possessed, the recognition accuracy even exceeding the human performance can be obtained, and the training of different data can be realized by the same network structure. Deep learning has been widely used in various fields such as speech recognition, machine translation, image recognition, and automatic driving. The method for detecting the target by using the camera generally depends on a deep learning method based on a convolutional neural network. The deep learning utilizes the multi-layer network to capture image detail information better, performs segmentation extraction of target characters in the image, and tests the data set through the training data set to obtain parameters with higher accuracy, so that a neural network is built, and the image afferent network can be classified.
With the development of signal processing technology, we can determine the behavior of the monitored target by analyzing the electromagnetic wave received by the radar antenna. In the detection system, background noise generated by reflection of some static objects exists, and the identification of target objects can be affected. After background noise is removed from echo data received by the radar, the distance and Doppler frequency analysis of the time-varying signals is carried out through the distance Fourier transform and the short-time Fourier transform, so that the defect that the distance and frequency of the time-domain signals are difficult to acquire is overcome. The different actions are detected by a deep learning method by utilizing the characteristic that the frequency of signals generated by a human body under different action conditions is different and the generated time-frequency diagram and time-distance diagram are different.
Disclosure of Invention
Aiming at the problems, the human body gesture recognition method based on the combination of CNN and LSTM aims to solve the problem of low recognition accuracy rate on the premise of protecting privacy, and provides a method for realizing human body action gesture recognition by using a millimeter wave radar for the old or related groups.
The aim of the invention can be achieved by adopting the following technical scheme, which comprises the following steps:
step 1, obtaining intermediate frequency signals required by training and testing through a millimeter wave radar, and obtaining the intermediate frequency signals after mixing the transmitting signals with echo signals received by an receiving antenna.
The millimeter wave radar in the method is placed in the moving distance of the identified object, and after electromagnetic wave signals are sent out through the transmitting antenna, reflected signals are received through the reflecting antenna after being reflected by the monitoring target and the background environment, so that the non-contact monitoring is realized. In the detection process, signal acquisition needs to be performed on a plurality of different targets, and each target is required to realize a plurality of different postures.
Step 2, performing distance dimension Fourier transform on the intermediate frequency signal to obtain a time-distance image, and summing the distance unit data of the target along the distance dimension to obtain a one-dimensional distance spectrum peak; and carrying out short-time Fourier transform on the one-dimensional distance spectrum peak to obtain a time-frequency image, and labeling the two images with labels of corresponding types.
And 3, establishing an improved three-channel deep learning neural network model, combining CNN and LSTM networks in the three channels, taking time-frequency images as input in a first channel and a second channel, extracting features by adopting convolution kernels with size difference in a convolution layer, taking time-distance images as input in a third channel, inputting data into the model according to requirements, training, obtaining optimal model parameters, and storing.
The millimeter wave radar in the step 1 obtains the intermediate frequency signal needed by training and testing and is formed by mixing a transmitting signal and an echo signal received by a receiving antenna, and the steps are as follows:
step 1.1, a linear frequency modulation continuous wave signal, also called chirp signal, transmitted by a millimeter wave radar transmitting antenna at the time t is:
wherein A is tx To transmit signal amplitude, f c Is carrier frequency, B is bandwidth, T c Is a sweep frequency period.
Step 1.2. The echo signal received by the receiving antenna is the delay time tau of the transmitting signal d The following signals:
delay time τ d Can be expressed as:
where d is the distance between the target and the radar and c is the speed of light.
And then mixing the echo signal and the transmitting signal to obtain an intermediate frequency signal, wherein the mixing refers to conjugate multiplication of the echo signal and the transmitting signal, and the intermediate frequency signal is in the following form:
wherein A is IF For the amplitude of the intermediate frequency signal, f b Is the frequency of the intermediate frequency signal,is the phase of the intermediate frequency signal.
In practice, signal processing is often done in the digital domain, requiring sampling of the signal. For multicycle chirp signals, f b Andrelated to the time interval between chrip. Assume that the sampling rate of the millimeter wave radar system is f s At this time, the discrete sampling form of the intermediate frequency signal is:
wherein m is a fast time sampling point, and represents distance dimension information of the signal; n is the slow time sampling point, characterizing the Doppler information of the signal.
Performing distance dimension Fourier transform on the intermediate frequency signal in the step 2 to obtain a time-distance image, and summing the distance unit data of the target along the distance dimension to obtain a one-dimensional distance spectrum peak; performing short-time Fourier transform on the one-dimensional distance spectrum peak to obtain a time-frequency image, and labeling the two images with labels of corresponding types, wherein the method comprises the following steps:
step 2.1, performing distance dimension Fourier transform on the sampled intermediate frequency signal to obtain a time-distance image:
wherein,,representing fourier transform of the fast time dimension, and k represents the distance dimension sampling point after fourier transform of the fast time dimension.
Step 2.2, summing the distance unit data of the target to obtain a one-dimensional distance spectrum peak, and performing short-time Fourier transform to obtain a time-frequency image:
wherein STFT represents short-time Fourier transform, p represents time-dimension sampling points after short-time Fourier transform, l represents Doppler-dimension sampling points after short-time Fourier transform, and k 0 A starting distance unit k for representing the movement track crossing of the object to be detected 1 And the distance unit is used for indicating the end of the motion trail of the target to be detected.
And 2.3, respectively labeling the time-distance images and the time-frequency images corresponding to various actions with corresponding labels, and storing the labels in different folders for classification.
In the step 3, an improved three-channel deep learning neural network model is established, and the gestures are classified by combining CNN and LSTM, and the method comprises the following steps:
and 3.1, constructing a three-channel network, and taking the time-distance image generated in the step 2 as the input of the first channel and the second channel and the time-distance image as the input of the third channel.
In the first and third channels, firstly, the convolution check image with a size of a×a is adopted in the convolution layer to perform feature extraction, and 0 filling (padding) is utilized to ensure that the image edge information is not lost, then the average pooling layer is adopted to reduce the parameter calculation amount, and the feature image is changed into sequence data and is sent into the LSTM network. In the second channel, features are extracted from the convolution layer by adopting a convolution check image with the size of b multiplied by b, wherein b is larger than a, because the time-frequency images after the transformation of different attitude signals have obviously different peaks, and the features with different thickness degrees can be extracted by adopting different convolution kernel sizes.
Step 3.3, fusing the three-channel feature images by using a concatate () method in a keras library, performing nonlinear operation on the feature images by using a Relu function, and classifying by using a softmax function through a full connection layer, wherein the softmax function is a function for converting a group of numbers into probability distribution, and maps each original value to a probability value between 0 and 1, and the formula is as follows:
wherein x is i Is the original value to be converted, n is the number of categories, j represents the current category.
And 3.4, compiling a model, and performing error calculation on the real label and the actual label by using a classification cross loss entropy function as a loss function to obtain the model accuracy, wherein the classification cross loss entropy function is as follows:
further, compared with the prior art, the human body posture recognition method based on the combination of CNN and LSTM has the following advantages that:
1) The data set of the human body gesture recognition method based on the combination of millimeter wave radar and deep learning provided by the invention comes from a plurality of targets, and has higher diversity and universality;
2) The human body gesture recognition method based on the combination of millimeter wave radar and deep learning can effectively remove low-frequency clutter generated by static objects in the environment;
3) The human body gesture recognition method based on the millimeter wave radar and the deep learning combination can effectively extract the characteristics in the time-frequency image and the time-distance image obtained after the signals are transformed;
4) The human body posture recognition method based on the combination of millimeter wave radar and deep learning provided by the invention can be used for recognizing the human body posture with higher accuracy by fusing multiple aspects of information.
Drawings
FIG. 1 is a flow chart of the method of the present disclosure;
FIG. 2 is a deep learning neural network diagram of the present invention;
FIGS. 3.1 (a) and 3.1 (b) are time-frequency images and time-distance images, respectively, of a basketball playing position;
fig. 3.2 (a) and 3.2 (b) are respectively a time-frequency image and a time-distance image at the time of boxing gesture;
fig. 3.3 (a), 3.3 (b) are time-frequency images and time-distance images, respectively, at the dancing position;
fig. 3.4 (a), 3.4 (b) are time-frequency images and time-distance images, respectively, at jump-in-place;
fig. 3.5 (a), 3.5 (b) are time-frequency images and time-distance images, respectively, of a running;
fig. 4 (a) and fig. 4 (b) are training set and validation set results, respectively.
Detailed Description
In order to make the objects, embodiments and advantages of the present embodiments more comprehensible, the present embodiments are described in more detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1-3.5 (b), the invention provides a human body posture recognition method based on combination of CNN and LSTM, comprising the steps of:
step 1, obtaining intermediate frequency signals required by training and testing through a millimeter wave radar, and obtaining the intermediate frequency signals after mixing the transmitting signals with echo signals received by an receiving antenna.
In this embodiment, five actions of running, boxing, basketball, dancing and in-situ jump are mainly designed, the millimeter wave radar is placed at a distance of 1.5 meters from the identified object, and after electromagnetic wave signals are sent out through the transmitting antenna, reflected signals are received through the reflecting antenna after being reflected by the monitoring target and the background environment, so that non-contact monitoring is realized.
The linear frequency modulation continuous wave signal transmitted by the millimeter wave radar transmitting antenna at the time t is also called chirp signal, which is:
wherein A is tx To transmit signal amplitude, f c Is carrier frequency, B is bandwidth, T c For the sweep period, the carrier frequency f in this embodiment c 77GHz, bandwidth b=1.6 GHz, sweep period tc=40.96 μs.
The echo signal received by the receiving antenna is the delay time tau of the transmitting signal d The following signals:
delay time τ d Can be expressed as:
where d is the distance between the target and the radar and c is the speed of light.
And then mixing the echo signal and the transmitting signal to obtain an intermediate frequency signal, wherein the mixing refers to conjugate multiplication of the echo signal and the transmitting signal, and the intermediate frequency signal is in the following form:
wherein A is IF For the amplitude of the intermediate frequency signal, f b Is the frequency of the intermediate frequency signal,is the phase of the intermediate frequency signal.
In practice, signal processing is often done in the digital domain, requiring sampling of the signal. For multicycle chirp signals, f b Andrelated to the time interval between chirp. In this embodiment, the sampling rate of the millimeter wave radar system is f s =6.25 MHz, the discrete sampling form of the intermediate frequency signal is:
wherein m is a fast time sampling point, and represents distance dimension information of the signal; n is the slow time sampling point, characterizing the Doppler information of the signal. In this embodiment, the fast sampling point number is 256, and the slow sampling point number is 100.
Step 2, performing distance dimension Fourier transform on the intermediate frequency signal to obtain a time-distance image, and summing the distance unit data of the target along the distance dimension to obtain a one-dimensional distance spectrum peak; and carrying out short-time Fourier transform on the one-dimensional distance spectrum peak to obtain a time-frequency image, and labeling the two images with labels of corresponding types.
Step 2.1, performing distance dimension Fourier transform on the sampled intermediate frequency signal to obtain a time-distance image:
wherein,,representing fourier transform of the fast time dimension, and k represents the distance dimension sampling point after fourier transform of the fast time dimension.
Step 2.2, summing the distance unit data of the target to obtain a one-dimensional distance spectrum peak, and performing short-time Fourier transform to obtain a time-frequency image:
wherein STFT represents short-time Fourier transform, p represents time-dimension sampling points after short-time Fourier transform, l represents Doppler-dimension sampling points after short-time Fourier transform, and k 0 A starting distance unit k for representing the movement track crossing of the object to be detected 1 And the distance unit is used for indicating the end of the motion trail of the target to be detected.
And 2.3, respectively labeling the time-distance images and the time-frequency images corresponding to various actions with corresponding labels, and storing the labels in different folders for classification.
And 3, establishing an improved three-channel deep learning neural network model, combining CNN and LSTM networks in the three channels, taking time-frequency images as input in a first channel and a second channel, extracting features by adopting convolution kernels with size difference in a convolution layer, taking time-distance images as input in a third channel, inputting data into the model according to requirements, training, obtaining optimal model parameters, and storing.
In the step 3, an improved three-channel deep learning neural network model is established, and the gestures are classified by combining CNN and LSTM, and the method comprises the following steps:
and 3.1, constructing a three-channel network, and taking the time-distance image generated in the step 2 as the input of the first channel and the second channel and the time-distance image as the input of the third channel.
In the first and third channels, firstly, the convolution check image with a size of a×a is adopted in the convolution layer to perform feature extraction, and 0 filling (padding) is utilized to ensure that the image edge information is not lost, then the average pooling layer is adopted to reduce the parameter calculation amount, and the feature image is changed into sequence data and is sent into the LSTM network. In the second channel, features are extracted from the convolution layer by adopting a convolution check image with the size of b multiplied by b, wherein b is larger than a, because the time-frequency images after the transformation of different attitude signals have obviously different peaks, and the features with different thickness degrees can be extracted by adopting different convolution kernel sizes.
Step 3.3, fusing the three-channel feature images by using a concatate () method in a keras library, performing nonlinear operation on the feature images by using a Relu function, and classifying by using a softmax function through a full connection layer, wherein the softmax function is a function for converting a group of numbers into probability distribution, and maps each original value to a probability value between 0 and 1, and the formula is as follows:
wherein x is i Is the original value to be converted, n is the number of categories, j represents the current category.
And 3.4, compiling a model, and performing error calculation on the real label and the actual label by using a classification cross loss entropy function as a loss function to obtain the model accuracy, wherein the classification cross loss entropy function is as follows:
in this embodiment, a convolution kernel with a size of 3×3 may be adopted to perform feature extraction in the first channel, and a convolution kernel with a size of 5×5 may be adopted to extract features in the second channel, where the specific structure of the neural network is shown in fig. 2. The data set is used for 600 pieces, wherein each type of image is 100 pieces, and is divided into 480 pieces of training set and 120 pieces of verification set according to the proportion. The training runs were 20 runs, the number of samples was 5, and examples of time-frequency images and time-distance images for five different poses are shown in fig. 3. The network training results and test results are shown in fig. 4.
Table 1 shows the network structure in a specific embodiment
Table 2 shows the results of comparison of different types of methods
Method name | Classification accuracy |
Single class feature input CNN | 91.24% |
Single class feature input LSTM | 89.9% |
The method of the invention | 93.94% |
In conclusion, the method utilizes the reflected signals received by the radar, and after the time-frequency and time distance images are obtained through Fourier transform, the human body gestures are identified and classified by combining the deep learning neural network, so that feature fusion is effectively performed, time sequence information is effectively utilized, and the accuracy of human body gesture identification by the millimeter wave radar is improved.
The foregoing is illustrative of the methods and structures of the present invention and modifications and substitutions of the specific embodiments described herein by those skilled in the art will be apparent to those of ordinary skill in the art without departing from the invention or beyond the scope of the appended claims.
Claims (8)
1. The human body posture recognition method based on the combination of CNN and LSTM is characterized by comprising the following steps:
collecting radar reflection signals of different targets, wherein each target realizes a plurality of different postures;
obtaining a time-distance image and a time-frequency image according to the reflected signals, and respectively labeling the time-distance image and the time-frequency image corresponding to various gestures with corresponding labels;
establishing a deep learning neural network model comprising a feature extraction layer, a feature fusion layer and a full-connection layer, wherein the feature extraction layer comprises three channels, each channel comprises a CNN network and an LSTM network, the input of a first channel and a second channel is a time-frequency image, the input of a third channel is a time-distance image, the feature images output by the channels are input into the feature fusion layer for feature fusion, and the classification result is output at the full-connection layer;
training the deep learning neural network model;
and inputting the image to be identified into a trained deep learning neural network model, and outputting the corresponding human body posture type.
2. The human body posture recognition method based on the combination of CNN and LSTM according to claim 1, wherein the reflected signal is an intermediate frequency signal obtained by mixing a transmitting signal with an echo signal received by a receiving antenna.
3. The human body posture recognition method based on the combination of CNN and LSTM according to claim 2, wherein the discrete sampling form of the intermediate frequency signal is:
wherein A is IF For the amplitude of the intermediate frequency signal, f b Is the frequency of the intermediate frequency signal,the phase of the intermediate frequency signal, m is a fast time sampling point, and the distance dimension information of the signal is represented; n is a slow time sampling point, doppler information representing a signal, f s Is the radar system sampling rate.
4. The human body posture recognition method based on the combination of CNN and LSTM according to claim 1, wherein the reflected signal is subjected to distance dimension Fourier transform to obtain a time-distance image.
5. The human body posture recognition method based on the combination of CNN and LSTM according to claim 1, wherein the distance unit data of the target is summed along the distance dimension to obtain a one-dimensional distance spectrum peak; and carrying out short-time Fourier transform on the one-dimensional distance spectrum peak to obtain a time-frequency image.
6. The method for recognizing human body posture based on the combination of CNN and LSTM according to claim 1, wherein the convolution layers of the first and second channels use convolution kernels having a difference in size.
7. The method of claim 1, wherein the convolution layer of the first channel and the third channel uses a convolution kernel of a×a, and the convolution layer of the second channel uses a convolution kernel of b×b, wherein b > a.
8. The human body posture recognition method based on the combination of CNN and LSTM according to claim 1, wherein the three-channel characteristic images are fused by using a concatate () method in a keras library, the characteristic images are subjected to nonlinear operation by using a Relu function, and then classified by using a softmax function through a full connection layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310465077.5A CN116524537A (en) | 2023-04-26 | 2023-04-26 | Human body posture recognition method based on CNN and LSTM combination |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310465077.5A CN116524537A (en) | 2023-04-26 | 2023-04-26 | Human body posture recognition method based on CNN and LSTM combination |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524537A true CN116524537A (en) | 2023-08-01 |
Family
ID=87397031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310465077.5A Pending CN116524537A (en) | 2023-04-26 | 2023-04-26 | Human body posture recognition method based on CNN and LSTM combination |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524537A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117310646A (en) * | 2023-11-27 | 2023-12-29 | 南昌大学 | Lightweight human body posture recognition method and system based on indoor millimeter wave radar |
-
2023
- 2023-04-26 CN CN202310465077.5A patent/CN116524537A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117310646A (en) * | 2023-11-27 | 2023-12-29 | 南昌大学 | Lightweight human body posture recognition method and system based on indoor millimeter wave radar |
CN117310646B (en) * | 2023-11-27 | 2024-03-22 | 南昌大学 | Lightweight human body posture recognition method and system based on indoor millimeter wave radar |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Angelov et al. | Practical classification of different moving targets using automotive radar and deep neural networks | |
CN107862705B (en) | Unmanned aerial vehicle small target detection method based on motion characteristics and deep learning characteristics | |
CN101223456B (en) | Computer implemented method for identifying a moving object by using a statistical classifier | |
CN111399642B (en) | Gesture recognition method and device, mobile terminal and storage medium | |
CN107358250B (en) | Body gait recognition methods and system based on the fusion of two waveband radar micro-doppler | |
Yang et al. | Dense people counting using IR-UWB radar with a hybrid feature extraction method | |
Liu et al. | Deep learning and recognition of radar jamming based on CNN | |
Ahmed et al. | Radar-based air-writing gesture recognition using a novel multistream CNN approach | |
CN111461037B (en) | End-to-end gesture recognition method based on FMCW radar | |
CN113837131B (en) | Multi-scale feature fusion gesture recognition method based on FMCW millimeter wave radar | |
US20230039196A1 (en) | Small unmanned aerial systems detection and classification using multi-modal deep neural networks | |
WO2023029390A1 (en) | Millimeter wave radar-based gesture detection and recognition method | |
CN111175718A (en) | Time-frequency domain combined ground radar automatic target identification method and system | |
CN113640768B (en) | Low-resolution radar target identification method based on wavelet transformation | |
CN116524537A (en) | Human body posture recognition method based on CNN and LSTM combination | |
Hendy et al. | Deep learning approaches for air-writing using single UWB radar | |
CN109061632A (en) | A kind of unmanned plane recognition methods | |
CN114581958A (en) | Static human body posture estimation method based on CSI signal arrival angle estimation | |
CN113901878A (en) | CNN + RNN algorithm-based three-dimensional ground penetrating radar image underground pipeline identification method | |
CN117452496A (en) | Target detection and identification method for seismoacoustic sensor | |
Martinez et al. | Deep learning-based segmentation for the extraction of micro-doppler signatures | |
Tang et al. | SAR deception jamming target recognition based on the shadow feature | |
CN115600101B (en) | Priori knowledge-based unmanned aerial vehicle signal intelligent detection method and apparatus | |
CN115061094B (en) | Radar target recognition method based on neural network and SVM | |
Zhu et al. | A New Deception Jamming Signal Detection Technique Based on YOLOv7 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |