CN113420610A

CN113420610A - Human body gesture recognition method based on fusion of millimeter waves and laser radar, electronic device and storage medium

Info

Publication number: CN113420610A
Application number: CN202110602532.2A
Authority: CN
Inventors: 石绍应; 李冠章; 唐少林; 吴杰伟; 冯勤群; 王亮; 周志伟; 周立和; 吴尚鸿; 杨杰; 蒋贤烨
Original assignee: Hunan Senying Zhizao Technology Co ltd
Current assignee: Hunan Senying Zhizao Technology Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-21

Abstract

The invention relates to a human body gesture recognition method based on the fusion of millimeter waves and a laser radar, electronic equipment and a storage medium. The method comprises the following steps: controlling the millimeter wave radar to send a trigger signal to the laser radar when the millimeter wave radar monitors the gesture initial action of the operator; controlling the laser radar to start to collect a depth image of an operator in a working state after receiving a trigger signal; and sending the depth image into a trained neural network for human body gesture recognition, detecting and outputting a recognition result. The invention adopts millimeter wave triggering and laser radar collection to obtain data, avoids the problems of data resource waste, performance reduction and the like caused by continuous work of the laser radar, is not influenced by illumination, temperature and climate, and has the advantages of strong environmental adaptability, high recognition rate and good real-time property.

Description

Human body gesture recognition method based on fusion of millimeter waves and laser radar, electronic device and storage medium

Technical Field

The invention belongs to the field of gesture recognition, and particularly relates to a human body gesture recognition method based on the fusion of millimeter waves and a laser radar, electronic equipment and a computer readable storage medium.

Background

The railway locomotive crew member needs to adopt various gesture actions to assist in driving in the locomotive driving process, the standard gesture actions are important guarantees for improving the concentration degree of the crew member and ensuring the running safety of the locomotive, and wrong gesture actions can reflect that the driving state of the crew member is not good enough and safety accidents are easy to happen. For this reason, effective technical means are needed to identify and normalize operator driving gestures of a railroad locomotive.

At present, the human body gesture recognition technology mainly comprises two schemes of millimeter wave radar spectrogram recognition and visible light image recognition. Millimeter wave radar spectrogram recognition is an active recognition scheme, can work under various illumination conditions, but information obtained from a radar echo spectrogram is limited, and a finer driving gesture is difficult to recognize. The visible light image can obtain higher recognition rate with the artificial intelligence recognition technology, but the overall performance is easily disturbed by the illumination change of the cab, and in addition, the performance of the camera is obviously reduced under the zoom condition after long-term working, and a large amount of redundant data can be recorded after long-term starting.

Therefore, the two schemes have certain limitation in the application of railway locomotive driving gesture recognition.

Disclosure of Invention

Aiming at the defects in the existing laser radar gesture recognition technology, the invention aims to provide a human body gesture recognition method based on the fusion of millimeter waves and a laser radar, which is used for automatically recognizing the gestures of a crew in a railway locomotive driving environment and judging the gesture standardization of the crew on the basis.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to one aspect of the invention, a human body gesture recognition method based on the fusion of millimeter waves and a laser radar is provided, which comprises the following steps:

controlling the millimeter wave radar to send a trigger signal to the laser radar when the millimeter wave radar monitors the gesture initial action of the operator;

controlling the laser radar to start to collect a depth image of an operator in a working state after receiving a trigger signal;

and sending the depth image into a trained neural network for human body gesture recognition, detecting and outputting a recognition result.

In accordance with another aspect of the present invention, there is provided an electronic apparatus, wherein the electronic apparatus includes:

a processor; and the number of the first and second groups,

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method.

According to another aspect of the present invention, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method.

The invention adopts millimeter wave triggering and laser radar collection to obtain data, avoids the problems of data resource waste, performance reduction and the like caused by continuous work of the laser radar, is not influenced by illumination, temperature and climate, and has the advantages of strong environmental adaptability, high recognition rate and good real-time property.

The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the description and other objects, features, and advantages of the present invention more apparent and understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like elements throughout the drawings.

In the drawings:

FIG. 1 illustrates a flow diagram of a gesture recognition method of the present invention;

FIG. 2 illustrates a portion of a transmitted chirp continuous millimeter wave signal reflected back to a signal received by a radar receiving antenna after encountering a target at a distance R;

FIG. 3 illustrates a neural network architecture diagram of the present invention;

FIG. 4 is a schematic structural diagram of an electronic device according to the present invention;

fig. 5 is a schematic structural diagram of a computer-readable storage medium according to the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Example 1

As shown in fig. 1, the human body gesture recognition method based on the fusion of millimeter waves and laser radar of the present invention includes the following steps:

and S1, controlling the millimeter wave radar to send a trigger signal to the laser radar when the millimeter wave radar monitors the gesture starting action of an operator.

The method for monitoring the gesture initial action comprises the following steps: extracting characteristic information corresponding to the dynamic gesture according to the received millimeter wave electric signals, wherein the characteristic information comprises one or more of distance, azimuth angle, pitch angle and Doppler frequency, generating a characteristic vector according to the characteristic information, and identifying and analyzing the characteristic vector so as to judge the category of the dynamic gesture.

In the above, the extracting of the feature information is implemented by using a multi-domain feature engineering technology based on a distance domain-doppler domain and a time domain-frequency domain, where the multi-domain feature engineering technology includes: the method comprises the steps of carrying out classification combination on time domain-frequency domain combined features consisting of envelope frequency, peak frequency, frequency component duration and peak frequency dynamic range and distance domain-Doppler domain combined features consisting of scattering center distance-Doppler track, distance-velocity accumulation value, distance-velocity dispersion range and multi-channel distance-Doppler inter-frame difference to form continuous multi-frame dynamic feature vector sequences.

The identification is achieved using a multi-tier perceptron.

For ease of understanding, a specific embodiment of step S1 is provided as exemplified below:

s11, installing a millimeter wave radar in a locomotive cab, electrifying to generate a linear frequency modulation continuous wave signal with certain bandwidth and pulse width:

wherein the frequency-modulated starting frequency f of the transmitted signal_cIs 24GHz or more, preferably 60GHz, and pulse duration T_c40us, bandwidth B4 GHz, and frequency modulation rate (slope) S100 MHz/us.

And S12, converting the linear frequency modulation continuous millimeter wave signal into a millimeter wave transmitting signal through phase shifting and amplification, and transmitting the millimeter wave transmitting signal in a direction aligned with a locomotive driver through the antenna unit.

The phase shift is used for controlling the transmitting angle of the transmitting beam, and the amplification is used for ensuring that the transmitted signal has enough power.

And S13, processing the echo millimeter wave signal through low-noise amplification, down-conversion, intermediate frequency filtering, A/D sampling and the like, and then performing processing such as distance measurement, angle measurement, target detection, Doppler information extraction, phase information extraction and the like.

Specifically, the dynamic gesture recognition and monitoring of the locomotive driver are realized by extracting detailed characteristic information contained in a gesture action echo of the locomotive driver and classifying and recognizing the characteristic information. In order to fully utilize the advantage of millimeter wave high frequency and improve the accuracy of describing small-scale detail feature information in the gesture echo, the following multi-domain combined features are adopted for dynamic feature extraction. As shown in fig. 2, after the transmitted chirp continuous millimeter wave signal encounters a target gesture at a distance R, a part of the signal reflected back to the radar receiving antenna is received as:

correlating the received signal with the transmitted signal, and performing low-pass filtering to obtain a baseband echo signal x (t), wherein the corresponding discrete digital signal is as follows:

wherein T is_sIs the sampling period.

Performing time-frequency analysis on the signals to obtain a time-frequency spectrogram of the echo signal as

Where h (m) is a window function that affects the time and frequency domain resolution in the time-frequency analysis.

The time domain energy characteristics based on time frequency analysis are as follows:

wherein M is the number of Doppler resolution cells.

The main Doppler frequency characteristics based on time-frequency analysis are as follows:

and (3) arranging the baseband signals of a plurality of pulse repetition periods according to the distance to sampling time (fast time) -pulse repetition period (slow time) to obtain an R-D map RD (R, v, T) of a distance domain-Doppler domain, wherein index variables R, v and T respectively represent distance, speed and frame time.

The distance image characteristic based on the R-D image is as follows:

similarly, the R-D based doppler signature for micromotion is:

the speed centroid characteristics based on the R-D diagram are as follows:

wherein r0 and r1 are the minimum and maximum distances of the gesture motion distribution range respectively.

The velocity dispersion range characteristic based on the R-D diagram is as follows:

similarly, the distance dispersion range based on the R-D plot is characterized by:

wherein r is^*(T) is the specific distance corresponding to the distance-Doppler cell with the largest echo energy in the Tth frame data, i.e. the

The energy accumulation characteristic based on the R-D diagram is as follows:

the energy difference based on the R-D diagram is characterized in that:

the multichannel accumulation characteristics based on the R-D map are as follows:

wherein RD_k(R, v, T) is the R-D plot for the kth receive channel, K being the total number of all receive channels.

The multichannel difference characteristic based on the R-D diagram is as follows:

M_ij(r,v,T)＝RD_i(r,v,T)-RD_j(r,v,T)

and performing time sequence combination on the time domain-frequency domain characteristics and the distance domain-Doppler domain characteristics to form a characteristic queue, sending the characteristic queue to a multilayer perceptron to perform classification and identification of dynamic gestures, and giving a delay trigger signal to the laser radar when the gesture type is identified to belong to a set gesture initial action.

And S2, controlling the laser radar to start to collect the depth image of the operator in the working state after receiving the trigger signal.

The depth image has n frames, n is a positive integer greater than zero, and in order to process all image data under the same standard, normalization preprocessing is performed on each frame of depth image, so that the data format is normalized, and the identification precision is improved.

The neural network is a convolutional neural network and comprises a 7-layer structure, and as the characteristics of the preprocessed sample picture are enhanced, the 7-layer convolutional neural network with a simple model structure and a small calculated amount can meet the identification requirement, can well take the identification efficiency and the identification efficiency into consideration, and meets the requirement of railway locomotive driving application. Further, the training method of the neural network comprises the following steps: collecting a depth image as a sample picture, and labeling the sample picture according to the gesture type; and training parameters of the neural network model by taking the sample picture as input data of the neural network model and taking the label as output data of the model, wherein the neural network model is a convolutional neural network.

Specifically, the specific implementation of step S2 is as follows:

step S21, after the laser radar receives the delay trigger signal sent by the millimeter wave radar, starting an acquisition program, supposing that N frames of laser picture samples are acquired and respectively recorded as G_n(N is 1,2, …, N), the size of each frame of picture sample is I × J, i.e., the picture height dimension contains I pixels, and the width dimension contains J pixels;

and step S22, carrying out normalization processing on the nth frame picture sample. Determining the maximum value of the pixel in the picture sample, and recording as:

g_max＝maxG_n(i,j)

wherein G is_n(I, J) represents the pixel value of the ith row and the jth column of the frame sample picture, I belongs to {1,2, …, I }, J belongs to {1,2, …, J }, the picture sample data is normalized according to the maximum value of the pixel, and the pixel value is distributed between 0 and 255 by calculating the following pixel points:

and S23, sending the normalized sample obtained in the S22 into a trained convolutional neural network for recognition, and outputting a recognition result. The structural design and parameter training of the neural network comprise the following steps:

step S23.1, as shown in fig. 3, a neural network structure is built, and the neural network includes 7 layers of structures, which are:

layer 1, convolution layer, convolution kernel size is 5 × 5 × 1, the layer contains 6 kinds of convolution kernels in total, and the convolution kernel step size is 1;

the 2 nd layer is a pooling layer, the size of a pooling core is 2 multiplied by 1, the pooling mode is maximum pooling, and the step size of the pooling core is 2;

layer 3, convolution layer, convolution kernel size is 5 × 5 × 6, the layer contains 16 kinds of convolution kernels in total, and the convolution kernel step size is 1;

the layer 4 is a pooling layer, the size of a pooling core is 2 multiplied by 1, the pooling mode is maximum pooling, and the step size of the pooling core is 2;

layer 5, convolution layer, convolution kernel size is 5 × 5 × 16, the layer contains 64 kinds of convolution kernels in total, and the convolution kernel step size is 1;

the 6 th layer and the full connection layer are used for sequentially splicing all output characteristic values of the 5 th layer into a long column vector to serve as an input node of the 6 th layer, and the full connection layer is formed by the input node and 84 output nodes;

and the 7 th layer and the output layer assume M gesture types in total, and 84 nodes output by the 6 th layer are used as input nodes to form a full connection layer with the M output nodes.

S23.2, preprocessing a training sample, namely preprocessing the collected sample data in the step S22, making a label for the gesture type of the sample data, wherein the label of the sample is 0-1 vector data, and the label data of the training sample of the nth frame of picture is recorded as b_nAssuming a total of M gesture types, b_nThe size of (a) is M × 1, and the specific element values are as follows:

and S23.3, training parameters of the neural network model by taking the sample picture as input data of the neural network model and taking the sample label vector as output data of the model.

The method of the present invention further comprises a step S3 for improving the accuracy of recognizing the output structure.

And S3, recording the recognition result of the nth frame of depth image, and selecting the same target with the highest recognition rate as an output recognition result.

The specific implementation of the step S3 is as follows:

step S31, recording the recognition result of the depth image of the nth frame, and if N is less than N, returning to the step S2 to execute, recognizing the sample of the (x + 1) th frame, and if x is equal to N, executing step S32;

and step S32, integrating the N frame recognition results, and voting to output the final recognition result. Assuming a total of M gesture types, the number of tickets obtained by each gesture is respectively marked as T_m(M1, 2, …, M), the final result is the class with the most votes, denoted R.

According to the invention, the millimeter wave radar monitors the gesture initial action, and sends a trigger signal to the laser radar, so that the resource waste and performance reduction of the laser radar caused by long-term standby are reduced.

The method used in the present invention can be converted into program steps and devices which can be stored in a computer storage medium, and the program steps and devices are implemented by means of calling and executing by a controller, wherein the devices are understood as functional modules implemented by computer programs, and the computer programs can be stored in a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, and the one or more programs, when executed by a processor, implement the method.

The present invention also provides an electronic device, wherein the electronic device includes:

a processor; and the number of the first and second groups,

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method described above.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may also be used in conjunction with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of practicing the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the invention may be stored on computer-readable media or may be in the form of one or more signals. Such signals may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the invention. The electronic device conventionally comprises a processor 31 and a memory 32 arranged to store computer-executable instructions (program code). The memory 32 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 32 has a storage space 33 storing program code 34 for performing any of the method steps in the embodiments. For example, the storage space 33 for the program code may comprise respective program codes 34 for implementing respective steps in the above method. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 5. The computer readable storage medium may have memory segments, memory spaces, etc. arranged similarly to the memory 32 in the electronic device of fig. 4. The program code may be compressed, for example, in a suitable form. In general, the memory unit stores program code 41 for performing the steps of the method according to the invention, i.e. program code readable by a processor such as 31, which when run by an electronic device causes the electronic device to perform the individual steps of the method described above.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A human body gesture recognition method based on the fusion of millimeter waves and a laser radar is characterized by comprising the following steps:

2. The method of claim 1, wherein monitoring gesture initiated actions comprises:

extracting characteristic information corresponding to the dynamic gesture according to the received millimeter wave electric signals, wherein the characteristic information comprises one or more of distance, azimuth angle, pitch angle and Doppler frequency;

and generating a characteristic vector according to the characteristic information, and identifying and analyzing the characteristic vector so as to judge the category of the dynamic gesture.

3. The method of claim 2,

extracting the characteristic information by adopting a multi-domain characteristic engineering technology based on a distance domain-Doppler domain and a time domain-frequency domain, and/or,

the identification is achieved using a multi-layer perceptron.

4. The method of claim 3, wherein the multi-domain feature engineering technique comprises:

a time domain-frequency domain combined characteristic consisting of envelope frequency, peak frequency, frequency component duration, and peak frequency dynamic range;

the distance domain-Doppler domain combined characteristic is formed by scattering center distance-Doppler tracks, distance-velocity accumulation values, distance-velocity dispersion ranges and multi-channel distance-Doppler inter-frame difference;

and classifying and combining the features to form a dynamic feature vector sequence of continuous multiple frames.

5. The method of claim 1, wherein the method of training the neural network comprises:

collecting a depth image as a sample picture, and labeling the sample picture according to the gesture type;

and training parameters of the neural network model by taking the sample picture as input data of the neural network model and taking the label as output data of the model, wherein the neural network model is a convolutional neural network.

6. The method of claim 5, wherein the convolutional neural network comprises a 7-layer structure of:

the 7 th layer and the output layer assume M gesture types, and 84 nodes output by the 6 th layer are used as input nodes to form a full connection layer with M output nodes.

7. The method of claim 6, wherein: the depth image has n frames, n is a positive integer greater than zero, and the identification is performed after normalization processing is performed on each frame of depth image.

8. The method of claim 1 or 7, wherein:

collecting n frames of depth images at a set frame rate, n being a positive integer greater than zero,

and recording the recognition result of the depth image of the nth frame, and selecting the same target with the highest recognition rate as an output recognition result.

9. An electronic device, wherein the electronic device comprises:

a processor; and the number of the first and second groups,

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method according to any one of claims 1 to 8.

10. A storage medium, wherein the storage medium stores one or more programs which, when executed by a processor, implement the method of any one of claims 1-8.