CN112949447A

CN112949447A - Gesture recognition method, system, apparatus, device, and medium

Info

Publication number: CN112949447A
Application number: CN202110212320.3A
Authority: CN
Inventors: 李伟; 刘宗民; 冀潮; 曲峰; 范西超; 郭俊伟
Original assignee: BOE Technology Group Co Ltd; Beijing BOE Technology Development Co Ltd
Current assignee: BOE Technology Group Co Ltd; Beijing BOE Technology Development Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-11

Abstract

The application discloses a gesture recognition method, system, device, equipment and medium. The method comprises the following steps: acquiring signal vectors respectively corresponding to continuous multiframe echo signals; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result. The problems of low gesture recognition efficiency and low accuracy are solved.

Description

Gesture recognition method, system, apparatus, device, and medium

Technical Field

The present application relates generally to the field of digital signal processing, and more particularly, to a method, system, apparatus, device, and medium for gesture recognition.

Background

With the development of intelligent electronic devices, gesture recognition becomes an important human-computer interaction mode, and is receiving more and more attention in various fields.

In the related art, the gesture recognition technology is implemented based on a visual technology, for example, gesture recognition is performed by using an optical lens, or based on an optical lens and a depth lens, which is a common gesture recognition method.

However, the gesture recognition technology based on the vision technology generally has the defects of low recognition accuracy, long processing time and the like, and the gesture recognition technology based on the multi-type lens combination also has the problem of high cost of the electronic device.

Disclosure of Invention

In view of the above-mentioned drawbacks and deficiencies of the prior art, it is desirable to provide a gesture recognition method, system, device, apparatus, and medium that improves gesture recognition accuracy and efficiency.

In a first aspect, the present application provides a gesture recognition method, including:

acquiring signal vectors respectively corresponding to continuous multi-frame echo signals, wherein the signal vectors comprise at least two types of sub-signal vectors, and the signal vectors are determined based on all linear frequency modulation signals corresponding to each frame of echo signals;

splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified;

respectively inputting at least two types of images to be recognized corresponding to at least two types of signal matrixes to be recognized into recognition channels corresponding to each type of images to be recognized in a pre-trained gesture recognition model to obtain gesture recognition results;

in a second aspect, the present application provides a gesture recognition system, which includes a signal transmitting antenna, a signal receiving antenna, a radio frequency unit, and a signal processing unit,

the radio frequency unit is used for controlling the signal transmitting antenna to transmit continuous multiframe radar signals, and each frame of radar signal comprises a plurality of linear frequency modulation signals;

the radio frequency unit is also used for acquiring the linear frequency modulation signals received by the signal receiving antenna and returned by the gesture, and sending the linear frequency modulation signals to the signal processing unit;

the signal processing unit is used for acquiring signal vectors respectively corresponding to continuous multi-frame echo signals, the signal vectors comprise at least two types of sub-signal vectors, and the signal vectors are determined based on all linear frequency modulation signals corresponding to each frame of echo signals; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; respectively inputting at least two types of images to be recognized corresponding to at least two types of signal matrixes to be recognized into recognition channels corresponding to each type of images to be recognized in a pre-trained gesture recognition model to obtain gesture recognition results;

in a third aspect, the present application provides a gesture recognition apparatus, including:

in a fourth aspect, the present application provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor being adapted to perform the method according to the first aspect when executing the program

In a fifth aspect, the present application provides a computer readable storage medium having stored thereon a computer program for implementing the method of the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the gesture recognition method, the system, the device, the equipment and the medium provided by the embodiment of the application can acquire signal vectors respectively corresponding to continuous multi-frame echo signals; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result. The gesture type can be determined quickly and accurately.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

fig. 1 is a schematic structural diagram of a gesture recognition system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a gesture recognition method according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another gesture recognition method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a gesture recognition model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of another gesture recognition apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 is a schematic structural diagram of a gesture recognition system according to an embodiment of the present application. As shown in fig. 1, the gesture recognition system includes a signal transmitting antenna 101, a signal receiving antenna 102, a radio frequency unit 103, and a signal processing unit 104, where the signal transmitting antenna 101 and the signal receiving antenna 102 are respectively connected to the radio frequency unit 103, and the radio frequency unit 103 is connected to the signal processing unit 104, where the signal transmitting antenna and the signal receiving antenna may be MIMO antennas, the signal transmitting antenna and the signal receiving antenna may form an antenna array, the number of the signal transmitting antennas and the number of the signal receiving antennas in the antenna array respectively include at least two, for example, the number of the signal transmitting antennas may be two, and the number of the signal receiving antennas may be 4, which is equivalent to 8 virtual antennas.

When the gesture recognition system is in a working state, the radio frequency unit 103 is configured to control the signal transmitting antenna 101 to transmit continuous multiple frames of radar signals to an external space, where each frame of radar signal includes multiple chirp signals (chirp), and the radio frequency unit 103 is further configured to obtain each chirp signal in each frame of echo signals returned by the gesture, received by the signal receiving antenna 102, and send the chirp signal to the signal processing unit 104. Wherein, the number of chirp signals contained in each frame of radar signal transmitted by the signal transmitting antenna 101 is 2ⁿThe n is a positive integer, for example, the number of chirp signals included in each frame of radar signal may be 128 or 256, and specifically, the number may be determined based on actual needs, which is not limited in this embodiment of the application. It should be noted that, in the embodiment of the present application, the two signals are includedThe system comprises antenna arrays of a transmitting antenna and four signal receiving antennas, wherein the process of transmitting each frame of radar signals by the signal transmitting antennas can be that the two signal transmitting antennas alternately transmit a plurality of linear frequency modulation signals, and the four signal receiving antennas can simultaneously receive each linear frequency modulation signal returned by a gesture so as to determine the angle information of the gesture.

The signal processing unit 101 is configured to obtain signal vectors corresponding to consecutive multi-frame echo signals respectively; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to at least two types of signal matrixes to be recognized into recognition channels corresponding to each type of images to be recognized in a pre-trained gesture recognition model to obtain a gesture recognition result, wherein the signal vector comprises at least two types of sub-signal vectors, and the signal vector is determined based on all linear frequency modulation signals corresponding to each frame of echo signals.

It can be understood that, in the embodiment of the present application, the radar signal is a signal transmitted by the signal transmitting antenna 101, the echo signal is a signal reflected by the gesture and then transmitted by the signal receiving antenna 102, and one frame of radar signal corresponds to one frame of echo signal.

An embodiment of the present application provides a gesture recognition method, which may be applied to a signal processing module of a gesture recognition system shown in fig. 1, and as shown in fig. 2, the method includes:

step 201, obtaining signal vectors corresponding to the echo signals of the continuous multiple frames respectively.

In the embodiment of the present application, the continuous multi-frame echo signal is a preset number of frame echo signals including a current frame echo signal, the current frame echo signal may be one frame echo signal corresponding to any one frame radar signal in the continuous multi-frame radar signals transmitted by the signal transmitting antenna, and the remaining frame echo signals in the continuous multi-frame echo signals are historical frame echo signals that have been received; the signal vector is determined based on the entire chirp signal corresponding to each frame of the echo signal, the signal vector including at least two types of sub-signal vectors.

As shown in fig. 3, the process of determining the signal vector based on all the chirp signals corresponding to each frame of echo signal may be:

step S11 is to determine a feature signal corresponding to each frame of echo signal based on all the chirp signals corresponding to each frame of echo signal.

In an embodiment of the present application, the determining the characteristic signal corresponding to each frame of echo signal based on all the chirp signals corresponding to each frame of echo signal may include: when the gesture recognition system is in a working state, acquiring a linear frequency modulation signal returned by an echo signal of a current frame, performing signal compensation on the linear frequency modulation signal to obtain a linear frequency modulation signal to be processed, performing distance dimension Fourier transform on the linear frequency modulation signal to be processed to obtain a distance dimension signal corresponding to the linear frequency modulation signal, and storing the distance dimension signal corresponding to the linear frequency modulation signal.

Further, judging whether the linear frequency modulation signal is the last linear frequency modulation signal corresponding to the current frame echo signal, if so, determining to obtain each distance dimension signal corresponding to the current frame echo signal; and if not, continuously acquiring the next linear frequency modulation signal returned by the gesture, judging whether the linear frequency modulation signal is the last linear frequency modulation signal in the frame of echo signals or not until the last linear frequency modulation signal in the current frame of echo signals is acquired, and acquiring distance dimensional signals respectively associated with all linear frequency modulation signals corresponding to each frame of echo signals.

Then, for distance dimension signals respectively associated with all linear frequency modulation signals corresponding to each frame of echo signal, arranging a plurality of distance dimension signals according to a sequence of obtaining the receiving time of the linear frequency modulation signals corresponding to each distance dimension signal to obtain a distance dimension signal sequence, so as to accurately determine the gesture type based on the distance dimension signal sequence; performing two-dimensional Fourier transform on the distance dimensional signal sequence to obtain a distance Doppler signal, and performing peak detection on the distance Doppler signal to obtain a peak signal; and processing the peak value signal to obtain an angle signal corresponding to the echo signal of the current frame, and determining a characteristic signal corresponding to each frame of echo signal. Wherein, the peak signal can be processed based on Angle-of-Arrival ranging (AOA) to obtain an Angle signal corresponding to the echo signal of the current frame.

It should be noted that, in the embodiment of the present application, the characteristic signal may include a distance signal and an angle signal, and the characteristic signal may also include a distance signal, a peak signal and an angle signal, and a type of the signal included in the characteristic signal may be determined based on an actual need, which is not limited in the embodiment of the present application. The more abundant the signal types in the characteristic signal are, the more information is acquired in the gesture recognition process, and the more accurate the recognition result of the gesture type is.

Step S12, determining a signal vector corresponding to each frame of echo signal based on the feature signal.

In this step, the process of determining the signal vector corresponding to each frame of echo signal based on the feature signal may have the following two optional implementations:

in an alternative implementation, the feature signal includes a distance signal and an angle signal, and the process of determining a signal vector corresponding to each frame of the echo signal based on the feature signal may include: determining a distance sub-signal vector corresponding to the distance signal and determining an angle sub-signal vector corresponding to the angle signal; combining the range signal sub-signal vector and the angle sub-signal vector to obtain a signal vector corresponding to each frame of echo signal, wherein the range sub-signal corresponding to the range signal may include a range dimension signal and/or a range doppler signal, and the range sub-signal vector corresponding to the range signal may include a range dimension signal vector and/or a range doppler signal vector; the distance sub-signal corresponding to the angle signal may include a horizontal angle signal, a pitch angle signal, and/or a roll angle signal, and the angle sub-signal vector corresponding to the angle signal may include: a horizontal angle signal vector, a pitch angle signal vector, and/or a roll angle signal vector.

In another alternative implementation, the feature signal includes a distance signal, a peak signal and an angle signal, and the process of determining the sub-signal vector corresponding to the echo signal of each frame based on the feature signal may include:

determining a distance sub-signal vector corresponding to the distance signal, determining an angle sub-signal vector corresponding to the angle signal, and determining a peak sub-signal vector corresponding to the peak signal; and combining the distance sub-signal vector, the peak sub-signal vector and the angle sub-signal vector to obtain a signal vector corresponding to each frame of echo signals, wherein the peak sub-signal corresponding to the peak signal may comprise the peak signal, and the peak sub-signal vector corresponding to the peak signal may comprise the peak signal vector.

Step 202, splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signals to obtain at least two types of signal matrixes to be identified.

In this step, for the similar sub-signal vectors corresponding to each frame of echo signal, the multiple similar sub-signal vectors may be spliced according to the sequence of the receiving time of each frame of echo signal to obtain a class of signal matrix to be identified, and at least two classes of signal matrices to be identified are obtained based on this manner. The gesture type is recognized based on the time characteristics of the echo signals, so that the accuracy of the gesture type recognition result can be ensured.

For example, assuming that signal vectors corresponding to 50 frames of echo signals are acquired, at least two types of sub-signal vectors corresponding to the signal vectors include: and according to the sequence of receiving 50 frames of echo signals, splicing 50 distance Doppler signal vectors into a Doppler signal matrix to be identified, splicing 50 horizontal angle signal vectors into a horizontal angle signal matrix to be identified, and splicing 50 pitch angle signal vectors into a pitch angle signal matrix to be identified.

And 203, inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model respectively to obtain a gesture recognition result.

In the embodiment of the application, in order to improve the accuracy of gesture type recognition, a gesture recognition model comprising at least two recognition channels is designed, each recognition channel is used for recognizing an image to be recognized corresponding to a part of gesture information, specific gesture information can be recognized in a targeted manner, the recognition results of the recognition channels are combined to determine the final gesture type, and the accuracy of the finally determined gesture type is ensured.

Wherein the gesture recognition model may include: at least two recognition channels and a loss function layer (softmax) connected to the output ends of the at least two recognition channels, each recognition channel comprising at least one feature extraction submodel and a fully connected layer connected in sequence, wherein each feature extraction submodel comprises a batch normalization layer (BatchNorm), a convolutional layer (Conv), a pooling layer (Max Pooling) and an activation layer (Relu) connected in sequence; the full connection layer includes a first Linear layer (Linear1), a first active layer (Relu1), a second Linear layer (Linear2), a second active layer (Relu2), and a second Linear layer (Linear3) which are connected at one time. The more the number of the feature extraction submodels is, more feature information can be extracted, but the excessive feature extraction submodels may cause the gesture recognition model to be over-fitted and the problem of poor generalization capability occurs, preferably, the number of the feature extraction submodels is three, so that the accuracy of the model recognition result can be ensured, and the generalization capability of the model can also be ensured.

As shown in fig. 4, fig. 4 shows a schematic structural diagram of a gesture recognition model provided in an embodiment of the present application, where the gesture recognition model includes three recognition channels, namely a range-doppler recognition channel M1, a pitch angle recognition channel M2, and a horizontal angle recognition channel M3, and three feature extraction submodels, namely a first feature extraction submodel M1, a second feature extraction submodel M2, and a third feature extraction submodel M3.

It is understood that, in the embodiment of the present application, at least two recognition channels included in the gesture recognition model may be determined based on actual needs, which is not limited by the embodiment of the present application. For example, if it is required to determine a gesture type based on a distance signal and an angle signal corresponding to the gesture, the gesture recognition model may include a distance doppler recognition channel, a pitch angle recognition channel and a horizontal angle recognition channel, and the signal matrix to be recognized determined in step 202 includes: a distance Doppler signal matrix to be identified, a pitch angle signal matrix to be identified and a horizontal angle signal matrix to be identified.

It can be understood that, in the embodiment of the present application, the gesture recognition model needs to be trained in advance, and assuming that the gesture recognition model obtained by the final training is as shown in fig. 3, the training process of the gesture recognition model may be: acquiring continuous multi-frame sample echo signals of various types of gestures, wherein each frame of sample echo signal comprises a plurality of sample linear frequency modulation signals; and performing signal compensation on each sample linear frequency modulation signal in each frame of sample echo signals to obtain a sample linear frequency modulation signal to be processed, and performing distance dimension Fourier transform on the sample linear frequency modulation signal to be processed to obtain a sample distance dimension signal corresponding to the sample linear frequency modulation signal. Wherein the gesture type may be: gesture types of sliding from left to right, right to left, bottom to top, top to bottom, rotating a finger clockwise, rotating a finger counterclockwise, and the like.

For each frame of sample echo signals, obtaining sample distance dimensional signals respectively associated with a plurality of sample linear frequency modulation signals corresponding to each frame of sample echo signals; arranging a plurality of sample distance dimensional signals according to the sequence of obtaining the receiving time of the sample linear frequency modulation signals corresponding to each sample distance dimensional signal to obtain a sample distance dimensional signal sequence; carrying out two-dimensional Fourier transform on the sample distance dimensional signal sequence to obtain a sample distance Doppler signal; carrying out peak detection on the sample range Doppler signal to obtain a sample peak signal; and processing the sample peak signal to obtain a sample angle signal corresponding to each frame of sample echo signal.

Further, determining a sample distance Doppler signal vector corresponding to the sample Doppler signal, and determining a sample pitch angle signal vector and a sample horizontal angle signal vector corresponding to the sample angle signal to obtain a sample signal vector corresponding to each frame of sample echo signal; splicing sample distance Doppler signal vectors respectively corresponding to each frame of sample echo signals to obtain a sample Doppler to-be-identified signal matrix; splicing sample pitch angle signal vectors respectively corresponding to each frame of sample echo signals to obtain a sample pitch angle to-be-identified signal matrix; and splicing sample horizontal angle signal vectors respectively corresponding to each frame of sample echo signals to obtain a sample horizontal angle to-be-identified signal matrix.

Respectively inputting a sample distance Doppler to-be-recognized image corresponding to a sample distance Doppler to-be-recognized signal matrix, a sample pitch angle to-be-recognized image corresponding to a sample pitch angle to-be-recognized signal matrix and a sample horizontal angle to-be-recognized image corresponding to a sample horizontal angle to-be-recognized signal matrix into a distance Doppler recognition channel, a pitch angle recognition channel and a horizontal angle recognition channel in an initial gesture recognition model to obtain three sample to-be-classified images, processing the three sample to-be-classified images by a loss function layer to obtain a sample classification result, judging whether the model is converged or not based on a loss function value of the loss function layer, and if so, determining that the model training is finished to obtain a gesture recognition model; otherwise, adjusting the model parameters of the initial gesture recognition model and continuously repeating the process until the model converges to obtain the gesture recognition model.

In this step, the at least two types of images to be recognized corresponding to the at least two types of signal matrices to be recognized are respectively input into the recognition channels corresponding to each type of image to be recognized in the pre-trained gesture recognition model, and the process of obtaining the gesture recognition result may include: and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result.

Each recognition channel processes the image to be recognized corresponding to the recognition channel, and the process of obtaining the image to be classified may include: the at least one feature extraction submodel connected in sequence extracts image features in the image to be identified to obtain a feature image corresponding to the image to be identified; and processing the characteristic image by the full connection layer to obtain an image to be classified.

Optionally, if the gesture recognition model provided in this embodiment is shown in fig. 3, a process of extracting image features in an image to be recognized by at least one feature extraction sub-model connected in sequence to obtain a feature image corresponding to the image to be recognized includes:

for the first feature extraction sub-model, the batch normalization layer performs normalization processing on the image to be identified to obtain a first normalized image; the convolution layer performs feature extraction on the first normalized image to obtain a first feature image; the pooling layer performs characteristic optimization processing on the first characteristic image to obtain a second characteristic image; and the activation layer performs characteristic activation processing on the second characteristic image to obtain a third characteristic image corresponding to the image to be identified.

For the second feature extraction sub-model, the batch normalization layer performs normalization processing on the third feature image to obtain a second normalized image; the convolution layer performs feature extraction on the second normalized image to obtain a fourth feature image; the pooling layer performs characteristic optimization processing on the fourth characteristic image to obtain a fifth characteristic image; and the activation layer performs characteristic activation processing on the fifth characteristic image to obtain a sixth characteristic image corresponding to the image to be identified.

For the third feature extraction sub-model, the batch normalization layer performs normalization processing on the sixth feature image to obtain a third normalized image; the convolution layer performs feature extraction on the third normalized image to obtain a seventh feature image; the pooling layer performs characteristic optimization processing on the seventh characteristic image to obtain an eighth characteristic image; and the activation layer performs characteristic activation processing on the seventh characteristic image to obtain a characteristic image corresponding to the image to be identified.

In summary, the gesture recognition method provided in the embodiment of the present application can obtain signal vectors respectively corresponding to multiple consecutive frames of echo signals; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result. The gesture type can be determined quickly and accurately.

An embodiment of the present application provides a gesture recognition apparatus, as shown in fig. 5, the apparatus 30 includes:

an obtaining module 301 configured to obtain signal vectors corresponding to the consecutive multiple frames of echo signals, respectively, where the signal vectors include at least two types of sub-signal vectors, and the signal vectors are determined based on all chirp signals corresponding to each frame of echo signals;

the splicing module 302 is configured to splice the same-type sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified;

the recognition module 303 is configured to input at least two types of images to be recognized corresponding to the at least two types of signal matrices to be recognized into recognition channels corresponding to each type of image to be recognized in a pre-trained gesture recognition model, respectively, so as to obtain a gesture recognition result.

Optionally, the gesture recognition model includes at least two recognition channels and a loss function layer connected to an output of each recognition channel, and the recognition module 303 is configured to:

each identification channel processes the image to be identified corresponding to the identification channel to obtain the image to be classified;

and processing at least two images to be classified by the loss function layer to obtain a gesture recognition result.

Optionally, the identification channel includes at least one feature extraction submodel and a full connection layer connected in sequence, and the identification module 303 is configured to:

the at least one feature extraction submodel connected in sequence extracts image features in the image to be identified to obtain a feature image corresponding to the image to be identified;

and processing the characteristic image by the full connection layer to obtain an image to be classified.

Optionally, as shown in fig. 6, the apparatus further includes a determining module 304 configured to:

determining a characteristic signal corresponding to each frame of echo signal based on all the linear frequency modulation signals corresponding to each frame of echo signal;

and determining a signal vector corresponding to each frame of echo signals based on the characteristic signals.

Optionally, the characteristic signal includes a distance signal and an angle signal, and the determining module 304 is configured to:

the characteristic signal includes a distance signal and an angle signal.

Optionally, the characteristic signal includes a doppler signal, a peak signal and an angle signal, and the determining module 304 is configured to:

determining a distance sub-signal vector corresponding to the distance signal, determining an angle sub-signal vector corresponding to the angle signal, and determining a peak sub-signal vector corresponding to the peak signal;

and combining the distance sub-signal vector, the peak sub-signal vector and the angle sub-signal vector to obtain a signal vector corresponding to each frame of echo signals.

In summary, the gesture recognition apparatus provided in the embodiment of the present application can obtain signal vectors corresponding to consecutive multi-frame echo signals, respectively; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result. The gesture type can be determined quickly and accurately.

Fig. 7 is a diagram illustrating a computer device according to an exemplary embodiment, which includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for system operation are also stored. The CPU401, ROM402, and RAM403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. Drivers are also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, the processes described above in fig. 2-3 may be implemented as computer software programs, according to embodiments of the present application. For example, various embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The above-described functions defined in the system of the present application are executed when the computer program is executed by a Central Processing Unit (CPU) 401.

It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. The described units or modules may also be provided in a processor, and may be described as: a processor includes an acquisition module, a stitching module, and an identification module. The names of the units or modules do not limit the units or modules in some cases, and for example, the acquiring module may also be described as a "signal vector acquiring module for acquiring signal vectors corresponding to echo signals of consecutive multiple frames respectively".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the gesture recognition method as described in the above embodiments.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A gesture recognition method, comprising:

and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result.

2. The method according to claim 1, wherein the gesture recognition model includes at least two recognition channels and a loss function layer connected to an output end of each recognition channel, and the obtaining of the gesture recognition result by inputting at least two types of images to be recognized corresponding to at least two types of signal matrices to be recognized into the recognition channel corresponding to each type of image to be recognized in the pre-trained gesture recognition model respectively comprises:

each identification channel processes the image to be identified corresponding to the identification channel to obtain an image to be classified;

3. The method according to claim 2, wherein the recognition channel comprises at least one feature extraction submodel and a full connection layer which are connected in sequence, and each recognition channel processes the image to be recognized corresponding to the recognition channel to obtain an image to be classified, including:

4. The method of any one of claims 1 to 3, wherein determining a signal vector based on all chirp signals corresponding to each frame of echo signals comprises:

determining a characteristic signal corresponding to each frame of echo signal based on all linear frequency modulation signals corresponding to each frame of echo signal;

5. The method of claim 4, wherein the feature signal comprises a distance signal and an angle signal, and wherein determining the signal vector corresponding to the echo signal of each frame based on the feature signal comprises:

determining a distance sub-signal vector corresponding to the distance signal, and determining an angle sub-signal vector corresponding to the angle signal;

and combining the distance signal sub-signal vector and the angle sub-signal vector to obtain a signal vector corresponding to each frame of echo signal.

6. The method of claim 4, wherein the feature signal comprises a distance signal, a peak signal and an angle signal, and wherein determining the sub-signal vector corresponding to the echo signal of each frame based on the feature signal comprises:

and combining the distance ion signal vector, the peak sub-signal vector and the angle sub-signal vector to obtain a signal vector corresponding to each frame of echo signal.

7. A gesture recognition system is characterized in that the system comprises a signal transmitting antenna, a signal receiving antenna, a radio frequency unit and a signal processing unit,

the radio frequency unit is further used for acquiring the chirp signals received by the signal receiving antenna and returned by the gesture, and sending the chirp signals to the signal processing unit;

the signal processing unit is used for acquiring signal vectors respectively corresponding to continuous multi-frame echo signals, the signal vectors comprise at least two types of sub-signal vectors, and the signal vectors are determined based on all linear frequency modulation signals corresponding to each frame of echo signals; splicing the same type of sub-signal vectors respectively corresponding to each frame of echo signal to obtain at least two types of signal matrixes to be identified; and respectively inputting at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in the pre-trained gesture recognition model to obtain a gesture recognition result.

8. A gesture recognition apparatus, comprising:

an acquisition module configured to acquire signal vectors respectively corresponding to a plurality of continuous frames of echo signals, the signal vectors including at least two types of sub-signal vectors, the signal vectors being determined based on all chirp signals corresponding to each frame of echo signals;

the splicing module is configured to splice the similar sub-signal vectors respectively corresponding to each frame of echo signals to obtain at least two types of signal matrixes to be identified;

and the recognition module is configured to input at least two types of images to be recognized corresponding to the at least two types of signal matrixes to be recognized into recognition channels corresponding to the images to be recognized in a pre-trained gesture recognition model respectively to obtain a gesture recognition result.

9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor being adapted to implement the method according to any of claims 1-6 when executing the program.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon for implementing the method according to any one of claims 1-6.