CN111382644A

CN111382644A - Gesture recognition method and device, terminal equipment and computer readable storage medium

Info

Publication number: CN111382644A
Application number: CN201811646013.0A
Authority: CN
Inventors: 陈炫言; 霰心培
Original assignee: TCL Research America Inc
Current assignee: TCL Corp; TCL Research America Inc
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-07

Abstract

The application belongs to the technical field of computers and information, and provides a gesture recognition method which comprises the steps of collecting ultrasonic image information and thermal imaging image information; inputting the acquired ultrasonic image information and the acquired thermal imaging image information into a gesture recognition model for processing to obtain a predicted image output by the gesture recognition model; judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not; and if a target gesture image matched with the predicted image exists in the prestored gesture images, acquiring a control instruction corresponding to the target gesture image, and executing the control instruction. By adopting the method, the ultrasonic technology and the thermal imaging technology are input into the gesture recognition model for processing, the respective defects are mutually compensated, the defects of the equipment technology are balanced, and the recognition speed, the recognition precision and the robustness of the equipment are improved.

Description

Gesture recognition method and device, terminal equipment and computer readable storage medium

Technical Field

The present application belongs to the field of computer and information technology, and in particular, to a gesture recognition method, a gesture recognition apparatus, a terminal device, and a computer-readable storage medium.

Background

At present, gesture recognition devices on the market are supported by single equipment, such as infrared rays, radars, cameras, electric fields and the like. There are various drawbacks supported by a single device: such as infrared photon detection, are susceptible to light; the radar is easy to be absorbed by organisms and rebounded by metal; the camera shooting identification power consumption is large, and the identification distance of the power plant is too short. Due to the single equipment and technology, the defects of the equipment and the problems of delay and poor robustness caused by the imperfect technology are caused.

Disclosure of Invention

In view of the above, embodiments of the present application provide a gesture recognition method, a gesture recognition apparatus, a terminal device and a computer readable storage medium, so as to solve the technical problems of device defects and slow recognition caused by a single gesture recognition apparatus and single technology in the prior art.

A first aspect of an embodiment of the present application provides a gesture recognition method, including:

acquiring ultrasonic image information and thermal imaging image information;

inputting the acquired ultrasonic image information and the acquired thermal imaging image information into a gesture recognition model for processing to obtain a predicted image output by the gesture recognition model;

judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not;

and if a target gesture image matched with the predicted image exists in the prestored gesture images, acquiring a control instruction corresponding to the target gesture image, and executing the control instruction.

A second aspect of an embodiment of the present application provides a gesture recognition apparatus, including:

the acquisition module is used for acquiring ultrasonic image information and thermal imaging image information;

the predicted image output module is used for inputting the acquired ultrasonic image information and the acquired thermal imaging image information into a gesture recognition model for processing to obtain a predicted image output by the gesture recognition model;

the judging module is used for judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not;

and the processing module is used for acquiring a control instruction corresponding to the target gesture image and executing the control instruction when the target gesture image matched with the predicted image exists in the prestored gesture images.

A third aspect of the embodiments of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the gesture recognition method when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the gesture recognition method described above.

Compared with the prior art, the embodiment of the application has the advantages that:

in the embodiment of the application, firstly, ultrasonic image information and thermal imaging image information are collected, then, the collected ultrasonic image information and thermal imaging image information are input into a gesture recognition model to be processed, a predicted image output by the gesture recognition model is obtained, then, the predicted image is compared with a pre-stored gesture image, whether a target gesture image matched with the predicted image exists in the pre-stored gesture image is searched, if yes, a control instruction corresponding to the target gesture image is searched in a gesture instruction database, and equipment executes the control instruction.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart illustrating an implementation process of a gesture recognition method according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a gesture recognition method according to a second embodiment of the present application;

fig. 3 is a schematic diagram of a first neural network fusing an ultrasound image signal and a thermal imaging image signal to obtain a fused image in a gesture recognition method according to the second embodiment of the present application;

fig. 4 is a schematic diagram of a second neural network performing prediction processing on a fused image sent by a first neural network to obtain a predicted image in the gesture recognition method according to the second embodiment of the present application;

FIG. 5 is a schematic diagram of a gesture recognition apparatus provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In particular implementations, the terminal devices described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal device that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal device may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal device supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal device may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

In addition, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Fig. 1 is a schematic flow chart illustrating an implementation process of a gesture recognition method according to an embodiment of the present application, which is detailed as follows:

step S101, collecting ultrasonic image information and thermal imaging image information.

In the embodiment, an ultrasonic signal is transmitted to a target object through a preset ultrasonic device, and a reflected signal of the ultrasonic signal is received; the ultrasonic device converts the reflected signal into an electric signal, records the waveform state of the electric signal in a storage unit, performs distance measurement and identification on the target object according to the Doppler effect and the ultrasonic flight time principle, and finally processes the electric signal information by using a signal processing system to obtain the ultrasonic image information of the target object.

In the embodiment, a preset thermal imaging device is used for acquiring a power signal and converting the acquired power signal into an electric signal; simulating the temperature spatial distribution of the target object according to the electric signal; thermal imaging image information is formed according to the spatial distribution of the temperature of the target object.

In particular, the thermal imaging device may be an infrared detector.

In this embodiment, when acquiring the ultrasound image information and the thermal imaging image information, the frequency controller controls the acquisition frequency of the ultrasound image information and the thermal imaging image information.

Step S102, inputting the acquired ultrasonic image information and the acquired thermal imaging image information into a gesture recognition model for processing to obtain a predicted image output by the gesture recognition model;

in the method, training sample data is obtained firstly, wherein the training sample data comprises an ultrasonic image sample, a thermal imaging image sample and a gesture image stored in a gesture image database;

inputting training sample data into a neural network model for training to obtain the gesture recognition model, wherein the neural network respectively extracts features of the ultrasonic image sample, the thermal imaging image sample and the gesture image stored in the gesture image database and compares the features to obtain a predicted image with higher probability.

Step S103, judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not;

specifically, the predicted image is compared with the gesture images in the gesture image set, the comparison can be performed through selecting the features, the structural features of the predicted image and the structural features of the gesture images are respectively extracted, then the pixel contact ratio of the structural features is compared, and if the pixel contact ratio is higher than a certain preset value, the predicted image is matched with the gesture images. Of course, other methods may be used for alignment, such as by aligning the contours of features in the image, the distances between features, the locations, the contrast, and so forth.

And step S104, if a target gesture image matched with the predicted image exists in the prestored gesture images, acquiring a control instruction corresponding to the target gesture image, and executing the control instruction.

In this embodiment, a gesture instruction database is also preset. The gesture instruction database stores gesture instructions corresponding to the gesture images in the gesture images. If the target gesture image matched with the predicted image exists in the gesture image, searching a control instruction corresponding to the target gesture image in a gesture instruction database, and enabling the equipment to execute the control instruction; if not, no response is made.

In this embodiment, the gesture recognition model is communicated with the device through the connector, and the device receives the control instruction of the gesture recognition model through the connector and executes the control instruction.

Referring to fig. 2, it is a schematic diagram of an implementation flow of a gesture recognition method provided in the second embodiment of the present application, where the method may include:

step S201, acquiring ultrasound image information and thermal imaging image information.

For a specific implementation process of step S201, reference may be made to step S101, which is not described herein again.

Step S202, the ultrasonic image information and the thermal imaging image information are preprocessed through a first neural network respectively to form a first image data set and a second image data set.

Referring to fig. 3, in this step, when the ultrasonic image signal and the thermal imaging image signal are respectively preprocessed to form a first image data set and a second image data set, the preprocessing includes mirror image flipping, resizing, color adjustment, contrast adjustment, and the like, on each image, and then the image data sets are formed.

Step S203, a first preliminary image feature is obtained from the first image dataset, and a second preliminary image feature is obtained from the second image dataset.

In this step, the first preliminary image feature and the second preliminary image feature are extracted in the same manner. The method specifically comprises the following steps: and (4) processing the formed image data set to enter a convolution layer for primary feature extraction. The image dataset enters the convolutional layer and the input data is the pixel values of the image. In the convolutional layer, a plurality of convolutional kernels are provided, the features of extracting input data of different convolutional kernels are different, and the more convolutional kernels are, the more input data features are extracted.

Step S204, extracting a first target image characteristic meeting a first preset condition from the first preliminary image characteristic, and extracting a second target image characteristic meeting a second preset condition from the second preliminary image characteristic;

in the step, the first preliminary image features extracted by the convolutional layer enter a pooling layer, and the first target image features meeting the first preset condition are further extracted through a pooling algorithm. The first preset condition and the second preset condition are whether a target gesture image matched with the predicted image exists in prestored gesture images, and the first target feature image is an image obtained by feature extraction of continuously performing maximum pooling or average pooling on a feature image of a previous layer in a pooling layer. The pooling algorithm may specifically be a maximum pooling algorithm or an average pooling algorithm. The maximum pooling is that the maximum value of the image area is selected as the value after the image area is pooled; and average pooling calculates the average of the image region as the pooled value for that region.

After further extraction through pooling calculation, parameters of the first neural network model are greatly reduced, the training efficiency of the model is improved, and the adaptability of the neural network to image changes is enhanced.

Step S205, fitting the extracted first target image feature and the second target image feature to obtain a fused image;

in this step, the first target image feature and the second target image feature extracted by pooling calculation respectively represent a part of features of the image, namely local features, and after entering the full-connection layer, the first target image feature and the second target image feature are connected and fitted to obtain a fused image, and finally data of the fused image is output by the output layer. Connection fitting is to connect each node of the fully connected layer with all nodes of the previous layer to integrate the extracted features.

Step S206, pre-storing the gesture image to a first neural network, and performing feature extraction on the pre-stored gesture image through the first neural network to obtain a gesture image set with features;

in the step, the existing gesture images are collected, customized and made into the required gesture images, and put into the first neural network, and the gesture image set with the preliminary features is obtained through the preliminary feature extraction of the first neural network, but the feature extraction of the first neural network cannot completely enrich and refine the gesture images in the gesture image set, and along with the increase of data, the recognition precision of the gesture images is further improved, so that the gesture images need to be put into the second neural network for continuous training and learning.

Step S207, a second neural network receives the gesture image set with the characteristics of the first neural network and the fusion image, and performs third preliminary image characteristic extraction and fourth preliminary image characteristic extraction on the gesture image in the gesture image set and the fusion image respectively;

referring to fig. 4, in this step, the gesture image set with features and the fusion image are respectively subjected to third preliminary image feature extraction and fourth preliminary image feature extraction in the second neural network, and different preliminary image features are extracted through a plurality of different convolution kernels. And through the training of the first neural network, the output fused images are a group of similar and clear gesture instruction images, at the moment, the output fused images are taken as a training set and are put into a second neural network for continuous screening, and the extraction of the images from the group of data sets is the extraction of the fourth preliminary features. The trained gesture recognition image is a gesture recognition image collected in advance, is equivalent to an image data set, is stored in a database, and has the function of comparing with a fusion image input by a first neural network to find a gesture image with the highest similarity. Meanwhile, because the neural network is a self-learning process, the third preliminary image feature extraction is carried out on the trained gesture image, the trained gesture image is compared and then continuously learned, because the fused image is input into the second neural network and enters the database, the gestures in the database are enriched, the gesture recognition accuracy is improved, and the gestures can be recognized more easily in the subsequent process.

Step S208, extracting a third target image characteristic meeting a third preset condition from the third preliminary image characteristic, and extracting a fourth target image characteristic meeting a fourth preset condition from the fourth preliminary image characteristic;

similarly, in this step, a maximum pooling algorithm or an average pooling algorithm is used to further extract a second target image feature meeting a third preset condition from the extracted third preliminary image feature, and a fourth target image feature meeting a fourth preset condition from the extracted fourth preliminary image feature. In this step, the third preset condition and the fourth preset condition refer to whether an image with excessive noise exists in the third preliminary image feature and the fourth preliminary image feature, and the third target image feature and the fourth target image feature are clearer images extracted through pooling.

Step S209, classifying the extracted fourth target image features to obtain a classified image, and extracting fifth target image features from the classified image;

in this step, all the fourth target image features extracted by pooling calculation enter the full-link layer for classification to obtain classified image data, and the full-link layer plays a role of a classifier and further extracts fifth target image features. Specifically, classification is performed using a Softmax function, and input values are mapped to values in the (0, 1) interval by the action of the Softmax function, and the values are added up to 1.

Step S210, comparing and predicting the fifth target image feature and the third target image feature of the classified image, and obtaining a predicted image with the highest probability.

In this step, the Softmax layer performs function operation on the fifth target image feature of the classified image data, that is, performs probability transformation, and obtains an image with the highest probability as a predicted image. When the fifth target image feature is compared with the third target image feature, each pixel point on each feature is compared, because one image is composed of many pixel points, such as 40 × 60, and 2400 pixel points in total, probability calculation is to extract how many pixel points which are coincided on each position of each feature comparison at first, and then the predicted image with the highest probability is selected and the highest coincidence degree is obtained. As shown in fig. 4, for example, the gesture image with higher probability such as fist making, hand stretching, forward, shaking, etc. Finally, these predicted images are output through the output layer.

Step S211, judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not;

step S212, if a target gesture image matched with the predicted image exists in the prestored gesture images, acquiring a control instruction corresponding to the target gesture image, and executing the control instruction.

The specific implementation process of steps S211 and S212 may refer to steps S103 and S104, which are not described herein again.

It should be understood that, in the above embodiments, the order of execution of the steps is not meant to imply any order, and the order of execution of the steps should be determined by their function and inherent logic, and should not limit the implementation process of the embodiments of the present invention.

Fig. 5 is a schematic diagram of a gesture recognition apparatus 3 provided in a third embodiment of the present application, and only the relevant portions of the embodiment of the present application are shown for convenience of description.

The gesture recognition device 3 may be a software unit, a hardware unit, or a combination of software and hardware unit built in a terminal device such as a tablet pc, a notebook, a television, and a game machine, or may be integrated as an independent pendant into the terminal device.

The gesture recognition apparatus 3 includes:

the acquisition module 31 is used for acquiring ultrasonic image information and thermal imaging image information;

a predicted image output module 32, configured to input the acquired ultrasound image information and the acquired thermal imaging image information to a neural network model for processing, so as to obtain a predicted image output by the neural network model;

the judging module 33 is configured to judge whether a target gesture image matched with the predicted image exists in pre-stored gesture images;

and the processing module 34 is configured to, when a target gesture image matching the predicted image exists in the pre-stored gesture images, acquire a control instruction corresponding to the target gesture image, and execute the control instruction.

In particular, the acquisition module 31 comprises in turn:

the electrical signal conversion submodule 311 is configured to collect a power signal through a preset thermal imaging device, and convert the collected power signal into an electrical signal;

a simulation submodule 312 for simulating a spatial distribution of the temperature of the target object based on the electrical signal;

a thermal image information generation sub-module 313 for forming the thermal imaging image information according to the spatial distribution of the temperature of the target object.

Fig. 6 is a schematic diagram of a terminal device according to a fourth embodiment of the present application. As shown in fig. 6, the terminal device 4 of the present embodiment includes: a processor 41, a memory 42 and a computer program 43, such as a gesture recognition program, stored in said memory 42 and executable on said processor 41. The processor 41, when executing the computer program 43, implements the steps in the above-described embodiments of the gesture recognition method, such as step 101 shown in fig. 1, acquiring ultrasound image information and thermal imaging image information; s102, inputting the acquired ultrasonic image information and the acquired thermal imaging image information into a neural network model for processing to obtain a predicted image output by the neural network model; s103, judging whether a target gesture image matched with the predicted image exists in prestored gesture images or not; and S104, if a target gesture image matched with the predicted image exists in the prestored gesture images, acquiring a control instruction corresponding to the target gesture image, and executing the control instruction. The processor 41 integrates the ultrasonic technology, the thermal imaging technology and the neural network by adopting the method, and makes up the respective defects mutually, so that the identification speed, the identification precision and the robustness of the equipment are improved. Alternatively, the processor 41 implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 31 to 34 shown in fig. 5, when executing the computer program 43.

The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 4 may include, but is not limited to, a processor 41 and a memory 42. Processor 41 it will be understood by those skilled in the art that fig. 6 is merely an example of terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or less components than those shown, or combine certain components, or different components, for example, terminal device 4 may also include input output devices, network access devices, buses, etc.

The Processor 41 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 42 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 42 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 42 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 42 is used for storing the computer program 43 and other programs and data required by the terminal device 4. The memory 42 may also be used to temporarily store data that has been output or is to be output.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A gesture recognition method, comprising:

acquiring ultrasonic image information and thermal imaging image information;

2. The gesture recognition method according to claim 1, wherein acquiring the ultrasonic image information comprises:

transmitting an ultrasonic signal through a preset ultrasonic device, and receiving a reflected signal of the ultrasonic signal;

and converting the reflected signal into an electric signal, and processing the electric signal information to obtain ultrasonic image information of the target object reflecting the ultrasonic information.

3. The gesture recognition method of claim 1, wherein acquiring the thermal imaging image information comprises:

acquiring a power signal through a preset thermal imaging device, and converting the acquired power signal into an electric signal;

simulating the temperature spatial distribution of the target object according to the electric signal;

and forming the thermal imaging image information according to the temperature space distribution of the target object.

4. The gesture recognition method according to any one of claims 1 to 3, characterized in that the gesture recognition method further comprises:

acquiring training sample data, wherein the training sample data comprises an ultrasonic image sample, a thermal imaging image sample and a gesture image stored in a gesture image database;

inputting the training sample data into a neural network model for training to obtain the gesture recognition model.

5. The gesture recognition method of claim 4, wherein the neural network model comprises a first neural network and a second neural network.

6. The gesture recognition method according to claim 5, wherein the step of inputting the acquired ultrasound image information and the thermal imaging image information to a gesture recognition model for processing to obtain a predicted image output by the gesture recognition model comprises:

respectively preprocessing the ultrasonic image information and the thermal imaging image information through the first neural network to form a first image data set and a second image data set;

obtaining a first preliminary image feature from the first image dataset and a second preliminary image feature from the second image dataset;

extracting a first target image feature meeting a first preset condition from the first preliminary image feature, and extracting a second target image feature meeting a second preset condition from the second preliminary image feature;

and fitting the extracted first target image characteristic and the second target image characteristic to obtain a fused image.

Pre-storing the gesture image to the first neural network, and performing feature extraction on the pre-stored gesture image through the first neural network to obtain a gesture image set with features;

a second neural network receives the gesture image set with the characteristics of the first neural network and the fusion image, and performs third preliminary image characteristic extraction and fourth preliminary image characteristic extraction on the gesture images in the gesture image set and the fusion image respectively;

extracting a third target image feature meeting a third preset condition from the third preliminary image feature, and extracting a fourth target image feature meeting a fourth preset condition from the fourth preliminary image feature;

classifying the extracted fourth target image features to obtain a classified image, and extracting fifth target image features from the classified image;

and comparing and predicting the fifth target image characteristic and the third target image characteristic of the classified image to obtain a predicted image with the highest probability.

7. The gesture recognition method according to claim 6, wherein preprocessing the ultrasound image information and the thermal imaging image information by the first neural network to form a first image data set and a second image data set includes mirror flipping, resizing, color adjustment, and contrast adjustment of images.

8. A gesture recognition apparatus, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.