CN110209273B

CN110209273B - Gesture recognition method, interaction control method, device, medium and electronic equipment

Info

Publication number: CN110209273B
Application number: CN201910435353.7A
Authority: CN
Inventors: 黄锋华
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2022-03-01
Anticipated expiration: 2039-05-23
Also published as: CN110209273A

Abstract

The disclosure provides a gesture recognition method and device, a man-machine interaction method and device, a computer readable storage medium and electronic equipment, and belongs to the technical field of man-machine interaction. The gesture recognition method is applied to electronic equipment, the electronic equipment comprises a first camera and a second camera which are arranged on the same side of the electronic equipment, and the method comprises the following steps: acquiring a first gesture image through a first camera, and acquiring a second gesture image through a second camera, wherein the first gesture image is a depth image, and the second gesture image is a plane image; when the first gesture image is detected not to reach the preset quality standard, processing the second gesture image to identify the gesture in the second gesture image; and when the first gesture image is detected to reach the preset quality standard, processing the first gesture image to identify the gesture in the first gesture image. The gesture recognition method and the gesture recognition device can realize the gesture recognition algorithm with higher robustness under the existing hardware condition, and improve the accuracy of gesture recognition.

Description

Gesture recognition method, interaction control method, device, medium and electronic equipment

Technical Field

The present disclosure relates to the field of human-computer interaction technologies, and in particular, to a gesture recognition method, an interaction control method, a gesture recognition apparatus, an interaction control apparatus, a computer-readable storage medium, and an electronic device.

Background

The gesture-based human-computer interaction is that under the condition of not contacting with equipment, the operation gesture of a person is recognized by utilizing the technologies of computer vision, graphics and the like, and is converted into a control instruction for the equipment. The gesture interaction is a new interaction mode following a mouse, a keyboard and a touch screen, can get rid of the dependence of the traditional interaction mode on input equipment, and has wide application in the fields of virtual reality, augmented reality and the like.

Gesture interaction has also been developed in mobile terminals such as smart phones and tablet computers, and at present, mobile phones equipped with TOF (Time of Flight) cameras have appeared, and the gestures in the mobile phones are recognized by shooting depth images to perform interaction control. However, the TOF camera on the existing mobile phone has limited capability, cannot accurately detect depth information of an object too close to or too far from the camera, and has poor processing capability for objects made of black materials or highly reflective materials, scenes with large illumination changes and the like, so that the algorithm robustness of gesture recognition is low, and normal interaction is affected.

Therefore, how to improve the accuracy of gesture recognition, implement an algorithm with higher robustness, and ensure normal interaction under the existing hardware condition is a problem to be solved urgently in the prior art.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides a gesture recognition method, an interaction control method, a gesture recognition apparatus, an interaction control apparatus, a computer-readable storage medium, and an electronic device, thereby overcoming, at least to some extent, the problem in the prior art that a gesture cannot be recognized accurately.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, a gesture recognition method is provided, which is applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device, and the method includes: acquiring a first gesture image through the first camera, and acquiring a second gesture image through the second camera, wherein the first gesture image is a depth image, and the second gesture image is a plane image; when the first gesture image is detected not to reach a preset quality standard, processing the second gesture image to identify a gesture in the second gesture image; when the first gesture image is detected to reach the preset quality standard, processing the first gesture image to recognize a gesture in the first gesture image.

In an exemplary embodiment of the disclosure, whether the first gesture image meets a preset quality standard is detected by: detecting whether the depth value of each pixel point in the first gesture image is invalid or not; counting the proportion of pixel points with invalid depth values in the first gesture image; and judging whether the proportion is smaller than a preset proportion threshold value, if so, enabling the first gesture image to reach a preset quality standard.

In an exemplary embodiment of the disclosure, whether the first gesture image meets a preset quality standard is detected by: converting the first gesture image into a plane image, detecting whether the similarity between the first gesture image and the second gesture image reaches a preset similarity threshold value, and if so, enabling the first gesture image to reach a preset quality standard.

In an exemplary embodiment of the present disclosure, before the processing the first gesture image to identify the gesture in the first gesture image, the method further comprises: registering the first gesture image with the second gesture image; performing optimization processing on the registered first gesture image by using the registered second gesture image, wherein the optimization processing comprises any one or more of the following steps: edge filtering, hole filling and distortion correction.

In an exemplary embodiment of the present disclosure, the first camera is a TOF camera based on infrared light, and the first gesture image is a TOF image; the method further comprises the following steps: acquiring an infrared image through the first camera; pre-processing the TOF image using the infrared image, wherein the pre-processing comprises any one or more of: screenshot, noise removal and pixel point filtering based on depth value confidence.

In an exemplary embodiment of the present disclosure, the gesture in the first gesture image comprises information of a hand skeletal point and/or information of a hand gesture; the processing the first gesture image to identify a gesture in the first gesture image, comprising: recognizing the first gesture image by utilizing a first neural network model trained in advance to obtain information of the hand skeleton points; and/or recognizing the first gesture image by utilizing a pre-trained second neural network model to obtain the information of the hand gesture.

In an exemplary embodiment of the present disclosure, before identifying the first gesture image using the first neural network model or the second neural network model, the method further comprises: and performing background subtraction on the first gesture image to obtain a first gesture image only containing a hand foreground image.

According to a second aspect of the present disclosure, an interaction control method is provided, which is applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device, and the method includes: recognizing a gesture by any one of the gesture recognition methods; and executing a control instruction according to the gesture.

In an exemplary embodiment of the present disclosure, the gesture includes information of a hand skeletal point and/or information of a hand gesture; the executing the control instruction according to the gesture comprises: executing a control instruction corresponding to the hand gesture; and/or triggering to execute a control option where the mapping point is located according to the mapping point of the hand skeleton point in a graphical user interface of the electronic equipment.

According to a third aspect of the present disclosure, a gesture recognition apparatus is provided, which is applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device, and the apparatus includes: the image acquisition module is used for acquiring a first gesture image through the first camera and acquiring a second gesture image through the second camera, wherein the first gesture image is a depth image, and the second gesture image is a plane image; the first recognition module is used for processing the second gesture image to recognize the gesture in the second gesture image when the first gesture image is detected not to reach the preset quality standard; the second recognition module is used for processing the first gesture image to recognize the gesture in the first gesture image when the first gesture image is detected to reach the preset quality standard.

In an exemplary embodiment of the present disclosure, the gesture recognition apparatus further includes: a quality detection module; the quality detection module further comprises: the depth value detection unit is used for detecting whether the depth value of each pixel point in the first gesture image is invalid or not; the invalid ratio counting unit is used for counting the ratio of pixel points with invalid depth values to the first gesture image; and the quality standard judging unit is used for judging whether the proportion is smaller than a preset proportion threshold value, and if so, the first gesture image reaches the preset quality standard.

In an exemplary embodiment of the present disclosure, the gesture recognition apparatus further includes: and the quality detection module is used for converting the first gesture image into a plane image and detecting whether the similarity between the first gesture image and the second gesture image reaches a preset similarity threshold value, and if so, the first gesture image reaches the preset quality standard.

In an exemplary embodiment of the disclosure, the second recognition module is further configured to, before processing the first gesture image to recognize the gesture in the first gesture image, register the first gesture image with the second gesture image, and perform optimization processing on the registered first gesture image by using the registered second gesture image, where the optimization processing includes any one or more of the following: edge filtering, hole filling and distortion correction.

In an exemplary embodiment of the present disclosure, the first camera is a TOF camera based on infrared light, and the first gesture image is a TOF image; the image acquisition module is used for acquiring the TOF image and the second gesture image and simultaneously acquiring an infrared image through the first camera; the gesture recognition apparatus further includes: a preprocessing module, configured to perform preprocessing on the TOF image by using the infrared image, where the preprocessing includes any one or more of: screenshot, noise removal and pixel point filtering based on depth value confidence.

In an exemplary embodiment of the present disclosure, the gesture in the first gesture image comprises information of a hand skeletal point and/or information of a hand gesture; the second recognition module comprises a bone point recognition unit and/or a gesture recognition unit; the skeleton point recognition unit is used for recognizing the first gesture image by using a first neural network model trained in advance to obtain information of the hand skeleton points, and the gesture recognition unit is used for recognizing the first gesture image by using a second neural network model trained in advance to obtain information of the hand gesture.

In an exemplary embodiment of the present disclosure, the second identifying module further includes: a background subtraction unit configured to perform background subtraction on the first gesture image to obtain a first gesture image including only a hand foreground image before the first gesture image is recognized by the bone point recognition unit or the gesture recognition unit.

According to a fourth aspect of the present disclosure, an interactive control apparatus is provided, which is applied to an electronic device, where the electronic device includes a first camera and a second camera, the first camera and the second camera are disposed on the same side of the electronic device, and the apparatus includes: the image acquisition module is used for acquiring a first gesture image through the first camera and acquiring a second gesture image through the second camera; the first recognition module is used for processing the second gesture image to recognize the gesture in the second gesture image when the first gesture image is detected not to reach the preset quality standard; the second recognition module is used for processing the first gesture image to recognize the gesture in the first gesture image when the first gesture image is detected to reach the preset quality standard; and the instruction execution module is used for executing a control instruction according to the gesture.

In an exemplary embodiment of the present disclosure, the interactive control device includes all modules/units included in any one of the gesture recognition devices, and the instruction execution module.

In an exemplary embodiment of the present disclosure, the gesture comprises information of the hand skeletal points and/or information of the hand gesture; the instruction execution module comprises a first execution unit and/or a second execution unit; the first execution unit is used for executing a control instruction corresponding to the hand gesture, and the second execution unit is used for triggering and executing a control option where a mapping point is located according to the mapping point of the hand skeleton point in a graphical user interface of the electronic device.

According to a fifth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the gesture recognition method of any of the above or the interaction control method of any of the above.

According to a sixth aspect of the present disclosure, there is provided an electronic device comprising: the first camera is used for acquiring a first gesture image, and the first gesture image is a depth image; the second camera is used for acquiring a second gesture image, and the second gesture image is a plane image; a processor; and a memory for storing executable instructions of the processor; the first camera and the second camera are arranged on the same side of the electronic equipment; the processor is configured to perform, via execution of the executable instructions: the gesture recognition method of any one of the above items, so as to recognize the gesture in the first gesture image or the second gesture image; or the interactive control method of any one of the above, to recognize the gesture in the first gesture image or the second gesture image, and execute the control instruction according to the gesture.

Exemplary embodiments of the present disclosure have the following advantageous effects:

the method comprises the steps that a first gesture image with depth information and a second planar gesture image are collected through a first camera and a second camera of the electronic device respectively, when the quality of the first gesture image is high, a gesture is recognized through processing of the first gesture image, and when the quality of the first gesture image is low, a gesture is recognized through processing of the second gesture image. Therefore, under the existing hardware condition, the gesture recognition algorithm with higher robustness is realized, the influence of the defects of the depth camera on the recognition result is overcome, and the accuracy of gesture recognition is improved. Moreover, the exemplary embodiment starts from application scenes such as a mobile terminal, the image acquisition and image processing processes are simple, and the gesture recognition algorithm has a low computation amount, so that the exemplary embodiment has high applicability.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 illustrates a flow chart of a method of gesture recognition in the present exemplary embodiment;

FIG. 2 illustrates a sub-flow diagram of a method of gesture recognition in the present exemplary embodiment;

FIG. 3 shows a flowchart of an interaction control method in the present exemplary embodiment;

fig. 4 is a block diagram showing a configuration of a gesture recognition apparatus in the present exemplary embodiment;

fig. 5 is a block diagram showing the structure of an interaction control apparatus in the present exemplary embodiment;

FIG. 6 illustrates a computer-readable storage medium for implementing the above-described method in the present exemplary embodiment;

fig. 7 shows a block diagram of an electronic device for implementing the above method in the present exemplary embodiment;

fig. 8 shows a block diagram of another electronic device for implementing the method in the exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

An exemplary embodiment of the present disclosure first provides a gesture recognition method, which may be applied to an electronic device including a first camera and a second camera, which may be two cameras or camera modules with different functions. The first camera and the second camera are arranged on the same side of the electronic device, and are usually arranged on the front side or the back side of the electronic device at the same time for collecting images on the same side of the electronic device. The execution main body of the present exemplary embodiment may be a mobile phone, a tablet computer, or a smart television, a personal computer, etc. configured with two cameras, taking the mobile phone as an example, the first camera and the second camera may be two front cameras or two rear cameras, or the first camera is a front camera embedded in the front of the mobile phone, and the second camera is a front camera of a lifting type, etc., which is not limited in this disclosure.

Fig. 1 shows a method flow of the present exemplary embodiment, including steps S110 to S130:

and step S110, acquiring a first gesture image through the first camera, and acquiring a second gesture image through the second camera.

The first camera may be a depth camera, such as a TOF camera, a structured light camera, or the like, and takes an image including depth information of the object, such as a TOF image, a structured light three-dimensional image, or the like; the second camera is a common plane imaging camera, and the plane image shot by the second camera can be an RGB image, a gray scale image and the like. In this exemplary embodiment, a user may start a gesture recognition function on a mobile phone to start a first camera and a second camera, and then the user performs a gesture operation in an area in front of the mobile phone, where the first camera and the second camera in front of the mobile phone respectively capture a depth image and a plane image related to the gesture, and the first camera and the second camera are respectively a first gesture image and a second gesture image. The first gesture image and the second gesture image are simultaneously captured images, for example, the first camera and the second camera synchronously capture images, the two images are the same frame of image, and the gestures recorded by the two images are the same gesture.

Step S120, when the detected first gesture image does not reach the preset quality standard, processing the second gesture image to identify the gesture in the second gesture image;

in step S130, when the detected first gesture image reaches the preset quality standard, the first gesture image is processed to identify a gesture in the first gesture image.

In the exemplary embodiment, considering that the depth camera on the electronic device has limited capability, the depth information of an object too close to or too far from the camera cannot be accurately detected, and the processing capability for an object made of a black material or a highly reflective material, a scene with large illumination change, and the like is poor, so that the accuracy of the image content is low under the condition that the first gesture image quality is poor, and the first gesture image is difficult to be used as a basis for gesture recognition. The preset quality criterion of the exemplary embodiment is to determine whether the quality of the first gesture image meets the standard, determine whether the image is available, if not, recognize the gesture from the second gesture image, and if so, recognize the gesture from the first gesture image. Whether the gesture is recognized from the first gesture image or the second gesture image is to correctly recognize the operation gesture of the user, if the first gesture image has high quality and contains depth information, the richness of the information is higher than that of the planar second gesture image, the accuracy of recognizing the gesture from the first gesture image is higher, the step S130 is preferentially executed, and otherwise, the step S120 is executed.

In image processing, a deep learning model, such as a pre-trained convolutional neural network model, may be used, and an image to be recognized is input into the model to obtain a result of gesture recognition. Since the dimensions or the number of channels of the first gesture image and the second gesture image are usually different, models may be trained for the two types of images, for example, the model for recognizing the first gesture image (RGB-D image) is a model for 4-channel input, and the model for recognizing the second gesture image (RGB image) is a model for 3-channel input. The image processing may also be performed by using a gesture comparison method, for example, a plurality of standard gestures are predetermined, a gesture portion in the image to be recognized is extracted, and the gesture is recognized as the gesture if it is determined that the gesture is closest to the standard gesture. Generally, gesture recognition is a continuous process, the first camera and the second camera collect depth images and plane images of continuous frames, and therefore gestures in a current frame image can also be judged by combining gestures in a previous frame image, for example, the coincidence degree of the previous frame image and the current frame image is detected, if the coincidence degree is higher, gestures of a user are considered to be unchanged, a gesture recognition result of the previous frame is used as a gesture recognition result of the current frame, or after the depth images and the plane images of continuous multiple frames are collected, steps S120 and S130 are started, if all or more than a certain proportion of images in the depth images of the continuous multiple frames reach a preset quality standard, the gestures are recognized through changes of hands in the depth images of the continuous multiple frames, and otherwise, the gestures are recognized through changes of the hands in the plane images of the continuous multiple frames. The present disclosure does not limit the specific manner of image processing.

Based on the above description, the exemplary embodiment collects a first gesture image with depth information and a second gesture image with a plane through a first camera and a second camera of the electronic device, respectively, and recognizes a gesture through processing of the first gesture image when the first gesture image quality is high, and recognizes a gesture through processing of the second gesture image when the first gesture image quality is low. Therefore, under the existing hardware condition, the gesture recognition algorithm with higher robustness is realized, the influence of the defects of the depth camera on the recognition result is overcome, and the accuracy of gesture recognition is improved. Moreover, the exemplary embodiment starts from application scenes such as a mobile terminal, the image acquisition and image processing processes are simple, and the gesture recognition algorithm has a low computation amount, so that the exemplary embodiment has high applicability.

When the quality of the first gesture image is detected, whether the depth information in the image can accurately reflect the actual shooting hand condition is mainly detected, based on the thought principle, for different types of depth images, depth images with different image information types and depth images with different application scenes, the adopted method and standard can be different according to different conditions, and the method is not limited by the disclosure, and a plurality of specific detection method examples are provided below.

(1) In an exemplary embodiment, as shown in fig. 2, detecting whether the first gesture image meets the preset quality standard may be implemented by the following steps S201 to S203:

step S201, detecting whether the depth value of each pixel point in the first gesture image is invalid;

step S202, counting the proportion of pixel points with invalid depth values in the first gesture image;

step S203, determining whether the ratio is smaller than a preset ratio threshold, and if so, the first gesture image reaches a preset quality standard.

When the first camera shoots the first gesture image, if the distance between an object or a part of the object and the camera exceeds a detection range, or abnormal illumination conditions exist in a scene, for example, the hand image is overexposed due to too strong illumination, the camera cannot accurately detect corresponding partial depth values, the depth values of pixel points corresponding to the object are generally output to be invalid values or abnormal values, for example, the TOF camera shoots the hand image, if the distance between the hand and the camera is far, the TOF sensed by the TOF camera exceeds an upper limit, the depth values of the pixel points of the hand can be recorded as an upper limit value or other abnormal values, and therefore the depth values of the pixel points are not credible. If the proportion of the pixel points occupying the entire first gesture image is too high, it indicates that the quality of the entire first gesture image is low, in this exemplary embodiment, the preset proportion threshold may be set to 20% or 30% according to experience, scene requirements, the first camera characteristics, and the like, and when the proportion of the pixel points with invalid depth values is lower than the preset proportion threshold, it indicates that the first gesture image reaches the preset quality standard.

(2) In an exemplary embodiment, it may also be detected whether the first gesture image meets a preset quality standard by:

and converting the first gesture image into a plane image, detecting whether the similarity between the first gesture image and the second gesture image reaches a preset similarity threshold value, and if so, enabling the first gesture image to reach a preset quality standard.

When the similarity is detected, whether content information presented by the two images is similar or not is mainly detected, that is, whether the first gesture image and the second gesture image are shot by the same object or not is detected, it is considered that the imaging quality of the second gesture image is generally high, the image content is clear and accurate, and when the first gesture image detects the abnormal depth information, the presented image content may also be abnormal, so that the difference between the first gesture image and the second gesture image is large. Before detecting the similarity, the two types of images may be subjected to a certain matching process, and the first gesture image is usually converted into a plane image for comparison with the second gesture image. In addition, the first gesture image and the second gesture image may be converted into planar images having the same color mode (for example, both of the images are converted into RGB images, HSL images, grayscale images, or the like); if the difference between the positions or shooting angles of the first camera and the second camera is large, the first gesture image and the second gesture image can be registered, and then the similarity is detected. The specific way of detecting the similarity may include: detecting the coincidence degree of the two images, or detecting the probability that the hands in the two images are the same through an image recognition model, and the like. The preset similarity threshold is a preset standard for measuring whether the similarity reaches the standard or not, the value of the preset similarity threshold is set according to actual conditions or experience, if the similarity reaches the threshold, the content of the images presented by the first gesture image and the second gesture image is the same, and the first gesture image reaches the preset quality standard.

(3) In an exemplary embodiment, it may be further detected whether the first gesture image meets a preset quality standard by:

determining a threshold value for hand thickness;

and counting the depth value span (namely the maximum depth value-the minimum depth value) of the first gesture image, and if the depth value span does not exceed the threshold value, the first gesture image reaches a preset quality standard.

The threshold value of the hand thickness is the maximum thickness of the hand under various gestures, namely the size of the hand in the direction perpendicular to the plane of the camera (depth direction), and when the threshold value is determined, the parameters of the camera, the gesture habits of the user, the screen size of the electronic device and other factors are considered in combination with an application scene. Normally, the depth value span of the first gesture image should not exceed a threshold value of the hand thickness, and if so, it indicates that there may be other interfering objects in the first gesture image, or the depth value accuracy is low, thereby determining that the quality of the first gesture image is low. In order to improve the accuracy of the detection method, the background subtraction may be performed on the first gesture image to remove background partial images other than the hand, a foreground image mainly including the hand is extracted, and the depth value span in the foreground image is counted, so that the thickness of the hand detected by the first gesture image can be represented more accurately. In addition, the minimum thickness of the hand can be considered, a range related to the thickness of the hand is determined, and if the depth value span is within the range, the first gesture image is proved to reach the preset quality standard.

It should be understood that, when the quality of the first gesture image is detected, the present exemplary embodiment may adopt any one of the above methods, or adopt a combination of multiple methods, for example, adopt the above methods (1) and (2) simultaneously, and when it is required to satisfy that the ratio is smaller than the preset ratio threshold and the similarity reaches the preset similarity threshold, it is determined that the first gesture image reaches the preset quality standard, and any other similar methods may also be adopted.

The first gesture image and the second gesture image may be combined in consideration of their advantages and disadvantages in different aspects. In an exemplary embodiment, if the first gesture image meets the preset quality standard, before processing the first gesture image to recognize the gesture in the first gesture image, the gesture recognition method may further include the following steps:

registering the first gesture image with the second gesture image;

and optimizing the registered first gesture image by using the registered second gesture image.

The registration refers to performing spatial matching processing on the first gesture image and the second gesture image so that the first gesture image and the second gesture image can be directly compared. For example, the internal parameters of the first camera and the second camera and the external parameters between the first camera and the second camera may be calibrated in advance, parameters of image registration, including parameters of translation, rotation, scaling and the like, may be obtained based on the transformation of the internal and external parameters, or certain feature points may be extracted from the first gesture image and the second gesture image, respectively, and the parameters of image registration may be obtained based on the correspondence between the feature points in the two images. The exemplary embodiment may register the first gesture image based on the second gesture image, may also register the second gesture image based on the first gesture image, and may also determine a standard coordinate system in advance, and then register the first gesture image and the second gesture image into the coordinate system respectively, so that only one image may be changed after registration, or both images may be changed after registration.

Since the second hand-gesture image is generally high in imaging quality, the second hand-gesture image may be used to perform an optimization process on the first hand-gesture image after the registration, where the optimization process may include any one or more of the following: edge filtering, hole filling and distortion correction. The edge filtering refers to filtering the feature of the graph edge in the first gesture image by referring to the feature of the graph edge in the second gesture image, and the filtering includes smoothing, deburring, local fine tuning and the like; hole filling refers to filling holes in the first gesture image by referring to the second gesture image, and eliminating hole defects in the image to obtain a complete graph; distortion correction refers to correcting radial distortion, tangential distortion and the like in the first gesture image by referring to the second gesture image so as to eliminate figure deformation in the first gesture image and obtain a more planar image. It should be understood that the present exemplary embodiment may also employ an optimization processing manner other than the above. Through optimization processing, the quality of the first gesture image can be improved, and more accurate gesture recognition is facilitated.

In an exemplary embodiment, the first camera is an infrared light-based TOF camera and the first gesture image is a TOF image. The gesture recognition method may further include the steps of:

acquiring an infrared image through a first camera while acquiring a TOF image (namely a first gesture image) and a second gesture image;

the TOF image is preprocessed with infrared images.

The infrared image may be an image obtained by imaging processing using an infrared module in the TOF camera, and may include depth value confidence information of each pixel point in the TOF image, and may also include information such as thermal power and radiation. The pretreatment may include any one or more of: screenshot, noise removal and pixel point filtering based on depth value confidence. The screenshot refers to that thermal information in an infrared image is referred, and a local image of a TOF image, which mainly comprises a hand, is intercepted so as to remove image content irrelevant to gesture recognition; the noise removal refers to removing interference information, noise points and the like in the TOF image imaging process by referring to the imaging effect of the infrared image; the pixel point filtering based on the depth value confidence coefficient means that pixel points with low depth value confidence coefficient in the TOF image are removed, and the quality of depth information is improved. It should be appreciated that the present exemplary embodiment may also employ preprocessing other than that described above. Through preprocessing, the quality of the first gesture image can be improved, the calculation amount of subsequent gesture recognition can be reduced, and the recognition accuracy is improved.

It should be added that, the optimization processing of the first gesture image by using the second gesture image and the preprocessing of the first gesture image (TOF image) by using the infrared image may be performed separately or combined, which is not limited in this disclosure. For example, after the TOF image, the second gesture image and the infrared image are acquired, the TOF image is preprocessed by using the infrared image, and the TOF image is optimized by using the second gesture image, so that the TOF image with higher quality can be realized, and then the subsequent step S120 or S130 is executed, which is beneficial to further improving the accuracy of gesture recognition.

In an exemplary embodiment, the gesture may include information of hand skeletal points and/or information of hand gestures; correspondingly, the step of processing the first gesture image to recognize the gesture in the first gesture image may be specifically implemented by the following steps:

recognizing a first gesture image by using a first neural network model trained in advance to obtain information of hand skeleton points; and/or

And recognizing the first gesture image by using a pre-trained second neural network model to obtain the information of the hand gesture.

According to the scene requirement and the image quality condition, the hand-specific feature points can be predetermined as skeleton points, and may include 21 skeleton points: the feature points of 4 joints of each finger and the feature points of the palm of the hand may also include some skeleton points, for example, when performing gesture recognition of the index finger, only the joint feature points of the index finger may be used as the skeleton points of the hand. In the exemplary embodiment, a large number of hand depth images may be manually marked with bone points in advance as sample data to train the first neural network model, and in the application stage of the model, after the first gesture image is input, the coordinates of the hand bone points may be obtained.

The information of the hand gesture may be a result of gesture classification, and in the exemplary embodiment, a plurality of gestures may be predetermined and numbered, for example, the vertical thumb is 1, the vertical index finger is 2, and the like, and then, the gesture classification numbers are manually marked in a large number of hand depth images to form sample data, so as to train the second neural network model, and in the application stage of the model, after the first gesture image is input, a result of the hand gesture (i.e., gesture) classification may be obtained.

It should be noted that, in the exemplary embodiment, the first neural network model and the second neural network model may be simultaneously used to obtain two types of gesture recognition results, or only one of the two types of gesture recognition results may be used to obtain one type of gesture recognition result. After the information of the hand skeleton points is obtained by using the first neural network model, the gesture posture can be estimated based on the distribution of the hand skeleton points, or the information of the hand skeleton points is added to the first gesture image, for example, the hand skeleton points are marked in the first gesture image or the coordinates of the hand skeleton points are spliced with the first gesture image into a feature matrix, and the first gesture image (or the feature matrix) added with the hand skeleton point information is processed by using the second neural network model, so that a more accurate gesture recognition result can be obtained.

Further, in an exemplary embodiment, before the first gesture image is identified by using the first neural network model or the second neural network model, background subtraction may be performed on the first gesture image, for example, a background portion with an excessively large depth value may be subtracted, or the first gesture image of the background is captured in advance, and the background portion is subtracted from the first gesture image including the hand, so as to obtain the first gesture image including only the hand foreground image for subsequent identification processing, which can further reduce the amount of computation and improve the identification accuracy. Accordingly, if the quality of the first gesture image is low and the step S120 needs to be performed to process the second gesture image, the local plane image including only the hand may be extracted by skin color detection or graphic detection, which is not limited by the present disclosure, and the effect is similar to the above background subtraction.

The exemplary embodiment of the present disclosure further provides an interaction control method, which may be applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device; in other words, the electronic device is the same as the electronic device that performs the gesture recognition method described above. The interaction control method comprises the following steps:

and recognizing the gesture through any gesture recognition method, and executing a control instruction according to the gesture.

The recognized gesture is a gesture of a user operating before the electronic device, namely a user gesture shot by the first camera or the second camera. The electronic equipment is internally provided with a control instruction for gesture operation, and after a specific gesture made by a user is recognized, the corresponding control instruction is triggered and executed according to the gesture.

In an exemplary embodiment, the gesture may include information of a hand skeleton point and/or information of a hand gesture; correspondingly, the step of executing the control instruction according to the gesture may specifically include:

executing a control instruction corresponding to the hand gesture; and/or

And triggering the control option where the mapping point is located according to the mapping point of the hand skeleton point in the graphical user interface of the electronic equipment.

The hand gesture and the control instruction have a preset corresponding relationship, for example, the hand gesture is upward pointing, the corresponding control instruction is a pull-up page, the hand gesture is downward pointing, the corresponding control instruction is a pull-down page, and the like, and based on the corresponding relationship, after the hand gesture operated by the user is recognized, the hand gesture can be converted into the control instruction on the electronic device. In addition, the position of the specific hand skeleton point in front of the first camera or the second camera can be mapped to a graphical user interface of the electronic equipment and executed as a click operation. For example: projecting the index finger tip point of the user on a screen of the electronic device, wherein the moving of the index finger by the user is equivalent to moving the clicking position on the screen, if the user maintains the index finger tip point at a certain position for more than a certain time (for example, more than 3 seconds), the position is considered to be clicked by the user, and if the position is an option button such as 'confirm' or 'cancel', the execution of the corresponding option instruction is triggered.

It should be noted that the hand gesture and the bone point mapping are two interactive control modes, and either one of the two modes may be adopted in the present exemplary embodiment, or both the two modes may be adopted at the same time. Fig. 3 shows a flow of the present exemplary embodiment, including:

step S301, acquiring an infrared image and a TOF image (namely the first gesture image) through a TOF camera of the electronic equipment, and acquiring a plane image (namely the second gesture image) through a plane camera;

step S302, utilizing infrared images to preprocess TOF images;

step S303, judging whether the TOF image after pretreatment reaches a preset quality standard;

if yes, executing step S304, optimizing the TOF image by using the plane image, executing step S305, segmenting a gesture image from the TOF image by background subtraction, and executing step S306, and executing gesture recognition on the depth gesture image;

if not, executing step S307, segmenting a gesture image from the plane image in modes of skin color detection and the like, and then executing step S308 to perform gesture recognition on the plane gesture image;

the result of the gesture recognition contains two parts: gesture type, and specific skeletal point coordinates. The gesture type is used for executing step S309, determining a control instruction corresponding to the gesture type according to the gesture type, and executing the control instruction; the skeleton point coordinates are used to perform step S310, convert the skeleton point coordinates into a mapping point in the graphical user interface, and trigger the control option where the mapping point is executed. Therefore, the control of man-machine interaction is realized according to the shot user gestures, and two types of interaction control are carried out through two types of gesture recognition results, so that the interaction diversity is improved.

The exemplary embodiment of the present disclosure further provides a gesture recognition apparatus, which may be applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device. As shown in fig. 4, the gesture recognition apparatus 400 may include: the image acquisition module 410 is configured to acquire a first gesture image through a first camera, and acquire a second gesture image through a second camera, where the first gesture image is a depth image, and the second gesture image is a plane image; the first recognition module 420 is configured to, when it is detected that the first gesture image does not reach the preset quality standard, process the second gesture image to recognize a gesture in the second gesture image; the second recognition module 430 is configured to, when it is detected that the first gesture image reaches the preset quality standard, process the first gesture image to recognize a gesture in the first gesture image.

In an exemplary embodiment, the gesture recognition apparatus 400 may further include: a quality detection module (not shown in the figures), which in turn may comprise: a depth value detection unit (not shown in the figure) for detecting whether the depth value of each pixel point in the first gesture image is invalid; an invalid ratio counting unit (not shown in the figure) for counting the ratio of the pixel points with invalid depth values in the first gesture image; a quality standard determining unit (not shown in the figure) for determining whether the ratio is smaller than a preset ratio threshold, and if so, the first gesture image reaches a preset quality standard.

In an exemplary embodiment, the gesture recognition apparatus 400 may further include: and a quality detection module (not shown in the figure) configured to convert the first gesture image into a planar image, and detect whether a similarity between the first gesture image and the second gesture image reaches a preset similarity threshold, where if so, the first gesture image reaches a preset quality standard.

In an exemplary embodiment, the second recognition module 420 is further configured to, before processing the first gesture image to recognize the gesture in the first gesture image, register the first gesture image with the second gesture image, and perform an optimization process on the registered first gesture image by using the registered second gesture image, where the optimization process includes any one or more of the following: edge filtering, hole filling and distortion correction.

In an exemplary embodiment, the first camera may be an infrared light-based TOF camera, and the first gesture image may be a TOF image; the image acquisition module 410 is further configured to acquire an infrared image through the first camera while acquiring the TOF image and the second gesture image; the gesture recognition apparatus 400 may further include: a preprocessing module (not shown in the figure) for preprocessing the TOF image by using the infrared image, wherein the preprocessing includes any one or more of the following: screenshot, noise removal and pixel point filtering based on depth value confidence.

In an exemplary embodiment, the gesture in the first gesture image may comprise information of a hand skeleton point and/or information of a hand gesture; the second identification module 430 may include: a bone point recognition unit (not shown in the figures) and/or a gesture recognition unit (not shown in the figures); the gesture recognition unit is used for recognizing the first gesture image by using the second neural network model trained in advance to obtain the information of the hand gesture.

In an exemplary embodiment, the second identifying module 430 may further include: and a background subtraction unit (not shown) for performing background subtraction on the first gesture image to obtain a first gesture image including only the hand foreground image before the first gesture image is recognized by the bone point recognition unit or the gesture recognition unit.

The exemplary embodiment of the present disclosure further provides an interaction control apparatus, which may be applied to an electronic device, where the electronic device includes a first camera and a second camera, and the first camera and the second camera are disposed on the same side of the electronic device. As shown in fig. 5, the interactive control device 500 may include: the image acquisition module 510 is configured to acquire a first gesture image through a first camera and acquire a second gesture image through a second camera; the first recognition module 520 is configured to, when it is detected that the first gesture image does not reach the preset quality standard, process the second gesture image to recognize a gesture in the second gesture image; the second recognition module 530 is configured to, when it is detected that the first gesture image meets the preset quality standard, process the first gesture image to recognize a gesture in the first gesture image; and the instruction executing module 540 is configured to execute the control instruction according to the gesture.

In an exemplary embodiment, the interactive control device 500 may further include all the modules/units included in any one of the gesture recognition devices described above, and an instruction execution module 540.

In an exemplary embodiment, the gesture may include information of hand skeletal points and/or information of hand gestures; the instruction execution module 540 may include: a first execution unit (not shown) and/or a second execution unit (not shown); the first execution unit is used for executing a control instruction corresponding to the hand gesture, and the second execution unit is used for triggering and executing a control option where a mapping point is located according to the mapping point of the hand skeleton point in a graphical user interface of the electronic device.

The details of each module/unit in the above-mentioned apparatus have been described in detail in the embodiments of the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 6, a program product 600 for implementing the above method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method. An electronic device 700 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the electronic device 700 may include: the first camera 710 is used for acquiring a first gesture image, wherein the first gesture image is a depth image; the second camera 720 is used for acquiring a second gesture image, and the second gesture image is a plane image; a processor 730; and a memory 740 for storing executable instructions of the processor. The first camera 710 and the second camera 720 are disposed on the same side of the electronic device 700; processor 730 is configured to perform, via execution of executable instructions: any one of the gesture recognition methods in the exemplary embodiments of the present disclosure to recognize a gesture in the first gesture image or the second gesture image; or any one of the interactive control methods in the exemplary embodiments of the present disclosure, to recognize a gesture in the first gesture image or the second gesture image and execute a control instruction according to the gesture.

In an exemplary embodiment, as shown in FIG. 8, electronic device 800 may be embodied in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, the bus 830 connecting the various system components (including the memory unit 820 and the processing unit 810), the display unit 840, and the first camera 870 and the second camera 880.

Wherein the storage unit 820 stores program code that may be executed by the processing unit 810 to cause the processing unit 810 to perform the steps according to various exemplary embodiments of the present disclosure described in the above section "exemplary method" of this specification. For example, processing unit 810 may perform the method steps shown in fig. 1, fig. 2, or fig. 3, among others.

The storage unit 820 may include readable media in the form of volatile storage units, such as a random access storage unit (RAM)821 and/or a cache storage unit 822, and may further include a read only storage unit (ROM) 823.

Storage unit 820 may also include a program/utility 824 having a set (at least one) of program modules 825, such program modules 825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to an exemplary embodiment of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A gesture recognition method is applied to electronic equipment, and is characterized in that the electronic equipment comprises a first camera and a second camera which are arranged on the same side of the electronic equipment, and the method comprises the following steps:

acquiring a first gesture image through the first camera, and acquiring a second gesture image through the second camera, wherein the first gesture image is a depth image, and the second gesture image is a plane image;

detecting whether the similarity between the plane image of the first gesture image and the second gesture image reaches a preset similarity threshold value, and if so, enabling the first gesture image to reach a preset quality standard; a plane image of the first gesture image is obtained by converting the first gesture image into a plane image;

when the first gesture image does not reach the preset quality standard, processing the second gesture image to identify a gesture in the second gesture image;

when the first gesture image reaches the preset quality standard, processing the first gesture image to recognize a gesture in the first gesture image.

2. The method according to claim 1, wherein the converting the first gesture image into a planar image and detecting whether the similarity between the first gesture image and the second gesture image reaches a preset similarity threshold, and if so, the first gesture image reaches a preset quality standard includes:

detecting whether the similarity between the plane image of the first gesture image and the second gesture image reaches a preset similarity threshold value or not; a plane image of the first gesture image is obtained by converting the first gesture image into a plane image;

detecting whether the depth value of each pixel point in the first gesture image is invalid or not;

counting the proportion of pixel points with invalid depth values in the first gesture image;

and if the similarity between the first gesture image and the second gesture image reaches the preset similarity threshold value and the proportion is smaller than a preset proportion threshold value, the first gesture image reaches the preset quality standard.

3. The method according to claim 1, wherein the similarity between the planar image of the first gesture image and the second gesture image is detected by:

detecting the coincidence degree of the plane image of the first gesture image and the second gesture image as the similarity; or detecting, as the similarity, a probability that the hand in the plane image of the first gesture image is the same as the hand in the second gesture image by using an image recognition model.

4. The method of claim 1, wherein prior to processing the first gesture image to identify the gesture in the first gesture image, the method further comprises:

registering the first gesture image with the second gesture image;

performing optimization processing on the registered first gesture image by using the registered second gesture image, wherein the optimization processing comprises any one or more of the following steps: edge filtering, hole filling and distortion correction.

5. The method of claim 1, wherein the first camera is an infrared light-based time-of-flight camera and the first gesture image is a time-of-flight image; the method further comprises the following steps:

acquiring an infrared image through the first camera;

pre-processing the time-of-flight image using the infrared image, wherein the pre-processing comprises any one or more of: screenshot, noise removal and pixel point filtering based on depth value confidence.

6. The method of claim 1, wherein the gesture in the first gesture image comprises information of a hand skeletal point and/or information of a hand pose;

the processing the first gesture image to identify a gesture in the first gesture image, comprising:

recognizing the first gesture image by utilizing a first neural network model trained in advance to obtain information of the hand skeleton points; and/or

7. The method of claim 6, wherein prior to identifying the first gesture image using the first neural network model or the second neural network model, the method further comprises:

and performing background subtraction on the first gesture image to obtain a first gesture image only containing a hand foreground image.

8. An interaction control method is applied to electronic equipment, and is characterized in that the electronic equipment comprises a first camera and a second camera which are arranged on the same side of the electronic equipment, and the method comprises the following steps:

recognizing a gesture by the gesture recognition method according to any one of claims 1 to 7;

and executing a control instruction according to the gesture.

9. The method of claim 8, wherein the gesture comprises information of a hand skeletal point and/or information of a hand gesture;

the executing the control instruction according to the gesture comprises:

executing a control instruction corresponding to the hand gesture; and/or

And triggering and executing a control option where the mapping point is located according to the mapping point of the hand skeleton point in a graphical user interface of the electronic equipment.

10. The utility model provides a gesture recognition device, is applied to electronic equipment, its characterized in that, electronic equipment includes first camera and second camera, first camera with the second camera is located same one side of electronic equipment, the device includes:

the image acquisition module is used for acquiring a first gesture image through the first camera and acquiring a second gesture image through the second camera, wherein the first gesture image is a depth image, and the second gesture image is a plane image;

the quality detection module is used for detecting whether the similarity between the plane image of the first gesture image and the second gesture image reaches a preset similarity threshold value, and if so, the first gesture image reaches a preset quality standard; a plane image of the first gesture image is obtained by converting the first gesture image into a plane image;

the first recognition module is used for processing the second gesture image to recognize the gesture in the second gesture image when the first gesture image does not reach the preset quality standard;

the second recognition module is used for processing the first gesture image when the first gesture image reaches the preset quality standard so as to recognize the gesture in the first gesture image.

11. The utility model provides an interactive control device, is applied to electronic equipment, its characterized in that, electronic equipment includes first camera and second camera, first camera with the second camera is located same one side of electronic equipment, the device includes:

the second recognition module is used for processing the first gesture image to recognize the gesture in the first gesture image when the first gesture image reaches the preset quality standard;

and the instruction execution module is used for executing a control instruction according to the gesture.

12. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the gesture recognition method according to any one of claims 1 to 7 or the interaction control method according to any one of claims 8 to 9.

13. An electronic device, comprising:

the first camera is used for acquiring a first gesture image, and the first gesture image is a depth image;

the second camera is used for acquiring a second gesture image, and the second gesture image is a plane image;

a processor; and

a memory for storing executable instructions of the processor;

the first camera and the second camera are arranged on the same side of the electronic equipment;

the processor is configured to perform, via execution of the executable instructions:

the gesture recognition method of any one of claims 1-7, to recognize a gesture in the first or second gesture image; or

The interactive control method of any one of claims 8 to 9, to recognize a gesture in the first gesture image or the second gesture image, and to execute a control instruction according to the gesture.