CN115909472A

CN115909472A - Gesture recognition method, device, terminal and storage medium

Info

Publication number: CN115909472A
Application number: CN202110936045.XA
Authority: CN
Inventors: 张超
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2023-04-04

Abstract

The disclosure relates to a gesture recognition method, a gesture recognition device, a terminal and a storage medium, wherein the gesture recognition method comprises the following steps: acquiring an original depth image and an original color image, wherein the original depth image and the original color image both comprise images of targets to be recognized; and determining the posture type of the target to be recognized according to the original depth image and the original color image. In the method, the gesture type of the target to be recognized is determined based on the depth image and the color image of the target to be recognized, so that the error caused by only using the color image for gesture recognition can be eliminated, the accuracy of gesture recognition is improved, and the use experience of a user is improved.

Description

Gesture recognition method, device, terminal and storage medium

Technical Field

The present disclosure relates to the field of terminal technologies, and in particular, to a gesture recognition method and apparatus, a terminal, and a storage medium.

Background

Along with the dependence of people on terminals such as mobile phones and the like is higher and higher, the richness of the functions of the terminals is higher and higher.

Currently, a terminal generally recognizes a static gesture by a deep learning (or machine learning) method based on a color image, or by a structured analysis method based on a color image. However, the above two methods generally have poor effects, and the accuracy of gesture recognition is poor.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a gesture recognition method, apparatus, terminal, and storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a gesture recognition method applied to a terminal, the method including:

acquiring an original depth image and an original color image, wherein the original depth image and the original color image both comprise images of targets to be recognized;

and determining the posture category of the target to be recognized according to the original depth image and the original color image.

Optionally, the determining the gesture category of the target to be recognized according to the original depth image and the original color image includes:

correcting the original depth image and the original color image, and determining a corrected depth image and a corrected color image after correction;

and determining the posture category of the target to be recognized according to the corrected depth image and the corrected color image.

Optionally, the determining the posture category of the target to be recognized according to the corrected depth image and the corrected color image includes:

carrying out target detection on the corrected depth image and the corrected color image, and determining a first posture recognition image of the corrected depth image and a second posture recognition image of the corrected color image;

determining a target gesture recognition image from the first gesture recognition image and the second gesture recognition image;

and determining the gesture type of the target to be recognized according to the target gesture recognition image.

Optionally, the determining a target gesture recognition image from the first gesture recognition image and the second gesture recognition image comprises:

adjusting the image area of the second gesture recognition image according to the depth information of the image area corresponding to the second gesture recognition image in the corrected depth image to obtain a third gesture recognition image;

according to the image area of the third gesture recognition image, adjusting the image area of the first gesture recognition image to obtain a fourth gesture recognition image, wherein the image area of the fourth gesture recognition image is the same as the image area of the third gesture recognition image;

and performing image fusion processing on the third gesture recognition image and the fourth gesture recognition image to obtain the target gesture recognition image.

Optionally, the adjusting, according to the depth information of the first gesture recognition image, an image region of the second gesture recognition image to obtain a third gesture recognition image includes:

determining an average depth value of depth information of the first gesture recognition image;

acquiring a unit depth value of each unit image of an image area of the second gesture recognition image in the corrected depth image;

determining a depth difference value of each unit depth value and the average depth value;

determining the unit depth value corresponding to the depth difference value which is greater than or equal to a set threshold value as a target unit depth value;

and removing the unit image corresponding to the target unit depth value from the second gesture recognition image to obtain a third gesture recognition image.

Optionally, the determining, according to the target gesture recognition image, the gesture category of the target to be recognized includes:

and obtaining the gesture category of the target to be recognized based on the gesture recognition model and the target gesture recognition image.

Optionally, the gesture recognition model is determined by:

obtaining a plurality of training sample pairs for gesture recognition, wherein the training sample pairs comprise gesture recognition image samples and gesture category samples of the gesture recognition image samples;

and training an original classifier model based on the plurality of training samples to obtain the gesture recognition model.

Optionally, the original depth image is captured by a depth camera of the terminal, the original color image is captured by a color camera of the terminal,

the performing a correction process on the original depth image and the original color image to determine a corrected depth image and a corrected color image after correction includes:

acquiring double-shooting calibration information of the depth camera and the color camera;

and correcting the original depth image and the original color image according to the double-shot calibration information to obtain the corrected depth image and the corrected color image.

According to a second aspect of the embodiments of the present disclosure, there is provided a gesture recognition apparatus applied to a terminal, the apparatus including:

the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring an original depth image and an original color image, and the original depth image and the original color image both comprise images of targets to be recognized;

and the determining module is used for determining the posture category of the target to be recognized according to the original depth image and the original color image.

Optionally, the determining module is specifically configured to:

and determining the posture type of the target to be recognized according to the corrected depth image and the corrected color image.

Optionally, the determining module is configured to:

and determining the gesture category of the target to be recognized according to the target gesture recognition image.

Optionally, the determining module is configured to:

Optionally, the gesture recognition model is determined by:

the acquisition module is used for acquiring double-shot calibration information of the depth camera and the color camera;

and the determining module is used for correcting the original depth image and the original color image according to the double-shot calibration information to obtain the corrected depth image and the corrected color image.

According to a third aspect of the embodiments of the present disclosure, there is provided a terminal, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions which, when executed by a processor of a terminal, enable the terminal to perform the method according to the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: in the method, the gesture type of the target to be recognized is determined based on the depth image and the color image of the target to be recognized, so that the error caused by only using the color image for gesture recognition can be eliminated, the accuracy of gesture recognition is improved, and the use experience of a user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a block diagram illustrating a method of gesture recognition in accordance with an exemplary embodiment.

FIG. 2-1 is a flow diagram illustrating a method of gesture recognition according to an example embodiment.

FIG. 2-2 is a flow diagram illustrating a method of gesture recognition, according to an example embodiment.

2-3 are flow diagrams illustrating a method of gesture recognition according to an exemplary embodiment.

2-4 are flow diagrams illustrating a method of gesture recognition according to an exemplary embodiment.

2-5 are flow diagrams illustrating a manner of determination of a gesture recognition model in accordance with an exemplary embodiment.

2-6 are flow diagrams illustrating a method of gesture recognition according to an exemplary embodiment.

2-7 are flow diagrams illustrating a method of gesture recognition according to an exemplary embodiment.

Fig. 2a is a schematic diagram of an original color image shown in accordance with an exemplary embodiment.

FIG. 2b is a schematic diagram illustrating a second gesture recognition image according to an example embodiment.

FIG. 2c is a schematic illustration of a third gesture recognition image shown in accordance with an exemplary embodiment.

FIG. 3 is a block diagram illustrating a gesture recognition apparatus according to an example embodiment.

Fig. 4 is a block diagram of a terminal shown in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

The disclosure provides a gesture recognition method applied to a terminal. In the method, the gesture type of the target to be recognized is determined based on the depth image and the color image of the target to be recognized, so that the error caused by only using the color image for gesture recognition can be eliminated, the accuracy of gesture recognition is improved, and the use experience of a user is improved.

In one exemplary embodiment, a gesture recognition method is provided, which is applied to a terminal. Referring to fig. 1 and 2-1, the method includes:

s110, acquiring an original depth image and an original color image;

and S120, determining the posture category of the target to be recognized according to the original depth image and the original color image.

Wherein the gesture recognition comprises recognition of a static gesture and recognition of a dynamic gesture. Static gestures are relative to dynamic gestures, i.e., static gestures and dynamic gestures are a relative concept. A static gesture may refer to a static gesture in one image and a dynamic gesture may refer to a moving gesture in a set of images. The method mainly performs static gesture recognition.

In step S110, the original depth image and the original color image each include an image of an object to be recognized, which may be a human gesture or an object gesture or the like.

The original depth image refers to an image including depth information of an object to be recognized, that is, the original depth image may include only the depth information of the object to be recognized, or may include both the depth information of the object to be recognized and depth information of other things.

The original Depth image may be captured by a Depth Camera (Depth Camera) of the terminal. For example, a user can control a depth camera of the terminal to shoot an image through a shooting function of the terminal, the original depth image can be transmitted to a processor of the terminal after being shot by the depth camera, and the original depth image can be obtained by the processor. In general, an original depth image obtained by shooting includes depth information of an object to be recognized (e.g., a posture of a person) and depth information of other things (e.g., an environment in which the person is located).

The original color image refers to an image including color information (RGB information) of an object to be recognized, that is, the original color image may include only dark color information of the object to be recognized, or may include both the color information of the object to be recognized and color information of other things.

The original color image may be captured by a color Camera (RGB Camera) of the terminal. For example, the user may also control a color camera of the terminal to capture an image through a photographing function of the terminal, thereby obtaining an original color image. In general, an original color image obtained by shooting includes color information of an object to be recognized (e.g., a posture of a person) and color information of other things (e.g., an environment in which the person is located).

In addition, the original depth image and the original color image may also be obtained by other means. For example, it can be downloaded from the internet, and will not be described herein.

In step S120, the target to be recognized may be more precisely located according to the depth information of the original depth image and the color information of the original color image, and the gesture recognition image of the target to be recognized may be determined, so that the gesture category of the target to be recognized may be more accurately determined.

In this method, when the gesture recognition is performed only with the color image, a recognition error caused when the color image includes the same texture, repeated textures, or weak textures can be eliminated. The method can more accurately position the target to be recognized based on the original depth image and the original color image of the target to be recognized, thereby more accurately determining the gesture type of the target to be recognized, improving the accuracy of gesture recognition and improving the use experience of a user.

In one exemplary embodiment, a gesture recognition method is provided, which is applied to a terminal. In the method, referring to fig. 1 and fig. 2-2, determining the gesture category of the target to be recognized according to the original depth image and the original color image may include:

s210, carrying out correction processing on the original depth image and the original color image, and determining a corrected depth image and a corrected color image after correction;

and S220, determining the posture type of the target to be recognized according to the corrected depth image and the corrected color image.

In step S210, when the original depth image is obtained by the depth camera and the original color image is obtained by the color camera, the depth camera and the color camera may be controlled to shoot synchronously to obtain the corresponding original depth image and the corresponding original color image.

However, since the positions of the depth camera and the color camera are not exactly the same, it is necessary to perform correction processing on the original depth image and the original color image and determine a corrected depth image and a corrected color image after correction. Thereby aligning the positions of the objects to be recognized in the corrected depth image and the corrected color image.

Wherein, when the original depth image is captured by a depth camera and the original color image is captured by a color camera, as shown in fig. 2-3, the steps may include:

s211, acquiring double-shot calibration information of the depth camera and the color camera;

and S212, performing correction processing on the original depth image and the original color image according to the double-shot calibration information to obtain a corrected depth image and a corrected color image.

In step S211, the bi-camera calibration information may be determined based on the bi-camera calibration principle of zhangyingyou. Of course, the bi-camera calibration information may also be determined by other methods, which are not described herein again.

In step S212, the original depth image and the original color image may be subjected to a rectification process, i.e., the original depth image and the original color image are aligned, based on the bi-camera calibration information. The aligned images are determined as a corrected depth image and a corrected color image, respectively. The region of the target to be recognized in the corrected depth image is the same as the region thereof in the corrected color image.

In step S220, the target to be recognized may be more precisely located and the posture recognition image of the target to be recognized may be determined according to the depth information of the corrected depth image after the alignment is corrected and the color information of the corrected color image, so as to more accurately determine the posture category of the target to be recognized.

According to the method, the original depth image and the original color image which are shot by the terminal are corrected to obtain the corrected depth image and the corrected color image which are aligned after correction, and then the target to be recognized can be positioned more accurately based on the corrected depth image and the corrected color image, so that the posture type of the target to be recognized is determined more accurately, the accuracy of posture recognition is improved, and the use experience of a user is improved.

In one exemplary embodiment, a gesture recognition method is provided, which is applied to a terminal. In the method, referring to fig. 1 and fig. 2 to 4, determining the posture category of the target to be recognized according to the corrected depth image and the corrected color image may include:

s310, carrying out target detection on the corrected depth image and the corrected color image, and determining a first posture recognition image of the corrected depth image and a second posture recognition image of the corrected color image;

s320, determining a target gesture recognition image according to the first gesture recognition image and the second gesture recognition image;

and S330, determining the gesture type of the target to be recognized according to the target gesture recognition image.

In step S310, the target detection methods mainly include an edge detection method, a human body detection method, an energy method, and a moving target detection method. The method for detecting the target of the corrected depth image is similar to the method for detecting the target of the traditional gray image, that is, when the target is detected, the depth image can be regarded as the gray image, and only the gray value of the gray image is changed into the depth value of the depth image.

For example, when the object detection is performed by using the edge detection method, the first posture recognition image and the second posture recognition image may be determined by determining a region (i.e., ROI region) where the object to be recognized is located in the corrected depth image and the corrected color image by an object detection operator (e.g., sobel operator, lapel operator) for edge detection.

For another example, when the object detection is performed by using a human body detection method, the region (i.e., ROI region) where the object to be recognized is located in the corrected depth image and the corrected color image may be determined by a human body detection operator (e.g., haarcacade operator), so as to determine the first posture recognition image and the second posture recognition image.

Note that, the ROI (region of interest) region generally refers to a region of interest. In machine vision and image processing, a region to be processed is outlined from a processed image in the form of a square, a circle, an ellipse, an irregular polygon, or the like, and is called a region of interest. Various operators (operators) and functions are commonly used in machine vision software such as Halcon, openCV, matlab and the like to obtain a region of interest (ROI), and the image is processed in the next step.

In step S320, the first gesture recognition image theoretically includes only depth information of the target to be recognized, and the second gesture recognition image theoretically includes only color information of the target to be recognized. In this step, the first posture recognition image and the second posture recognition image may be subjected to image fusion processing, and the image obtained by the image fusion processing may be regarded as the target posture recognition image.

In step S330, a gesture in the target gesture recognition image may be recognized, thereby determining a gesture class of the target to be recognized.

Wherein, the gesture class of the target to be recognized can be obtained based on a gesture recognition model (for example, a neural network model) and the target gesture recognition image. For example, a target gesture recognition image may be input into a gesture recognition model, and the gesture recognition model may output a gesture class corresponding to the target gesture recognition image, thereby determining a gesture class of the target to be recognized.

2-5, the gesture recognition model may be determined by:

s33-1, acquiring a plurality of training sample pairs for gesture recognition;

s33-2, training the original classifier model based on the plurality of training samples to obtain a gesture recognition model.

In step S33-1, each training sample pair includes a gesture recognition image sample and a gesture class sample of the gesture recognition image sample. Wherein the gesture recognition image sample can be downloaded from the internet, and the gesture category sample of the gesture recognition image sample can be determined manually or by other means.

In step S33-2, the original classifier model may be trained using a plurality of training sample pairs, that is, the original classifier model is trained using the gesture recognition image samples in the training sample pairs as input samples during training and the gesture class samples in the training sample pairs as output samples during training. Wherein, whether training is completed can be detected through a loss function. When the accuracy of the model for gesture recognition reaches the set accuracy, the model at the moment can be determined as the gesture recognition model. The setting accuracy may be set according to actual conditions, for example, the setting accuracy may be 98%.

According to the method, the first posture recognition image and the second posture recognition image of the target to be recognized are determined from the corrected depth image and the corrected color image through target detection, and then the posture category of the target to be recognized is determined according to the first posture recognition image and the second posture recognition image, so that the accuracy of posture recognition can be improved, and the use experience of a user is improved.

In one exemplary embodiment, a gesture recognition method is provided and applied to a terminal. In the method, referring to fig. 1 and fig. 2 to 6, determining the target gesture recognition image according to the first gesture recognition image and the second gesture recognition image may include:

s410, adjusting the image area of the second gesture recognition image according to the depth information of the image area corresponding to the second gesture recognition image in the corrected depth image to obtain a third gesture recognition image;

s420, adjusting the image area of the first gesture recognition image according to the image area of the third gesture recognition image to obtain a fourth gesture recognition image;

and S430, carrying out image fusion processing on the third gesture recognition image and the fourth gesture recognition image to obtain a target gesture recognition image.

The ROI region can generally only be roughly determined by the object detection algorithm, that is, the first gesture recognition image and the second gesture recognition image are not images of the object to be recognized in a strict sense, and may not include all the object to be recognized, or may include images of other things besides the object to be recognized.

The method can more accurately determine the region where the target to be identified is located by adjusting the ROI region.

In step S410, the image area of the first posture-recognition image may be referred to as a first image area, the image area of the second posture-recognition image may be referred to as a second image area, and the sizes of the first image area and the second image area are not generally identical. Since the corrected depth image and the corrected color image are aligned, the first image region corresponds to the first depth image and the first color image in the corrected depth image and the corrected color image, respectively, where the first depth image is the first posture identifying image. Similarly, the second image area corresponds to the second depth image and the second color image in the corrected depth image and the corrected color image respectively, wherein the second color image is the second posture recognition image.

In this step, the depth information of the image region corresponding to the second posture recognition image in the corrected depth image is the depth information of the second image region in the corrected depth image, that is, the depth information of the second depth image.

This step adjusts the image area of the second posture recognition image (i.e., the second color image), that is, the second image area, based on the depth information of the second depth image, and records an image corresponding to the adjusted image area in the corrected color image as the third posture recognition image.

When the second posture recognition image is determined by the target detection method, if the background image in the corrected color image and the image of the target to be recognized have images with the same or similar colors and textures, the part of the background image is also easily determined as the detection target, so that the second posture recognition image comprises the part of the background image.

The above-mentioned partial background image included in the second posture-recognition image is referred to as a second more image. The second more-than-image area is referred to as a second more-than-area. In a corresponding second depth image of the second image region in the corrected depth image, the depth value of the pixel point in the second excess region generally has a larger difference from the depth values of the pixel points in other regions, and based on the difference, the second excess region can be determined. And then removing the image corresponding to the second more than area in the second posture recognition image to obtain a third posture recognition image, wherein the third posture recognition image has higher correspondence with the target to be recognized and can better reflect the color information of the target to be recognized, so that the reliability of the color image of the target to be recognized is improved.

For example, an original color image is shown in fig. 2a, a second gesture recognition image determined based on object detection is shown in fig. 2b, due to limitation of object detection, the second gesture recognition image still includes more images and cannot well reflect an object to be recognized (i.e., a heart-shaped gesture), and after the second gesture recognition image is adjusted by depth information, a third gesture recognition image shown in fig. 2c can be obtained, and it can be seen that the third gesture recognition image has removed more images in the second gesture recognition image, and can better reflect the object to be recognized.

In step S420, the image area of the fourth gesture recognition image is the same as the image area of the third gesture recognition image.

The image area of the third posture-recognition image may be referred to as a third image area, and the third image area and the second excess area together constitute a second image area. The third posture recognition image has higher correspondence with the target to be recognized, so that the third image area can reflect the image area where the target to be recognized is located.

The first gesture recognition image is determined by the target detection method and generally has much noise, such as holes and flying spots. In this step, the first image region of the first posture recognition image may be adjusted to the same region as the third image region, and then an image corresponding to the region in the corrected depth image may be determined as the fourth posture recognition image. That is, the image area of the fourth posture recognition image is the same as the third image area, so that noise can be better eliminated, and the reliability of the depth image of the target to be recognized is improved.

In step S430, the image area of the third gesture recognition image is the same as the image area of the fourth gesture recognition image, that is, the third gesture recognition image and the fourth gesture recognition image have the same size, and both can better reflect the image information of the target to be recognized. The third posture recognition image reflects color information of the target to be recognized, and the fourth posture recognition image reflects depth information (also called distance information) of the target to be recognized.

In the step, the third posture recognition image and the fourth posture recognition image are synthesized into one image in an image fusion processing mode and are recorded as a target posture recognition image. Based on the target gesture recognition image, the determined gesture class is more reliable.

According to the method, the region where the image of the target to be recognized is located is determined more accurately through adjustment of the ROI region, so that the image information of the target to be recognized is determined more accurately, the accuracy of gesture recognition can be improved, and the use experience of a user is improved.

In one exemplary embodiment, a gesture recognition method is provided, which is applied to a terminal. Referring to fig. 1 and 2-7, in the method, adjusting an image area of the second gesture recognition image according to the depth information of the first gesture recognition image to obtain a third gesture recognition image may include:

s510, determining an average depth value of the depth information of the first gesture recognition image;

s520, acquiring a unit depth value of each unit image of the image area of the second posture recognition image in the corrected depth image;

s530, determining a depth difference value between each unit depth value and the average depth value;

s540, determining the unit depth value corresponding to the depth difference value which is greater than or equal to the set threshold value as a target unit depth value;

and S550, removing the unit image corresponding to the target unit depth value from the second gesture recognition image to obtain a third gesture recognition image.

In step S510, the first gesture recognition image includes a plurality of depth values, which constitute depth information of the first gesture recognition image. In this step, an average depth value may be determined from the depth values (also called distance values) of all unit images of the first gesture recognition image.

Note that if detection is required using depth information to determine the first posture recognition image when the first posture recognition image is determined by object detection. The average value of the above depth information may also be determined as the average depth value of the first gesture recognition image in this step.

In step S520, the unit depth value of each unit image of the image area of the second gesture recognition image in the target depth image refers to the depth value of each unit image of the second image area in the corrected depth image, and the depth value is recorded as the unit depth value.

The unit image refers to an image with a minimum unit, for example, the unit image may be a pixel, and this step is used to obtain a depth value of each pixel of the second image area in the corrected depth image, and is denoted as a pixel depth value.

In step S530, the average depth value may be subtracted from each unit depth value to determine a depth difference value of each unit depth value from the average depth value. Alternatively, each unit depth value is subtracted from the average depth value, respectively, to determine each depth difference value.

In step S540, after all the depth difference values are determined, the depth difference value greater than or equal to the set threshold may be determined as a target depth difference value, and the unit depth value corresponding to the target depth difference value may be determined as a target unit depth value.

The set threshold value can be set before the terminal leaves the factory or after the terminal leaves the factory, and the set threshold value can be modified after the terminal leaves the factory so as to better meet different requirements of users. The threshold is set to, for example, 100mm, 150mm, or 500mm, etc.

In step S550, after the target unit depth value is determined, the unit image corresponding to the target unit depth value may be recorded as a target unit image, and then all the target unit images may be removed from the second gesture recognition image, and the removed image may be determined as a third gesture recognition image.

In the method, the second gesture recognition image is adjusted according to the depth information, so that a third gesture recognition image which can reflect the target to be recognized more accurately is obtained, and the reliability of subsequent gesture recognition is improved.

In one exemplary embodiment, a gesture recognition apparatus is provided, which is applied to a terminal. The device is used for implementing the gesture recognition method. Illustratively, referring to fig. 1 and 3, the apparatus may include an obtaining module 101 and a determining module 102, wherein,

the system comprises an acquisition module 101, a recognition module and a recognition module, wherein the acquisition module 101 is used for acquiring an original depth image and an original color image, and the original depth image and the original color image both comprise images of targets to be recognized;

and the determining module 102 is configured to determine a posture category of the target to be recognized according to the original depth image and the original color image.

In one exemplary embodiment, a gesture recognition apparatus is provided, which is applied to a terminal. Referring to fig. 1 and fig. 3, in the apparatus, the determining module 102 is specifically configured to:

In one exemplary embodiment, a gesture recognition apparatus is provided and applied to a terminal. Referring to fig. 1 and fig. 3, in the apparatus, the determining module 102 is further configured to:

determining a target gesture recognition image according to the first gesture recognition image and the second gesture recognition image;

In one exemplary embodiment, a gesture recognition apparatus is provided, which is applied to a terminal. Referring to fig. 1 and 3, in the apparatus, the determining module 102 is further configured to:

adjusting the image area of the first gesture recognition image according to the image area of the third gesture recognition image to obtain a fourth gesture recognition image, wherein the image area of the fourth gesture recognition image is the same as the image area of the third gesture recognition image;

and performing image fusion processing on the third gesture recognition image and the fourth gesture recognition image to obtain a target gesture recognition image.

acquiring a unit depth value of each unit image in the corrected depth image in an image area of the second gesture recognition image;

determining a depth difference value between each unit depth value and the average depth value;

determining a unit depth value corresponding to the depth difference value which is greater than or equal to the set threshold value as a target unit depth value;

and obtaining the gesture type of the target to be recognized based on the gesture recognition model and the target gesture recognition image.

In one exemplary embodiment, a gesture recognition apparatus is provided and applied to a terminal. In the device, a gesture recognition model is determined by:

acquiring a plurality of training sample pairs for gesture recognition, wherein the training sample pairs comprise gesture recognition image samples and gesture category samples of the gesture recognition image samples;

and training the original classifier model based on a plurality of training samples to obtain a gesture recognition model.

In one exemplary embodiment, a gesture recognition apparatus is provided and applied to a terminal. Referring to fig. 1 and 3, in the apparatus, an original depth image is photographed by a depth camera of a terminal, an original color image is photographed by a color camera of the terminal,

the acquisition module 101 is further configured to acquire double-shot calibration information of the depth camera and the color camera;

the determining module 102 is further configured to perform a correction process on the original depth image and the original color image according to the bi-shooting calibration information, so as to obtain a corrected depth image and a corrected color image.

In one exemplary embodiment, a terminal is provided, such as a mobile phone, a laptop, a tablet, a wearable device, and the like.

Referring to fig. 4, terminal 400 may include one or more of the following components: a processing component 402, a memory 404, a power component 406, a multimedia component 408, an audio component 410, an interface for input/output (I/O) 412, a sensor component 414, and a communication component 416.

The processing component 402 generally controls overall operation of the terminal 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the terminal 400. Examples of such data include instructions for any application or method operating on the terminal 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power components 406 provide power to the various components of the terminal 400. The power components 406 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 400.

The multimedia component 408 includes a screen providing an output interface between the terminal 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front camera module and/or a rear camera module. The front camera module and/or the rear camera module can receive external multimedia data when the terminal 400 is in an operation mode, such as a shooting mode or a video mode. Each front camera module and rear camera module may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 410 is configured to output and/or input audio signals. For example, the audio component 410 includes a Microphone (MIC) configured to receive external audio signals when the terminal 400 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the terminal 400. For example, the sensor assembly 414 can detect an open/closed state of the terminal 400, relative positioning of components, such as a display and keypad of the terminal 400, the sensor assembly 414 can also detect a change in position of the terminal 400 or a component of the terminal 400, the presence or absence of user contact with the terminal 400, orientation or acceleration/deceleration of the terminal 400, and a change in temperature of the terminal 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate communications between the terminal 400 and other terminals in a wired or wireless manner. The terminal 700 can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, 5G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital signal processing terminals (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the terminal 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage terminal, and the like. The instructions in the storage medium, when executed by a processor of the terminal, enable the terminal to perform the methods shown in the above-described embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A gesture recognition method is applied to a terminal, and is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining the gesture class of the object to be recognized according to the original depth image and the original color image comprises:

3. The method according to claim 2, wherein the determining the posture category of the object to be recognized according to the corrected depth image and the corrected color image comprises:

4. The method of claim 3, wherein determining a target gesture recognition image from the first gesture recognition image and the second gesture recognition image comprises:

5. The method of claim 4, wherein adjusting the image area of the second gesture recognition image based on the depth information of the first gesture recognition image to obtain a third gesture recognition image comprises:

6. The method according to claim 3, wherein the determining the gesture class of the target to be recognized according to the target gesture recognition image comprises:

7. The method of claim 6, wherein the gesture recognition model is determined by:

8. The method of any of claims 2-7, wherein the original depth image is captured by a depth camera of the terminal and the original color image is captured by a color camera of the terminal,

9. A gesture recognition apparatus applied to a terminal, the apparatus comprising:

10. The apparatus of claim 9, wherein the determining module is specifically configured to:

11. The apparatus of claim 10, wherein the determining module is configured to:

12. The apparatus of claim 11, wherein the determining module is configured to:

13. The apparatus of claim 12, wherein the determining module is configured to:

14. The apparatus of claim 11, wherein the determining module is configured to:

15. The apparatus of claim 14, wherein the gesture recognition model is determined by:

16. The apparatus according to any of claims 10-15, wherein the original depth image is captured by a depth camera of the terminal and the original color image is captured by a color camera of the terminal,

the determining module is configured to perform a correction process on the original depth image and the original color image according to the bi-camera calibration information, so as to obtain the corrected depth image and the corrected color image.

17. A terminal, characterized in that the terminal comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any one of claims 1 to 8.

18. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform the method of any of claims 1-8.