CN107133361B

CN107133361B - Gesture recognition method and device and terminal equipment

Info

Publication number: CN107133361B
Application number: CN201710398580.8A
Authority: CN
Inventors: 万韶华
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2020-02-07
Anticipated expiration: 2037-05-31
Also published as: CN107133361A

Abstract

The disclosure relates to a gesture recognition method, a gesture recognition device and terminal equipment, wherein the method comprises the following steps: acquiring a gesture video to be recognized; further, determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

Description

Gesture recognition method and device and terminal equipment

Technical Field

The present disclosure relates to the technical field of electronic devices, and in particular, to a gesture recognition method and apparatus, and a terminal device.

Background

With the increasing demand of users for convenience in use of electronic products, hands-free operation or gesture recognition will become a key factor for distinguishing high-end electronic products from other similar electronic products.

In the prior art, a video of a gesture to be recognized is shot through an infrared camera, and a movement track of a hand skeleton joint point is determined according to the position of the hand skeleton joint point in each frame of gesture image in the video of the gesture to be recognized, so that the gesture to be recognized is determined according to the movement track.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a gesture recognition method, device and terminal device.

According to a first aspect of the embodiments of the present disclosure, there is provided a gesture recognition method, including:

acquiring a gesture video to be recognized;

determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: acquiring a gesture video to be recognized; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

In one possible design, the determining, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, a gesture video set to which the gesture video to be recognized belongs includes:

performing an acquisition operation, the acquisition operation comprising: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;

when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set;

when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of providing an implementation mode for determining a gesture video set to which a gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database. Compared with the prior art, the purpose of accurately determining the gesture video to be recognized is achieved, so that the gesture to be recognized in the gesture video to be recognized is accurately determined.

In one possible design, the obtaining a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:

determining the label factors of the first type of gesture videos and the second type of gesture videos; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;

and acquiring a support vector machine of the gesture video to be recognized according to the label factors of the first type of gesture video, the label factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.

In one possible design, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:

determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;

determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databaseⁱ(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YⁱColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;

according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function α_iRepresents a first weighting coefficient, y_iA label factor, κ (Y), representing the ith preset gesture videoⁱX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YⁱB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the y_iEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y is_iEqual to said second preset value.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the tag factors of the first type of gesture videos, the tag factors of the second type of gesture videos and the similarity between the gesture video to be recognized and each preset gesture video, an implementation mode of a support vector machine of the gesture video to be recognized is obtained, so that whether the gesture video to be recognized belongs to the first gesture video set or not is further judged.

In one possible design, the method further includes:

according to the formula

Determining the gesture video matrix X to be recognized and the preset gesture video matrix YⁱSimilarity between K (Y)ⁱ，X)；

The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2^lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and X_lkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,

representing the preset gesture video matrix YⁱThe kth video matrix of the l-th layer of the corresponding temporal pyramid,represents said

And said X_lkSimilarity between them, μ_lkRepresenting the second weighting factor.

In one possible design, the method further includes:

according to the formula

Determining the

And said X_lkSimilarity between them

Wherein exp () represents an exponential functionThe number of the first and second groups is,

represents said

And said X_lkAnd gamma represents a second predetermined constant.

In one possible design, the method further includes:

according to the formulaDetermining the

And said X_lkThe distance between

Wherein the content of the first and second substances,

representing the function of the euclidean distance,

representing a first preset sparse affine sequence,

representing a second preset sparse affine sequence.

According to a second aspect of the embodiments of the present disclosure, there is provided a gesture recognition apparatus including:

the acquisition module is configured to acquire a gesture video to be recognized;

the first determining module is configured to determine a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.

In one possible design, the first determining module includes:

an acquisition submodule configured to perform an acquisition operation, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;

a first determining submodule configured to determine that the gesture video to be recognized belongs to the first gesture video set when the support vector machine is greater than 0;

and the second determining submodule is configured to, when the support vector machine is not greater than 0, take any one of other types of gesture video sets in the preset database as a new first gesture video set, return to the acquiring submodule to execute the acquiring operation, acquire a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video and the new second type of gesture video, and acquire a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video, and determine that the gesture video to be recognized belongs to the new first gesture video set until the new support vector machine is greater than 0.

In one possible design, the obtaining sub-module includes:

a determining unit configured to determine a tag factor of the first type of gesture video and a tag factor of the second type of gesture video; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;

the acquisition unit is configured to acquire a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.

In one possible design, the obtaining unit is specifically configured to:

according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function α_iRepresents a first weighting coefficient, y_iA label factor, κ (Y), representing the ith preset gesture videoⁱX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YⁱBetweenB represents a first predetermined constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the y_iEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y is_iEqual to said second preset value.

In one possible design, the apparatus further includes:

a second determination module configured to determine a formula

representing the preset gesture video matrix YⁱThe kth video matrix of the l-th layer of the corresponding temporal pyramid,

represents saidAnd said X_lkSimilarity between them, μ_lkRepresenting the second weighting factor.

In one possible design, the apparatus further includes:

a third determination module configured to determine a formula

Determining the

And said X_lkSimilarity between themWherein exp () represents an exponential function,

represents said

And said X_lkAnd gamma represents a second predetermined constant.

In one possible design, the apparatus further includes:

a fourth determination module configured to determine a formula

Determining the

And said X_lkThe distance between

Wherein the content of the first and second substances,

representing the function of the euclidean distance,

representing a first preset sparse affine sequence,representing a second preset sparse affine sequence.

According to a third aspect of the embodiments of the present disclosure, there is provided a terminal device, including: a processor and a memory for storing processor-executable instructions;

the processor is configured to:

acquiring a gesture video to be recognized;

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method and the device for recognizing the gesture and the terminal equipment are provided, and the gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1A is a flow diagram illustrating a method of gesture recognition in accordance with an exemplary embodiment;

FIG. 1B is a schematic diagram illustrating image segmentation according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of gesture recognition in accordance with another exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of gesture recognition in accordance with another exemplary embodiment;

FIG. 4 is a block diagram illustrating a first embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 5 is a block diagram illustrating a second embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 6 is a block diagram illustrating a third embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 7 is a block diagram illustrating a fourth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 8 is a block diagram illustrating a fifth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 9 is a block diagram illustrating a sixth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;

FIG. 10 is a block diagram illustrating a terminal device according to an example embodiment;

fig. 11 is a block diagram illustrating a terminal device 1200 according to an example embodiment.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

First, words related to the present disclosure will be explained:

the terminal devices to which the present disclosure relates may include, but are not limited to: the terminal may be a smart phone, a tablet computer, an electronic reader, a personal digital assistant, a smart television, smart glasses, or other terminals having an image capturing function, which is not limited in the embodiments of the present disclosure.

The Histogram of Oriented Gradient (HOG) feature to which the present disclosure relates is a feature descriptor used for object detection in computer vision and image processing. The HOG features are constructed by calculating and counting the histogram of gradient direction of local area of image.

Similar to the HOG feature, the Optical Flow Histogram (HOF) feature according to the present disclosure is to perform weighted statistics on the Optical Flow direction to obtain an Optical Flow direction information Histogram. Because the size of the target changes with time, the dimensionality of the corresponding optical flow feature descriptor also changes, and meanwhile, the calculation of the optical flow is sensitive to background noise, scale change and motion direction, a feature which can represent time domain action information and is insensitive to scale and motion direction based on the optical flow needs to be found, and the HOF is proposed based on the requirement.

Next, an application scenario of the embodiment of the present disclosure is introduced:

with the increasing demand of users for convenience in use of electronic products, hands-free operation or gesture recognition will become a key factor for distinguishing high-end electronic products from other similar electronic products. Therefore, research on gesture recognition technology is a very important research direction.

Another implementation manner of gesture recognition is provided in the embodiments of the present disclosure, and specific implementation manners are as follows:

the following describes a gesture recognition method, a gesture recognition device, and a terminal device according to embodiments of the present disclosure in detail with reference to the accompanying drawings.

Fig. 1A is a flow diagram illustrating a gesture recognition method according to an exemplary embodiment, and fig. 1B is a schematic diagram illustrating image segmentation according to an exemplary embodiment. The execution subject of this embodiment may be a gesture recognition apparatus in the terminal device, and the apparatus may be implemented by software and/or hardware. As shown in fig. 1A, the scheme of the present embodiment may include the following steps:

in step S101, a gesture video to be recognized is acquired.

In this step, the gesture recognition apparatus obtains a gesture video to be recognized through the image acquisition unit, optionally, the gesture video to be recognized includes: at least one frame of gesture image to be recognized, wherein each frame of gesture image to be recognized comprises: the gesture to be recognized. Optionally, the image acquisition unit may be any one of: the color camera and the infrared camera may also be other units having an image capturing function, which is not limited in the embodiment of the present disclosure.

Optionally, the implementation manner of obtaining the gesture video to be recognized by the image acquisition unit at least includes the following:

the first realizable way: the gesture recognition device acquires an original video (including at least one frame of original color image) through a color camera, and segments a hand image and a background image in each frame of original color image in the original video by adopting an image segmentation method based on skin color detection to obtain the gesture video to be recognized, which includes at least one frame of gesture image to be recognized (only including the hand image). For example, as shown in fig. 1B, an image segmentation method based on skin color detection is adopted to segment a hand image and a background image in an original color image of a certain frame, so as to obtain a gesture image to be recognized, which only includes the hand image. Optionally, a specific implementation manner of the image segmentation method based on skin color detection in the embodiment of the present disclosure may refer to an image segmentation method based on skin color detection in the prior art, which is not limited in the embodiment of the present disclosure.

The second realizable way: the gesture recognition device acquires an original video (including at least one frame of original depth image) through an infrared camera, and divides a hand image and a background image in each frame of original depth image in the original video by adopting an infrared image division method based on the infrared camera to obtain the gesture video to be recognized, which includes at least one frame of gesture image to be recognized (only including the hand image). Optionally, a specific implementation manner of the infrared image segmentation method based on the infrared camera in the embodiment of the present disclosure may refer to an infrared image segmentation method in the prior art, which is not limited in the embodiment of the present disclosure.

Of course, the implementation manner of obtaining the gesture video to be recognized through the image acquisition unit may also include other implementation manners, which is not limited in the embodiment of the present disclosure.

In step S102, a gesture video set to which the gesture video to be recognized belongs is determined according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database.

In the embodiment of the present disclosure, a preset database is preset in the gesture recognition device, and optionally, the preset database includes: at least one type of gesture video set (such as a power-on gesture video set, a power-off gesture video set, a channel changing gesture video set and the like); wherein each type of gesture video set comprises: at least one preset gesture video (for example, the power-on gesture video set comprises at least one preset power-on gesture video, and the power-off gesture video set comprises at least one preset power-off gesture video, etc.).

In this step, the gesture recognition device determines a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and each preset gesture video in a preset database. Optionally, the gesture recognition device determines, according to a similarity between the gesture video to be recognized and each preset gesture video in a preset database, whether the gesture video to be recognized belongs to a first gesture video set in the preset database (where the first gesture video set is any type of gesture video set in the preset database, for example, the first gesture video set is a power-on gesture video set); if the gesture video to be recognized is determined to belong to the first gesture video set, ending the process; if the gesture video to be recognized is determined not to belong to the first gesture video set, continuously judging whether the gesture video to be recognized belongs to a second gesture video set in the preset database (the second gesture video set is any other type of gesture video set except the first gesture video set in the preset database, for example, the second gesture video set is a shutdown gesture video set); if the gesture video to be recognized is determined to belong to the second gesture video set, ending the process; if the gesture video to be recognized does not belong to the second gesture video set, continuously judging whether the gesture video to be recognized belongs to a third gesture video set in the preset database (the third gesture video set is any other type of gesture video set except the first gesture video set and the second gesture video set in the preset database, for example, the third gesture video set is a channel change gesture video set), … …, and repeating the steps until the gesture video set to be recognized belongs to the gesture video set.

Optionally, the gesture recognition device determines a gesture video set (e.g., a second gesture video set) to which the gesture video to be recognized belongs, that is, determines a gesture to be recognized in the gesture video to be a preset gesture in a preset gesture video included in the gesture video set (e.g., the second gesture video set) to determine the gesture to be recognized, so as to determine a target operation corresponding to the gesture to be recognized further according to the gesture to be recognized and preset mapping information (including a corresponding relationship between at least one preset gesture and the target operation). For example, when the gesture to be recognized is determined to be a preset power-on gesture, the determined target operation corresponding to the gesture to be recognized is a power-on operation.

In the embodiment, a gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

FIG. 2 is a flow chart illustrating a method of gesture recognition according to another exemplary embodiment. On the basis of the above embodiment, as shown in fig. 2, step S102 includes:

in step S102A, an acquisition operation is performed, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.

The first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos.

In this step, in order to determine whether the gesture video to be recognized belongs to the first gesture video set (e.g., a power-on gesture video set) in the preset database, the gesture recognition apparatus first divides the gesture video set in the preset database into a first type of gesture video (e.g., a power-on gesture video) and a second type of gesture video (e.g., a non-power-on gesture video) according to the type of the first gesture video set (e.g., the power-on gesture video set). For example, the gesture recognition device divides all preset gesture videos in the preset database into a power-on gesture video and a non-power-on gesture video.

Further, the gesture recognition device acquires a Support Vector Machine (SVM) of the gesture video to be recognized according to the first type of gesture video (for example, a power-on type gesture video), the second type of gesture video (for example, a non-power-on type gesture video), and the similarity between the gesture video to be recognized and each preset gesture video. Optionally, the gesture recognition device determines the label factor of the first type of gesture video (e.g., power-on type gesture video) and the label factor of the second type of gesture video (e.g., non-power-on type gesture video); the label factor of the first type of gesture video is equal to a first preset value (for example, 1) belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value (for example, -1) belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same; further, a support vector machine of the gesture video to be recognized is obtained according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.

Of course, the gesture recognition device may also obtain a support vector machine of the gesture video to be recognized in other manners according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video, which is not limited in the embodiment of the present disclosure.

When the support vector machine is greater than 0, executing step S102B; when the support vector machine is not greater than 0, determining that the gesture video to be recognized does not belong to the first gesture video set, and executing step S102C.

In step S102B, it is determined that the gesture video to be recognized belongs to the first gesture video set.

In step S102C, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determining that the gesture video to be recognized belongs to the new first gesture video set.

In this step, the gesture recognition device uses any one of the other types of gesture video sets (for example, a second gesture video set) in the preset database as a new first gesture video set to determine whether the gesture video to be recognized belongs to the new first gesture video set (for example, the second gesture video set) in the preset database; further, returning to execute the obtaining operation, so as to divide the gesture video set in the preset database into a new first type of gesture video (for example, a shutdown type gesture video) and a new second type of gesture video (for example, a non-shutdown type gesture video) according to the type of the new first gesture video set (for example, a second gesture video set, and the second gesture video set is a shutdown gesture video set), and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and the similarity between the gesture video to be recognized and each of the preset gesture videos; when the new support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the new first gesture video set (for example, a second gesture video set); when the new support vector machine is not larger than 0, taking any one of other types of gesture video sets (such as a third gesture video set) in the preset database as a new first gesture video set to judge whether the gesture video to be recognized belongs to the new first gesture video set (such as the third gesture video set) in the preset database; further, returning to execute the acquiring operation, dividing the gesture video sets in the preset database into a new first type gesture video (such as a channel changing type gesture video) and a new second type gesture video (such as a non-channel changing type gesture video) according to the types of the new first gesture video set (such as a third gesture video set and the third gesture video set is a channel changing gesture video set), and acquiring a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video, … …, and repeating the steps until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to a new first gesture video set.

Optionally, a new first type of gesture video and a new second type of gesture video are obtained according to the new first type of gesture video set, and an implementation manner of a new support vector machine for the gesture video to be recognized is obtained according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each preset gesture video, which may be referred to relevant parts of step S102A in the embodiment of the present disclosure, and is not described herein again.

In an embodiment of the present disclosure, by performing an obtaining operation, the obtaining operation includes: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set; when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set. Therefore, the purpose of accurately determining the gesture video to be recognized is achieved, and the gesture to be recognized in the gesture video to be recognized is accurately determined.

FIG. 3 is a flow chart illustrating a method of gesture recognition according to another exemplary embodiment. On the basis of the foregoing embodiment, as shown in fig. 3, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:

in step S301, a gesture video matrix X to be recognized corresponding to the gesture video to be recognized is determined according to a feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized.

In the step, a gesture recognition device determines a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to a feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: and m is an integer greater than or equal to 1 (namely, the maximum value of m is the total frame number of the gesture images to be recognized included in the gesture video to be recognized). Alternatively, the signature sequence may be a combination of one or more of: HOG characteristic sequence and HOF characteristic sequence; of course, the characteristic sequence may also include other sequences, which are not limited in the embodiments of the present disclosure.

Assuming that the feature sequence includes an HOF feature sequence, optionally, the gesture recognition apparatus determines the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized by extracting an optical flow (optical flow) of each frame of gesture image to be recognized (including only a hand image) in the gesture video to be recognized and according to the optical flow of each frame of gesture image to be recognized in the gesture video to be recognized. Optionally, an implementation process of determining the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized according to the optical flow of each frame of gesture image to be recognized in the gesture video to be recognized may refer to an implementation process of determining the HOF feature of an image according to the optical flow of an image in the prior art, which is not limited in the embodiment of the present disclosure. Of course, the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized may also be determined in other ways, which is not limited in the embodiment of the present disclosure.

Assuming that the feature sequence includes a HOG feature sequence, optionally, the gesture recognition apparatus determines the HOG feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized by extracting three primary colors (RGB) of each frame of gesture image to be recognized (including only hand images) in the gesture video to be recognized and according to the RGB of each frame of gesture image to be recognized in the gesture video to be recognized. Optionally, an implementation process of determining the HOG feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized according to RGB of each frame of the gesture image to be recognized in the gesture video to be recognized may refer to an implementation process of determining the HOG feature of an image according to RGB of an image in the prior art, which is not limited in the embodiment of the present disclosure. Of course, the HOG feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized may also be determined in other ways, which is not limited in the embodiment of the present disclosure.

Assuming that the feature sequence includes an HOG feature sequence and an HOF feature sequence, the above-mentioned portion of "determining the HOG feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized" and the portion of "determining the HOF feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized" are combined, and details are not repeated here.

In step S302, according to a feature sequence of each frame of preset gesture image in an ith preset gesture video in the preset database, a preset gesture video matrix Y corresponding to the ith preset gesture video is determinedⁱ。

In this step, the gesture recognition device determines a preset gesture video moment corresponding to the ith preset gesture video according to the feature sequence of each frame of preset gesture image in the ith preset gesture video in the preset databaseMatrix Yⁱ(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YⁱColumn m of (d) contains: and a feature sequence corresponding to the m-th frame of the ith preset gesture video is preset (namely, the maximum value of m is the total frame number of the preset gesture images included in the ith preset gesture video). Alternatively, the signature sequence may be a combination of one or more of: HOG characteristic sequence and HOF characteristic sequence; of course, the characteristic sequence may also include other sequences, which are not limited in the embodiments of the present disclosure.

Optionally, an implementation manner of determining the feature sequence of each frame of the preset gesture image in the ith preset gesture video may refer to the relevant part of "determining the feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized", and details are not repeated here.

In step S303, according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized.

In this step, the gesture recognition device determines the support vector machine of the gesture video to be recognized by taking the similarity between the gesture video to be recognized and each preset gesture video as a kernel function of the support vector machine. Optionally, the gesture recognition means is according to said formula

Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function α_iRepresents a first weighting coefficient, y_iA label factor, κ (Y), representing the ith preset gesture videoⁱX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YⁱB represents a first preset constant; if the ith preset gesture video belongs to the first type of gesture video (for example, the startup type of gesture video), the ith preset gesture video is a first type of gesture videoY is_iEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video (for example, a non-starting type gesture video), the y_iEqual to said second preset value.

Optionally, the formula can also be used

The support vector machine f (x) of the gesture video to be recognized is determined by other equivalent or deformation formulas, which are not limited in the embodiment of the present disclosure.

Optionally, in this embodiment of the present disclosure, an achievable manner of obtaining a new support vector machine of the gesture video to be recognized according to the tag factor of the new first type of gesture video, the tag factor of the new second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video may be referred to as the achievable manner of obtaining the support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video, which is not described herein again in this embodiment of the present disclosure.

Optionally, in this disclosure, the size of the sequence number of the step is not limited to the order of execution, and the execution order of each step may be adjusted appropriately, which is not limited in this disclosure.

In the embodiment of the disclosure, how to obtain an implementation manner of a support vector machine of a gesture video to be recognized according to a tag factor of the first type of gesture video, a tag factor of the second type of gesture video, and a similarity between the gesture video to be recognized and each preset gesture video is provided, so as to further judge whether the gesture video to be recognized belongs to the first gesture video set.

Further, on the basis of the above embodiments, in the embodiment of the present disclosure, an implementable manner of determining the similarity between the gesture video to be recognized and any preset gesture video in the preset database (for example, the ith preset gesture video in the preset database) is explained:

in the embodiment of the disclosure, the gesture recognition device divides the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database into a time pyramid comprising L layers according to the same division rule; wherein each layer comprises K video matrices, K2^lAnd l represents the l-th layer, l is a natural number which is greater than or equal to 0, and k represents the k-th video matrix of each layer. For example, the layer 0 (l ═ 0) of the time pyramid corresponding to the gesture video matrix X to be recognized includes: a complete gesture video matrix X to be recognized of the gesture video to be recognized and a preset gesture video matrix Yⁱ(the preset gesture video matrix corresponding to the ith preset gesture video in the preset database) the 0 th layer (l ═ 0) of the time pyramid comprises: the complete preset gesture video matrix Y of the ith preset gesture videoⁱ(ii) a The 1 st layer (l ═ 1) of the time pyramid corresponding to the gesture video matrix X to be recognized includes: the gesture video to be recognized is divided into two sub-gesture videos to be recognized, the two sub-gesture videos to be recognized respectively correspond to video matrixes, and a preset gesture video matrix Y is setⁱThe 1 st layer (l ═ 1) of the corresponding temporal pyramid includes: the video matrixes corresponding to the two sub-preset gesture videos after the ith preset gesture video is divided into two sub-preset gesture videos are respectively arranged; and so on.

In this step, the gesture recognition device is based on a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YⁱSimilarity between K (Y)ⁱX); wherein, X_lkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,

representing the preset gesture video matrix YⁱKth video of l-th layer of corresponding temporal pyramidThe matrix is a matrix of a plurality of matrices,

represents said

And said X_lkSimilarity between them, μ_lkRepresenting the second weighting factor. Optionally, the μ_lk＝1/2^L-1Of course, the μ_lkAnd may be equal to other values, which are not limited in the embodiments of the present disclosure. Optionally, the gesture recognition device can also be according to the formula

Determining the gesture video matrix X to be recognized and the preset gesture video matrix Y by other equivalent or deformation formulasⁱSimilarity between K (Y)ⁱX), which is not limited in the embodiments of the present disclosure.

Alternatively, the gesture recognition means is according to a formula

Determining the

And said X_lkSimilarity between them

Wherein exp () represents an exponential function,

represents said

And said X_lkAnd gamma represents a second predetermined constant. Optionally, the gesture recognition device can also be according to a formulaOther equivalent or deformation formulas for determining said

And said X_lkThe similarity between the two is not limited in the embodiments of the present disclosure.

Optionally, the gesture recognition device is based on sparse affine packages

Determining the

And said X_lkThe distance between

Alternatively, the gesture recognition means is according to a formula

Determining theAnd said X_lkThe distance between

Wherein the content of the first and second substances,

representing the function of the euclidean distance,representing a first preset sparse affine sequence,

representing a second preset sparse affine sequence. Optionally, the gesture recognition device can also be according to the formula

Other equivalent or deformation formulas for determining said

And said X_lkThe distance between

This is not a limitation in the embodiments of the present disclosure.

Optionally, the following embodiments of the disclosure are directed to determining sparse affine packages

Explains the realizations of:

determining a sparse affine package under the assumption of a sample gesture video W and a sample to-be-recognized gesture video ZCan be realized as follows:

wherein the content of the first and second substances,

represents said βⁱP represents βⁱThe number of preset sparse affine coefficients included (optionally, the same as the number of columns of W), β_nRepresents the nth preset sparse affine coefficient of β, and q represents β_nThe number of preset sparse affine coefficients included (optionally, the same number of columns as Z), arg () representing the parameter-solving function (optionally, so that

Reach β of minimum valueⁱAnd β), min () represents the minimum function,₁representing absolute value functions, and λ representing a third predetermined constant (e.g., 0.1, 0.01, etc.). the first three equations described above in this paragraph are combined to solve βⁱTo obtain

And solving β to obtain

Of course, in the embodiment of the present disclosure, the sparse affine package may also be determined by other manners

This is not a limitation in the embodiments of the present disclosure.

In the disclosed embodiments, the determination is made by a sparse affine package based(the Preset gesture video matrix YⁱThe kth video matrix of the l-th layer of the corresponding temporal pyramid) and X_lk(the kth video matrix of the l layer of the time pyramid corresponding to the gesture video matrix X to be recognized) are obtained

Further, according to the

Determining the

And said X_lkSimilarity between them

Further, according to theDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YⁱSimilarity between K (Y)ⁱX); further, according to the tag factor of the ith preset gesture video (for example, the tag factor of the first type of gesture video or the tag factor of the second type of gesture video) and the k (Y)ⁱX) determining a support vector machine f (X) of the gesture video to be recognized so as to further judge the gesture video to be recognized according to the support vector machine f (X)Identifying whether a gesture video belongs to the first gesture video set. Compared with the prior art, in the embodiment, the similarity between the gesture video to be recognized and the ith preset gesture video determined based on the sparse affine package is used as the kernel function of the support vector machine, so that the accuracy of gesture recognition is high.

Fig. 4 is a block diagram illustrating a first embodiment of a gesture recognition apparatus according to an exemplary embodiment. As shown in fig. 4, the gesture recognition apparatus 40 includes:

an obtaining module 401 configured to obtain a gesture video to be recognized;

a first determining module 402, configured to determine a gesture video set to which the gesture video to be recognized belongs according to similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.

In the gesture recognition apparatus provided by the embodiment of the present disclosure, the obtaining module 401 obtains a gesture video to be recognized; further, the first determining module 402 determines a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, so as to determine that the gesture to be recognized is a preset gesture in a preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

On the basis of the embodiment shown in fig. 4, fig. 5 is a block diagram of a second embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 5, the first determining module 402 includes:

an acquisition submodule 402A configured to perform an acquisition operation, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;

a first determining submodule 402B configured to determine that the gesture video to be recognized belongs to the first gesture video set when the support vector machine is greater than 0;

a second determining submodule 402C configured to, when the support vector machine is not greater than 0, regard any one of the other types of gesture video sets in the preset database as a new first gesture video set, return to the obtaining submodule 402A to perform the obtaining operation, obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determine that the gesture video to be recognized belongs to the new first gesture video set.

On the basis of the embodiment shown in fig. 5, fig. 6 is a block diagram of a third embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 6, the acquisition submodule 402A includes:

a determining unit 402a1 configured to determine a tag factor of the first type of gesture video and a tag factor of the second type of gesture video; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;

the obtaining unit 402a2 is configured to obtain a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video.

Optionally, the obtaining unit 402a2 is specifically configured to:

according to the formula

Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function α_iRepresents a first weighting coefficient, y_iA label factor, κ (Y), representing the ith preset gesture videoⁱX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YⁱB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the y_iEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y is_iEqual to said second preset value.

On the basis of the embodiment shown in fig. 6, fig. 7 is a block diagram of a fourth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 7, the gesture recognition apparatus 40 further includes:

a second determination module 403 configured to determine a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YⁱSimilarity between K (Y)ⁱ，X)；

represents said

On the basis of the embodiment shown in fig. 7, fig. 8 is a block diagram of a fifth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 8, the gesture recognition apparatus 40 further includes:

a third determination module 404 configured to determine a value based on a formula

Determining the

represents said Y_l ⁱ _kAnd said X_lkAnd gamma represents a second predetermined constant.

On the basis of the embodiment shown in fig. 8, fig. 9 is a block diagram of a sixth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 9, the gesture recognition apparatus 40 further includes:

a fourth determination module 405 configured to determine a formula

Determining the

And said X_lkThe distance between

Wherein the content of the first and second substances,

representing the function of the euclidean distance,

representing a first preset sparse affine sequence,

representing a second preset sparse affine sequence.

The gesture recognition device provided by any one of the embodiments is used in the technical scheme of any one of the embodiments of the gesture recognition method disclosed by the disclosure, the implementation principle and the technical effect are similar, and a gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

The internal functional modules and the structural schematic of the gesture recognition apparatus are described above, and the execution subject of the gesture recognition apparatus should be a terminal device, and fig. 10 is a block diagram of a terminal device according to an exemplary embodiment. Referring to fig. 10, the terminal device includes: a processor and a memory for storing processor-executable instructions;

the processor is configured to:

acquiring a gesture video to be recognized;

In the above embodiments of the terminal device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in a hardware processor, or in a combination of hardware and software modules.

Fig. 11 is a block diagram illustrating a terminal device 1200 according to an example embodiment. Referring to fig. 11, terminal device 1200 may include one or more of the following components: processing component 1202, memory 1204, power component 1206, multimedia component 1208, audio component 1210, input/output (I/O) interface 1212, sensor component 1214, and communications component 1216.

The processing component 1202 generally controls overall operation of the terminal device 1200, such as operations associated with display, data communication, multimedia operations, and recording operations. The processing components 1202 may include one or more processors 1220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1202 can include one or more modules that facilitate interaction between the processing component 1202 and other components. For example, the processing component 1202 can include a multimedia module to facilitate interaction between the multimedia component 1208 and the processing component 1202.

The memory 1204 is configured to store various types of data to support operation at the terminal device 1200. Examples of such data include instructions for any application or method operating on terminal device 1200, various types of data, messages, pictures, videos, and so forth. The memory 1204 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply components 1206 provide power to the various components of terminal device 1200. Power components 1206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for terminal device 1200.

The multimedia component 1208 includes a screen providing an output interface between the terminal device 1200 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

Audio component 1210 is configured to output and/or input audio signals. For example, the audio component 1210 includes a Microphone (MIC) configured to receive an external audio signal when the terminal apparatus 1200 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1204 or transmitted via the communication component 1216. In some embodiments, audio assembly 1210 further includes a speaker for outputting audio signals.

The I/O interface 1212 provides an interface between the processing component 1202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc.

The sensor assembly 1214 includes one or more sensors for providing various aspects of state assessment for the terminal device 1200. For example, sensor assembly 1214 may detect an open/closed state of terminal device 1200, the relative positioning of components, such as a display and keypad of terminal device 1200, sensor assembly 1214 may also detect a change in position of terminal device 1200 or a component of terminal device 1200, the presence or absence of user contact with terminal device 1200, orientation or acceleration/deceleration of terminal device 1200, and a change in temperature of terminal device 1200. The sensor assembly 1214 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 1214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communications component 1216 is configured to facilitate communications between terminal device 1200 and other devices in a wired or wireless manner. The terminal device 1200 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1216 receives the broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1216 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal device 1200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as memory 1204 comprising instructions, executable by processor 1220 of terminal device 1200 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium in which instructions, when executed by a processing component of a terminal device 1200, enable the terminal device 1200 to perform a gesture recognition method, the method comprising:

acquiring a gesture video to be recognized;

Optionally, the determining, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, a gesture video set to which the gesture video to be recognized belongs includes:

Optionally, the obtaining a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:

Optionally, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:

Optionally, the method further comprises:

according to the formula

represents said

And the above-mentionedX_lkSimilarity between them, μ_lkRepresenting the second weighting factor.

Optionally, the method further comprises:

according to the formula

Determining the

And said X_lkSimilarity between them

Wherein exp () represents an exponential function,

represents saidAnd said X_lkAnd gamma represents a second predetermined constant.

Optionally, the method further comprises:

according to the formula

Determining the

And said X_lkThe distance between

Wherein the content of the first and second substances,

representing the function of the euclidean distance,

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A gesture recognition method, comprising:

acquiring a gesture video to be recognized;

determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video;

the determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database comprises the following steps:

when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the acquisition operation to acquire a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and acquiring a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set;

the support vector machine for acquiring the gesture videos to be recognized according to the first type of gesture videos, the second type of gesture videos and the similarity between the gesture videos to be recognized and each preset gesture video comprises:

acquiring a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video;

the obtaining of the support vector machine of the gesture video to be recognized according to the tag factors of the first type of gesture video, the tag factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video includes:

according to the formula

2. The method of claim 1, further comprising:

according to the formula

represents said

3. The method of claim 2, further comprising:

according to the formula

Determining the

And said X_lkSimilarity between them

Wherein exp () represents an exponential function,

4. The method of claim 3, further comprising:

according to the formula

Determining the

And said X_lkThe distance betweenWherein the content of the first and second substances,

representing the function of the euclidean distance,

representing a first preset sparse affine sequence,

representing a second preset sparse affine sequence.

5. A gesture recognition apparatus, comprising:

the first determining module is configured to determine a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video;

wherein the first determining module comprises:

a second determining submodule configured to, when the support vector machine is not greater than 0, take any one of other types of gesture video sets in the preset database as a new first gesture video set, return to the obtaining submodule to perform the obtaining operation, obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determine that the gesture video to be recognized belongs to the new first gesture video set;

the acquisition submodule includes:

the acquisition unit is configured to acquire a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video;

the obtaining unit is specifically configured to:

according to the formula

6. The apparatus of claim 5, further comprising:

a second determination module configured to determine a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YⁱSimilarity between K (Y)ⁱ，X)；