CN107133361B - Gesture recognition method and device and terminal equipment - Google Patents

Gesture recognition method and device and terminal equipment Download PDF

Info

Publication number
CN107133361B
CN107133361B CN201710398580.8A CN201710398580A CN107133361B CN 107133361 B CN107133361 B CN 107133361B CN 201710398580 A CN201710398580 A CN 201710398580A CN 107133361 B CN107133361 B CN 107133361B
Authority
CN
China
Prior art keywords
gesture
gesture video
preset
video
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710398580.8A
Other languages
Chinese (zh)
Other versions
CN107133361A (en
Inventor
万韶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201710398580.8A priority Critical patent/CN107133361B/en
Publication of CN107133361A publication Critical patent/CN107133361A/en
Application granted granted Critical
Publication of CN107133361B publication Critical patent/CN107133361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure relates to a gesture recognition method, a gesture recognition device and terminal equipment, wherein the method comprises the following steps: acquiring a gesture video to be recognized; further, determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.

Description

Gesture recognition method and device and terminal equipment
Technical Field
The present disclosure relates to the technical field of electronic devices, and in particular, to a gesture recognition method and apparatus, and a terminal device.
Background
With the increasing demand of users for convenience in use of electronic products, hands-free operation or gesture recognition will become a key factor for distinguishing high-end electronic products from other similar electronic products.
In the prior art, a video of a gesture to be recognized is shot through an infrared camera, and a movement track of a hand skeleton joint point is determined according to the position of the hand skeleton joint point in each frame of gesture image in the video of the gesture to be recognized, so that the gesture to be recognized is determined according to the movement track.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a gesture recognition method, device and terminal device.
According to a first aspect of the embodiments of the present disclosure, there is provided a gesture recognition method, including:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: acquiring a gesture video to be recognized; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.
In one possible design, the determining, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, a gesture video set to which the gesture video to be recognized belongs includes:
performing an acquisition operation, the acquisition operation comprising: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set;
when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of providing an implementation mode for determining a gesture video set to which a gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database. Compared with the prior art, the purpose of accurately determining the gesture video to be recognized is achieved, so that the gesture to be recognized in the gesture video to be recognized is accurately determined.
In one possible design, the obtaining a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining the label factors of the first type of gesture videos and the second type of gesture videos; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
and acquiring a support vector machine of the gesture video to be recognized according to the label factors of the first type of gesture video, the label factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.
In one possible design, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the tag factors of the first type of gesture videos, the tag factors of the second type of gesture videos and the similarity between the gesture video to be recognized and each preset gesture video, an implementation mode of a support vector machine of the gesture video to be recognized is obtained, so that whether the gesture video to be recognized belongs to the first gesture video set or not is further judged.
In one possible design, the method further includes:
according to the formula
Figure BDA0001309125020000032
Determining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure BDA0001309125020000033
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,represents said
Figure BDA0001309125020000035
And said XlkSimilarity between them, μlkRepresenting the second weighting factor.
In one possible design, the method further includes:
according to the formula
Figure BDA0001309125020000036
Determining the
Figure BDA0001309125020000037
And said XlkSimilarity between them
Figure BDA0001309125020000038
Wherein exp () represents an exponential functionThe number of the first and second groups is,
Figure BDA0001309125020000039
represents said
Figure BDA00013091250200000310
And said XlkAnd gamma represents a second predetermined constant.
In one possible design, the method further includes:
according to the formulaDetermining the
Figure BDA0001309125020000042
And said XlkThe distance between
Figure BDA0001309125020000043
Wherein the content of the first and second substances,
Figure BDA0001309125020000044
representing the function of the euclidean distance,
Figure BDA0001309125020000045
representing a first preset sparse affine sequence,
Figure BDA0001309125020000046
representing a second preset sparse affine sequence.
According to a second aspect of the embodiments of the present disclosure, there is provided a gesture recognition apparatus including:
the acquisition module is configured to acquire a gesture video to be recognized;
the first determining module is configured to determine a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
In one possible design, the first determining module includes:
an acquisition submodule configured to perform an acquisition operation, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
a first determining submodule configured to determine that the gesture video to be recognized belongs to the first gesture video set when the support vector machine is greater than 0;
and the second determining submodule is configured to, when the support vector machine is not greater than 0, take any one of other types of gesture video sets in the preset database as a new first gesture video set, return to the acquiring submodule to execute the acquiring operation, acquire a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video and the new second type of gesture video, and acquire a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video, and determine that the gesture video to be recognized belongs to the new first gesture video set until the new support vector machine is greater than 0.
In one possible design, the obtaining sub-module includes:
a determining unit configured to determine a tag factor of the first type of gesture video and a tag factor of the second type of gesture video; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
the acquisition unit is configured to acquire a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.
In one possible design, the obtaining unit is specifically configured to:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiBetweenB represents a first predetermined constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
In one possible design, the apparatus further includes:
a second determination module configured to determine a formula
Figure BDA0001309125020000052
Determining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure BDA0001309125020000053
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,
Figure BDA0001309125020000054
represents saidAnd said XlkSimilarity between them, μlkRepresenting the second weighting factor.
In one possible design, the apparatus further includes:
a third determination module configured to determine a formula
Figure BDA0001309125020000056
Determining the
Figure BDA0001309125020000057
And said XlkSimilarity between themWherein exp () represents an exponential function,
Figure BDA0001309125020000062
represents said
Figure BDA0001309125020000063
And said XlkAnd gamma represents a second predetermined constant.
In one possible design, the apparatus further includes:
a fourth determination module configured to determine a formula
Figure BDA0001309125020000064
Determining the
Figure BDA0001309125020000065
And said XlkThe distance between
Figure BDA0001309125020000066
Wherein the content of the first and second substances,
Figure BDA0001309125020000067
representing the function of the euclidean distance,
Figure BDA0001309125020000068
representing a first preset sparse affine sequence,representing a second preset sparse affine sequence.
According to a third aspect of the embodiments of the present disclosure, there is provided a terminal device, including: a processor and a memory for storing processor-executable instructions;
the processor is configured to:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method and the device for recognizing the gesture and the terminal equipment are provided, and the gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1A is a flow diagram illustrating a method of gesture recognition in accordance with an exemplary embodiment;
FIG. 1B is a schematic diagram illustrating image segmentation according to an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a method of gesture recognition in accordance with another exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of gesture recognition in accordance with another exemplary embodiment;
FIG. 4 is a block diagram illustrating a first embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 5 is a block diagram illustrating a second embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 6 is a block diagram illustrating a third embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 7 is a block diagram illustrating a fourth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 8 is a block diagram illustrating a fifth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 9 is a block diagram illustrating a sixth embodiment of a gesture recognition apparatus in accordance with an illustrative embodiment;
FIG. 10 is a block diagram illustrating a terminal device according to an example embodiment;
fig. 11 is a block diagram illustrating a terminal device 1200 according to an example embodiment.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
First, words related to the present disclosure will be explained:
the terminal devices to which the present disclosure relates may include, but are not limited to: the terminal may be a smart phone, a tablet computer, an electronic reader, a personal digital assistant, a smart television, smart glasses, or other terminals having an image capturing function, which is not limited in the embodiments of the present disclosure.
The Histogram of Oriented Gradient (HOG) feature to which the present disclosure relates is a feature descriptor used for object detection in computer vision and image processing. The HOG features are constructed by calculating and counting the histogram of gradient direction of local area of image.
Similar to the HOG feature, the Optical Flow Histogram (HOF) feature according to the present disclosure is to perform weighted statistics on the Optical Flow direction to obtain an Optical Flow direction information Histogram. Because the size of the target changes with time, the dimensionality of the corresponding optical flow feature descriptor also changes, and meanwhile, the calculation of the optical flow is sensitive to background noise, scale change and motion direction, a feature which can represent time domain action information and is insensitive to scale and motion direction based on the optical flow needs to be found, and the HOF is proposed based on the requirement.
Next, an application scenario of the embodiment of the present disclosure is introduced:
with the increasing demand of users for convenience in use of electronic products, hands-free operation or gesture recognition will become a key factor for distinguishing high-end electronic products from other similar electronic products. Therefore, research on gesture recognition technology is a very important research direction.
In the prior art, a video of a gesture to be recognized is shot through an infrared camera, and a movement track of a hand skeleton joint point is determined according to the position of the hand skeleton joint point in each frame of gesture image in the video of the gesture to be recognized, so that the gesture to be recognized is determined according to the movement track.
Another implementation manner of gesture recognition is provided in the embodiments of the present disclosure, and specific implementation manners are as follows:
the following describes a gesture recognition method, a gesture recognition device, and a terminal device according to embodiments of the present disclosure in detail with reference to the accompanying drawings.
Fig. 1A is a flow diagram illustrating a gesture recognition method according to an exemplary embodiment, and fig. 1B is a schematic diagram illustrating image segmentation according to an exemplary embodiment. The execution subject of this embodiment may be a gesture recognition apparatus in the terminal device, and the apparatus may be implemented by software and/or hardware. As shown in fig. 1A, the scheme of the present embodiment may include the following steps:
in step S101, a gesture video to be recognized is acquired.
In this step, the gesture recognition apparatus obtains a gesture video to be recognized through the image acquisition unit, optionally, the gesture video to be recognized includes: at least one frame of gesture image to be recognized, wherein each frame of gesture image to be recognized comprises: the gesture to be recognized. Optionally, the image acquisition unit may be any one of: the color camera and the infrared camera may also be other units having an image capturing function, which is not limited in the embodiment of the present disclosure.
Optionally, the implementation manner of obtaining the gesture video to be recognized by the image acquisition unit at least includes the following:
the first realizable way: the gesture recognition device acquires an original video (including at least one frame of original color image) through a color camera, and segments a hand image and a background image in each frame of original color image in the original video by adopting an image segmentation method based on skin color detection to obtain the gesture video to be recognized, which includes at least one frame of gesture image to be recognized (only including the hand image). For example, as shown in fig. 1B, an image segmentation method based on skin color detection is adopted to segment a hand image and a background image in an original color image of a certain frame, so as to obtain a gesture image to be recognized, which only includes the hand image. Optionally, a specific implementation manner of the image segmentation method based on skin color detection in the embodiment of the present disclosure may refer to an image segmentation method based on skin color detection in the prior art, which is not limited in the embodiment of the present disclosure.
The second realizable way: the gesture recognition device acquires an original video (including at least one frame of original depth image) through an infrared camera, and divides a hand image and a background image in each frame of original depth image in the original video by adopting an infrared image division method based on the infrared camera to obtain the gesture video to be recognized, which includes at least one frame of gesture image to be recognized (only including the hand image). Optionally, a specific implementation manner of the infrared image segmentation method based on the infrared camera in the embodiment of the present disclosure may refer to an infrared image segmentation method in the prior art, which is not limited in the embodiment of the present disclosure.
Of course, the implementation manner of obtaining the gesture video to be recognized through the image acquisition unit may also include other implementation manners, which is not limited in the embodiment of the present disclosure.
In step S102, a gesture video set to which the gesture video to be recognized belongs is determined according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database.
In the embodiment of the present disclosure, a preset database is preset in the gesture recognition device, and optionally, the preset database includes: at least one type of gesture video set (such as a power-on gesture video set, a power-off gesture video set, a channel changing gesture video set and the like); wherein each type of gesture video set comprises: at least one preset gesture video (for example, the power-on gesture video set comprises at least one preset power-on gesture video, and the power-off gesture video set comprises at least one preset power-off gesture video, etc.).
In this step, the gesture recognition device determines a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and each preset gesture video in a preset database. Optionally, the gesture recognition device determines, according to a similarity between the gesture video to be recognized and each preset gesture video in a preset database, whether the gesture video to be recognized belongs to a first gesture video set in the preset database (where the first gesture video set is any type of gesture video set in the preset database, for example, the first gesture video set is a power-on gesture video set); if the gesture video to be recognized is determined to belong to the first gesture video set, ending the process; if the gesture video to be recognized is determined not to belong to the first gesture video set, continuously judging whether the gesture video to be recognized belongs to a second gesture video set in the preset database (the second gesture video set is any other type of gesture video set except the first gesture video set in the preset database, for example, the second gesture video set is a shutdown gesture video set); if the gesture video to be recognized is determined to belong to the second gesture video set, ending the process; if the gesture video to be recognized does not belong to the second gesture video set, continuously judging whether the gesture video to be recognized belongs to a third gesture video set in the preset database (the third gesture video set is any other type of gesture video set except the first gesture video set and the second gesture video set in the preset database, for example, the third gesture video set is a channel change gesture video set), … …, and repeating the steps until the gesture video set to be recognized belongs to the gesture video set.
Optionally, the gesture recognition device determines a gesture video set (e.g., a second gesture video set) to which the gesture video to be recognized belongs, that is, determines a gesture to be recognized in the gesture video to be a preset gesture in a preset gesture video included in the gesture video set (e.g., the second gesture video set) to determine the gesture to be recognized, so as to determine a target operation corresponding to the gesture to be recognized further according to the gesture to be recognized and preset mapping information (including a corresponding relationship between at least one preset gesture and the target operation). For example, when the gesture to be recognized is determined to be a preset power-on gesture, the determined target operation corresponding to the gesture to be recognized is a power-on operation.
In the embodiment, a gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.
FIG. 2 is a flow chart illustrating a method of gesture recognition according to another exemplary embodiment. On the basis of the above embodiment, as shown in fig. 2, step S102 includes:
in step S102A, an acquisition operation is performed, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.
The first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos.
In this step, in order to determine whether the gesture video to be recognized belongs to the first gesture video set (e.g., a power-on gesture video set) in the preset database, the gesture recognition apparatus first divides the gesture video set in the preset database into a first type of gesture video (e.g., a power-on gesture video) and a second type of gesture video (e.g., a non-power-on gesture video) according to the type of the first gesture video set (e.g., the power-on gesture video set). For example, the gesture recognition device divides all preset gesture videos in the preset database into a power-on gesture video and a non-power-on gesture video.
Further, the gesture recognition device acquires a Support Vector Machine (SVM) of the gesture video to be recognized according to the first type of gesture video (for example, a power-on type gesture video), the second type of gesture video (for example, a non-power-on type gesture video), and the similarity between the gesture video to be recognized and each preset gesture video. Optionally, the gesture recognition device determines the label factor of the first type of gesture video (e.g., power-on type gesture video) and the label factor of the second type of gesture video (e.g., non-power-on type gesture video); the label factor of the first type of gesture video is equal to a first preset value (for example, 1) belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value (for example, -1) belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same; further, a support vector machine of the gesture video to be recognized is obtained according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.
Of course, the gesture recognition device may also obtain a support vector machine of the gesture video to be recognized in other manners according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video, which is not limited in the embodiment of the present disclosure.
When the support vector machine is greater than 0, executing step S102B; when the support vector machine is not greater than 0, determining that the gesture video to be recognized does not belong to the first gesture video set, and executing step S102C.
In step S102B, it is determined that the gesture video to be recognized belongs to the first gesture video set.
In step S102C, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determining that the gesture video to be recognized belongs to the new first gesture video set.
In this step, the gesture recognition device uses any one of the other types of gesture video sets (for example, a second gesture video set) in the preset database as a new first gesture video set to determine whether the gesture video to be recognized belongs to the new first gesture video set (for example, the second gesture video set) in the preset database; further, returning to execute the obtaining operation, so as to divide the gesture video set in the preset database into a new first type of gesture video (for example, a shutdown type gesture video) and a new second type of gesture video (for example, a non-shutdown type gesture video) according to the type of the new first gesture video set (for example, a second gesture video set, and the second gesture video set is a shutdown gesture video set), and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and the similarity between the gesture video to be recognized and each of the preset gesture videos; when the new support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the new first gesture video set (for example, a second gesture video set); when the new support vector machine is not larger than 0, taking any one of other types of gesture video sets (such as a third gesture video set) in the preset database as a new first gesture video set to judge whether the gesture video to be recognized belongs to the new first gesture video set (such as the third gesture video set) in the preset database; further, returning to execute the acquiring operation, dividing the gesture video sets in the preset database into a new first type gesture video (such as a channel changing type gesture video) and a new second type gesture video (such as a non-channel changing type gesture video) according to the types of the new first gesture video set (such as a third gesture video set and the third gesture video set is a channel changing gesture video set), and acquiring a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video, … …, and repeating the steps until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to a new first gesture video set.
Optionally, a new first type of gesture video and a new second type of gesture video are obtained according to the new first type of gesture video set, and an implementation manner of a new support vector machine for the gesture video to be recognized is obtained according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each preset gesture video, which may be referred to relevant parts of step S102A in the embodiment of the present disclosure, and is not described herein again.
In an embodiment of the present disclosure, by performing an obtaining operation, the obtaining operation includes: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set; when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set. Therefore, the purpose of accurately determining the gesture video to be recognized is achieved, and the gesture to be recognized in the gesture video to be recognized is accurately determined.
FIG. 3 is a flow chart illustrating a method of gesture recognition according to another exemplary embodiment. On the basis of the foregoing embodiment, as shown in fig. 3, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:
in step S301, a gesture video matrix X to be recognized corresponding to the gesture video to be recognized is determined according to a feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized.
In the step, a gesture recognition device determines a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to a feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: and m is an integer greater than or equal to 1 (namely, the maximum value of m is the total frame number of the gesture images to be recognized included in the gesture video to be recognized). Alternatively, the signature sequence may be a combination of one or more of: HOG characteristic sequence and HOF characteristic sequence; of course, the characteristic sequence may also include other sequences, which are not limited in the embodiments of the present disclosure.
Assuming that the feature sequence includes an HOF feature sequence, optionally, the gesture recognition apparatus determines the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized by extracting an optical flow (optical flow) of each frame of gesture image to be recognized (including only a hand image) in the gesture video to be recognized and according to the optical flow of each frame of gesture image to be recognized in the gesture video to be recognized. Optionally, an implementation process of determining the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized according to the optical flow of each frame of gesture image to be recognized in the gesture video to be recognized may refer to an implementation process of determining the HOF feature of an image according to the optical flow of an image in the prior art, which is not limited in the embodiment of the present disclosure. Of course, the HOF feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized may also be determined in other ways, which is not limited in the embodiment of the present disclosure.
Assuming that the feature sequence includes a HOG feature sequence, optionally, the gesture recognition apparatus determines the HOG feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized by extracting three primary colors (RGB) of each frame of gesture image to be recognized (including only hand images) in the gesture video to be recognized and according to the RGB of each frame of gesture image to be recognized in the gesture video to be recognized. Optionally, an implementation process of determining the HOG feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized according to RGB of each frame of the gesture image to be recognized in the gesture video to be recognized may refer to an implementation process of determining the HOG feature of an image according to RGB of an image in the prior art, which is not limited in the embodiment of the present disclosure. Of course, the HOG feature sequence of each frame of gesture image to be recognized in the gesture video to be recognized may also be determined in other ways, which is not limited in the embodiment of the present disclosure.
Assuming that the feature sequence includes an HOG feature sequence and an HOF feature sequence, the above-mentioned portion of "determining the HOG feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized" and the portion of "determining the HOF feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized" are combined, and details are not repeated here.
In step S302, according to a feature sequence of each frame of preset gesture image in an ith preset gesture video in the preset database, a preset gesture video matrix Y corresponding to the ith preset gesture video is determinedi
In this step, the gesture recognition device determines a preset gesture video moment corresponding to the ith preset gesture video according to the feature sequence of each frame of preset gesture image in the ith preset gesture video in the preset databaseMatrix Yi(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: and a feature sequence corresponding to the m-th frame of the ith preset gesture video is preset (namely, the maximum value of m is the total frame number of the preset gesture images included in the ith preset gesture video). Alternatively, the signature sequence may be a combination of one or more of: HOG characteristic sequence and HOF characteristic sequence; of course, the characteristic sequence may also include other sequences, which are not limited in the embodiments of the present disclosure.
Optionally, an implementation manner of determining the feature sequence of each frame of the preset gesture image in the ith preset gesture video may refer to the relevant part of "determining the feature sequence of each frame of the gesture image to be recognized in the gesture video to be recognized", and details are not repeated here.
In step S303, according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized.
In this step, the gesture recognition device determines the support vector machine of the gesture video to be recognized by taking the similarity between the gesture video to be recognized and each preset gesture video as a kernel function of the support vector machine. Optionally, the gesture recognition means is according to said formula
Figure BDA0001309125020000142
Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; if the ith preset gesture video belongs to the first type of gesture video (for example, the startup type of gesture video), the ith preset gesture video is a first type of gesture videoY isiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video (for example, a non-starting type gesture video), the yiEqual to said second preset value.
Optionally, the formula can also be used
Figure BDA0001309125020000143
The support vector machine f (x) of the gesture video to be recognized is determined by other equivalent or deformation formulas, which are not limited in the embodiment of the present disclosure.
Optionally, in this embodiment of the present disclosure, an achievable manner of obtaining a new support vector machine of the gesture video to be recognized according to the tag factor of the new first type of gesture video, the tag factor of the new second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video may be referred to as the achievable manner of obtaining the support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video, which is not described herein again in this embodiment of the present disclosure.
Optionally, in this disclosure, the size of the sequence number of the step is not limited to the order of execution, and the execution order of each step may be adjusted appropriately, which is not limited in this disclosure.
In the embodiment of the disclosure, how to obtain an implementation manner of a support vector machine of a gesture video to be recognized according to a tag factor of the first type of gesture video, a tag factor of the second type of gesture video, and a similarity between the gesture video to be recognized and each preset gesture video is provided, so as to further judge whether the gesture video to be recognized belongs to the first gesture video set.
Further, on the basis of the above embodiments, in the embodiment of the present disclosure, an implementable manner of determining the similarity between the gesture video to be recognized and any preset gesture video in the preset database (for example, the ith preset gesture video in the preset database) is explained:
in the embodiment of the disclosure, the gesture recognition device divides the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database into a time pyramid comprising L layers according to the same division rule; wherein each layer comprises K video matrices, K2lAnd l represents the l-th layer, l is a natural number which is greater than or equal to 0, and k represents the k-th video matrix of each layer. For example, the layer 0 (l ═ 0) of the time pyramid corresponding to the gesture video matrix X to be recognized includes: a complete gesture video matrix X to be recognized of the gesture video to be recognized and a preset gesture video matrix Yi(the preset gesture video matrix corresponding to the ith preset gesture video in the preset database) the 0 th layer (l ═ 0) of the time pyramid comprises: the complete preset gesture video matrix Y of the ith preset gesture videoi(ii) a The 1 st layer (l ═ 1) of the time pyramid corresponding to the gesture video matrix X to be recognized includes: the gesture video to be recognized is divided into two sub-gesture videos to be recognized, the two sub-gesture videos to be recognized respectively correspond to video matrixes, and a preset gesture video matrix Y is setiThe 1 st layer (l ═ 1) of the corresponding temporal pyramid includes: the video matrixes corresponding to the two sub-preset gesture videos after the ith preset gesture video is divided into two sub-preset gesture videos are respectively arranged; and so on.
In this step, the gesture recognition device is based on a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)iX); wherein, XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure BDA0001309125020000152
representing the preset gesture video matrix YiKth video of l-th layer of corresponding temporal pyramidThe matrix is a matrix of a plurality of matrices,
Figure BDA0001309125020000153
represents said
Figure BDA0001309125020000154
And said XlkSimilarity between them, μlkRepresenting the second weighting factor. Optionally, the μlk=1/2L-1Of course, the μlkAnd may be equal to other values, which are not limited in the embodiments of the present disclosure. Optionally, the gesture recognition device can also be according to the formula
Figure BDA0001309125020000155
Determining the gesture video matrix X to be recognized and the preset gesture video matrix Y by other equivalent or deformation formulasiSimilarity between K (Y)iX), which is not limited in the embodiments of the present disclosure.
Alternatively, the gesture recognition means is according to a formula
Figure BDA0001309125020000156
Determining the
Figure BDA0001309125020000157
And said XlkSimilarity between them
Figure BDA0001309125020000158
Wherein exp () represents an exponential function,
Figure BDA0001309125020000159
represents said
Figure BDA00013091250200001510
And said XlkAnd gamma represents a second predetermined constant. Optionally, the gesture recognition device can also be according to a formulaOther equivalent or deformation formulas for determining said
Figure BDA00013091250200001512
And said XlkThe similarity between the two is not limited in the embodiments of the present disclosure.
Optionally, the gesture recognition device is based on sparse affine packages
Figure BDA00013091250200001513
Determining the
Figure BDA00013091250200001514
And said XlkThe distance between
Figure BDA0001309125020000161
Alternatively, the gesture recognition means is according to a formula
Figure BDA0001309125020000162
Determining theAnd said XlkThe distance between
Figure BDA0001309125020000164
Wherein the content of the first and second substances,
Figure BDA0001309125020000165
representing the function of the euclidean distance,representing a first preset sparse affine sequence,
Figure BDA0001309125020000167
representing a second preset sparse affine sequence. Optionally, the gesture recognition device can also be according to the formula
Figure BDA0001309125020000168
Other equivalent or deformation formulas for determining said
Figure BDA0001309125020000169
And said XlkThe distance between
Figure BDA00013091250200001610
This is not a limitation in the embodiments of the present disclosure.
Optionally, the following embodiments of the disclosure are directed to determining sparse affine packages
Figure BDA00013091250200001611
Explains the realizations of:
determining a sparse affine package under the assumption of a sample gesture video W and a sample to-be-recognized gesture video ZCan be realized as follows:
Figure BDA00013091250200001613
wherein the content of the first and second substances,
Figure BDA00013091250200001614
represents said βiP represents βiThe number of preset sparse affine coefficients included (optionally, the same as the number of columns of W), βnRepresents the nth preset sparse affine coefficient of β, and q represents βnThe number of preset sparse affine coefficients included (optionally, the same number of columns as Z), arg () representing the parameter-solving function (optionally, so that
Figure BDA00013091250200001615
Reach β of minimum valueiAnd β), min () represents the minimum function,1representing absolute value functions, and λ representing a third predetermined constant (e.g., 0.1, 0.01, etc.). the first three equations described above in this paragraph are combined to solve βiTo obtain
Figure BDA00013091250200001616
And solving β to obtain
Figure BDA00013091250200001617
Of course, in the embodiment of the present disclosure, the sparse affine package may also be determined by other manners
Figure BDA00013091250200001618
This is not a limitation in the embodiments of the present disclosure.
In the disclosed embodiments, the determination is made by a sparse affine package based(the Preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid) and Xlk(the kth video matrix of the l layer of the time pyramid corresponding to the gesture video matrix X to be recognized) are obtained
Figure BDA00013091250200001620
Further, according to the
Figure BDA00013091250200001621
Determining the
Figure BDA00013091250200001622
And said XlkSimilarity between them
Figure BDA00013091250200001623
Further, according to theDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)iX); further, according to the tag factor of the ith preset gesture video (for example, the tag factor of the first type of gesture video or the tag factor of the second type of gesture video) and the k (Y)iX) determining a support vector machine f (X) of the gesture video to be recognized so as to further judge the gesture video to be recognized according to the support vector machine f (X)Identifying whether a gesture video belongs to the first gesture video set. Compared with the prior art, in the embodiment, the similarity between the gesture video to be recognized and the ith preset gesture video determined based on the sparse affine package is used as the kernel function of the support vector machine, so that the accuracy of gesture recognition is high.
Fig. 4 is a block diagram illustrating a first embodiment of a gesture recognition apparatus according to an exemplary embodiment. As shown in fig. 4, the gesture recognition apparatus 40 includes:
an obtaining module 401 configured to obtain a gesture video to be recognized;
a first determining module 402, configured to determine a gesture video set to which the gesture video to be recognized belongs according to similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
In the gesture recognition apparatus provided by the embodiment of the present disclosure, the obtaining module 401 obtains a gesture video to be recognized; further, the first determining module 402 determines a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, so as to determine that the gesture to be recognized is a preset gesture in a preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.
On the basis of the embodiment shown in fig. 4, fig. 5 is a block diagram of a second embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 5, the first determining module 402 includes:
an acquisition submodule 402A configured to perform an acquisition operation, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
a first determining submodule 402B configured to determine that the gesture video to be recognized belongs to the first gesture video set when the support vector machine is greater than 0;
a second determining submodule 402C configured to, when the support vector machine is not greater than 0, regard any one of the other types of gesture video sets in the preset database as a new first gesture video set, return to the obtaining submodule 402A to perform the obtaining operation, obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determine that the gesture video to be recognized belongs to the new first gesture video set.
On the basis of the embodiment shown in fig. 5, fig. 6 is a block diagram of a third embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 6, the acquisition submodule 402A includes:
a determining unit 402a1 configured to determine a tag factor of the first type of gesture video and a tag factor of the second type of gesture video; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
the obtaining unit 402a2 is configured to obtain a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video.
Optionally, the obtaining unit 402a2 is specifically configured to:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formula
Figure BDA0001309125020000181
Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
On the basis of the embodiment shown in fig. 6, fig. 7 is a block diagram of a fourth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 7, the gesture recognition apparatus 40 further includes:
a second determination module 403 configured to determine a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure BDA0001309125020000192
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,
Figure BDA0001309125020000193
represents said
Figure BDA0001309125020000194
And said XlkSimilarity between them, μlkRepresenting the second weighting factor.
On the basis of the embodiment shown in fig. 7, fig. 8 is a block diagram of a fifth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 8, the gesture recognition apparatus 40 further includes:
a third determination module 404 configured to determine a value based on a formula
Figure BDA0001309125020000195
Determining the
Figure BDA0001309125020000196
And said XlkSimilarity between themWherein exp () represents an exponential function,
Figure BDA0001309125020000198
represents said Yl i kAnd said XlkAnd gamma represents a second predetermined constant.
On the basis of the embodiment shown in fig. 8, fig. 9 is a block diagram of a sixth embodiment of a gesture recognition apparatus according to an exemplary embodiment. Referring to fig. 9, the gesture recognition apparatus 40 further includes:
a fourth determination module 405 configured to determine a formula
Figure BDA0001309125020000199
Determining the
Figure BDA00013091250200001910
And said XlkThe distance between
Figure BDA00013091250200001911
Wherein the content of the first and second substances,
Figure BDA00013091250200001912
representing the function of the euclidean distance,
Figure BDA00013091250200001913
representing a first preset sparse affine sequence,
Figure BDA00013091250200001914
representing a second preset sparse affine sequence.
The gesture recognition device provided by any one of the embodiments is used in the technical scheme of any one of the embodiments of the gesture recognition method disclosed by the disclosure, the implementation principle and the technical effect are similar, and a gesture video to be recognized is obtained; further, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, determining a gesture video set to which the gesture video to be recognized belongs, so as to determine that the gesture to be recognized is a preset gesture in the preset gesture video included in the gesture video set to which the gesture video to be recognized belongs. It can be seen that, in contrast to the prior art, another implementation of gesture recognition is provided in the embodiments of the present disclosure.
The internal functional modules and the structural schematic of the gesture recognition apparatus are described above, and the execution subject of the gesture recognition apparatus should be a terminal device, and fig. 10 is a block diagram of a terminal device according to an exemplary embodiment. Referring to fig. 10, the terminal device includes: a processor and a memory for storing processor-executable instructions;
the processor is configured to:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
In the above embodiments of the terminal device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in a hardware processor, or in a combination of hardware and software modules.
Fig. 11 is a block diagram illustrating a terminal device 1200 according to an example embodiment. Referring to fig. 11, terminal device 1200 may include one or more of the following components: processing component 1202, memory 1204, power component 1206, multimedia component 1208, audio component 1210, input/output (I/O) interface 1212, sensor component 1214, and communications component 1216.
The processing component 1202 generally controls overall operation of the terminal device 1200, such as operations associated with display, data communication, multimedia operations, and recording operations. The processing components 1202 may include one or more processors 1220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1202 can include one or more modules that facilitate interaction between the processing component 1202 and other components. For example, the processing component 1202 can include a multimedia module to facilitate interaction between the multimedia component 1208 and the processing component 1202.
The memory 1204 is configured to store various types of data to support operation at the terminal device 1200. Examples of such data include instructions for any application or method operating on terminal device 1200, various types of data, messages, pictures, videos, and so forth. The memory 1204 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply components 1206 provide power to the various components of terminal device 1200. Power components 1206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for terminal device 1200.
The multimedia component 1208 includes a screen providing an output interface between the terminal device 1200 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
Audio component 1210 is configured to output and/or input audio signals. For example, the audio component 1210 includes a Microphone (MIC) configured to receive an external audio signal when the terminal apparatus 1200 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1204 or transmitted via the communication component 1216. In some embodiments, audio assembly 1210 further includes a speaker for outputting audio signals.
The I/O interface 1212 provides an interface between the processing component 1202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc.
The sensor assembly 1214 includes one or more sensors for providing various aspects of state assessment for the terminal device 1200. For example, sensor assembly 1214 may detect an open/closed state of terminal device 1200, the relative positioning of components, such as a display and keypad of terminal device 1200, sensor assembly 1214 may also detect a change in position of terminal device 1200 or a component of terminal device 1200, the presence or absence of user contact with terminal device 1200, orientation or acceleration/deceleration of terminal device 1200, and a change in temperature of terminal device 1200. The sensor assembly 1214 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 1214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
Communications component 1216 is configured to facilitate communications between terminal device 1200 and other devices in a wired or wireless manner. The terminal device 1200 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1216 receives the broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1216 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the terminal device 1200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as memory 1204 comprising instructions, executable by processor 1220 of terminal device 1200 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium in which instructions, when executed by a processing component of a terminal device 1200, enable the terminal device 1200 to perform a gesture recognition method, the method comprising:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video.
Optionally, the determining, according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database, a gesture video set to which the gesture video to be recognized belongs includes:
performing an acquisition operation, the acquisition operation comprising: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set;
when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the obtaining operation to obtain a new first type of gesture video and a new second type of gesture video according to the new first type of gesture video set, and obtaining a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set.
Optionally, the obtaining a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining the label factors of the first type of gesture videos and the second type of gesture videos; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
and acquiring a support vector machine of the gesture video to be recognized according to the label factors of the first type of gesture video, the label factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video.
Optionally, the obtaining a support vector machine of the gesture video to be recognized according to the tag factor of the first type of gesture video, the tag factor of the second type of gesture video, and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formulaDetermining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
Optionally, the method further comprises:
according to the formula
Figure BDA0001309125020000232
Determining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure BDA0001309125020000241
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,
Figure BDA0001309125020000242
represents said
Figure BDA0001309125020000243
And the above-mentionedXlkSimilarity between them, μlkRepresenting the second weighting factor.
Optionally, the method further comprises:
according to the formula
Figure BDA0001309125020000244
Determining the
Figure BDA0001309125020000245
And said XlkSimilarity between them
Figure BDA0001309125020000246
Wherein exp () represents an exponential function,
Figure BDA0001309125020000247
represents saidAnd said XlkAnd gamma represents a second predetermined constant.
Optionally, the method further comprises:
according to the formula
Figure BDA0001309125020000249
Determining the
Figure BDA00013091250200002410
And said XlkThe distance between
Figure BDA00013091250200002411
Wherein the content of the first and second substances,
Figure BDA00013091250200002412
representing the function of the euclidean distance,
Figure BDA00013091250200002413
representing a first preset sparse affine sequence,representing a second preset sparse affine sequence.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (9)

1. A gesture recognition method, comprising:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video;
the determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database comprises the following steps:
performing an acquisition operation, the acquisition operation comprising: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set;
when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the acquisition operation to acquire a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and acquiring a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set;
the support vector machine for acquiring the gesture videos to be recognized according to the first type of gesture videos, the second type of gesture videos and the similarity between the gesture videos to be recognized and each preset gesture video comprises:
determining the label factors of the first type of gesture videos and the second type of gesture videos; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
acquiring a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video;
the obtaining of the support vector machine of the gesture video to be recognized according to the tag factors of the first type of gesture video, the tag factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formula
Figure FDA0002216982440000021
Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
2. The method of claim 1, further comprising:
according to the formula
Figure FDA0002216982440000022
Determining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure FDA0002216982440000023
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,
Figure FDA0002216982440000024
represents said
Figure FDA0002216982440000025
And said XlkSimilarity between them, μlkRepresenting the second weighting factor.
3. The method of claim 2, further comprising:
according to the formula
Figure FDA0002216982440000031
Determining the
Figure FDA0002216982440000032
And said XlkSimilarity between them
Figure FDA0002216982440000033
Wherein exp () represents an exponential function,
Figure FDA0002216982440000034
represents saidAnd said XlkAnd gamma represents a second predetermined constant.
4. The method of claim 3, further comprising:
according to the formula
Figure FDA0002216982440000036
Determining the
Figure FDA0002216982440000037
And said XlkThe distance betweenWherein the content of the first and second substances,
Figure FDA0002216982440000039
representing the function of the euclidean distance,
Figure FDA00022169824400000310
representing a first preset sparse affine sequence,
Figure FDA00022169824400000311
representing a second preset sparse affine sequence.
5. A gesture recognition apparatus, comprising:
the acquisition module is configured to acquire a gesture video to be recognized;
the first determining module is configured to determine a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video;
wherein the first determining module comprises:
an acquisition submodule configured to perform an acquisition operation, the acquisition operation including: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
a first determining submodule configured to determine that the gesture video to be recognized belongs to the first gesture video set when the support vector machine is greater than 0;
a second determining submodule configured to, when the support vector machine is not greater than 0, take any one of other types of gesture video sets in the preset database as a new first gesture video set, return to the obtaining submodule to perform the obtaining operation, obtain a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and obtain a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video, and a similarity between the gesture video to be recognized and each of the preset gesture videos, until the new support vector machine is greater than 0, determine that the gesture video to be recognized belongs to the new first gesture video set;
the acquisition submodule includes:
a determining unit configured to determine a tag factor of the first type of gesture video and a tag factor of the second type of gesture video; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
the acquisition unit is configured to acquire a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video;
the obtaining unit is specifically configured to:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formula
Figure FDA0002216982440000041
Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
6. The apparatus of claim 5, further comprising:
a second determination module configured to determine a formulaDetermining the gesture video matrix X to be recognized and the preset gesture video matrix YiSimilarity between K (Y)i,X);
The K represents the number of video matrixes included in each layer of the time pyramid comprising L layers, the gesture video matrix X to be recognized and the preset gesture video matrix corresponding to each preset gesture video in the preset database are divided into the number of video matrixes included in each layer of the time pyramid comprising L layers according to the same division rule, and the K is 2lL represents the l-th layer, l is a natural number greater than or equal to 0, k represents the k-th video matrix of each layer, and XlkRepresenting the kth video matrix of the ith layer of the time pyramid corresponding to the gesture video matrix X to be recognized,
Figure FDA0002216982440000051
representing the preset gesture video matrix YiThe kth video matrix of the l-th layer of the corresponding temporal pyramid,
Figure FDA0002216982440000052
represents saidAnd said XlkSimilarity between them, μlkRepresenting the second weighting factor.
7. The apparatus of claim 6, further comprising:
a third determination module configured to determine a formulaDetermining the
Figure FDA0002216982440000055
And said XlkSimilarity between them
Figure FDA0002216982440000056
Wherein exp () represents an exponential function,
Figure FDA0002216982440000057
represents said
Figure FDA0002216982440000058
And said XlkAnd gamma represents a second predetermined constant.
8. The apparatus of claim 7, further comprising:
a fourth determination module configured to determine a formula
Figure FDA0002216982440000059
Determining theAnd said XlkThe distance between
Figure FDA00022169824400000511
Wherein the content of the first and second substances,
Figure FDA00022169824400000512
representing the function of the euclidean distance,
Figure FDA00022169824400000513
representing a first preset sparse affine sequence,
Figure FDA00022169824400000514
representing a second preset sparse affine sequence.
9. A terminal device, comprising: a processor and a memory for storing processor-executable instructions;
the processor is configured to:
acquiring a gesture video to be recognized;
determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database; the preset database comprises at least one type of gesture video set, and each type of gesture video set comprises at least one preset gesture video;
the determining a gesture video set to which the gesture video to be recognized belongs according to the similarity between the gesture video to be recognized and a preset gesture video in a preset database comprises the following steps:
performing an acquisition operation, the acquisition operation comprising: according to the type of a first gesture video set in the preset database, dividing the gesture video set in the preset database into a first type of gesture video and a second type of gesture video; and acquiring a support vector machine of the gesture video to be recognized according to the first type of gesture video, the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video; the first gesture video set is any type of gesture video set in the preset database; the preset gesture videos included in the first gesture video set belong to the first type of gesture videos, and the preset gesture videos included in other gesture video sets except the first gesture video set in the preset database belong to the second type of gesture videos;
when the support vector machine is larger than 0, determining that the gesture video to be recognized belongs to the first gesture video set;
when the support vector machine is not larger than 0, taking any one of the other types of gesture video sets in the preset database as a new first gesture video set, returning to execute the acquisition operation to acquire a new first type of gesture video and a new second type of gesture video according to the new first gesture video set, and acquiring a new support vector machine of the gesture video to be recognized according to the new first type of gesture video, the new second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video until the new support vector machine is larger than 0, and determining that the gesture video to be recognized belongs to the new first gesture video set;
the support vector machine for acquiring the gesture videos to be recognized according to the first type of gesture videos, the second type of gesture videos and the similarity between the gesture videos to be recognized and each preset gesture video comprises:
determining the label factors of the first type of gesture videos and the second type of gesture videos; the label factor of the first type of gesture video is equal to a first preset value belonging to a positive number, the label factor of the second type of gesture video is equal to a second preset value belonging to a negative number, and the absolute values of the first preset value and the second preset value are the same;
acquiring a support vector machine of the gesture video to be recognized according to the label factor of the first type of gesture video, the label factor of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video;
the obtaining of the support vector machine of the gesture video to be recognized according to the tag factors of the first type of gesture video, the tag factors of the second type of gesture video and the similarity between the gesture video to be recognized and each preset gesture video includes:
determining a gesture video matrix X to be recognized corresponding to the gesture video to be recognized according to the characteristic sequence of each frame of gesture image to be recognized in the gesture video to be recognized; wherein, the mth column of the gesture video matrix X to be recognized comprises: a feature sequence corresponding to the mth frame of the gesture video to be recognized and the gesture image to be recognized, wherein m is an integer greater than or equal to 1;
determining a preset gesture video matrix Y corresponding to the ith preset gesture video according to the characteristic sequence of each frame of preset gesture image in the ith preset gesture video in the preset databasei(ii) a Wherein i is an integer greater than or equal to 1 and less than or equal to N, N is the number of preset gesture videos included in the preset database, and the preset gesture video matrix YiColumn m of (d) contains: a feature sequence corresponding to an m-th frame of preset gesture images of the ith preset gesture video;
according to the formula
Figure FDA0002216982440000071
Determining a support vector machine f (X) of the gesture video to be recognized, wherein sign () represents a sign function αiRepresents a first weighting coefficient, yiA label factor, κ (Y), representing the ith preset gesture videoiX) represents the gesture video matrix X to be recognized and the preset gesture video matrix YiB represents a first preset constant; wherein if the ith preset gesture video belongs to the first type of gesture video, then the yiEqual to the first preset value, if the ith preset gesture video belongs to the second type of gesture video, the y isiEqual to said second preset value.
CN201710398580.8A 2017-05-31 2017-05-31 Gesture recognition method and device and terminal equipment Active CN107133361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710398580.8A CN107133361B (en) 2017-05-31 2017-05-31 Gesture recognition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710398580.8A CN107133361B (en) 2017-05-31 2017-05-31 Gesture recognition method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN107133361A CN107133361A (en) 2017-09-05
CN107133361B true CN107133361B (en) 2020-02-07

Family

ID=59734033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710398580.8A Active CN107133361B (en) 2017-05-31 2017-05-31 Gesture recognition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN107133361B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108022543B (en) * 2017-11-27 2021-05-07 深圳中科呼图电子商务有限公司 Advertisement autonomous demonstration method and system, advertisement machine and application
CN108268835A (en) * 2017-12-28 2018-07-10 努比亚技术有限公司 sign language interpretation method, mobile terminal and computer readable storage medium
CN108596079B (en) * 2018-04-20 2021-06-15 歌尔光学科技有限公司 Gesture recognition method and device and electronic equipment
CN109284689A (en) * 2018-08-27 2019-01-29 苏州浪潮智能软件有限公司 A method of In vivo detection is carried out using gesture identification

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855488A (en) * 2011-06-30 2013-01-02 北京三星通信技术研究有限公司 Three-dimensional gesture recognition method and system
CN103092332A (en) * 2011-11-08 2013-05-08 苏州中茵泰格科技有限公司 Digital image interactive method and system of television
US20150138078A1 (en) * 2013-11-18 2015-05-21 Eyal Krupka Hand pose recognition using boosted look up tables
US20150177842A1 (en) * 2013-12-23 2015-06-25 Yuliya Rudenko 3D Gesture Based User Authorization and Device Control Methods
CN103745228B (en) * 2013-12-31 2017-01-11 清华大学 Dynamic gesture identification method on basis of Frechet distance
CN104299004B (en) * 2014-10-23 2018-05-01 浙江大学 A kind of gesture identification method based on multiple features fusion and finger tip detection

Also Published As

Publication number Publication date
CN107133361A (en) 2017-09-05

Similar Documents

Publication Publication Date Title
US11532180B2 (en) Image processing method and device and storage medium
TWI781359B (en) Face and hand association detection method and device, electronic device and computer-readable storage medium
CN106651955B (en) Method and device for positioning target object in picture
CN108121952B (en) Face key point positioning method, device, equipment and storage medium
CN111310616B (en) Image processing method and device, electronic equipment and storage medium
US10007841B2 (en) Human face recognition method, apparatus and terminal
RU2625340C1 (en) Method and device for processing video file identifier
US11455491B2 (en) Method and device for training image recognition model, and storage medium
KR20200131305A (en) Keypoint detection method, device, electronic device and storage medium
US20170220846A1 (en) Fingerprint template input method, device and medium
CN107133361B (en) Gesture recognition method and device and terminal equipment
US12008167B2 (en) Action recognition method and device for target object, and electronic apparatus
CN106648063B (en) Gesture recognition method and device
CN111435432B (en) Network optimization method and device, image processing method and device and storage medium
CN106557759B (en) Signpost information acquisition method and device
WO2020048392A1 (en) Application virus detection method, apparatus, computer device, and storage medium
RU2632578C2 (en) Method and device of characteristic extraction
CN110781323A (en) Method and device for determining label of multimedia resource, electronic equipment and storage medium
WO2020114236A1 (en) Keypoint detection method and apparatus, electronic device, and storage medium
CN105354560A (en) Fingerprint identification method and device
TW202036476A (en) Method, device and electronic equipment for image processing and storage medium thereof
CN107977636B (en) Face detection method and device, terminal and storage medium
CN110619325A (en) Text recognition method and device
CN107729886B (en) Method and device for processing face image
TWI770531B (en) Face recognition method, electronic device and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant