CN107368181B - Gesture recognition method and device - Google Patents

Gesture recognition method and device Download PDF

Info

Publication number
CN107368181B
CN107368181B CN201610316842.7A CN201610316842A CN107368181B CN 107368181 B CN107368181 B CN 107368181B CN 201610316842 A CN201610316842 A CN 201610316842A CN 107368181 B CN107368181 B CN 107368181B
Authority
CN
China
Prior art keywords
gesture
sliding window
window size
video sequence
sequence frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610316842.7A
Other languages
Chinese (zh)
Other versions
CN107368181A (en
Inventor
刘丽艳
赵颖
梁玲燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liguang Co
Original Assignee
Liguang Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liguang Co filed Critical Liguang Co
Priority to CN201610316842.7A priority Critical patent/CN107368181B/en
Publication of CN107368181A publication Critical patent/CN107368181A/en
Application granted granted Critical
Publication of CN107368181B publication Critical patent/CN107368181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a gesture recognition method and device, relates to a man-machine interaction technology, and can improve the accuracy of gesture recognition. The gesture recognition method comprises the following steps: acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size; matching at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template; acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window; acquiring a reference similarity parameter between a reference gesture video sequence frame and a preset gesture template; when the reference similarity parameter is smaller than the first similarity parameter, the reference sliding window size is taken as a sliding window size for recognizing the user gesture. The invention is mainly used in the gesture recognition technology.

Description

Gesture recognition method and device
Technical Field
The invention relates to a man-machine interaction technology, in particular to a gesture recognition method and device.
Background
Gesture recognition is a key component of natural human-machine interaction, which can help to build a more fluid "human-machine conversation" between human and machine. The machine may include a computer, a projector, and a wearable device that has been introduced in recent years.
In existing gesture recognition techniques, it is often necessary for a person to "learn" how to interact with a machine. For example, the machine prompts the user to "please get closer to a point again," "please wave a hand quickly," etc., and the user then makes corresponding gestures according to these prompts. However, such "learning" is not desired by the user. In a more natural human-computer interaction, users want computers to be able to "fit" people.
Template-based matching methods or learning-based methods are two methods widely applied in gesture recognition. Since gestures are "time-consuming to complete," sliding windows are widely used to locate the start and end of a gesture in successive video frames when performing gesture recognition. However, the frame rates of the cameras are different, and different people may have different speeds even though they have the same gesture. Thus, the same gesture will contain a different number of video frames from one instance to another in terms of length of time. Therefore, if a sliding window with a fixed size is adopted in gesture recognition, the accuracy of the gesture recognition is affected.
Disclosure of Invention
In view of this, the present invention provides a gesture recognition method and device, which can improve the accuracy of gesture recognition.
In order to solve the above technical problem, the present invention provides a gesture recognition method, including:
acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size;
matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template;
acquiring a reference sliding window size, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, wherein the reference sliding window size is different from the first sliding window size;
acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
and when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture.
Wherein, when the gesture made by the user is an initial gesture, the acquiring the first sliding window size includes:
identifying the initial gesture of the user to obtain an initial gesture video sequence frame;
acquiring an initial motion speed of the user's hand in the initial gesture video sequence frame;
and calculating the size of the first sliding window according to a preset movement speed, the size of a preset sliding window and the initial movement speed.
Wherein the initial motion speed of the user's hand in the initial gesture video sequence frame is obtained by the following formula:
Figure BDA0000988382390000021
wherein v isuserRepresenting the initial motion speed, m representing the number of frames of the initial gesture video sequence and m > 0, fcurrentRepresenting frame rate, Pi(i-0, … m-2) represents the position of the user's hand center in each of the initial gesture video sequence frames.
Wherein the first sliding window size is calculated from a preset movement speed, a preset sliding window size and the initial movement speed by the following formula:
Figure BDA0000988382390000022
wherein, sizeuserRepresenting said first sliding window size, vcommonRepresenting said preset movement speed, said sizecommonRepresenting said preset sliding window size, vuserRepresenting the initial movement speed.
Wherein, when the gesture made by the user is not an initial gesture, the acquiring the first sliding window size includes: and acquiring the first sliding window size stored in a storage unit.
Wherein the first similarity parameter is a first dynamic time warping distance; the acquiring a first similarity parameter between a first gesture video sequence frame identifying a user gesture and a preset gesture template comprises:
and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.
Wherein the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is greater than the second sliding window size and less than the third sliding window size;
the obtaining a reference sliding window size and obtaining a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size includes:
obtaining a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the second sliding window size;
obtaining a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the third sliding window size.
Wherein the reference similarity parameter comprises a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template;
the reference similarity parameter between the reference gesture video sequence frame and the preset gesture template is obtained:
acquiring a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method;
and acquiring a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.
Wherein the taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter includes:
when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture;
and when the third dynamic time warping distance is smaller than the first dynamic time warping distance, taking the third sliding window size as the sliding window size for recognizing the user gesture.
Wherein, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture includes:
when the second dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the second dynamic time warping distance;
and if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value within a preset time period, taking the second sliding window size as the sliding window size for identifying the user gesture.
Wherein, when the third dynamic time warping distance is less than the first dynamic time warping distance, taking the third sliding window size as a sliding window size for recognizing a user gesture includes:
when the third dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the third dynamic time warping distance;
and if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value within a preset time period, taking the third sliding window size as the sliding window size for identifying the user gesture.
In a second aspect, the present invention provides a gesture recognition apparatus, including:
the first video sequence frame acquisition module is used for acquiring the size of a first sliding window and acquiring at least one gesture video sequence frame according to the size of the first sliding window;
the first parameter acquisition module is used for respectively matching the at least one gesture video sequence frame with a preset gesture template to acquire a first similarity parameter between the first gesture video sequence frame with the recognized user gesture and the preset gesture template;
a second video sequence frame obtaining module, configured to obtain a reference sliding window size, and obtain a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, where the reference sliding window size is different from the first sliding window size;
the second parameter acquisition module is used for acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
and the parameter processing module is used for taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter.
Wherein the first video sequence frame acquisition module comprises:
the first gesture recognition submodule is used for recognizing the initial gesture of the user when the gesture made by the user is the initial gesture to obtain an initial gesture video sequence frame;
the speed acquisition submodule is used for acquiring the initial motion speed of the hand of the user in the initial gesture video sequence frame;
the first sliding window size obtaining submodule is used for calculating the first sliding window size according to a preset movement speed, a preset sliding window size and the initial movement speed;
and the first video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
Wherein the first video sequence frame acquisition module comprises:
a second sliding window size obtaining submodule, configured to obtain the first sliding window size stored in the storage unit when the gesture performed by the user is not the initial gesture;
and the second video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
Wherein the first similarity parameter is a first dynamic time warping distance; the first parameter obtaining module is specifically configured to: and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.
Wherein the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is greater than the second sliding window size and less than the third sliding window size;
the second video sequence frame acquisition module comprises:
a third video sequence frame obtaining submodule, configured to obtain a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the second sliding window;
and the fourth video sequence frame acquisition submodule is used for acquiring a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the third sliding window.
Wherein the reference similarity parameter comprises a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template;
the second parameter obtaining module comprises:
the first parameter obtaining submodule is used for obtaining a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method;
and the second parameter obtaining submodule is used for obtaining a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.
Wherein the parameter processing module comprises:
a comparison submodule, configured to compare the second dynamic time warping distance with the first dynamic time warping distance, and compare the third dynamic time warping distance with the first dynamic time warping distance, respectively;
a first parameter selection submodule, configured to, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, take the second sliding window size as a sliding window size for recognizing a user gesture;
and the second parameter selection submodule is used for taking the third sliding window size as the sliding window size for recognizing the user gesture when the third dynamic time warping distance is smaller than the first dynamic time warping distance.
Wherein the first parameter selection submodule comprises:
the first counting unit is used for accumulating the count value corresponding to the second dynamic time warping distance when the second dynamic time warping distance is smaller than the first dynamic time warping distance;
and the first selection unit is used for taking the second sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value in a preset time period.
Wherein the second parameter selection submodule comprises:
the second counting unit is used for accumulating the count value corresponding to the third dynamic time warping distance when the third dynamic time warping distance is smaller than the first dynamic time warping distance;
and the second selection unit is used for taking the third sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value in a preset time period.
The technical scheme of the invention has the following beneficial effects:
in the embodiment of the invention, a first similarity parameter between a first gesture video sequence frame and a preset gesture template, which are used for identifying the user gesture by using the first sliding window size, is obtained, a reference similarity parameter between a reference gesture video sequence frame and the preset gesture template, which are obtained from the first gesture video sequence frame by using the reference sliding window size, is obtained, and the reference similarity parameter is compared with the first similarity parameter. And when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture. As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.
Drawings
FIG. 1 is a flowchart illustrating a gesture recognition method according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a gesture recognition method according to a second embodiment of the present invention;
FIG. 3 is a diagram illustrating a gesture recognition apparatus according to a third embodiment of the present invention;
FIG. 4 is a diagram of an electronic device according to a fourth embodiment of the invention;
fig. 5 is a schematic diagram of a gesture recognition system according to a fifth embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the present invention will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 1, the gesture recognition method according to the first embodiment of the present invention includes:
and step 11, acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size.
In a specific application, if the method of the embodiment of the present invention is not performed for the first time, that is, the gesture made by the user is not the initial gesture, the first sliding window size may be obtained by reading the storage unit. That is, in this case, the first sliding window size is the sliding window size for recognizing the gesture of the user determined after the gesture recognition is performed by the method according to the embodiment of the present invention. If the method of the embodiment of the present invention is performed for the first time, that is, the gesture made by the user is an initial gesture, the first sliding window size may be obtained through calculation.
Since the user's gesture has a certain persistence, it may be desirable to obtain more than one frame of image including the user's hand if the user's gesture is to be recognized in a particular application. These images are referred to herein as gesture video sequence frames. After the first sliding window size is obtained, then at least one gesture video sequence frame may be acquired according to the first sliding window size.
And step 12, matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template.
When the gesture of the user is recognized by using the first sliding window size, the gesture is mainly recognized by using a template matching method. Template matching is a widely used method in the field of gesture recognition. In the gesture recognition process, a preset gesture template is matched with a series of video frames in a sliding window to judge whether the video frame sequence in the window is a specific gesture.
Usually, the preset gesture template is also composed of a series of video frames, and the window size is obtained according to the average speed of a specific gesture; the size of the sliding window used in a particular gesture recognition process is specifically relevant to a particular user, and thus is not always the same. Therefore, a Dynamic Time Warping (DTW) method may be used to calculate the similarity between a preset gesture template and a series of video frames within a sliding window. When the similarity meets a certain condition, the gesture in the preset gesture template is recognized in a series of video frames in a certain sliding window.
In time series, the length of two time series to be compared for similarity may not be equal. The DTW method calculates the similarity between two time series by extending and shortening the time series.
Suppose Q and C represent hands, respectivelyA sequence of potential template video frames and a sequence of user hand motion video frames, their lengths being n and m, respectively, wherein: q ═ Q1,q2,…qn;C=c1,c2,…cm
In order to align these two sequences, it is necessary to construct an n × m matrix grid, where the element (i, j) in the matrix grid represents qiAnd cjDistance d (q) between two pointsi,cj) (i.e., the similarity between each point of the sequence Q and each point of C, the smaller the distance, the higher the similarity). In general, the distance between two points is expressed by Euclidean distance, where d (q) isi,cj)=(qi-cj)2. Each matrix element (i, j) represents a point qiAnd cjIs aligned. The dynamic programming algorithm can be summarized as finding a path through a plurality of grid points in the grid, wherein the grid points through which the path passes are aligned points for calculating the two sequences. The optimal path between Q and C is the path between which the warping cost is the smallest. Usually we also refer to the computed minimum warping cost (warping cost) as the similarity between two sequences, i.e. the DTW distance.
According to the DTW algorithm, the similarity between two sequences can be calculated by the following formula (1):
Figure BDA0000988382390000091
the form of the normalization path is W ═ ω12,…ωkMax (| Q |, | C |)<=K<The length of the two time sequences Q and C is Q and C, respectively. When the DTW (Q, C) is less than a predetermined value, the sequence of the user's hand motion video frames can be considered as a gesture. Here, the video sequence frame in which the user gesture is recognized is referred to as a first video sequence frame.
In an embodiment of the present invention, the first similarity parameter refers to a first dynamic time warping distance. By recognizing the gesture, a first dynamic time regular distance between a first gesture video sequence frame with the recognized user gesture and a preset gesture template can be obtained according to the formula (1) at the same time.
And step 13, acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window.
In the embodiment of the present invention, the reference sliding window size is arbitrarily set in advance and is different from the first sliding window size. And according to the acquired reference sliding window size, intercepting a video sequence frame with a corresponding length from the first gesture video sequence frame, wherein the video sequence frame is called a reference gesture video sequence frame.
Wherein the reference sliding window size may be set to 1, and may be greater than or less than the first sliding window size. Or, to further improve the gesture recognition efficiency, the reference sliding window sizes may be set to 2, where one of the reference sliding window sizes is larger than the first sliding window size, and the other is set to be smaller than the first sliding window size.
And step 14, acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template.
The reference similarity parameter refers to a DTW distance between the reference gesture video sequence frame and the preset gesture template. Also, in this step, a DTW distance between the reference gesture video sequence frame and the preset gesture template may be obtained by using a DTW method.
And step 15, when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as the sliding window size for recognizing the user gesture.
In this step, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture; and when the third dynamic time warping distance is smaller than the first dynamic time warping distance, taking the third sliding window size as the sliding window size for recognizing the user gesture.
As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.
In the second embodiment, the implementation process of the embodiment of the present invention is described in detail by taking the first time of performing gesture recognition by using the method of the embodiment of the present invention as an example. As shown in fig. 2, the gesture recognition method according to the second embodiment of the present invention includes:
and step 21, determining the size of the initial sliding window. The method comprises the following steps:
and step 21a, identifying the initial gesture of the user to obtain an initial gesture video sequence frame.
When the gesture made by the user is an initial gesture, the user can complete the initial gesture according to the prompt of the machine. The initial gesture may be a hand waving motion.
When the user starts to make an initial gesture, the camera device is used for tracking the hand of the user to obtain a plurality of continuous video frames comprising the motion of the hand of the user. Here, it is not necessary to actually recognize that the user has made a gesture or whether the gesture is completed, but only to obtain a plurality of consecutive video frames including the motion of the user's hand. The plurality of consecutive video frames comprising the motion of the user's hand is referred to herein as initial gesture video sequence frames. Assume that m represents the number of frames in the initial gesture video sequence frame and m > 0.
And step 21b, acquiring an initial motion speed of the hand of the user in the initial gesture video sequence frame.
In this embodiment, the initial movement speed is calculated according to the following formula (2).
Figure BDA0000988382390000101
Wherein v isuserRepresenting the initial motion speed, m representing the number of frames of the initial gesture video sequence and m > 0, fcurrentRepresenting frame rate, Pi(i-0, … m-2) represents the position of the user's hand center in each of the initial gesture video sequence frames.
And step 21c, calculating the size of the initial sliding window according to the preset movement speed, the size of the preset sliding window and the initial movement speed.
Specifically, the initial sliding window size is calculated by the following formula (3):
Figure BDA0000988382390000111
wherein, sizeuserRepresenting the initial sliding window size, vcommonRepresenting said preset movement speed, said sizecommonRepresenting said preset sliding window size, vuserRepresenting the initial movement speed. Wherein, the preset movement speed and the preset sliding window size can be preset.
After the initial sliding window size is determined, if the user has a new gesture, in the embodiment of the present invention, the initial sliding window size may be used for recognition.
Step 22, recognizing the user gesture using the initial sliding window size.
And step 23, acquiring a first DTW distance between the first gesture video sequence frame with the recognized user gesture and a preset gesture template.
Specifically, in this step, the aforementioned DTW method may still be used to identify a new gesture of the user, and obtain the first DTW distance.
And 24, acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window.
Here, it is assumed that the reference sliding window size includes a second sliding window sizeuserε and the third sliding window sizeuser+ ε, wherein sizeuserε, ε may be set based on empirical values.
Here, a first sub-reference gesture video sequence frame is obtained from the first gesture video sequence frame according to the second sliding window size, and a second sub-reference gesture video sequence frame is obtained from the first gesture video sequence frame according to the third sliding window size. I.e. two sequences of video frames of different lengths are obtained here.
The user's hand does not always maintain a constant speed of movement during the user's interaction with the computer. The user's hand may be moving faster or slower than the speed of movement at the time of the initial gesture. In this case, the sliding window size set in the initial stage should also be updated in time to accommodate such a change by the user. Therefore, the invention provides a method adopting online learning to enable the computer to actively learn and adapt to the change of the behavior pattern of the user.
In a particular application, steps 22-24 may be repeated for new gestures made by the user at a later time to obtain samples of learned user gestures. In the sample, a predetermined number of first sub-reference gesture video sequence frames, a predetermined number of second sub-reference gesture video sequence frames, and a corresponding predetermined number of first DTW distances are included. Wherein the predetermined number can be arbitrarily set. Here, the obtained samples may be expressed as:
Figure BDA0000988382390000121
D={d1,d2,…dn}
wherein, gi' (i-1, … n) is each first sub-reference gesture video sequence frame,
Figure BDA0000988382390000122
for each second sub-reference gesture video sequence frame, di(i ═ 1, … n) indicates each first DTW distance. Where the n value represents a predetermined number, which can be arbitrarily set.
And step 25, for the samples, respectively obtaining a second DTW distance between each first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method. Then n second DTW distances are available here.
And step 26, for the samples, respectively obtaining a third DTW distance between each second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method. Then n third DTW distances are available here.
And 27, comparing the magnitude of each first DTW distance and the magnitude of each second DTW distance respectively for the n first DTW distances and the n second DTW distances.
And step 28, if the second DTW distance is smaller than the first DTW distance, accumulating the count value corresponding to the second DTW distance. If the second DTW distance is greater than the first DTW distance, step 27 is performed.
Steps 27 and 28 are repeatedly performed until all of the first and second DTW distances are compared or a preset time period elapses. Wherein the preset time period may be preset.
And step 29, comparing the count value corresponding to the second dynamic time warping distance in the preset time period with the first threshold value.
Step 210, if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value within a preset time period, taking the second sliding window size as the sliding window size for recognizing the user gesture. Otherwise, step 211 is performed.
And step 211, comparing the magnitude of each first DTW distance and the magnitude of each third DTW distance respectively for the n first DTW distances and the n third DTW distances.
And step 212, if the third DTW distance is smaller than the first DTW distance, accumulating the count value corresponding to the third DTW distance. If the third DTW distance is greater than the first DTW distance, step 211 is performed.
Steps 211 and 212 are repeatedly performed until all of the first DTW distances and the third DTW distances are compared or a preset time period elapses. Wherein the preset time period may be preset.
And step 213, comparing the count value corresponding to the third dynamic time warping distance in the preset time period with a second threshold value.
And 214, if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value within a preset time period, taking the third sliding window size as the sliding window size for recognizing the user gesture. Otherwise, step 215 is performed.
Step 215, the first sliding window size is used as the sliding window size for recognizing the gesture of the user.
The preset time period can be set arbitrarily, the first threshold and the second threshold can also be set arbitrarily, and the first threshold and the second threshold can be set to be the same or different.
Here, the size of the sliding window for recognizing the user's gesture is determined by judging the change in the speed of the user's hand movement within a certain period of time through steps 27-215. Therefore, the size of the sliding window determined in the mode is more suitable for the speed of the hand movement of the user, and the accuracy of the gesture recognition of the user is further improved.
In the foregoing processes, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not limit the implementation processes of the embodiments of the present invention.
As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.
As shown in fig. 3, a user behavior analysis device according to a third embodiment of the present invention includes:
a first video sequence frame obtaining module 31, configured to obtain a first sliding window size, and obtain at least one gesture video sequence frame according to the first sliding window size;
the first parameter obtaining module 32 is configured to match the at least one gesture video sequence frame with a preset gesture template, and obtain a first similarity parameter between the first gesture video sequence frame in which the user gesture is recognized and the preset gesture template;
a second video sequence frame obtaining module 33, configured to obtain a reference sliding window size, and obtain a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, where the reference sliding window size is different from the first sliding window size;
a second parameter obtaining module 34, configured to obtain a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
a parameter processing module 35, configured to, when the reference similarity parameter is smaller than the first similarity parameter, take the reference sliding window size as a sliding window size for recognizing a user gesture.
Wherein the first video sequence frame acquisition module 31 may include:
the first gesture recognition submodule is used for recognizing the initial gesture of the user when the gesture made by the user is the initial gesture to obtain an initial gesture video sequence frame; the speed acquisition submodule is used for acquiring the initial motion speed of the hand of the user in the initial gesture video sequence frame; the first sliding window size obtaining submodule is used for calculating the first sliding window size according to a preset movement speed, a preset sliding window size and the initial movement speed; and the first video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
Alternatively, the first video sequence frame acquiring module 31 may further include: a second sliding window size obtaining submodule, configured to obtain the first sliding window size stored in the storage unit when the gesture performed by the user is not the initial gesture; and the second video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
In a specific application, the first similarity parameter is a first dynamic time warping distance; the first parameter obtaining module 32 is specifically configured to: and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.
In a specific application, in order to further improve the accuracy of gesture recognition, the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is larger than the second sliding window size and smaller than the third sliding window size. Accordingly, the video sequence frame acquisition module 33 may include: a first video sequence frame obtaining sub-module, configured to obtain a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the second sliding window; and the second video sequence frame acquisition submodule is used for acquiring a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the third sliding window.
At this time, the reference similarity parameter includes a second dynamic time regular distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time regular distance between the second sub-reference gesture video sequence frame and the preset gesture template.
The second parameter obtaining module 34 may include: the first parameter obtaining submodule is used for obtaining a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method; and the second parameter obtaining submodule is used for obtaining a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.
Wherein, the parameter processing module 35 may include:
a comparison submodule, configured to compare the second dynamic time warping distance with the first dynamic time warping distance, and compare the third dynamic time warping distance with the first dynamic time warping distance, respectively; a first parameter selection submodule, configured to, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, take the second sliding window size as a sliding window size for recognizing a user gesture; and the second parameter selection submodule is used for taking the third sliding window size as the sliding window size for recognizing the user gesture when the third dynamic time warping distance is smaller than the first dynamic time warping distance.
In order to further improve the accuracy of gesture recognition, the first parameter selection submodule comprises: the first counting unit is used for accumulating the count value corresponding to the second dynamic time warping distance when the second dynamic time warping distance is smaller than the first dynamic time warping distance; and the first selection unit is used for taking the second sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value in a preset time period.
In order to further improve the accuracy of gesture recognition, the second parameter selection submodule includes: the second counting unit is used for accumulating the count value corresponding to the third dynamic time warping distance when the third dynamic time warping distance is smaller than the first dynamic time warping distance; and the second selection unit is used for taking the third sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value in a preset time period.
The working principle of the device according to the invention can be referred to the description of the method embodiment described above.
As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.
As shown in fig. 4, a fourth embodiment of the present invention further provides an electronic device, which can implement the processes of the embodiments shown in fig. 1-2 of the present invention. The electronic device can be a Personal Computer (PC), a tablet PC, various smart devices (including a smart phone), and the like. As shown in fig. 4, the electronic device may include: the device comprises a shell 41, a processor 42, a memory 43, a circuit board 44 and a power circuit 45, wherein the circuit board 44 is arranged inside a space enclosed by the shell 41, and the processor 42 and the memory 43 are arranged on the circuit board 44; a power supply circuit 45 for supplying power to each circuit or device of the electronic apparatus; the memory 43 is used for storing executable program code; the processor 42 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 43, for performing the steps of:
acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size;
matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template;
acquiring a reference sliding window size, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, wherein the reference sliding window size is different from the first sliding window size;
acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
and when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture.
As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.
As shown in fig. 5, a fifth embodiment of the present invention further provides a gesture recognition system, including: a camera 51, an image processing unit 52, a display device 56, a CPU 57, and a RAM (Random-Access Memory) 58. The image processing unit 52 includes an initialization unit 53, a gesture recognition unit 54, and an online learning and updating unit 55. In a specific application, the display device 56 may be a television, a display device composed of a projector and a projection screen, and other display devices. Wherein the image processing unit 52 is adapted to perform the procedures of the aforementioned method embodiments.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (12)

1. A gesture recognition method, comprising:
acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size;
matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template;
acquiring a reference sliding window size, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, wherein the reference sliding window size is different from the first sliding window size;
acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
and when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture.
2. The method of claim 1, wherein when the gesture made by the user is an initial gesture, the obtaining a first sliding window size comprises:
identifying the initial gesture of the user to obtain an initial gesture video sequence frame;
acquiring an initial motion speed of the user's hand in the initial gesture video sequence frame;
and calculating the size of the first sliding window according to a preset movement speed, the size of a preset sliding window and the initial movement speed.
3. The method of claim 1, wherein when the gesture made by the user is not an initial gesture, the obtaining a first sliding window size comprises: and acquiring the first sliding window size stored in a storage unit.
4. The method of claim 1, wherein the first similarity parameter is a first dynamic time warping distance; the acquiring a first similarity parameter between a first gesture video sequence frame identifying a user gesture and a preset gesture template comprises:
and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.
5. The method of claim 4, wherein the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is greater than the second sliding window size and less than the third sliding window size;
the obtaining a reference sliding window size and obtaining a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size includes:
obtaining a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the second sliding window size;
obtaining a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the third sliding window size.
6. The method of claim 5, wherein the reference similarity parameters comprise a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template;
the reference similarity parameter between the reference gesture video sequence frame and the preset gesture template is obtained:
acquiring a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method;
and acquiring a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.
7. The method of claim 6, wherein the taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter comprises:
when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture;
and when the third dynamic time warping distance is smaller than the first dynamic time warping distance, taking the third sliding window size as the sliding window size for recognizing the user gesture.
8. The method of claim 7, wherein the taking the second sliding window size as the sliding window size for recognizing the user gesture when the second dynamic time warping distance is less than the first dynamic time warping distance comprises:
when the second dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the second dynamic time warping distance;
and if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value within a preset time period, taking the second sliding window size as the sliding window size for identifying the user gesture.
9. The method of claim 7, wherein taking the third sliding window size as a sliding window size for recognizing a user gesture when the third dynamic time warping distance is less than the first dynamic time warping distance comprises:
when the third dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the third dynamic time warping distance;
and if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value within a preset time period, taking the third sliding window size as the sliding window size for identifying the user gesture.
10. A gesture recognition apparatus, comprising:
the first video sequence frame acquisition module is used for acquiring the size of a first sliding window and acquiring at least one gesture video sequence frame according to the size of the first sliding window;
the first parameter acquisition module is used for respectively matching the at least one gesture video sequence frame with a preset gesture template to acquire a first similarity parameter between the first gesture video sequence frame with the recognized user gesture and the preset gesture template;
a second video sequence frame obtaining module, configured to obtain a reference sliding window size, and obtain a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, where the reference sliding window size is different from the first sliding window size;
the second parameter acquisition module is used for acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;
and the parameter processing module is used for taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter.
11. The apparatus of claim 10, wherein the first video sequence frame acquisition module comprises:
the first gesture recognition submodule is used for recognizing the initial gesture of the user when the gesture made by the user is the initial gesture to obtain an initial gesture video sequence frame;
the speed acquisition submodule is used for acquiring the initial motion speed of the hand of the user in the initial gesture video sequence frame;
the first sliding window size obtaining submodule is used for calculating the first sliding window size according to a preset movement speed, a preset sliding window size and the initial movement speed;
and the first video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
12. The apparatus of claim 10, wherein the first video sequence frame acquisition module comprises:
a second sliding window size obtaining submodule, configured to obtain the first sliding window size stored in the storage unit when the gesture performed by the user is not the initial gesture;
and the second video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.
CN201610316842.7A 2016-05-12 2016-05-12 Gesture recognition method and device Active CN107368181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610316842.7A CN107368181B (en) 2016-05-12 2016-05-12 Gesture recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610316842.7A CN107368181B (en) 2016-05-12 2016-05-12 Gesture recognition method and device

Publications (2)

Publication Number Publication Date
CN107368181A CN107368181A (en) 2017-11-21
CN107368181B true CN107368181B (en) 2020-01-14

Family

ID=60304615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610316842.7A Active CN107368181B (en) 2016-05-12 2016-05-12 Gesture recognition method and device

Country Status (1)

Country Link
CN (1) CN107368181B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110308786B (en) * 2018-03-20 2023-12-26 厦门歌乐电子企业有限公司 Vehicle-mounted equipment and gesture recognition method thereof
CN110163130B (en) * 2019-05-08 2021-05-28 清华大学 Feature pre-alignment random forest classification system and method for gesture recognition
CN111178308A (en) * 2019-12-31 2020-05-19 北京奇艺世纪科技有限公司 Gesture track recognition method and device
CN112121280B (en) * 2020-08-31 2022-04-01 浙江大学 Control method and control system of heart sound box
JP7264547B1 (en) 2022-03-02 2023-04-25 株式会社ベネモ Motion recognition method and motion recognition system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1860429A (en) * 2003-09-30 2006-11-08 皇家飞利浦电子股份有限公司 Gesture to define location, size, and/or content of content window on a display
CN103745228A (en) * 2013-12-31 2014-04-23 清华大学 Dynamic gesture identification method on basis of Frechet distance
KR20140076395A (en) * 2012-12-12 2014-06-20 삼성전자주식회사 Display apparatus for excuting applications and method for controlling thereof
US9268457B2 (en) * 2012-07-13 2016-02-23 Google Inc. Touch-based fluid window management

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150053438A (en) * 2013-11-08 2015-05-18 한국전자통신연구원 Stereo matching system and method for generating disparity map using the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1860429A (en) * 2003-09-30 2006-11-08 皇家飞利浦电子股份有限公司 Gesture to define location, size, and/or content of content window on a display
US9268457B2 (en) * 2012-07-13 2016-02-23 Google Inc. Touch-based fluid window management
KR20140076395A (en) * 2012-12-12 2014-06-20 삼성전자주식회사 Display apparatus for excuting applications and method for controlling thereof
CN103745228A (en) * 2013-12-31 2014-04-23 清华大学 Dynamic gesture identification method on basis of Frechet distance

Also Published As

Publication number Publication date
CN107368181A (en) 2017-11-21

Similar Documents

Publication Publication Date Title
CN107368181B (en) Gesture recognition method and device
US10990803B2 (en) Key point positioning method, terminal, and computer storage medium
CN108960163B (en) Gesture recognition method, device, equipment and storage medium
US10043308B2 (en) Image processing method and apparatus for three-dimensional reconstruction
CN104350509B (en) Quick attitude detector
WO2020078017A1 (en) Method and apparatus for recognizing handwriting in air, and device and computer-readable storage medium
WO2019041519A1 (en) Target tracking device and method, and computer-readable storage medium
CN109657533A (en) Pedestrian recognition methods and Related product again
US12008167B2 (en) Action recognition method and device for target object, and electronic apparatus
EP2336949B1 (en) Apparatus and method for registering plurality of facial images for face recognition
US20210027046A1 (en) Method and apparatus for multi-face tracking of a face effect, and electronic device
CN104049760B (en) The acquisition methods and system of a kind of man-machine interaction order
EP4024270A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
CN112364799A (en) Gesture recognition method and device
CN110427849B (en) Face pose determination method and device, storage medium and electronic equipment
Ruan et al. Dynamic gesture recognition based on improved DTW algorithm
CN112464833A (en) Dynamic gesture recognition method, device, equipment and storage medium based on optical flow
CN110633004A (en) Interaction method, device and system based on human body posture estimation
US20210158031A1 (en) Gesture Recognition Method, and Electronic Device and Storage Medium
CN113343812A (en) Gesture recognition method and device, storage medium and electronic equipment
CN112529939A (en) Target track matching method and device, machine readable medium and equipment
CN111611941B (en) Special effect processing method and related equipment
US20170085784A1 (en) Method for image capturing and an electronic device using the method
CN115439375B (en) Training method and device of image deblurring model and application method and device
JP6397508B2 (en) Method and apparatus for generating a personal input panel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant