CN107368181B

CN107368181B - Gesture recognition method and device

Info

Publication number: CN107368181B
Application number: CN201610316842.7A
Authority: CN
Inventors: 刘丽艳; 赵颖; 梁玲燕
Original assignee: Liguang Co
Current assignee: Liguang Co
Priority date: 2016-05-12
Filing date: 2016-05-12
Publication date: 2020-01-14
Anticipated expiration: 2036-05-12
Also published as: CN107368181A

Abstract

The invention provides a gesture recognition method and device, relates to a man-machine interaction technology, and can improve the accuracy of gesture recognition. The gesture recognition method comprises the following steps: acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size; matching at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template; acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window; acquiring a reference similarity parameter between a reference gesture video sequence frame and a preset gesture template; when the reference similarity parameter is smaller than the first similarity parameter, the reference sliding window size is taken as a sliding window size for recognizing the user gesture. The invention is mainly used in the gesture recognition technology.

Description

Gesture recognition method and device

Technical Field

The invention relates to a man-machine interaction technology, in particular to a gesture recognition method and device.

Background

Gesture recognition is a key component of natural human-machine interaction, which can help to build a more fluid "human-machine conversation" between human and machine. The machine may include a computer, a projector, and a wearable device that has been introduced in recent years.

In existing gesture recognition techniques, it is often necessary for a person to "learn" how to interact with a machine. For example, the machine prompts the user to "please get closer to a point again," "please wave a hand quickly," etc., and the user then makes corresponding gestures according to these prompts. However, such "learning" is not desired by the user. In a more natural human-computer interaction, users want computers to be able to "fit" people.

Template-based matching methods or learning-based methods are two methods widely applied in gesture recognition. Since gestures are "time-consuming to complete," sliding windows are widely used to locate the start and end of a gesture in successive video frames when performing gesture recognition. However, the frame rates of the cameras are different, and different people may have different speeds even though they have the same gesture. Thus, the same gesture will contain a different number of video frames from one instance to another in terms of length of time. Therefore, if a sliding window with a fixed size is adopted in gesture recognition, the accuracy of the gesture recognition is affected.

Disclosure of Invention

In view of this, the present invention provides a gesture recognition method and device, which can improve the accuracy of gesture recognition.

In order to solve the above technical problem, the present invention provides a gesture recognition method, including:

acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size;

matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template;

acquiring a reference sliding window size, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, wherein the reference sliding window size is different from the first sliding window size;

acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;

and when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture.

Wherein, when the gesture made by the user is an initial gesture, the acquiring the first sliding window size includes:

identifying the initial gesture of the user to obtain an initial gesture video sequence frame;

acquiring an initial motion speed of the user's hand in the initial gesture video sequence frame;

and calculating the size of the first sliding window according to a preset movement speed, the size of a preset sliding window and the initial movement speed.

Wherein the initial motion speed of the user's hand in the initial gesture video sequence frame is obtained by the following formula:

wherein v is_userRepresenting the initial motion speed, m representing the number of frames of the initial gesture video sequence and m > 0, f_currentRepresenting frame rate, P_i(i-0, … m-2) represents the position of the user's hand center in each of the initial gesture video sequence frames.

Wherein the first sliding window size is calculated from a preset movement speed, a preset sliding window size and the initial movement speed by the following formula:

wherein, size_userRepresenting said first sliding window size, v_commonRepresenting said preset movement speed, said size_commonRepresenting said preset sliding window size, v_userRepresenting the initial movement speed.

Wherein, when the gesture made by the user is not an initial gesture, the acquiring the first sliding window size includes: and acquiring the first sliding window size stored in a storage unit.

Wherein the first similarity parameter is a first dynamic time warping distance; the acquiring a first similarity parameter between a first gesture video sequence frame identifying a user gesture and a preset gesture template comprises:

and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.

Wherein the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is greater than the second sliding window size and less than the third sliding window size;

the obtaining a reference sliding window size and obtaining a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size includes:

obtaining a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the second sliding window size;

obtaining a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the third sliding window size.

Wherein the reference similarity parameter comprises a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template;

the reference similarity parameter between the reference gesture video sequence frame and the preset gesture template is obtained:

acquiring a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method;

and acquiring a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.

Wherein the taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter includes:

when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture;

and when the third dynamic time warping distance is smaller than the first dynamic time warping distance, taking the third sliding window size as the sliding window size for recognizing the user gesture.

Wherein, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture includes:

when the second dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the second dynamic time warping distance;

and if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value within a preset time period, taking the second sliding window size as the sliding window size for identifying the user gesture.

Wherein, when the third dynamic time warping distance is less than the first dynamic time warping distance, taking the third sliding window size as a sliding window size for recognizing a user gesture includes:

when the third dynamic time warping distance is smaller than the first dynamic time warping distance, accumulating the count value corresponding to the third dynamic time warping distance;

and if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value within a preset time period, taking the third sliding window size as the sliding window size for identifying the user gesture.

In a second aspect, the present invention provides a gesture recognition apparatus, including:

the first video sequence frame acquisition module is used for acquiring the size of a first sliding window and acquiring at least one gesture video sequence frame according to the size of the first sliding window;

the first parameter acquisition module is used for respectively matching the at least one gesture video sequence frame with a preset gesture template to acquire a first similarity parameter between the first gesture video sequence frame with the recognized user gesture and the preset gesture template;

a second video sequence frame obtaining module, configured to obtain a reference sliding window size, and obtain a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, where the reference sliding window size is different from the first sliding window size;

the second parameter acquisition module is used for acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;

and the parameter processing module is used for taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter.

Wherein the first video sequence frame acquisition module comprises:

the first gesture recognition submodule is used for recognizing the initial gesture of the user when the gesture made by the user is the initial gesture to obtain an initial gesture video sequence frame;

the speed acquisition submodule is used for acquiring the initial motion speed of the hand of the user in the initial gesture video sequence frame;

the first sliding window size obtaining submodule is used for calculating the first sliding window size according to a preset movement speed, a preset sliding window size and the initial movement speed;

and the first video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.

Wherein the first video sequence frame acquisition module comprises:

a second sliding window size obtaining submodule, configured to obtain the first sliding window size stored in the storage unit when the gesture performed by the user is not the initial gesture;

and the second video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.

Wherein the first similarity parameter is a first dynamic time warping distance; the first parameter obtaining module is specifically configured to: and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.

the second video sequence frame acquisition module comprises:

a third video sequence frame obtaining submodule, configured to obtain a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the second sliding window;

and the fourth video sequence frame acquisition submodule is used for acquiring a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the third sliding window.

the second parameter obtaining module comprises:

the first parameter obtaining submodule is used for obtaining a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method;

and the second parameter obtaining submodule is used for obtaining a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.

Wherein the parameter processing module comprises:

a comparison submodule, configured to compare the second dynamic time warping distance with the first dynamic time warping distance, and compare the third dynamic time warping distance with the first dynamic time warping distance, respectively;

a first parameter selection submodule, configured to, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, take the second sliding window size as a sliding window size for recognizing a user gesture;

and the second parameter selection submodule is used for taking the third sliding window size as the sliding window size for recognizing the user gesture when the third dynamic time warping distance is smaller than the first dynamic time warping distance.

Wherein the first parameter selection submodule comprises:

the first counting unit is used for accumulating the count value corresponding to the second dynamic time warping distance when the second dynamic time warping distance is smaller than the first dynamic time warping distance;

and the first selection unit is used for taking the second sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value in a preset time period.

Wherein the second parameter selection submodule comprises:

the second counting unit is used for accumulating the count value corresponding to the third dynamic time warping distance when the third dynamic time warping distance is smaller than the first dynamic time warping distance;

and the second selection unit is used for taking the third sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value in a preset time period.

The technical scheme of the invention has the following beneficial effects:

in the embodiment of the invention, a first similarity parameter between a first gesture video sequence frame and a preset gesture template, which are used for identifying the user gesture by using the first sliding window size, is obtained, a reference similarity parameter between a reference gesture video sequence frame and the preset gesture template, which are obtained from the first gesture video sequence frame by using the reference sliding window size, is obtained, and the reference similarity parameter is compared with the first similarity parameter. And when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as a sliding window size for recognizing the user gesture. As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.

Drawings

FIG. 1 is a flowchart illustrating a gesture recognition method according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating a gesture recognition method according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating a gesture recognition apparatus according to a third embodiment of the present invention;

FIG. 4 is a diagram of an electronic device according to a fourth embodiment of the invention;

fig. 5 is a schematic diagram of a gesture recognition system according to a fifth embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the present invention will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

As shown in fig. 1, the gesture recognition method according to the first embodiment of the present invention includes:

and step 11, acquiring a first sliding window size, and acquiring at least one gesture video sequence frame according to the first sliding window size.

In a specific application, if the method of the embodiment of the present invention is not performed for the first time, that is, the gesture made by the user is not the initial gesture, the first sliding window size may be obtained by reading the storage unit. That is, in this case, the first sliding window size is the sliding window size for recognizing the gesture of the user determined after the gesture recognition is performed by the method according to the embodiment of the present invention. If the method of the embodiment of the present invention is performed for the first time, that is, the gesture made by the user is an initial gesture, the first sliding window size may be obtained through calculation.

Since the user's gesture has a certain persistence, it may be desirable to obtain more than one frame of image including the user's hand if the user's gesture is to be recognized in a particular application. These images are referred to herein as gesture video sequence frames. After the first sliding window size is obtained, then at least one gesture video sequence frame may be acquired according to the first sliding window size.

And step 12, matching the at least one gesture video sequence frame with a preset gesture template respectively to obtain a first similarity parameter between a first gesture video sequence frame with a recognized user gesture and the preset gesture template.

When the gesture of the user is recognized by using the first sliding window size, the gesture is mainly recognized by using a template matching method. Template matching is a widely used method in the field of gesture recognition. In the gesture recognition process, a preset gesture template is matched with a series of video frames in a sliding window to judge whether the video frame sequence in the window is a specific gesture.

Usually, the preset gesture template is also composed of a series of video frames, and the window size is obtained according to the average speed of a specific gesture; the size of the sliding window used in a particular gesture recognition process is specifically relevant to a particular user, and thus is not always the same. Therefore, a Dynamic Time Warping (DTW) method may be used to calculate the similarity between a preset gesture template and a series of video frames within a sliding window. When the similarity meets a certain condition, the gesture in the preset gesture template is recognized in a series of video frames in a certain sliding window.

In time series, the length of two time series to be compared for similarity may not be equal. The DTW method calculates the similarity between two time series by extending and shortening the time series.

Suppose Q and C represent hands, respectivelyA sequence of potential template video frames and a sequence of user hand motion video frames, their lengths being n and m, respectively, wherein: q ═ Q₁,q₂,…q_n；C＝c₁,c₂,…c_m。

In order to align these two sequences, it is necessary to construct an n × m matrix grid, where the element (i, j) in the matrix grid represents q_iAnd c_jDistance d (q) between two points_i,c_j) (i.e., the similarity between each point of the sequence Q and each point of C, the smaller the distance, the higher the similarity). In general, the distance between two points is expressed by Euclidean distance, where d (q) is_i,c_j)＝(q_i-c_j)². Each matrix element (i, j) represents a point q_iAnd c_jIs aligned. The dynamic programming algorithm can be summarized as finding a path through a plurality of grid points in the grid, wherein the grid points through which the path passes are aligned points for calculating the two sequences. The optimal path between Q and C is the path between which the warping cost is the smallest. Usually we also refer to the computed minimum warping cost (warping cost) as the similarity between two sequences, i.e. the DTW distance.

According to the DTW algorithm, the similarity between two sequences can be calculated by the following formula (1):

the form of the normalization path is W ═ ω₁,ω₂,…ω_kMax (| Q |, | C |)<＝K<The length of the two time sequences Q and C is Q and C, respectively. When the DTW (Q, C) is less than a predetermined value, the sequence of the user's hand motion video frames can be considered as a gesture. Here, the video sequence frame in which the user gesture is recognized is referred to as a first video sequence frame.

In an embodiment of the present invention, the first similarity parameter refers to a first dynamic time warping distance. By recognizing the gesture, a first dynamic time regular distance between a first gesture video sequence frame with the recognized user gesture and a preset gesture template can be obtained according to the formula (1) at the same time.

And step 13, acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window.

In the embodiment of the present invention, the reference sliding window size is arbitrarily set in advance and is different from the first sliding window size. And according to the acquired reference sliding window size, intercepting a video sequence frame with a corresponding length from the first gesture video sequence frame, wherein the video sequence frame is called a reference gesture video sequence frame.

Wherein the reference sliding window size may be set to 1, and may be greater than or less than the first sliding window size. Or, to further improve the gesture recognition efficiency, the reference sliding window sizes may be set to 2, where one of the reference sliding window sizes is larger than the first sliding window size, and the other is set to be smaller than the first sliding window size.

And step 14, acquiring a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template.

The reference similarity parameter refers to a DTW distance between the reference gesture video sequence frame and the preset gesture template. Also, in this step, a DTW distance between the reference gesture video sequence frame and the preset gesture template may be obtained by using a DTW method.

And step 15, when the reference similarity parameter is smaller than the first similarity parameter, taking the reference sliding window size as the sliding window size for recognizing the user gesture.

In this step, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, taking the second sliding window size as a sliding window size for recognizing a user gesture; and when the third dynamic time warping distance is smaller than the first dynamic time warping distance, taking the third sliding window size as the sliding window size for recognizing the user gesture.

As can be seen from the above, in the embodiment of the present invention, when performing gesture recognition on a user, the size of the sliding window for gesture recognition may be determined by comparing similarity parameters between different gesture video sequence frames and a preset gesture video sequence. Therefore, compared with the prior art, the scheme of the embodiment of the invention can flexibly adjust and acquire the size of the sliding window in real time according to the change of the movement speed of the hand of the user so as to identify the gesture of the user, thereby improving the accuracy of identifying the gesture of the user.

In the second embodiment, the implementation process of the embodiment of the present invention is described in detail by taking the first time of performing gesture recognition by using the method of the embodiment of the present invention as an example. As shown in fig. 2, the gesture recognition method according to the second embodiment of the present invention includes:

and step 21, determining the size of the initial sliding window. The method comprises the following steps:

and step 21a, identifying the initial gesture of the user to obtain an initial gesture video sequence frame.

When the gesture made by the user is an initial gesture, the user can complete the initial gesture according to the prompt of the machine. The initial gesture may be a hand waving motion.

When the user starts to make an initial gesture, the camera device is used for tracking the hand of the user to obtain a plurality of continuous video frames comprising the motion of the hand of the user. Here, it is not necessary to actually recognize that the user has made a gesture or whether the gesture is completed, but only to obtain a plurality of consecutive video frames including the motion of the user's hand. The plurality of consecutive video frames comprising the motion of the user's hand is referred to herein as initial gesture video sequence frames. Assume that m represents the number of frames in the initial gesture video sequence frame and m > 0.

And step 21b, acquiring an initial motion speed of the hand of the user in the initial gesture video sequence frame.

In this embodiment, the initial movement speed is calculated according to the following formula (2).

And step 21c, calculating the size of the initial sliding window according to the preset movement speed, the size of the preset sliding window and the initial movement speed.

Specifically, the initial sliding window size is calculated by the following formula (3):

wherein, size_userRepresenting the initial sliding window size, v_commonRepresenting said preset movement speed, said size_commonRepresenting said preset sliding window size, v_userRepresenting the initial movement speed. Wherein, the preset movement speed and the preset sliding window size can be preset.

After the initial sliding window size is determined, if the user has a new gesture, in the embodiment of the present invention, the initial sliding window size may be used for recognition.

Step 22, recognizing the user gesture using the initial sliding window size.

And step 23, acquiring a first DTW distance between the first gesture video sequence frame with the recognized user gesture and a preset gesture template.

Specifically, in this step, the aforementioned DTW method may still be used to identify a new gesture of the user, and obtain the first DTW distance.

And 24, acquiring the size of a reference sliding window, and acquiring a reference gesture video sequence frame from the first gesture video sequence frame according to the size of the reference sliding window.

Here, it is assumed that the reference sliding window size includes a second sliding window size_userε and the third sliding window size_user+ ε, wherein size_userε, ε may be set based on empirical values.

Here, a first sub-reference gesture video sequence frame is obtained from the first gesture video sequence frame according to the second sliding window size, and a second sub-reference gesture video sequence frame is obtained from the first gesture video sequence frame according to the third sliding window size. I.e. two sequences of video frames of different lengths are obtained here.

The user's hand does not always maintain a constant speed of movement during the user's interaction with the computer. The user's hand may be moving faster or slower than the speed of movement at the time of the initial gesture. In this case, the sliding window size set in the initial stage should also be updated in time to accommodate such a change by the user. Therefore, the invention provides a method adopting online learning to enable the computer to actively learn and adapt to the change of the behavior pattern of the user.

In a particular application, steps 22-24 may be repeated for new gestures made by the user at a later time to obtain samples of learned user gestures. In the sample, a predetermined number of first sub-reference gesture video sequence frames, a predetermined number of second sub-reference gesture video sequence frames, and a corresponding predetermined number of first DTW distances are included. Wherein the predetermined number can be arbitrarily set. Here, the obtained samples may be expressed as:

D＝{d₁,d₂,…d_n}

wherein, g_i' (i-1, … n) is each first sub-reference gesture video sequence frame,

for each second sub-reference gesture video sequence frame, d_i(i ═ 1, … n) indicates each first DTW distance. Where the n value represents a predetermined number, which can be arbitrarily set.

And step 25, for the samples, respectively obtaining a second DTW distance between each first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method. Then n second DTW distances are available here.

And step 26, for the samples, respectively obtaining a third DTW distance between each second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method. Then n third DTW distances are available here.

And 27, comparing the magnitude of each first DTW distance and the magnitude of each second DTW distance respectively for the n first DTW distances and the n second DTW distances.

And step 28, if the second DTW distance is smaller than the first DTW distance, accumulating the count value corresponding to the second DTW distance. If the second DTW distance is greater than the first DTW distance, step 27 is performed.

Steps 27 and 28 are repeatedly performed until all of the first and second DTW distances are compared or a preset time period elapses. Wherein the preset time period may be preset.

And step 29, comparing the count value corresponding to the second dynamic time warping distance in the preset time period with the first threshold value.

Step 210, if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value within a preset time period, taking the second sliding window size as the sliding window size for recognizing the user gesture. Otherwise, step 211 is performed.

And step 211, comparing the magnitude of each first DTW distance and the magnitude of each third DTW distance respectively for the n first DTW distances and the n third DTW distances.

And step 212, if the third DTW distance is smaller than the first DTW distance, accumulating the count value corresponding to the third DTW distance. If the third DTW distance is greater than the first DTW distance, step 211 is performed.

Steps 211 and 212 are repeatedly performed until all of the first DTW distances and the third DTW distances are compared or a preset time period elapses. Wherein the preset time period may be preset.

And step 213, comparing the count value corresponding to the third dynamic time warping distance in the preset time period with a second threshold value.

And 214, if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value within a preset time period, taking the third sliding window size as the sliding window size for recognizing the user gesture. Otherwise, step 215 is performed.

Step 215, the first sliding window size is used as the sliding window size for recognizing the gesture of the user.

The preset time period can be set arbitrarily, the first threshold and the second threshold can also be set arbitrarily, and the first threshold and the second threshold can be set to be the same or different.

Here, the size of the sliding window for recognizing the user's gesture is determined by judging the change in the speed of the user's hand movement within a certain period of time through steps 27-215. Therefore, the size of the sliding window determined in the mode is more suitable for the speed of the hand movement of the user, and the accuracy of the gesture recognition of the user is further improved.

In the foregoing processes, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not limit the implementation processes of the embodiments of the present invention.

As shown in fig. 3, a user behavior analysis device according to a third embodiment of the present invention includes:

a first video sequence frame obtaining module 31, configured to obtain a first sliding window size, and obtain at least one gesture video sequence frame according to the first sliding window size;

the first parameter obtaining module 32 is configured to match the at least one gesture video sequence frame with a preset gesture template, and obtain a first similarity parameter between the first gesture video sequence frame in which the user gesture is recognized and the preset gesture template;

a second video sequence frame obtaining module 33, configured to obtain a reference sliding window size, and obtain a reference gesture video sequence frame from the first gesture video sequence frame according to the reference sliding window size, where the reference sliding window size is different from the first sliding window size;

a second parameter obtaining module 34, configured to obtain a reference similarity parameter between the reference gesture video sequence frame and the preset gesture template;

a parameter processing module 35, configured to, when the reference similarity parameter is smaller than the first similarity parameter, take the reference sliding window size as a sliding window size for recognizing a user gesture.

Wherein the first video sequence frame acquisition module 31 may include:

the first gesture recognition submodule is used for recognizing the initial gesture of the user when the gesture made by the user is the initial gesture to obtain an initial gesture video sequence frame; the speed acquisition submodule is used for acquiring the initial motion speed of the hand of the user in the initial gesture video sequence frame; the first sliding window size obtaining submodule is used for calculating the first sliding window size according to a preset movement speed, a preset sliding window size and the initial movement speed; and the first video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.

Alternatively, the first video sequence frame acquiring module 31 may further include: a second sliding window size obtaining submodule, configured to obtain the first sliding window size stored in the storage unit when the gesture performed by the user is not the initial gesture; and the second video sequence frame acquisition submodule is used for acquiring at least one gesture video sequence frame according to the size of the first sliding window.

In a specific application, the first similarity parameter is a first dynamic time warping distance; the first parameter obtaining module 32 is specifically configured to: and acquiring a first dynamic time warping distance between a first gesture video sequence frame for recognizing the user gesture and a preset gesture template by using a dynamic time warping method.

In a specific application, in order to further improve the accuracy of gesture recognition, the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is larger than the second sliding window size and smaller than the third sliding window size. Accordingly, the video sequence frame acquisition module 33 may include: a first video sequence frame obtaining sub-module, configured to obtain a first sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the second sliding window; and the second video sequence frame acquisition submodule is used for acquiring a second sub-reference gesture video sequence frame from the first gesture video sequence frame according to the size of the third sliding window.

At this time, the reference similarity parameter includes a second dynamic time regular distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time regular distance between the second sub-reference gesture video sequence frame and the preset gesture template.

The second parameter obtaining module 34 may include: the first parameter obtaining submodule is used for obtaining a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method; and the second parameter obtaining submodule is used for obtaining a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template by using a dynamic time warping method.

Wherein, the parameter processing module 35 may include:

a comparison submodule, configured to compare the second dynamic time warping distance with the first dynamic time warping distance, and compare the third dynamic time warping distance with the first dynamic time warping distance, respectively; a first parameter selection submodule, configured to, when the second dynamic time warping distance is smaller than the first dynamic time warping distance, take the second sliding window size as a sliding window size for recognizing a user gesture; and the second parameter selection submodule is used for taking the third sliding window size as the sliding window size for recognizing the user gesture when the third dynamic time warping distance is smaller than the first dynamic time warping distance.

In order to further improve the accuracy of gesture recognition, the first parameter selection submodule comprises: the first counting unit is used for accumulating the count value corresponding to the second dynamic time warping distance when the second dynamic time warping distance is smaller than the first dynamic time warping distance; and the first selection unit is used for taking the second sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the second dynamic time warping distance exceeds a first threshold value in a preset time period.

In order to further improve the accuracy of gesture recognition, the second parameter selection submodule includes: the second counting unit is used for accumulating the count value corresponding to the third dynamic time warping distance when the third dynamic time warping distance is smaller than the first dynamic time warping distance; and the second selection unit is used for taking the third sliding window size as the sliding window size for identifying the user gesture if the count value corresponding to the third dynamic time warping distance exceeds a second threshold value in a preset time period.

The working principle of the device according to the invention can be referred to the description of the method embodiment described above.

As shown in fig. 4, a fourth embodiment of the present invention further provides an electronic device, which can implement the processes of the embodiments shown in fig. 1-2 of the present invention. The electronic device can be a Personal Computer (PC), a tablet PC, various smart devices (including a smart phone), and the like. As shown in fig. 4, the electronic device may include: the device comprises a shell 41, a processor 42, a memory 43, a circuit board 44 and a power circuit 45, wherein the circuit board 44 is arranged inside a space enclosed by the shell 41, and the processor 42 and the memory 43 are arranged on the circuit board 44; a power supply circuit 45 for supplying power to each circuit or device of the electronic apparatus; the memory 43 is used for storing executable program code; the processor 42 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 43, for performing the steps of:

As shown in fig. 5, a fifth embodiment of the present invention further provides a gesture recognition system, including: a camera 51, an image processing unit 52, a display device 56, a CPU 57, and a RAM (Random-Access Memory) 58. The image processing unit 52 includes an initialization unit 53, a gesture recognition unit 54, and an online learning and updating unit 55. In a specific application, the display device 56 may be a television, a display device composed of a projector and a projection screen, and other display devices. Wherein the image processing unit 52 is adapted to perform the procedures of the aforementioned method embodiments.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A gesture recognition method, comprising:

2. The method of claim 1, wherein when the gesture made by the user is an initial gesture, the obtaining a first sliding window size comprises:

3. The method of claim 1, wherein when the gesture made by the user is not an initial gesture, the obtaining a first sliding window size comprises: and acquiring the first sliding window size stored in a storage unit.

4. The method of claim 1, wherein the first similarity parameter is a first dynamic time warping distance; the acquiring a first similarity parameter between a first gesture video sequence frame identifying a user gesture and a preset gesture template comprises:

5. The method of claim 4, wherein the reference sliding window size comprises a second sliding window size and a third sliding window size; wherein the first sliding window size is greater than the second sliding window size and less than the third sliding window size;

6. The method of claim 5, wherein the reference similarity parameters comprise a second dynamic time warping distance between the first sub-reference gesture video sequence frame and the preset gesture template, and a third dynamic time warping distance between the second sub-reference gesture video sequence frame and the preset gesture template;

7. The method of claim 6, wherein the taking the reference sliding window size as the sliding window size for recognizing the user gesture when the reference similarity parameter is smaller than the first similarity parameter comprises:

8. The method of claim 7, wherein the taking the second sliding window size as the sliding window size for recognizing the user gesture when the second dynamic time warping distance is less than the first dynamic time warping distance comprises:

9. The method of claim 7, wherein taking the third sliding window size as a sliding window size for recognizing a user gesture when the third dynamic time warping distance is less than the first dynamic time warping distance comprises:

10. A gesture recognition apparatus, comprising:

11. The apparatus of claim 10, wherein the first video sequence frame acquisition module comprises:

12. The apparatus of claim 10, wherein the first video sequence frame acquisition module comprises: