CN106228121B

CN106228121B - Gesture feature recognition method and device

Info

Publication number: CN106228121B
Application number: CN201610559968.7A
Authority: CN
Inventors: 刘琼; 程驰; 杨铀; 喻莉
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2016-07-15
Filing date: 2016-07-15
Publication date: 2019-12-06
Anticipated expiration: 2036-07-15
Also published as: CN106228121A

Abstract

the invention discloses a gesture feature recognition method and device. The gesture feature recognition method comprises the following steps: calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result; decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result; extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence; forming key nodes into a key point set of the gesture depth map sequence; and performing gesture recognition according to the key point set to obtain a gesture recognition result. By the method and the device, the problem of low descriptive performance of the gesture characteristics caused by more time domain redundant information is solved.

Description

Gesture feature recognition method and device

Technical Field

The invention relates to the field of gesture recognition, in particular to a gesture feature recognition method and device.

Background

The dynamic gesture recognition method based on the depth map sequence mainly utilizes the global features of the depth map sequence to carry out gesture classification recognition, and the current more classical algorithms mainly comprise a motion recognition algorithm based on depth motion mapping and a motion recognition algorithm based on a four-dimensional normal vector histogram.

and (3) motion recognition algorithm based on depth map motion mapping: and regarding the depth map sequence as a whole, mapping each frame image in three directions to respectively obtain a front view, an upper view and a left try. And then, respectively solving the difference between adjacent frames in the time domain of each mapping image, and judging whether the pixel point moves or not through the difference value. And counting the motion situation to obtain a motion map in three mapping directions. And further solving a gradient histogram of each mapping image, and combining the gradient histograms solved in the three mapping images to form the global feature of the gesture sequence. And training a classifier by using the characteristics to obtain a gesture recognition result.

The action recognition algorithm based on the four-dimensional normal vector histogram is as follows: the sequence of depth maps is considered to be a surface of a four-dimensional space. A point on the curved surface of each pixel point. And solving the time-space domain gradient of each pixel point, and converting the time-space domain gradient into a normal vector of the curved surface according to a four-dimensional space curved surface equation. And then counting the four-dimensional normal vectors in a mapping mode to obtain a four-dimensional normal vector histogram, wherein the mapping mode is that 120 vertex coordinates of the regular six hundred cell bodies form a mapping matrix, and the four-dimensional normal vectors are projected to the 120 directions to obtain a 120-dimensional histogram. In addition, the method also carries out blocking processing on the depth map sequence, calculates a four-dimensional normal vector histogram in each block, and splices all the histograms to form a feature vector. And further training a classifier by using the feature vector to perform action recognition.

the existing gesture recognition algorithm based on the depth map sequence has two problems of extracted global features due to the fact that the features of gesture motion are not fully analyzed: firstly, because the frames of the gesture depth map sequence are very similar to each other, when global feature extraction is performed, a large amount of time domain redundant information exists in the features, and the time domain redundant information reduces the descriptive performance of the gesture features. Secondly, the current depth map gesture feature extraction algorithm does not consider the situation that the same gesture has great difference in features due to the difference in speed, so that the robustness of the gesture recognition algorithm is reduced.

Aiming at the problem that the descriptive performance of the gesture features is low due to the fact that time domain redundant information is too much in the related art, an effective solution is not provided at present.

Disclosure of Invention

The invention mainly aims to provide a gesture feature recognition method and device to solve the problem of low descriptive performance of gesture features caused by redundant time domain information.

in order to achieve the above object, according to an aspect of the present invention, there is provided a gesture feature recognition method, including: calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result; decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result; extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence; forming key nodes into a key point set of the gesture depth map sequence; and performing gesture recognition according to the key point set to obtain a gesture recognition result.

Further, calculating the similarity between frames of the sequence of gesture depth maps comprises: carrying out blocking processing on the images of the depth map sequence to obtain a plurality of image blocks; calculating the gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; forming a feature vector by taking the direction angles of all image blocks as elements; and calculating the similarity between frames of the gesture depth map sequence according to the feature vectors.

Further, calculating the similarity between frames of the gesture depth map sequence according to the feature vectors comprises: calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; weighting the similar matrix to obtain a weighted similar matrix; and calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

Further, decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the calculation results comprises: establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in a depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by weighted similarity matrixes; dividing the graph model, wherein dividing the graph model comprises: converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as the optimal vector; and clustering the points in the optimal vector.

further, decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the calculation result further comprises: performing binary iterative clustering on the current depth map sequence, wherein the performing binary iterative clustering on the current depth map sequence comprises: and aggregating the optimal vectors of the current sequence into two types to obtain two types of optimal vectors, and determining the condition for stopping iteration according to the relation between the similarity mean value of the current sequence and the similarity mean value of the two types of optimal vectors.

Further, extracting the key node of each of the plurality of subsequences comprises: calculating Euclidean distances between any two frames in each subsequence to obtain a plurality of Euclidean distances; and taking the frame with the minimum Euclidean distance in each subsequence as a key node of the subsequence.

In order to achieve the above object, according to another aspect of the present invention, there is also provided a gesture feature recognition apparatus, including: the calculation unit is used for calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result; the decomposition unit is used for decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result; the extraction unit is used for extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame which meets a preset condition in each subsequence; the combination unit is used for combining the key nodes into a key point set of the gesture depth map sequence; and the recognition unit is used for performing gesture recognition according to the key point set to obtain a gesture recognition result.

further, the calculation unit includes: the decomposition module is used for carrying out block processing on the images of the depth map sequence to obtain a plurality of image blocks; the first calculation module is used for calculating the gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; the combination module is used for forming a feature vector by taking the direction angles of all the image blocks as elements; and the second calculation module is used for calculating the similarity between frames of the gesture depth map sequence according to the feature vector.

Further, the second calculation module includes: the first calculation submodule is used for calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; the weighting submodule is used for weighting the similarity matrix to obtain a weighted similarity matrix; and the second calculation submodule is used for calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

Further, the decomposition unit includes: the establishing module is used for establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in a depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by weighted similar matrixes; the dividing module is used for dividing the graph model, wherein the dividing of the graph model comprises the following steps: converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as the optimal vector; and clustering the points in the optimal vector.

according to the method, the similarity between frames of the gesture depth map sequence is calculated to obtain a calculation result; decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result; extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence; forming key nodes into a key point set of the gesture depth map sequence; and performing gesture recognition according to the key point set to obtain a gesture recognition result, so that the problem of low descriptive performance of gesture characteristics caused by more time domain redundant information is solved, and the effect of improving the descriptive performance of the gesture characteristics is achieved.

drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow diagram of a gesture feature recognition method according to an embodiment of the invention; and

Fig. 2 is a schematic diagram of a gesture feature recognition apparatus according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

in order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides a gesture feature recognition method.

Fig. 1 is a flowchart of a gesture feature recognition method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

Step S101: and calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result.

The gesture depth map sequence comprises a plurality of frames, and because the frames of the gesture depth map sequence are very similar, when the global feature extraction is carried out on the gesture depth map, a large amount of time sequence redundant information exists, so that the similarity between the frames of the gesture depth map sequence is firstly calculated, and the calculation process can be simplified in the subsequent process. Calculating the similarity between frames of the gesture depth map sequence may be calculated by various calculation methods, and optionally, calculating the similarity between frames of the gesture depth map sequence may be calculated by the following method: carrying out blocking processing on the images of the depth map sequence to obtain a plurality of image blocks; calculating the gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; forming a feature vector by taking the direction angles of all image blocks as elements; and calculating the similarity between frames of the gesture depth map sequence according to the feature vectors.

Optionally, calculating the similarity between frames of the gesture depth map sequence according to the feature vector comprises: calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; weighting the similar matrix to obtain a weighted similar matrix; and calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

Step S102: and decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result.

Optionally, decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the calculation result includes: establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in a depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by weighted similarity matrixes; dividing the graph model, wherein dividing the graph model comprises: converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as the optimal vector; and clustering the points in the optimal vector.

Optionally, decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the calculation result further includes: performing binary iterative clustering on the current depth map sequence, wherein the performing binary iterative clustering on the current depth map sequence comprises: and aggregating the optimal vectors of the current sequence into two types to obtain two types of optimal vectors, and determining the condition for stopping iteration according to the relation between the similarity mean value of the current sequence and the similarity mean value of the two types of optimal vectors.

This embodiment may decompose the gesture sequence based on the similarity matrix, with similar frames being divided into the same class and dissimilar matrices being divided into different classes as a result of the decomposition. The gesture sequence is decomposed into a plurality of sub-processes. The gesture difference caused by different speed changes can be overcome.

Step S103: key nodes of each of the plurality of subsequences are extracted.

The key node is a frame meeting a preset condition in each subsequence, and the key node may be an optimal representative frame in each subsequence, and optionally, the key node for extracting each subsequence in the plurality of subsequences may be obtained by: calculating Euclidean distances between any two frames in each subsequence to obtain a plurality of Euclidean distances; and taking the frame with the minimum Euclidean distance in each subsequence as a key node of the subsequence.

Step S104: and forming the key nodes into a key point set of the gesture depth map sequence.

After the key nodes of each subsequence of the plurality of subsequences are extracted, the key nodes of all subsequences are combined into a key point set of the gesture depth map sequence. Since the key node is the most representative frame in each subsequence, the time sequence redundant information is removed from the key node set, and therefore the problem of low descriptive performance of the gesture features can be overcome.

step S105: and performing gesture recognition according to the key point set to obtain a gesture recognition result.

And after the key point set, performing gesture recognition on the key point set formed by the key nodes to obtain a gesture recognition result. The gesture recognition of the key point set composed of key nodes can be performed by various methods, regardless of a specific recognition method.

In the prior art, a large amount of time sequence redundant information exists in the features during global feature extraction, and the time sequence redundant information enables the descriptive property of the gesture features to be low.

in the embodiment, the similarity between frames of the gesture depth map sequence is calculated to obtain a calculation result; decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result; extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence; forming key nodes into a key point set of the gesture depth map sequence; and performing gesture recognition according to the key point set to obtain a gesture recognition result, so that the problem of low descriptive performance of gesture characteristics caused by more time domain redundant information is solved, and the effect of improving the descriptive performance of the gesture characteristics is achieved.

the gesture feature recognition method of the present invention is further described with reference to a specific embodiment:

Firstly, calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result.

in order to quantify the similarity between frames, the embodiment of the invention provides a gesture feature extraction algorithm based on direction, which specifically comprises the following steps:

For a given depth map I, respectively solving the transverse gradients Gx and Gy of each pixel point by using a Sobel operator, as shown in formula (1):

Among them, it is the template of the Sobel algorithm.

further solving the direction angle of the gradient, as shown in formula (2):

Since the gradient direction may depict the trend of the edge of the depth map, but the edge of the depth map is not sharp, the embodiment of the present invention further performs blocking processing on the image, and divides the image into blocks of k × k size. And counting gradient direction angles in each block, and taking the largest direction angle as the angle of the current block. The feature of the final depth map may be represented as a feature vector F ═ { d1, d 2., dn } having the direction angles of all the blocks as elements, which is a gesture feature vector based on direction, and may describe the similarity between frames, where d1 to dn represent the direction angles of the first to nth blocks, respectively.

the similarity between frames is represented by a similarity matrix, wherein each element of the similarity matrix represents the feature similarity between any two frames of depth maps, and is obtained by a gaussian similarity function, as shown in formula (3):

However, the depth map sequence is a sequence ordered in the time domain, so to reduce the similarity of images far apart in the time domain, the invention further performs weighting processing on the similarity matrix, and the weight is obtained by formula (4):

the similarity between the final frame and the frame is denoted as wi, j si, j sti, j.

and secondly, decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result.

In order to solve the problem of decomposition of a gesture sequence, the embodiment of the invention provides a gesture decomposition method based on spectral clustering, which specifically comprises the following steps:

an undirected graph model G ═ (V, E) is first constructed, where V ═ V1, V2. E { s11, s 12.., sij.., snn } represents an adjacency matrix in which each element represents the similarity between any two frame images. The adjacency matrix is thus constituted by the similarity matrix described above. A similarity metric greater than 0 indicates that the two vertices are connected, otherwise the vertices are not connected.

In order to decompose the depth map sequence, the map model needs to be divided. The spectral clustering algorithm can convert the graph partitioning problem into a matrix solving problem, and the specific method comprises the following steps: converting the similarity matrix to a Laplace matrix, where D represents the degree matrix of the graph, as shown in equation (5):

and solving the Laplace matrix to obtain an eigenvalue and an eigenvector. The second smallest feature vector, called the Fiedler vector (i.e., the optimal vector), is an approximation of the potential function, which represents an optimal solution for graph partitioning, which has been experimentally proven, and therefore the Fiedler vector is selected and points in the vector are clustered.

In order to solve the problem that the number of categories is uncertain during clustering, the embodiment of the invention provides a binary iterative clustering algorithm. The specific method comprises the following steps: according to the embodiment of the invention, the condition for stopping iteration is set according to the similarity mean value of the current sequence and the relation of the similarity mean values of the two subclasses of the current sequence. Assuming that when the preamble is listed as a, the mean similarity between frames is dA, and the mean similarity between two subsequences B and C is dB, dC, respectively, the condition for stopping iteration is calculated by formula (6).

And thirdly, extracting key nodes of each subsequence in the plurality of subsequences.

in order to solve the problem of extracting key nodes from the decomposed gesture sequence, the embodiment of the invention calculates the Euclidean distance between any two frames in each subsequence, and selects the frame with the minimum Euclidean distance with all other frames as the key node of the current subsequence. The selection mode is shown as formula (7):

the set of keypoints for the entire gesture sequence is represented as:

And fourthly, forming key nodes into a key point set of the gesture depth map sequence.

and fifthly, extracting gesture features and identifying dynamic gestures based on the key point set to obtain a gesture identification result.

The embodiment of the invention provides a dynamic gesture recognition algorithm based on gesture decomposition. And decomposing the gesture depth map sequence into a plurality of independent subsequences according to the similarity between frames, extracting key nodes in each subsequence, and recombining the key nodes into a new sequence set for gesture recognition. The time domain redundant information of the gesture features is reduced, the robustness of the feature to the gesture speed change difference is improved, and the technical problems that: first, how to quantify the correlation between frames of a gesture depth map sequence; second, how to decompose the gesture sequence according to similarity; third, how to extract key nodes in the decomposed subsequence.

According to the embodiment of the invention, the gesture sequence is decomposed according to the correlation between the depth map sequences, and the key points are extracted from the decomposed subsequences. Compared with other methods, the embodiment of the invention has the following advantages: firstly, the direction-based gesture features can fully describe the correlation between the depth map frames, and the analysis of the correlation is favorable for removing time domain redundant information and improving the descriptive property and the distinguishing degree of the gesture features. Secondly, the gesture sequence decomposition algorithm based on the spectral clustering can decompose the gesture sequence according to the similar matrix, and the decomposition result is that similar frames are divided into the same class, and dissimilar matrices are divided into different classes. The gesture sequence is decomposed into a plurality of sub-processes. The gesture difference caused by different speed changes can be overcome. Thirdly, extracting the most representative image of the current subsequence from each subsequence as the key point of the gesture based on the key point extraction method of the minimum Euclidean distance. The key point set removes time domain redundant information and overcomes the difference of the same gesture caused by speed change.

Compared with other algorithms, the gesture feature recognition method provided by the invention has the advantage that the recognition rate is obviously improved during dynamic gesture recognition of the gesture decomposition of the depth map sequence. The results of comparison are shown in tables 1 and 2.

TABLE 1 comparison of the recognition rates of the method of the present example with the prior art in the MSRGesture3D database

Algorithm	MSRGesture3D database recognition rate (%)
		gesture feature recognition method + HON4D of the embodiment of the invention	90.59
Uniform sampling + HON4D	87.01
		HON4D [ reference 1]	88.25
jiangital [ reference 2]	88.50
		Yangital [ reference 3]	89.20
Klaselette [ reference 4]	85.23

As shown in table 1, in the public data set msrgest 3D database, compared with the gesture feature recognition method of the related document, the recognition rate of the gesture feature recognition method of the embodiment of the present invention is as high as 90.59%, which exceeds the recognition rate of the gesture feature recognition method of the prior art.

TABLE 2 comparison of recognition rates in custom datasets for methods of embodiments of the present invention and prior art

as shown in table 2, in the custom database, compared with the gesture feature recognition method of the related document, the gesture feature recognition method of the embodiment of the present invention has a recognition rate as high as 85.76%, which is higher than that of the gesture feature recognition method of the prior art.

the relevant documents in the above table are as follows:

Document 1: suk H I, Sin B K, Lee S W.hand Gesture registration Based on Dynamic Bayesian Network framework, pattern registration, 2010,43(9):3059-3072.

document 2: von willebrand-based gesture recognition research: [ doctor academic thesis ]. Beijing: beijing university of Physician 2015.

document 3: wenjun T, Chengdong W, Shuying Z et al, dynamic Hand Motion Recognition Using Motion Trajectories and Key Frames. in: Proceedings of the 2nd IEEE International Conference On Advanced Computer Control (ICACC),2010,3: 163-.

Document 4: oreifej O, Liu Z.HON4D: Histogram of organized 4D Normals for Activity Recognition from Depth sequences. in: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition,2013: 716-.

Document 5: yang X, Zhang C, Tian Y L.recording Actions Using Depth Motion Maps-Based histories of organized gradients. in: Proceedings of the 20th ACM International Conference On Multimedia,2012, 1057-.

Document 6: tang S, Wang X, Lv X et al. Histogram of Oriented Normal Vectors for Object Recognition with a Depth sensor. in: Proceedings of Asian Conference on Computer Vision. Springer Berlin Heidelberg,2012, 525-.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the invention provides a gesture feature recognition device, which can be used for executing the gesture feature recognition method of the embodiment of the invention.

Fig. 2 is a schematic diagram of a gesture feature recognition apparatus according to an embodiment of the present invention, as shown in fig. 2, the apparatus includes:

And the calculating unit 10 is used for calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result.

and the decomposition unit 20 is used for decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result.

The extracting unit 30 is configured to extract a key node of each of the multiple subsequences, where the key node is a frame in each subsequence that meets a preset condition.

And the combining unit 40 is used for combining the key nodes into a key point set of the gesture depth map sequence.

And the recognition unit 50 is configured to perform gesture recognition according to the key point set to obtain a gesture recognition result.

Optionally, the calculation unit 10 comprises: the decomposition module is used for carrying out block processing on the images of the depth map sequence to obtain a plurality of image blocks; the first calculation module is used for calculating the gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; the combination module is used for forming a feature vector by taking the direction angles of all the image blocks as elements; and the second calculation module is used for calculating the similarity between frames of the gesture depth map sequence according to the feature vector.

Optionally, the second calculation module comprises: the first calculation submodule is used for calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; the weighting submodule is used for weighting the similarity matrix to obtain a weighted similarity matrix; and the second calculation submodule is used for calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

optionally, the decomposition unit 20 comprises: the establishing module is used for establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in a depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by weighted similar matrixes; the dividing module is used for dividing the graph model, wherein the dividing of the graph model comprises the following steps: converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as the optimal vector; and clustering the points in the optimal vector.

In the embodiment, the calculating unit 10 is adopted to calculate the similarity between frames of the gesture depth map sequence to obtain a calculation result; the decomposition unit 20 decomposes the gesture depth map sequence into a plurality of subsequences according to the calculation result; the extracting unit 30 extracts a key node of each of the plurality of subsequences; the combination unit 40 combines the key nodes into a key point set of the gesture depth map sequence; and the recognition unit 50 performs gesture recognition according to the key point set to obtain a gesture recognition result, so that the problem of low descriptive performance of gesture features caused by more time domain redundant information is solved, and the effect of improving the descriptive performance of the gesture features is achieved.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A gesture feature recognition method, comprising:

Calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result;

Decomposing the gesture depth map sequence into a plurality of sub-sequences according to the calculation result;

Extracting a key node of each subsequence of the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence;

Forming the key nodes into a key point set of the gesture depth map sequence; and

Performing gesture recognition according to the key point set to obtain a gesture recognition result;

Wherein calculating the similarity between frames of the sequence of gesture depth maps comprises: carrying out blocking processing on the images of the depth map sequence to obtain a plurality of image blocks; calculating gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; forming a feature vector by taking the direction angles of all image blocks as elements; calculating the similarity between frames of the gesture depth map sequence according to the feature vector;

Wherein computing a similarity from frame to frame of the sequence of gesture depth maps from the feature vectors comprises: calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; weighting the similar matrix to obtain a weighted similar matrix; and calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

2. the method of claim 1, wherein decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the computation comprises:

Establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in the depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by the weighted similarity matrix;

Dividing the graph model, wherein dividing the graph model comprises: converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as an optimal vector; and clustering the points in the optimal vector.

3. the method of claim 2, wherein decomposing the sequence of gesture depth maps into a plurality of sub-sequences according to the computation further comprises:

Performing binary iterative clustering on the current depth map sequence, wherein the performing binary iterative clustering on the current depth map sequence comprises: and aggregating the optimal vectors of the current sequence into two types to obtain two types of optimal vectors, and determining the condition of iteration stop according to the relation between the similarity mean value of the current sequence and the similarity mean value of the two types of optimal vectors.

4. The method of claim 1, wherein extracting key nodes of each of the plurality of subsequences comprises:

calculating Euclidean distances between any two frames in each subsequence to obtain a plurality of Euclidean distances;

And taking the frame with the minimum Euclidean distance in each subsequence as a key node of the subsequence.

5. A gesture feature recognition apparatus, comprising:

The calculation unit is used for calculating the similarity between frames of the gesture depth map sequence to obtain a calculation result;

The decomposition unit is used for decomposing the gesture depth map sequence into a plurality of subsequences according to the calculation result;

The extracting unit is used for extracting a key node of each subsequence in the plurality of subsequences, wherein the key node is a frame meeting a preset condition in each subsequence;

The combination unit is used for combining the key nodes into a key point set of the gesture depth map sequence; and

the recognition unit is used for carrying out gesture recognition according to the key point set to obtain a gesture recognition result;

Wherein the calculation unit includes: the decomposition module is used for carrying out block processing on the images of the depth map sequence to obtain a plurality of image blocks; the first calculation module is used for calculating the gradient direction angles of all pixel points in each image block, and taking the maximum value of the gradient direction angles in each image block as the direction angle of each image block; the combination module is used for forming a feature vector by taking the direction angles of all the image blocks as elements; the second calculation module is used for calculating the similarity between frames of the gesture depth map sequence according to the feature vector;

Wherein the second computing module comprises: the first calculation submodule is used for calculating a similarity matrix formed by the characteristic vectors through a Gaussian similarity function; the weighting submodule is used for weighting the similarity matrix to obtain a weighted similarity matrix; and the second calculation submodule is used for calculating the similarity between frames of the gesture depth map sequence according to the weighted similarity matrix.

6. the apparatus of claim 5, wherein the decomposition unit comprises:

the establishing module is used for establishing a undirected graph model according to a vertex set and an adjacency matrix, wherein the vertex set is a set of all depth maps in the depth map sequence, elements in the adjacency matrix represent the similarity between any two frames of images, and the adjacency matrix is formed by the weighted similarity matrix;

A partitioning module, configured to partition the graph model, where partitioning the graph model includes:

Converting the weighted similarity matrix into a Laplace matrix, and solving the Laplace matrix to obtain an eigenvalue and an eigenvector; taking the second smallest feature vector in the feature vectors as an optimal vector; and clustering the points in the optimal vector.