CN110597446A

CN110597446A - Gesture recognition method and electronic equipment

Info

Publication number: CN110597446A
Application number: CN201910464528.7A
Authority: CN
Inventors: 蔡志博; 冯勇强
Original assignee: Beijing Xiaoniao Tingting Technology Co Ltd
Current assignee: Beijing Xiaoniao Tingting Technology Co Ltd
Priority date: 2018-06-13
Filing date: 2019-05-30
Publication date: 2019-12-20

Abstract

The present invention relates to gesture recognition technologies, and in particular, to a gesture recognition method and an electronic device. The gesture recognition method comprises the following steps: acquiring detection data from each preset partition of the touch pad, wherein the detection data corresponds to a user gesture and has a plurality of frames, and each frame of detection data corresponds to data respectively detected by each preset partition of the touch pad at a certain moment; acquiring feature data based on the detection data, inputting the feature data into a gesture recognition model, and matching an optimal preset gesture; and executing the operation corresponding to the optimal preset gesture. The gesture recognition scheme provided by the embodiment of the invention can be used for recognizing various touch operations of a user.

Description

Gesture recognition method and electronic equipment

Technical Field

The present invention relates to gesture recognition technologies, and in particular, to a gesture recognition method and an electronic device.

Background

The electronic devices with touch panels currently available in the market can generally recognize simple gestures of a user, for example, recognize that the user performs a single-point touch operation or a sliding touch operation; however, the more complex gestures cannot be recognized, and the gestures with higher similarity cannot be distinguished, so a new gesture recognition method is needed to better recognize various touch operations of the user.

Disclosure of Invention

The invention aims to provide a gesture recognition scheme to better recognize various touch operations of a user.

According to a first aspect of the present invention, there is provided a gesture recognition method, comprising the steps of:

acquiring detection data from each preset partition of a touch pad, wherein the detection data corresponds to a user gesture and has a plurality of frames, and each frame of detection data corresponds to data respectively detected by each preset partition of the touch pad at a certain moment;

acquiring feature data based on the detection data, inputting the feature data into a gesture recognition model, and matching an optimal preset gesture; and the number of the first and second groups,

and executing the operation corresponding to the optimal preset gesture.

Optionally, before inputting the feature data to the gesture recognition model, the method further includes a step of intercepting the feature data:

the intercepting the feature data comprises: reserving characteristic data of a P frame before the touch event begins and a Q frame after the touch event ends during the touch event; wherein P and Q are positive integers.

Optionally, the feature data is acquired based on the detection data, and the method includes any one or a combination of the following processing modes: median filtering processing, zeroing and denoising processing, maximum value normalization processing and dynamic range control processing.

Optionally, the method further comprises an identification result verification step:

the identification result verifying step includes: determining whether the feature data is misrecognized as the optimal preset gesture.

Optionally, the preset gesture includes: clicking any central partition, double clicking any central partition, triple clicking any central partition, clicking any peripheral partition, sliding clockwise along the peripheral partition, sliding anticlockwise along the peripheral partition, simultaneously touching a plurality of partitions for a single time, simultaneously touching a plurality of partitions for two times, simultaneously touching a plurality of partitions for three times, touching a full partition, sliding along a peripheral partition-a middle partition-another peripheral partition, long-pressing any middle partition, long-pressing any peripheral partition, and simultaneously touching a plurality of partitions for a long time.

Optionally, the gesture recognition model is implemented using a neural network model.

Optionally, the gesture recognition model is obtained by using a stochastic gradient descent algorithm for iterative training.

Optionally, the neural network model is a recurrent neural network model.

Optionally, the neural network model is a long-term and short-term memory network model.

Optionally, the gesture recognition model is implemented using a hidden markov-gaussian mixture model (HMM-GMM).

Optionally, the matching of the optimal preset gesture includes:

calculating the probability of each frame of feature data corresponding to each hidden Markov (HMM) state according to the Gaussian mixture model parameters to obtain the HMM state of each frame of feature data;

and inputting the HMM state corresponding to each frame of feature data into a Finite State Transducer (FST) decoding network, and searching for an optimal HMM state sequence path according to the FST decoding network and a Viterbi algorithm to determine an optimal preset gesture.

Optionally, the gesture recognition model is obtained by training using sample feature data, and includes the following steps:

inputting each frame of sample characteristic data into a hidden Markov-Gaussian mixed model training tool;

constructing an FST decoding network according to the correct gesture marks and the state of the minimum recognition unit;

aligning the correct gesture marks with each frame of characteristics to obtain an aligned HMM state sequence; and updating model parameters based on the existing correct gesture recognition marks as supervision.

Optionally, for any preset gesture, training the complete sample feature data of the preset gesture as a minimum independent training unit; the complete sample characteristic data of the preset gesture comprises sample characteristic data of each preset partition; alternatively, the first and second electrodes may be,

and for any preset gesture, training the sample characteristic data of each preset partition as a minimum independent training unit respectively.

Optionally, the gesture recognition model is implemented using a hidden markov-deep neural network (HMM-DNN) model.

Optionally, the matching of the optimal preset gesture includes:

calculating the probability of each frame of feature data corresponding to each HMM state according to the deep neural network model parameters to obtain the HMM state of each frame of feature data;

and inputting the HMM state corresponding to each frame of feature data into the FST decoding network, and searching for an optimal HMM state sequence path according to the FST decoding network and a Viterbi algorithm to determine an optimal preset gesture.

inputting each frame of sample characteristic data into a hidden Markov-deep neural network model training tool;

Optionally, the gesture recognition model is implemented using dynamic time warping method modeling.

Optionally, inputting the feature data into a gesture recognition model, and matching an optimal preset gesture, including:

appointing at least one frame of sample characteristic data as template data of the preset gesture for each preset gesture, wherein the template data is used for forming the gesture recognition model;

calculating the difference degree between the feature data and each template data in the gesture recognition model by using a dynamic time warping method;

and matching the optimal preset gesture according to the difference degree.

Optionally, the sampling time interval between two adjacent frames of the detection data is 20-40 ms.

Optionally, the method further comprises:

acquiring characteristic data of a user-defined gesture of a user;

sending the characteristic data of the user-defined gesture to a server;

receiving an updated model of the gesture recognition model issued by the server, wherein the updated model is obtained by retraining the server by using the characteristic data of the user-defined gesture;

and replacing the original gesture recognition model with the updated model.

According to a second aspect of the present invention, there is provided an electronic device having a touch pad, a memory for storing computer instructions, and a processor, wherein the computer instructions, when executed by the processor, implement the method according to any one of the first aspect of the present invention.

Optionally, the electronic device is a sound box.

The gesture recognition scheme provided by the embodiment of the invention can be used for recognizing various touch operations of a user.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. For a person skilled in the art, it is possible to derive other relevant figures from these figures without inventive effort.

FIG. 1 is a flow chart illustrating a recognition process of a gesture recognition model according to an embodiment of the present invention;

FIG. 2 is a schematic view of a partition of a touch pad according to an embodiment of the present invention;

FIG. 3 illustrates a schematic diagram of DRC curves provided by an embodiment of the present invention;

fig. 4 shows a block diagram of an electronic device provided by an embodiment of the invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< gesture recognition method >

The gesture recognition model used in the gesture recognition method provided by the embodiment of the invention can be realized by using a neural network model algorithm or by using a hidden markov-gaussian mixture model (HMM-GMM) algorithm.

The gesture recognition model implemented by using the hidden markov-gaussian mixture model algorithm is obtained by the following learning and training steps 100-300.

100. Raw detection data (hereinafter also referred to as raw data) of each preset partition of the touch pad is acquired.

Taking a sound box with a capacitive touch screen as an example, referring to fig. 2, a circular touch screen is divided into 15 partitions (partition 0 to partition 14), including 3 central partitions (partitions 12 to 14) located at the center of the circular touch screen and 12 peripheral partitions (partitions 0 to 11) located at the periphery of the central partitions. The circular touch screen can display the subareas in a distinguishable mode so as to facilitate the touch operation of a user. Or, the circular touch screen only displays 0, 2, 4, 6, 8 and 10 partitions.

The preset gesture comprises: clicking any central partition, double clicking any central partition, triple clicking any central partition, clicking any peripheral partition, sliding clockwise along the peripheral partition, sliding anticlockwise along the peripheral partition, simultaneously touching a plurality of partitions for a single time, simultaneously touching a plurality of partitions for two times, simultaneously touching a plurality of partitions for three times, touching a full partition, sliding along a peripheral partition-a middle partition-another peripheral partition, long-pressing any middle partition, long-pressing any peripheral partition, and simultaneously touching a plurality of partitions for a long time. In another embodiment, for the touch operation "slide along a peripheral partition-a middle partition-another peripheral partition", the touch operation may be further classified into a plurality of preset gestures, for example, the direction of the slide of the touch operation is defined as a transverse direction, a longitudinal direction, or a slant direction, which corresponds to different preset gestures, and even the plurality of preset gestures may be further subdivided and defined according to the slant angle.

As will be readily appreciated by those skilled in the art, for a capacitive touch screen, when a user touches the capacitive touch screen, a change in the capacitance of the corresponding capacitor is caused, and the changed capacitance is converted into a value proportional thereto, referred to herein as raw data.

Of course, in other embodiments, the touch pad may also be a resistive touch screen, and the raw data of each partition of the resistive touch screen is acquired.

Regarding the setting of the sampling rate, the higher the sampling rate, the more information is obtained, which is beneficial to the training of the model, but at the same time, the burden of decoding is increased, and the real-time performance of decoding is affected. The sampling rate used in this embodiment corresponds to a sampling time interval (i.e., a sampling period) of two adjacent frames of detection data, which is 20 to 50 milliseconds (ms), and preferably, may be 20ms, i.e., the sampling rate is 50 Hz. Practice proves that the sampling rate can meet the requirements on the decoding speed and the sampling precision at the same time, and a satisfactory effect is obtained.

It should be noted that the sampling rate in this embodiment is always fixed, so as to avoid the unexpected result caused by the mismatch between the sampling rates during training and recognition.

200. And preprocessing the original detection data to obtain characteristic data.

The primary sampled original data of each preset partition of the touch pad form a frame of data, that is, the frame length is a sampling period, and the detection data of 15 partitions at the same sampling time form a matrix with 15 rows and 15 columns as a frame of original data. For example, when a finger touches the middle partition, the raw data of two adjacent frames are: {14639,13692,13692,13995,14093,14566,13908,14111,14154,14764,13950,14252,14317,12927,13904},{14641,13692,13693,13996,14094,14566,13907,14115,14153,14765,13973,14295,14465,12964,13916}.

The characteristic data obtained after pretreatment are as follows: {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,0,0,0,33.1,147.9,38.5,0}.

The specific pretreatment comprises the following parts: the method comprises the following steps of performing one or more of median filtering processing, signal estimation, zero-returning and background noise removing processing, amplitude limiting processing, normalization processing and DRC (Dynamic Range Control) processing on raw data.

The median filtering processing is carried out on the original data, namely, the original data which are continuously processed for several times before and after the same partition are subjected to smoothing processing, so that data burrs are removed, and noise interference is reduced.

In the embodiment of the present invention, each partition has a difference value, i.e., a signal amplitude, which represents a magnitude of capacitance change of the corresponding partition caused by touch. The signal amplitude can be determined by using a tracking algorithm of the original value of the reference capacitor provided by a capacitor plate manufacturer, or by setting a specific tracking algorithm of the original value of the reference capacitor according to actual requirements.

And carrying out zero-returning and background-noise-removing processing on the estimated small-amplitude signal, namely, assigning the signal data smaller than a lower threshold (for example, 30) to be zero.

The estimated large-amplitude data is subjected to a clipping process, that is, signal data larger than an upper threshold (e.g., 600) is assigned to the upper threshold.

The signal may also be multiplied by a scaling factor to bring the values mostly between 0 and 1, i.e. a normalization process.

The DRC processing is performed on the signal, and referring to fig. 3, a DRC curve used in the embodiment of the present invention is shown, where the abscissa is the original signal value and the ordinate is the signal value output after the DRC processing. Referring to fig. 3, when the original signal amplitude is small, for example, less than 30, it can be regarded as noise, and belongs to a useless signal, and the gain is not changed; amplifying the original signal when the original signal is small in amplitude, for example, between 30-150 a; when the amplitude of the original signal is large, for example, when the amplitude of the original signal is between 200 and 600, reducing the amplitude of the original signal; the distribution of signal amplitudes is transformed from relatively sparse to relatively concentrated by DRC nonlinear gain processing.

The touch force and touch habit of different users are different, detection data can be greatly different when the same gesture is executed, through the preprocessing step, the recognition effect can be improved, overfitting of the model due to limitation of training data is avoided, and the generalization capability of the model is improved. Judging the triggering state and the starting and ending time of the touch event and intercepting multi-frame feature data:

in one example, the characteristic data is compared with a preset threshold value, so as to determine whether the corresponding partition is triggered. In an embodiment of the invention, the value of the feature data obtained by preprocessing the original data is between 0 and 300, if the value of the feature data of a certain partition is greater than 90, the partition is considered to be triggered, and if any partition is triggered, the touch pad is in a triggering state; and if the values of the characteristic data of all the partitions are less than 90, namely all the partitions are not triggered, the touch pad is in a non-triggering state.

In another example, the signal amplitude (the difference between the original value of the capacitance corresponding to each partition of the touch screen and the original value of the reference capacitance) may be directly compared with a preset threshold to determine whether the corresponding partition is triggered. If any partition is triggered, the touch pad is in a trigger state; if all the partitions are not triggered, the touch pad is in a non-triggered state.

The flags for the touch event initiation are: after the last touch event is finished, the touch pad changes from the non-trigger state to the trigger state. The flags for the touch event end are: and when the touch pad is changed from the trigger state to the non-trigger state and the non-trigger state is maintained for a preset time, indicating that the touch event is ended.

The touch pad outputs a plurality of frames of feature data during a time period from the start of the touch event to the end of the touch event. In addition, the characteristic data of the P frame before the touch event starts and the Q frame after the touch event ends also contain some useful information. P and Q are preset positive integers, and appropriate values can be selected according to specific application scenarios. In one embodiment of the invention, P and Q are 6 and 4, respectively. Taking the multi-frame continuous characteristic data intercepted in the time section as the characteristic data and the corresponding gesture marking data as training data when in a model training stage; when identified in real time, then as input feature data for the model.

300. Inputting all the preprocessed feature data of each frame into a hidden Markov-Gaussian mixture model (HMM-GMM) training tool for training, wherein the specific training process is shown as 301-304. The feature data used in the training process are sample feature data, and correct gesture labels corresponding to the sample feature data are known.

The training process uses the Viterbi (Viterbi) algorithm, and the Viterbi (Viterbi) path is used in the training instead of accumulating all state paths, so that the calculation amount can be greatly reduced. In the training using the Viterbi (Viterbi) algorithm, feature data is first aligned forcibly according to model parameters to obtain HMM states corresponding to features of each frame, and a state sequence corresponding to the feature sequence is formed. According to the characteristic sequence and the corresponding state sequence, the HMM parameter, namely the transition probability matrix, can be updated through simple mathematical statistics.

Each HMM state corresponds to a Gaussian mixture model probability density function (Gaussian mixture model probability distribution function). After the feature sequence and the alignment sequence are known, all observations corresponding to each HMM state are found, and a gaussian probability density function corresponding to each HMM state is obtained.

In one embodiment of the present invention, each partition is taken as a minimum recognition unit, and the number of states of the minimum recognition unit is set to 3. In another embodiment of the invention, a preset gesture is used as a minimum recognition unit, the state number of the minimum recognition unit is set according to the complexity of the gesture, and the more the gesture is complex, the larger the state number is; for example, the gesture of "sliding along a peripheral zone-a middle zone-another peripheral zone" corresponds to a larger number of states, and the gesture of "clicking any one of the central zones" corresponds to a smaller number of states.

It should be noted that, in other embodiments, the gesture recognition model may also be implemented by a hidden markov Deep Neural Network (DNN) model. The HMM-GMM model and the HMM-DNN model are based on an HMM model to form a state sequence, and the HMM-GMM model is used for calculating the observation probability of each frame of feature data observed in a certain state through the GMM model, and the HMM-DNN model is used for calculating the observation probability of each frame of feature data observed in the certain state through the DNN model. The steps of both methods are substantially the same, and the following mainly describes the recognition step of the HMM-GMM model. It should be appreciated that the following steps 301-405 may also be used to explain the recognition step of the HMM-DNN model.

301. The hidden Markov-Gaussian mixture model is initialized, including steps 3011 and 3013.

3011. The state number of the HMM state corresponding to each minimum recognition unit is set. In one embodiment of the present invention, there are 15 partitions in total, each partition is used as a minimum recognition unit, and the number of states of the minimum recognition unit is set to 3, that is, each minimum recognition unit corresponds to 3 HMM states.

3012. For each HMM state, a gaussian probability density function with only one component is created.

3013. And creating an initial transition probability matrix according to the state number of the minimum recognition unit, and writing the initial transition probability matrix and a Gaussian probability density function corresponding to the HMM into an initial training model.

302. And constructing an FST (Finite State Transducer) decoding network according to the existing correct gesture labels and the State of the minimum recognition unit for use in gesture recognition. The trained network formed by connecting one or more HMM models according to the constraints of context, grammar and the like is further expanded to a state layer in each model to form a network which can be transferred or jumped between states. In addition, each state has the observation probability of outputting each frame of feature data, namely an FST network is formed.

303. And carrying out first alignment on the correct gesture marks and each frame of characteristics to obtain an aligned HMM state sequence. In one embodiment of the invention, the simplest uniform alignment may be employed. And updating model parameters based on the existing correct gesture recognition labels as supervision, wherein the model parameters comprise updating transition probability matrixes and GMM (Gaussian mixture model) parameters corresponding to each state.

304. Entering a main loop of the training model: and aligning the correct gesture marks with the features of each frame in the specified number of alignment rounds, and updating the HMM state sequence. The statistics used for the model parameters are calculated in each round, and then the model parameters are updated, including updating the transition probability matrix and the GMM parameters corresponding to each state.

The trained gesture recognition model comprises the following information: the number of states of the minimum identification unit, a transition probability matrix and a GMM parameter corresponding to each state. Wherein the GMM parameters comprise a Gaussian probability density function.

Referring to fig. 1, a process of performing gesture recognition by using a trained gesture recognition model according to an embodiment of the present invention is described:

401. and carrying out the same pretreatment on the original data to obtain characteristic data. After the touch event is judged to start, the characteristic data of one frame is preprocessed in the same way as the training process from the P frame before the touch event starts.

402. Starting from a P frame before the touch event starts, feature data are input into the trained gesture recognition model frame by frame.

403. And calculating the observation probability or likelihood (state adherence likelihood) of each frame of feature data in each HMM state according to the GMM parameters.

404. And inputting the HMM state corresponding to each frame of detection data into the FST decoding network, and searching for an optimal HMM state sequence path according to the FST decoding network according to a Viterbi (Viterbi) algorithm, namely determining an optimal gesture. For some observed gesture feature data, a state sequence which can generate the feature data with the maximum probability is found from the network, and the gesture corresponding to the HMM model to which the state sequence belongs is the recognition result. This finding of the optimal sequence from the network of state spaces is generally called decoding, and the nature of finding the optimal sequence or path is a dynamic programming problem, and there are many methods of searching, and this example describes the decoding process implemented by the commonly used Viterbi (Viterbi) algorithm.

First frame feature data X for gestures₁Calculating from an initial state S₀Jump to current S₁Transition probability P (S) of state (S)₁|S₀) With the observation probability P (X) of the output characteristic data in these states₁|S₁) Product of (V)₁. Inputting second frame feature data X₂The last state S is also calculated₁Jump to current S₂Transition probability P (S) of state (S)₂|S₁) And observation probability P (X)₂|S₂) Is multiplied by V₁To obtain V₂For each current state, only one backtracking path node with the maximum cumulative probability is reserved, namely a plurality of states S₁One of them. And repeating the steps until the data of the last frame is input, and calculating to obtain the cumulative probability V of each path_nAnd selecting the maximum accumulated probability, and obtaining the optimal path by backtracking each node. The gesture corresponding to the optimal path, i.e. the HMM model to which the state sequence belongs, is the recognition result.

The above description of the way to find the best sequence based on observation probability can also be used to explain the way to find the best sequence based on likelihood, both of which are similar.

405. And responding according to the determined optimal preset gesture.

For example, if the preset operation corresponding to the gesture of "sliding clockwise along the peripheral partition" is to increase the volume, when the determined optimal gesture is "sliding clockwise along the peripheral partition", the volume of the electronic device is increased.

In one embodiment, before the response, a step of checking the recognition result may be further included. In this step, it is determined whether the feature data is misrecognized as the optimal preset gesture. The specific operation comprises the following steps: the gesture recognition model inputs the feature data into a correct recognition/false recognition feature data binary classifier (e.g., SVM classifier) of the optimal preset gesture. Determining whether the feature data belongs to the misrecognized feature data category of the optimal preset gesture, and if the feature data belongs to the misrecognized feature data category of the optimal preset gesture, giving up the response; if the feature data belongs to the category of the correctly recognized feature data of the optimal preset gesture, step 405 is executed to answer according to the optimal preset gesture.

In one embodiment, the preset gesture comprises an invalid gesture, and the HMM state sequence path comprises an invalid gesture state sequence path. That is, various invalid operations are added to the preset gesture as invalid gestures, and when the determined preset gesture is the "invalid gesture", no response is made.

The training process can be realized by using Matlab Python, or directly using open source tools such as HTK, kaldi and the like.

In another embodiment, the feature data may also be input into a neural network training model for neural network training, so as to obtain a gesture recognition model based on a neural network model algorithm. And performing gesture recognition by using a gesture recognition model based on a neural network model algorithm. The gesture recognition model of the embodiment of the invention is not limited to the hidden markov-gaussian mixture model or the neural network model, and other suitable model types can be adopted.

The training of the neural network training model comprises the following steps:

s1, the preprocessing part is consistent with the hidden Markov-Gaussian mixture model. Inputting the preprocessed detection data serving as feature data into a neural network, and performing frame-level random initialization on the input feature data.

And S2, initializing the neural network model. During initialization, the neural network model only comprises one hidden layer, and the connection weight and the activation threshold value of each neuron to the next layer are initialized. Then, the hidden layer is gradually added in the training, and in the embodiment of the invention, 2-4 hidden layers can be added, and each layer is provided with 128 neurons.

And S3, training a neural network. Iterative training is carried out based on a random gradient descent algorithm, a cross validation mode is used in the training process to prevent overfitting, supervised training is carried out by using a directory with two aligned features, and an objective function of training and validation data is calculated in each iteration. The objective function reflects how many parameter changes per layer and how many changes in the training data objective function contribute to the changes per layer.

S4, sequence-discriminant training, which enables the neural network model to correctly classify the entire gesture, uses a lattice (lattice) framework to represent possible hypotheses, maximizes the accuracy of state labels, and trains each gesture in an updated manner with stochastic gradient descent.

The trained gesture recognition model comprises the following information: the number of layers of the neural network, the number of neurons in each layer, the connection weight between the neurons, and the activation threshold.

After the gesture recognition model is trained, the feature data is input into the gesture recognition model, and an optimal preset gesture is recognized and responded (i.e., an operation corresponding to the optimal preset gesture is executed). In the recognition process, the same data preprocessing steps as the training process are used for the feature data.

Specifically, the Neural network model herein may be a Recurrent Neural Network (RNN) or a Long-Short-Term Memory network (LSTM). The two neural networks have memorability and can well process data of sequence class.

In other embodiments, gesture recognition may be implemented using a Dynamic Time Warping (DTW) method in addition to the HMM-GMM model or the neural network model described above. DTW is a method of calculating the degree of difference (or degree of similarity) between two time series: given two time series, one reference series as a template and the other observation series for comparison, they may be the same or different in duration. Under certain constraint conditions, the DTW may select the minimum cumulative distance from multiple regular alignment modes by regularly aligning each sample point of the two sequences in the time dimension and calculating the distance between each pair of sample points, as a measure of the difference between the observed sequence and the reference sequence. In one example, an implementation with DTW as the modeling method includes a training phase and a recognition phase.

In the training phase of the DTW method in this embodiment, a template of each gesture is generated based on data samples of the gesture. The feature data sample may be sample data used in step 300, and for each preset gesture, the most typical sample may be selected from the sample as a template. One way to achieve this is to traverse the distance between each sample and all other samples in the sample set of a given gesture, take the average of the distances to represent the typicality of the sample, and finally select the sample with the smallest average of the distances as the template for the gesture.

In the recognition stage of the DTW method in this embodiment, the difference degree between the feature data and each feature template is calculated, and the gesture corresponding to the feature template with the minimum difference degree of the feature data is the recognized gesture. The identification process is as follows:

given characteristic data X to be recognized_1,2,…,mWherein m represents m frames of the feature data, m is a positive integer, and X_i＝{x_i1,x_i2…x_i15Is the value of the 15-segment feature data of the ith frame, and the feature template of the g-th gestureWherein G is 1,2, … G, G preset gestures, n_gRepresents the g-th gesture template frame number,is the value of the 15 partition characteristic data of the j-th frame. Calculating X_iAndthe distance between them, for example, the Euclidean distance (Euclidean distance),where 1. ltoreq. k.ltoreq.15, w denotes regular path node coordinates, w ═ i, j, 0. ltoreq. i.ltoreq.m, 0. ltoreq. j.ltoreq.n_g. I is more than or equal to 0 and less than or equal to m, j is more than or equal to 0 and less than or equal to n_gMeanwhile, (i, j) finds a regular path W ═ W { W } under the conditions of continuity, monotonicity and the like₁,w₂…w_p}＝{(1,1),…(i,j),…(m,n_g) H, where the path length is P, so that the sum of distances of nodes on the path is the minimum, i.e. the g-th gesture D_g＝min∑_wd_wThis value represents the degree of difference between the feature data and the g-th template. Calculating the difference degree between the characteristic data and all templates according to the method, wherein r is argmin_g(D_g) The gesture r, i.e. the gesture corresponding to the template with the minimum difference degree, is the recognition result.

In one embodiment, before the response, a step of checking the recognition result may be further included. In this step, it is determined whether the feature data is misrecognized as the optimal preset gesture. The specific operation comprises the following steps: the gesture recognition model inputs the feature data into a correct recognition/false recognition feature data binary classifier (e.g., SVM classifier) of the optimal preset gesture. Determining whether the characteristic data belongs to the misrecognized characteristic data category of the optimal preset gesture, and if the characteristic data belongs to the misrecognized characteristic data category of the optimal preset gesture, giving up a response; and if the characteristic data belong to the correct identification characteristic data category of the optimal preset gesture, responding according to the optimal preset gesture.

In one embodiment, the gesture recognition scheme has adaptive learning capabilities that allow the user to incorporate custom gestures during use. The self-adaptive learning capability can be realized by implanting a learning module in the user-side electronic equipment, or the server can automatically update the model according to the collected data and send the model to the user-side electronic equipment.

In one example, the adaptive learning process is implemented by:

firstly, feature data of a user-defined gesture of a user is obtained. The custom gesture here is a gesture other than the preset gesture input by the user.

Secondly, the electronic equipment at the user side uploads the characteristic data of the user-defined gesture to the server. The uploading may be periodically uploading, or uploading after a certain condition is met, for example, uploading after a certain number of custom gestures is reached.

And thirdly, the server marks the received user-defined gesture again, and adds the marked new data into the original training process to retrain the model to obtain an updated model of the gesture recognition model.

And finally, the server issues the updated model of the gesture recognition model to the user-side electronic equipment, and the user-side electronic equipment receives the updated model and replaces the original gesture recognition model with the updated recognition model.

According to the gesture recognition scheme provided by the embodiment of the invention, the operation of the user on the touch pad is recognized by using the trained gesture recognition model, and the gesture recognition scheme can be used for recognizing various gesture operations of the user.

The gesture recognition scheme provided by the embodiment of the invention has the function of verifying the recognition result and improves the accuracy of gesture recognition.

The gesture recognition scheme provided by the embodiment of the invention obtains the gesture recognition model based on the big data statistical basis training of different users, is suitable for the operation habits of different users, and solves the problem of operation difference of different users. The gesture recognition scheme provided by the embodiment of the invention can be used for learning and training the model according to the result of big data statistics for the preset gestures of the sliding type, and is suitable for the sliding angle range, the sliding speed, the number of fingers and the like of different users so as to adapt to the differential operation of different users.

According to the gesture recognition scheme provided by the embodiment of the invention, definable preset gestures can be diversified, the number of the gestures can exceed the number of the partitions, two or more similar gestures can be defined, and a gesture recognition model is obtained in a data modeling machine learning mode, so that the gesture recognition scheme has self-adaptive learning capability, and even if the two preset gestures are similar, the two preset gestures can be recognized and distinguished by the gesture model.

The gesture recognition scheme provided by the embodiment of the invention has self-adaptive learning capability, sample data used in the training process can come from touch pads of different electronic equipment and operations of different users, the difference of different hardware, the difference of finger conditions of different users and the difference of operation habits can be compensated, and the solution has universality.

The gesture recognition scheme provided by the embodiment of the invention can be used for updating the gesture recognition model algorithm by the server, and the subsequent upgrading maintenance optimization space is larger. The server can upgrade the gesture recognition model algorithm to add a new preset gesture, and push the upgraded gesture recognition model algorithm to the user-side electronic device with the touch pad.

According to the gesture recognition scheme provided by the embodiment of the invention, the electronic equipment at the user side can upload the collected characteristic data and the recognition result to the server, the server iteratively learns and updates the gesture recognition model algorithm, and the updated gesture recognition model algorithm is pushed to the user, so that the use experience of the user is optimized.

The gesture recognition scheme provided by the embodiment of the invention can be added with self-adaptive training, provides user-defined gestures for training, and provides more personalized selection. For example: a user executes a certain custom gesture on the touch pad, marks a corresponding gesture mark, and the gesture recognition model acquires feature data of the custom gesture and the corresponding gesture mark to train, learn and increase gestures. On the premise of user-defined gestures, different users can be identified according to the user-defined gestures, and the function effect of identifying the identity of a specific user is realized.

< electronic apparatus >

Based on the same inventive concept, referring to fig. 4, an embodiment of the present invention further provides an electronic device. The electronic apparatus 3000 has a touch pad 3050, a memory 3020, and a processor 3010. The memory 3020 is used to store computer instructions that, when executed by the processor 3010, implement the gesture recognition methods described above. The computer instructions, when executed by the processor 3010, may also implement the training learning method of the gesture recognition model described previously.

The electronic device 3000 may further include an interface 3030, a communication device 3040, an input device 3060, a speaker 3070, a microphone 3080, and the like.

The processor 3010 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 3020 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 3030 includes, for example, a USB interface, a headphone interface, and the like. The communication device 3040 can perform wired or wireless communication, for example. The input device 3060 may include, for example, keys, a keyboard, and the like. A user can input/output voice information through the speaker 3070 and the microphone 3080. The touch pad 3050 may be a capacitive touch screen or a resistive touch screen.

Optionally, the electronic device is a sound box.

The electronic device may also be other devices having a touch pad, for example the electronic device may be a headset.

The electronic device shown in fig. 4 is merely illustrative and is in no way intended to limit the present invention, its application, or uses. It will be appreciated by those skilled in the art that although a plurality of devices are shown in fig. 4, the present invention may relate to only some of the devices therein. Those skilled in the art can design instructions according to the disclosed aspects, and how the instructions control the operation of the processor is well known in the art, and therefore, will not be described in detail herein.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. It will be apparent to those skilled in the art that the above embodiments may be used alone or in combination with each other as desired.

In addition, for the device embodiment, since it corresponds to the method embodiment, the description is relatively simple, and for relevant points, refer to the description of the corresponding parts of the method embodiment. The system embodiments described above are merely illustrative, in that modules illustrated as separate components may or may not be physically separate.

In addition, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product provided in the embodiment of the present invention includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Although some specific embodiments of the present invention have been described in detail by way of examples, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present invention. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A gesture recognition method is characterized by comprising the following steps:

and executing the operation corresponding to the optimal preset gesture.

2. The method of claim 1, further comprising, prior to inputting the feature data to a gesture recognition model, the step of intercepting feature data:

3. The method according to claim 1 or 2, wherein the feature data is acquired based on the detection data, comprising any one or a combination of the following processing modes: median filtering processing, zeroing and denoising processing, maximum value normalization processing and dynamic range control processing.

4. A method according to any of claims 1-3, characterized in that the method further comprises an identification result verification step of:

5. The method according to any one of claims 1-4, wherein the preset gesture comprises: clicking any central partition, double clicking any central partition, triple clicking any central partition, clicking any peripheral partition, sliding clockwise along the peripheral partition, sliding anticlockwise along the peripheral partition, simultaneously touching a plurality of partitions for a single time, simultaneously touching a plurality of partitions for two times, simultaneously touching a plurality of partitions for three times, touching a full partition, sliding along a peripheral partition-a middle partition-another peripheral partition, long-pressing any middle partition, long-pressing any peripheral partition, and simultaneously touching a plurality of partitions for a long time.

6. The method of any of claims 1-5, wherein the gesture recognition model is implemented using a neural network model.

7. The method of claim 6, wherein the gesture recognition model is obtained using a stochastic gradient descent algorithm for iterative training.

8. The method of claim 6, wherein the neural network model is a recurrent neural network model.

9. An electronic device having a touch pad, a memory for storing computer instructions, and a processor, wherein the computer instructions, when executed by the processor, implement the method of any of claims 1-8.

10. The electronic device of claim 9, wherein the electronic device is a sound box.