CN110222645B - Gesture misidentification feature discovery method - Google Patents

Gesture misidentification feature discovery method Download PDF

Info

Publication number
CN110222645B
CN110222645B CN201910496416.XA CN201910496416A CN110222645B CN 110222645 B CN110222645 B CN 110222645B CN 201910496416 A CN201910496416 A CN 201910496416A CN 110222645 B CN110222645 B CN 110222645B
Authority
CN
China
Prior art keywords
matrix
dimension
gesture
misidentification
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910496416.XA
Other languages
Chinese (zh)
Other versions
CN110222645A (en
Inventor
孙元功
孙凯云
冯志全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN201910496416.XA priority Critical patent/CN110222645B/en
Publication of CN110222645A publication Critical patent/CN110222645A/en
Application granted granted Critical
Publication of CN110222645B publication Critical patent/CN110222645B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for discovering the character matrix character of gesture misidentification, supposing that m index finger gesture pictures correctly recognized by a convolutional neural network form a set A, n index finger gesture pictures incorrectly recognized by the convolutional neural network form a set B, extracting the character value of a 7 th layer full-connection layer by using a Python interface, storing the character value into a matrix V, inputting any two pictures i and j, wherein i belongs to A, j belongs to B, and calculating a character matrix Q of misidentification i The steps are as follows: a. respectively extracting characteristic values of i and j at a 7 th layer of full connection layer, and storing the characteristic values into a matrix V; b. calculating Z of i, j input to Softmax function respectively i ,Z j A value of (d) above; c. data are arranged from small to large and Z is described by a curve i ,Z j Finding out the features with violent change and the corresponding original dimension according to the change trend of the model, and forming a set C by the dimension; d. from C 1 Go through to C 4096 Ending circulation, and counting the frequency and frequency of each dimension; e. the frequency is more than 90% dimension and is stored in a matrix Q; f. and (6) ending. The invention can effectively extract the misidentification feature matrix.

Description

Gesture misidentification feature discovery method
Technical Field
The invention relates to the technical field of image recognition, in particular to a method for dynamic gesture recognition, and specifically relates to a method for discovering gesture misidentification characteristics.
Background
Dynamic gesture recognition has been widely studied for decades because it can provide a high level of human-computer interaction, but few researchers are concerned with the recognition of similar gestures. In 2006, a dynamic bayesian classifier was proposed to recognize similar gestures by combining motion-based and gesture-based features to achieve a competitive classification rate [1 ]. Elmezan [2] et al propose a real-time gesture recognition system that contains 36 gestures, and improves the accuracy of similar gestures mainly by creating a combination of features including location, direction and speed features. Ding [3] describes that there are many similarly shaped gestures, such as S and 5, Z and 2, in a database of numbers 0-9 and letters A-Z, which have a low recognition rate. The authors thus propose a new way of distinguishing similar gestures. Firstly, jumping motion track information in a three-dimensional space is captured, and motion characteristics are quantized into characteristics. The gestures are then modeled and classified using a Hidden Markov Model (HMM) approach. Experimental results show that the method has high recognition rate and universality. In summary, at present, similar gestures are researched a few times, and most of research methods for similar gestures reduce similarity to the maximum extent by combining multiple gesture features or combining multiple classification methods, so that the recognition rate is improved to a certain extent. But similar problems cannot be solved at the root.
At present, many researches are carried out on gesture recognition methods, but whether the methods are based on geometric features or machine learning, few researches are concerned about error mechanisms of gesture recognition errors. And thus lack an automatic recognition and correction method for the erroneous gesture. It is considered that the correction of the misrecognition is not only one of effective ways to improve the recognition rate but also one of breakthrough points to reveal the perception intelligence. Therefore, the invention provides a gesture misidentification feature discovery method based on the convolutional neural network gesture identification method, so that the mechanism problem of the error gesture can be further researched according to the method.
The AlexNet network model is adopted to train the operation model based on the convolutional neural network gesture recognition method, and the AlexNet network model and the gesture model training method are introduced as follows:
AlexNet is a network structure introduced by Alex Krizhevsky, university of Toronto, in the article "ImageNet Classification with Deep conditional Neural Networks". The AlexNet successfully applies the Tricks such as ReLU, Dropout and LRN in the CNN for the first time, and the AlexNet also uses the GPU for operation acceleration. The method adopts an AlexNet network model, and performs a limited number of iterative trainings by optimizing the parameters of the solution under the condition of ensuring that the loss value is decreased with the increase of the iterative times. And selecting an optimal test network model according to the two parameters of the accuracy and the loss value of each iteration. The convolution full-connection process is shown as C1-FC 7 in FIG. 1, the size of the input image in FIG. 1 is 227X 3 (width/height/channel number), the convolution kernel is shown as dark color in the figure, the size includes width, height and thickness, and the thickness is equal to the channel number of the convolved image. The number of convolution kernels is equivalent to the number of channels output after convolution operation. It is clear from the figure that the image size changes after each convolution, for example, after the original image passes through C1, the size becomes 55 × 55 × 48 × 2, and 96 is the number of convolution kernels, but the display is divided into two display cards, each display card has 48 display cards. Assuming that the image size is an N × N matrix, the size of the convolution kernel is a K × K matrix, and the convolution mode (edge pixel filling mode): p, the convolution step S, then the width and height mxm of the image after a layer of such convolution is calculated according to equation (1):
Figure BDA0002088748690000021
the convolutional layers are C1 … C5 shown in FIG. 1, and have 5 layers in total, the attributes of the convolutional layers are the same as the number of feature maps obtained by convolution of the same convolutional kernel in Table 1, and the same feature map is obtained by convolution of the same convolutional kernel, so that the number of parameter training is reduced by weight sharing, and the time is saved. C5 is followed by a full connection layer FC6, the convolution kernel size of full convolution is 13 × 13 × 256, 4096 full convolution operations are performed by the convolution kernels on the input image, respectively, and the final result is a column vector, which has a total of 4096 numbers. The FC8 has 7 neurons, each representing that the class in the label is 7 classes, the output value is in the [0,1] interval, the probability of identifying the class is represented, and the maximum probability is taken as the final identification result. The convolution kernel parameters employed in fig. 1 are shown in table 1.
TABLE 1
Convolutional layer Number of convolution kernels Width of Height of Thickness of
C1 96 11 11 3
C2 256 5 5 48
C3 384 3 3 256
C4 384 3 3 192
C5 256 3 3 192
In the invention, the number of each type of training sample reaches 20k, and the number of verification sets is 2 k. The basic learning rate was 0.01. And when the training times reach 500 times, testing the correctness of the verification set by using the current parameters. And finally, selecting the optimal model according to the loss value and the accuracy. The above is the process of training the gesture model of the present invention.
In neural networks, softmax (flexible maximum) function is mainly used in multi-classification processes. It maps the outputs of multiple neurons to [0,1]]Within the interval. Suppose we extract the data of a full connection of an image on the last layer, and represent the data by a one-dimensional matrix V, V l The value of the l-th element in V is represented, the value range of l is determined by the label number of the model, and W l Representing the weight parameter in softmax. That softmax is expressed by the following equation:
Figure BDA0002088748690000031
Z l =∑V l *W l (3)
the Softmax classification result is to select the class with the maximum probability, namely the maximum value S in the formula (2) l The corresponding l. From this formula, S l Value of and Z l In a direct proportional relationship. Because the value on the denominator is invariant and exp () is an increasing function, Z l The size of the value determines the result of the final classification, corresponding to Z l The largest is the final classification result of the image. Wherein Z l Can be calculated by the formula (3), and W is known from the above definition l Is a weight parameter, V l Is a set of input feature data and a more intuitive classification process is shown in fig. 2.
According to the method, firstly, the same gesture data set is divided into a correct type and an error type according to the result of a gesture recognition model. Based on the analysis, the value of Z depends on V, so that the characteristic value V of the 7 th fully-connected layer of each image before classification is extracted, and the distribution rule is represented by a curve fitting function. According to the invention, in the curve distribution observation of two recognition results of the same type of gestures, the common characteristic dimension T with large influence factors on the recognition results exists in V, and a matrix formed by dimensions with the frequency of the characteristic dimension T with large influence factors on the recognition results larger than 90% is defined as a false recognition characteristic matrix Q.
How to obtain the misrecognition characteristic matrix Q is the technical problem to be solved by the invention.
Disclosure of Invention
The invention provides a gesture misidentification feature matrix feature discovery method aiming at the technical problems, and the feature dimension with large influence factors on the gesture misidentification features can be extracted according to the method.
The invention is realized by the following technical scheme, and provides a method for finding the characteristic matrix characteristic of gesture misidentification, wherein a set A is formed by m index finger gesture pictures which are correctly identified by a convolutional neural network, a set B is formed by n index finger gesture pictures which are incorrectly identified by the convolutional neural network, the characteristic value of a 7 th layer full-connection layer is extracted by using a Python interface and is stored in a matrix V, any two pictures i and j are input, wherein i belongs to A, j belongs to B, and a misidentification characteristic matrix Q is calculated i The steps are as follows:
a. respectively extracting characteristic values of i and j at a 7 th layer of full-connection layer, and storing the characteristic values into a matrix V;
b. calculating Z of i, j input to Softmax function respectively i ,Z j A value of (d) above;
c. data are arranged from small to large and Z is described by curve i ,Z j Finding out the features with violent change and the corresponding original dimension according to the change trend of the model, and forming a set C by the dimension;
d. from C 1 Go through to C 4096 Ending circulation, and counting the frequency and frequency of each dimension;
e. the frequency dimension is more than 90 percent and is stored in a matrix Q;
f. and (6) ending.
Preferably, the value of m is 999, and the value of n is 999.
In conclusion, the method effectively extracts the false recognition characteristic matrix, and the characteristic dimension i with large influence factors of the recognition result can be further researched by the matrix.
Drawings
FIG. 1 is a schematic diagram of an Alexnet network structure according to the present invention;
FIG. 2 is a schematic diagram of the softmax classification process of the present invention;
FIG. 3 is a diagram illustrating a variation curve of the misrecognition feature corresponding to the set A in the present invention;
FIG. 4 is a diagram illustrating a variation curve of the misrecognized feature corresponding to the set B in the present invention.
Detailed Description
In order to clearly illustrate the technical features of the present invention, the present invention is further illustrated by the following detailed description with reference to the accompanying drawings.
A method for finding out features of a gesture misrecognition feature matrix is characterized by supposing that m index finger gesture pictures correctly recognized by a convolutional neural network form a set A, n index finger gesture pictures incorrectly recognized by the convolutional neural network form a set B, utilizing a Python interface to extract feature values of a 7 th full-connection layer, storing the feature values into a matrix V, inputting any two pictures i and j, wherein i belongs to A, j belongs to B, and calculating a misrecognition feature matrix Q i The steps are as follows:
a. respectively extracting characteristic values of i and j at a 7 th layer of full connection layer, and storing the characteristic values into a matrix V;
b. calculating Z of i, j input to Softmax function respectively i ,Z j A value of (d);
c. data are arranged from small to large and Z is described by curve i ,Z j Finding out the features with violent change and the corresponding original dimension according to the change trend of the model, and forming a set C by the dimension;
d. from C 1 Go through to C 4096 Ending circulation, and counting the frequency and frequency of each dimension;
e. the frequency is more than 90% dimension and is stored in a matrix Q;
f. and (6) ending.
In this embodiment, the value of m is 999, and the value of n is 999, that is, 999 pictures identified correctly by CNN are selected to form a set a, and 999 pictures identified by CNN as a thumb gesture by mistake are selected to form a set B. To further illustrate, first, Z is calculated for each picture in the A, B sets i The value i takes 2 and represents a label such as the index finger. And Z of each set i Storing by 999 x 4096 dimensional matrix, wherein the row represents the first picture, and the column represents the Z corresponding to the picture i Value, in total4096 data. In order to observe the change situation of the characteristic value, 4096 data are arranged according to the size, and the change of the data is described by adopting a curve fitting method. As shown in FIGS. 3 and 4, these are characteristic variations derived from the two pictures in the set A, B, respectively, with the rows being the number of data and the columns being Z i The areas with large slope of the curve of the value, the A set and the B set are marked with dark colors.
Since the value of the sum of these values determines the result of the index finger or the like, the larger the value, the greater the probability that it will eventually be recognized as correct. From the comparison between fig. 3 and fig. 4, it can be seen that the curvature of the curve varies dramatically in the anterior 100 and posterior 100 dimensions, and the values in the middle are relatively stable. Since the strongly varying values affect the final recognition result, the present invention calls this dimension as the feature with a large impact factor. Finally, the invention uses a matrix of 999 x 200 to represent the dimension of each picture corresponding to the original 200-dimensional position, and although the values of the 200-dimensional matrix corresponding to each picture are not completely the same, common dimensions exist, so that the invention counts the dimension with the frequency of more than 90%, as shown in the following table 2. The eigenvalues corresponding to these dimensions influence the correctness of the final classification, so the invention refers to the array formed by these dimensions as the misrecognition eigen matrix.
TABLE 2
Figure BDA0002088748690000051
Figure BDA0002088748690000061
Finally, it should be further noted that the above examples and descriptions are not limited to the above embodiments, and technical features of the present invention that are not described may be implemented by or using the prior art, and are not described herein again; the above embodiments and drawings are only for illustrating the technical solutions of the present invention and not for limiting the present invention, and the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that changes, modifications, additions or substitutions within the spirit and scope of the present invention may be made by those skilled in the art without departing from the spirit of the present invention, and shall also fall within the scope of the claims of the present invention. It should also be noted that other gesture recognition methods are cited in the present invention, and the documents cited for these gesture recognition methods are described as follows:
[1]Aviles-Arriaga H H,Sucar L E,Mendoza C E.Visual recognition of similar gestures[C].18th International Conference on Pattern Recognition(ICPR'06).IEEE,2006,1:1100-1103.
[2]Elmezain M,Al-Hamadi A,Michaelis B.Hand gesture recognition based on combined features extraction[J].Journal of World Academy of Science,Engineering and Technology,2009,60:395.
[3]Ding Z,Chen Y,Chen Y L,et al.Similar hand gesture recognition by automatically extracting distinctive features[J].International Journal of Control,Automation and Systems,2017,15(4):1770-1778.

Claims (2)

1. a gesture misidentification feature discovery method is characterized in that a set A is formed by assuming that m index finger gesture pictures which are correctly identified by a convolutional neural network, a set B is formed by n index finger gesture pictures which are incorrectly identified by the convolutional neural network, feature values of a 7 th full-link layer are extracted by utilizing a Python interface and stored in a matrix V, and any two pictures i and j are input, wherein i belongs to A, and j belongs to B i Comprises the following steps:
a. respectively extracting characteristic values of i and j at a 7 th layer of full-connection layer, and storing the characteristic values into a matrix V;
b. calculating Z of i, j input to Softmax function respectively i ,Z j A value of (d) above;
c. data are arranged from small to large and Z is described by curve i ,Z j Finding out the features with violent change and the corresponding original dimension according to the change trend of the model, and forming a set C by the dimension;
d. from C 1 Go through to C 4096 Ending circulation, and counting the frequency and frequency of each dimension;
e. the frequency is more than 90% dimension and is stored in a matrix Q;
f. and (6) ending.
2. The method according to claim 1, wherein m is 999 and n is 999.
CN201910496416.XA 2019-06-10 2019-06-10 Gesture misidentification feature discovery method Expired - Fee Related CN110222645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910496416.XA CN110222645B (en) 2019-06-10 2019-06-10 Gesture misidentification feature discovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910496416.XA CN110222645B (en) 2019-06-10 2019-06-10 Gesture misidentification feature discovery method

Publications (2)

Publication Number Publication Date
CN110222645A CN110222645A (en) 2019-09-10
CN110222645B true CN110222645B (en) 2022-09-27

Family

ID=67816148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910496416.XA Expired - Fee Related CN110222645B (en) 2019-06-10 2019-06-10 Gesture misidentification feature discovery method

Country Status (1)

Country Link
CN (1) CN110222645B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101236A (en) * 2020-09-17 2020-12-18 济南大学 Intelligent error correction method and system for elderly accompanying robot
CN112100075B (en) * 2020-09-24 2024-03-15 腾讯科技(深圳)有限公司 User interface playback method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529470A (en) * 2016-11-09 2017-03-22 济南大学 Gesture recognition method based on multistage depth convolution neural network
CN109190443A (en) * 2018-06-27 2019-01-11 济南大学 It is a kind of accidentally to know gestures detection and error correction method
WO2019080203A1 (en) * 2017-10-25 2019-05-02 南京阿凡达机器人科技有限公司 Gesture recognition method and system for robot, and robot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529470A (en) * 2016-11-09 2017-03-22 济南大学 Gesture recognition method based on multistage depth convolution neural network
WO2019080203A1 (en) * 2017-10-25 2019-05-02 南京阿凡达机器人科技有限公司 Gesture recognition method and system for robot, and robot
CN109190443A (en) * 2018-06-27 2019-01-11 济南大学 It is a kind of accidentally to know gestures detection and error correction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于卷积神经网络的手势识别方法;杨文斌等;《安徽工程大学学报》;20180215(第01期);全文 *
基于多列深度3D卷积神经网络的手势识别;易生等;《计算机工程》;20170815(第08期);全文 *

Also Published As

Publication number Publication date
CN110222645A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
US10289897B2 (en) Method and a system for face verification
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
Kim et al. Fusing aligned and non-aligned face information for automatic affect recognition in the wild: a deep learning approach
CN110321967B (en) Image classification improvement method based on convolutional neural network
CN102147858B (en) License plate character identification method
CN108664975B (en) Uyghur handwritten letter recognition method and system and electronic equipment
WO2014205231A1 (en) Deep learning framework for generic object detection
Zhang et al. Data driven feature selection for machine learning algorithms in computer vision
Richarz et al. Semi-supervised learning for character recognition in historical archive documents
CN105894050A (en) Multi-task learning based method for recognizing race and gender through human face image
CN109086660A (en) Training method, equipment and the storage medium of multi-task learning depth network
CN107704859A (en) A kind of character recognition method based on deep learning training framework
Escalera et al. Boosted Landmarks of Contextual Descriptors and Forest-ECOC: A novel framework to detect and classify objects in cluttered scenes
Dai Nguyen et al. Recognition of online handwritten math symbols using deep neural networks
CN110222645B (en) Gesture misidentification feature discovery method
CN112651323B (en) Chinese handwriting recognition method and system based on text line detection
EP2486518A1 (en) Method of computing global-to-local metrics for recognition
Korichi et al. Off-line Arabic handwriting recognition system based on ML-LPQ and classifiers combination
CN113420983B (en) Writing evaluation method, device, equipment and storage medium
Prasad et al. Multiple hidden Markov model post processed with support vector machine to recognize English handwritten numerals
Zhao Handwritten digit recognition and classification using machine learning
Elmezain et al. Posture and gesture recognition for human-computer interaction
Fan Efficient multiclass object detection by a hierarchy of classifiers
Rouabhi et al. Optimizing Handwritten Arabic Character Recognition: Feature Extraction, Concatenation, and PSO-Based Feature Selection.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220927

CF01 Termination of patent right due to non-payment of annual fee