CN112883922B - Sign language identification method based on CNN-BiGRU neural network fusion - Google Patents

Sign language identification method based on CNN-BiGRU neural network fusion Download PDF

Info

Publication number
CN112883922B
CN112883922B CN202110304616.8A CN202110304616A CN112883922B CN 112883922 B CN112883922 B CN 112883922B CN 202110304616 A CN202110304616 A CN 202110304616A CN 112883922 B CN112883922 B CN 112883922B
Authority
CN
China
Prior art keywords
sign language
bigru
data set
cnn
palm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110304616.8A
Other languages
Chinese (zh)
Other versions
CN112883922A (en
Inventor
李桢旻
祝东疆
苏彦博
贺子珊
鲁杰
彭靖宇
杜高明
王晓蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202110304616.8A priority Critical patent/CN112883922B/en
Publication of CN112883922A publication Critical patent/CN112883922A/en
Application granted granted Critical
Publication of CN112883922B publication Critical patent/CN112883922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a sign language identification method based on CNN-BiGRU neural network fusion, which comprises the following steps: 1, collecting sign language data and adding a label to make a sign language data set; 2, carrying out data preprocessing on the sign language data set; dividing the enhanced characteristic data into a training data set, a verification data set and a test data set; 4, establishing a CNN-BiGRU deep neural network model fused by one-dimensional CNN and BiGRU; and 5, collecting sign language data in real time, preprocessing the sign language data, and inputting the preprocessed sign language data into a final model to obtain a sign language classification result. The invention can fully utilize the space-time information of the sign language feature sequence and improve the identification precision of the whole model, thereby effectively and accurately realizing the identification and classification of the sign language.

Description

Sign language identification method based on CNN-BiGRU neural network fusion
Technical Field
The invention relates to the field of sign language identification, in particular to a sign language identification method based on CNN-BiGRU neural network fusion.
Background
Sign language recognition highlights the semantic reconstruction characteristic of dynamic space information of sign language interaction under the background that the information transmission mode of current intelligent man-machine interaction is increasingly diversified, and directly hits the requirement pain point: china has over twenty million deaf-mute populations, has a large cardinal number and a low literacy rate, and sign language is used as a communication mode in daily life, has wide application and large translation requirement. However, the sign language translation industry is slow in development, social training strength is weak, infrastructure is deficient and the like, so that high-level sign language translation talent supply is scarce. Secondly, the online sign language translation operation cost is high, the popularization difficulty is high, and therefore the method has important significance for automatic and accurate sign language identification.
The traditional independent sign language state acquisition modes comprise a skin electromyography sensor, a wearable data glove, a common camera and the like. The myoelectric sensing equipment collects weak nerve current generated by body muscle movement, and then corresponding sign language actions are identified and fed back through a mode identification algorithm; the recognition algorithm based on the common camera mainly obtains sign language lower-dimensional data by performing background segmentation on a target and constructing a local description factor and is used for classification; increasing the number of cameras can help to further extract hand space trajectory information. However, the complexity of the background information of the shot image increases the difficulty in recognizing gesture language postures and hand positions, and it is difficult to extract sufficient effective depth information from a single image, which affects the gesture language recognition accuracy. The myoelectric sensor and the wearable data glove have wearing requirements, are inconvenient to use, need to consider the sanitary protection problem of putting into public places under epidemic situations, and have considerable limitations in practical use popularization.
At present, with the rise of artificial intelligence, deep learning gradually deepens into various fields, the aspect of sign language recognition technology gradually turns to the deep learning field, and good results are obtained, but the recognition technology for sign language is still few and is not mature. The traditional sign language recognition deep learning method mainly comprises a Convolutional Neural Network (CNN) and a long-short term memory network (LSTM), and the CNN-based sign language recognition system is limited to local features and cannot deeply learn pooled features; although the LSTM network considers the past characteristic sequences, the LSTM network ignores the future time sequence information, and has a complex network structure and high training difficulty.
Disclosure of Invention
Aiming at the problem of sign language identification, in order to overcome the defects of the prior art, the invention provides a sign language identification method based on CNN-BiGRU neural network fusion, so that the space-time information of a sign language feature sequence can be fully utilized, the identification precision of an identification model is improved, and the identification and classification of sign languages can be effectively and accurately realized.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a sign language identification method based on CNN-BiGRU neural network fusion, which is characterized by comprising the following steps:
step 1: capturing by thumb position coordinates F with a depth camera device 1 Index finger position coordinate F 2 Middle finger position coordinate F 3 Position coordinates F of ring finger 4 Position coordinates of little finger F 5 Palm center position P C Palm stable position P S Palm Pitch angle Pitch, palm Yaw angle Yaw, palm Roll angle Roll, hand grip radius r, palm width P W Various sign language data composed of palm center velocity v; each sign language data is provided with a corresponding category label; forming a sign language data set by a plurality of sign language data and category labels thereof;
step 2: carrying out data preprocessing on the sign language data set;
step 2.1, calculating a relative radius S of a hand-held ball, a fingertip palm center distance D, a palm center relative speed V, an absolute three-dimensional palm position standard deviation P and an inter-finger distance L according to the sign language data set to be used as a feature data set;
step 2.2, standardizing the characteristic data set by adopting a zero-mean value normalization method to obtain a preprocessed characteristic data set;
step 2.3, performing data enhancement on the preprocessed characteristic data set to obtain an enhanced characteristic data set;
and step 3: dividing the enhanced feature data set into a training data set, a verification data set and a test data set;
and 4, step 4: establishing a CNN-BiGRU deep neural network model fused by one-dimensional CNN and BiGRU; the CNN-BiGRU deep neural network model comprises: a SpatialDropout layer, a one-dimensional CNN network, a BiGRU network and a full connection layer; the one-dimensional CNN network consists of a one-dimensional convolutional layer and a one-dimensional maximum pooling layer; the BiGRU network structure is formed by combining a forward propagation GRU unit and a backward propagation GRU unit;
step 4.1, setting a hyper-parameter, and initializing the parameter of the CNN-BiGRU deep neural network model so as to obtain a current network model;
step 4.2, inputting the training data set into a spatialDropout layer in the current network model, and obtaining primary sign language features through convolution and maximum pooling operation of a one-dimensional CNN network; after the initial sign language features pass through a BiGRU network, obtaining time sequence information of the sign language features and outputting each sign language category probability through a full connection layer;
4.3, reversely propagating the sign language category probability to the current network model by using an optimization algorithm, thereby updating parameters of each layer of network model and obtaining an updated CNN-BiGRU deep neural network model;
step 4.4, verifying the accuracy of the updated CNN-BiGRU deep neural network model by using a verification data set so as to judge whether the updated CNN-BiGRU deep neural network model is converged, if so, taking the updated CNN-BiGRU deep neural network model as an optimal sign language classification model under the current super-parameter setting, otherwise, taking the updated CNN-BiGRU deep neural network model as a current network model, and returning to the step 4.2;
and 4.5, according to the process from the step 4.1 to the step 4.4, obtaining the optimal sign language classification models under different hyper-parameters, and comparing the accuracy of the optimal sign language classification models on the test data set, so that the optimal sign language classification model with the highest accuracy is selected as the model finally used for recognizing the sign language.
The sign language identification method based on the CNN-BiGRU neural network fusion is also characterized in that the step 2.1 comprises the following steps:
step 2.1.1, obtaining the relative radius S of the handheld ball by using the formula (1):
Figure BDA0002987590640000031
step 2.1.2, obtaining the fingertip and palm center distance D by using the formula (2):
Figure BDA0002987590640000032
in the formula (2), F ix Coordinate value, F, representing the x direction in the ith finger position coordinate iy Coordinate value, F, representing the y direction in the i-th finger position coordinate iz Coordinate value, P, representing the z direction in the ith finger position coordinate Cx Coordinate value, P, representing the x-direction in coordinates of the palm center position Cy Coordinate value, P, representing the y-direction in coordinates of the palm center position Cz A coordinate value in the z direction in the palm center position coordinate is represented, and i is 1, 2, 3, 4 and 5 respectively represent a thumb, an index finger, a middle finger, a ring finger and a little finger;
step 2.1.3, obtaining the palm center relative speed V in the x direction by using the formula (3), the formula (4) and the formula (5) respectively x Y-direction palm center relative velocity V y And the palm center relative velocity V in the z direction z
Figure BDA0002987590640000033
Figure BDA0002987590640000034
Figure BDA0002987590640000035
In the formulae (3) to (5), v k Representing the palm velocity in the k direction;
step 2.1.4, obtaining the standard deviation P of the absolute three-dimensional palm position by using the formula (6):
P=P C -P S (6)
step 2.1.5, obtaining the fingertip distance L by using the formula (7):
Figure BDA0002987590640000036
in the formula (7), F jx Coordinate value, F, representing the x direction in the jth finger position coordinate jy Coordinate value, F, representing the y direction in the j-th finger position coordinate jz And j represents coordinate values in the z direction in the j-th finger position coordinate, wherein j represents a thumb, an index finger, a middle finger, a ring finger and a little finger respectively, and i is not equal to j.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the characteristics of the palm and the fingers are accurately obtained by utilizing the Leap Motion of the depth camera, and a one-dimensional CNN network and a BiGRU network are fused, so that sign language recognition with high accuracy is achieved;
2. according to the invention, the space-time characteristics of the palm and the fingers are accurately obtained by utilizing the Leap Motion of the depth camera under the operating situation of no contact and no mark assistance, compared with the traditional glove-type identification system, no special equipment is needed to obtain the characteristics, so that the equipment cost is reduced; compared with a common two-dimensional camera, the Leap Motion of the depth camera can capture more spatial features of the sign language, so that the recognition accuracy is improved;
3. the CNN-BiGRU deep neural network model utilizes a one-dimensional CNN network to perform primary feature extraction on the feature quantity through convolution operation, and extracts the spatial features of the hand speech, thereby greatly reducing the number of network parameters under the condition of ensuring the completeness of the feature vector at each moment, avoiding the mutual influence among the parameters due to the sparsity of convolution and improving the effectiveness of feature extraction;
4. the CNN-BiGRU deep neural network model adopts a BiGRU network, and performs forward and backward extraction on the time sequence information of the sign language features extracted by the one-dimensional CNN network in the time dimension simultaneously, so that more complete time sequence feature information is obtained, and the accuracy of sign language identification is improved;
drawings
FIG. 1 is a flow chart of a sign language recognition method based on fusion according to the present invention;
FIG. 2a is a diagram illustrating finger position, palm position and palm velocity in the sign language data feature of the present invention;
FIG. 2b is a schematic diagram of a palm angle in the sign language data feature of the present invention;
FIG. 2c is a schematic diagram of the hand-held sphere radius and the palm width in the sign language data feature of the present invention;
fig. 3 is a schematic structural diagram of a CNN-BiGRU combined network model implemented by the present invention.
Detailed Description
In this embodiment, a sign language identification method based on a CNN-BiGRU neural network fusion algorithm, as shown in fig. 1, includes the following steps:
step 1: obtaining thumb position coordinates F using a depth camera device Leap Motion 1 Index finger position coordinate F 2 Middle finger position coordinates F 3 Position coordinates of ring finger F 4 Position coordinates of little finger F 5 Palm center position P C Palm stable position P S Palm center velocity v, palm Pitch angle Pitch, palm Yaw Angle Yaw, palm Roll Angle Roll, hand grip radius r, palm Width P W The formed various sign language data; each sign language data is provided with a corresponding category label; forming a sign language data set by a plurality of sign language data and category labels thereof;
in specific implementation, the operation of acquiring the sign language data set specifically includes: 10 collection subjects, 15 sign language actions, each recording 20 sets of sign language data sets per sign language action. When recording, the palm faces to the Leap Motion device, appointed sign language actions are respectively completed in the visual field range of about 25mm to 600mm above the palm, and original characteristic data describing the palm position and the finger state are obtained.
As shown in FIG. 2a, the thumb position coordinates F in the sign language data set are obtained 1 Index finger position coordinate F 2 Middle finger position coordinate F 3 Position coordinates F of ring finger 4 Position coordinates of little finger F 5 Palm center position P C Palm velocity v;
as shown in fig. 2b, acquiring a palm Pitch angle Pitch, a palm Yaw angle Yaw and a palm Roll angle Roll in the sign language data set;
as shown in FIG. 2c, the hand-held sphere radius r and the palm width P in the sign language data set are obtained W
The 10 collection subjects are 10 volunteers selected from the recruited subjects in consideration of individual differentiation factors such as gender, age, handedness and the like, have a certain sign language basis, and participate in data set recording after short-time learning.
Step 2: carrying out data preprocessing on the sign language data set;
step 2.1, calculating according to the sign language data set to obtain a relative radius S of a hand-held ball, a fingertip palm center distance D, a palm center relative speed V, an absolute three-dimensional palm position standard deviation P and an inter-finger distance L, and taking the relative radius S, the fingertip palm center distance D, the palm center relative speed V, the absolute three-dimensional palm position standard deviation P and the inter-finger distance L as characteristic data;
step 2.1.1, obtaining the relative radius S of the handheld ball by using the formula (1):
Figure BDA0002987590640000051
step 2.1.2, obtaining the fingertip and palm center distance D by using the formula (2):
Figure BDA0002987590640000052
in the formula (2), F ix Coordinate value, F, representing the x direction in the ith finger position coordinate iy Coordinate value, F, representing the y direction in the i-th finger position coordinate iz Coordinate value, P, representing the z direction in the ith finger position coordinate Cx Coordinate value, P, representing the x-direction in coordinates of the palm center position Cy Coordinate value, P, representing the y-direction in coordinates of the palm center position Cz A coordinate value in the z direction in the palm center position coordinate is represented, and i is 1, 2, 3, 4 and 5 respectively represent a thumb, an index finger, a middle finger, a ring finger and a little finger;
step 2.1.3, obtaining the palm center relative speed V in the x direction by using the formula (3), the formula (4) and the formula (5) respectively x Y-direction palm center relative velocity V y And the palm center relative velocity V in the z direction z
Figure BDA0002987590640000053
Figure BDA0002987590640000054
Figure BDA0002987590640000061
In the formulae (3) to (5), v k Represents the palm velocity in the k direction;
step 2.1.4, obtaining the standard deviation P of the absolute three-dimensional palm position by using the formula (6):
P=P C -P S (6)
step 2.1.5, obtaining the fingertip distance L by using the formula (7):
Figure BDA0002987590640000062
in the formula (7), F jx Coordinate value, F, representing the x direction in the jth finger position coordinate jy Coordinate value, F, representing the y direction in the j-th finger position coordinate jz The coordinate value of z direction in j-th finger position coordinate is represented, j is 1, 2, 3, 4 and 5 respectively represent thumb, index finger, middle finger, ring finger and little finger, and i is not equal to j, and represents that the distance between a finger tip and the finger tip is not calculated。
Step 2.2, carrying out standardization processing on the characteristic data by adopting a zero-mean normalization method, eliminating the influence of dimension and value range difference among indexes, and obtaining a preprocessed characteristic data set;
the zero-mean normalization algorithm is embodied as follows:
Figure BDA0002987590640000063
in the formula (8), x represents characteristic data,
Figure BDA0002987590640000064
represents the mean of the characteristic data, sigma represents the standard deviation of the characteristic data, x * The resulting normalized preprocessed feature data is represented.
And 2.3, performing data enhancement on the preprocessed characteristic data set. And transforming the time sequence of the data samples in the time domain of the time sequence by adopting a central averaging method based on weighted form dynamic time warping. The method comprises the following steps: randomly selecting an initial time sequence from the data set, giving a weight of 0.5 to the initial time sequence, and using the initial time sequence as an initial time sequence of a central averaging technology; respectively calculating the dynamic time warping distances of the initial time sequence and other samples, and finding 5 time sequences with the shortest distance; randomly selecting two time sequences from the 5 nearest neighbors, and respectively giving a weight of 0.15; the remaining sequences are equally assigned the remaining 0.2 weights. Obtaining an enhanced feature data set;
and step 3: dividing the enhanced feature data set into a training data set, a verification data set and a test data set, wherein the proportion is 6: 2: 2;
and 4, step 4: as shown in fig. 3, a CNN-BiGRU deep neural network model with one-dimensional CNN and BiGRU fused is established; the CNN-BiGRU deep neural network model comprises the following steps: a SpatialDropout layer, a one-dimensional CNN network, a BiGRU network and a full connection layer; the spatialDropout layer randomly zeros the partial region of the sign language image, so that the generalization capability of the model is improved; the one-dimensional CNN network consists of a one-dimensional convolutional layer and a maximum pooling layer, wherein the one-dimensional convolutional layer obtains global information by comprehensively learning the local characteristics of the sign language sequence through convolution operation; and the maximum pooling layer performs down-sampling on the input feature map to eliminate partial redundant information. The BiGRU network structure is formed by combining a forward propagation GRU unit and a backward propagation GRU unit; the GRU unit which is propagated forwards processes input sequence data along a time positive sequence, the GRU unit which is propagated backwards processes the input sequence data along a time negative sequence, a unidirectional network structure is changed into a bidirectional network structure, context information can be fully utilized, feature information ignored by the unidirectional GRU can be captured, redundant information can be further eliminated, and finally space features and time feature information containing an original sign language sequence are obtained. The full connection layer is used for reintegrating the input data and mapping to the sample label space.
Step 4.1, setting a hyper-parameter, and initializing the parameter of the CNN-BiGRU deep neural network model so as to obtain a current network model; the hyper-parameter setting comprises the following steps: the activation function of the one-dimensional convolutional layer is Relu, the size of the filter is 5, the number of the filters is 64, overfitting is avoided by using L2 regularization, and the regularization coefficient is 0.001; the pooling layer had a pooling window size of 4. In the BiGRU layer, the dropout ratio parameter of an input unit is 0.1, the dropout ratio parameter of a circulation unit is 0.1, overfitting is avoided by using L2 regularization, and the regularization coefficient is 0.001. The full connection layer adopts a Softmax activation function.
Step 4.2, inputting the training data set into the current network model, randomly setting the partial region of the sign language image to zero by using a SpatialDropout layer, improving the generalization capability of the model, extracting the primary feature of the sign language through the convolution operation of the one-dimensional CNN network, and eliminating redundant information through the maximum pooling operation of the one-dimensional CNN network; obtaining initial sign language features, wherein the sign language primary features comprise spatial information such as positions and the like; after the initial sign language features pass through a BiGRU network, obtaining time sequence information of the sign language features as high-level features, and finally outputting each sign language category probability through a full connection layer;
4.3, reversely propagating the sign language class probability to the current network model by using an optimization algorithm, thereby updating parameters of each layer of network model and obtaining an updated CNN-BiGRU deep neural network model; adam is selected as an optimizer for the optimization algorithm model, Cross Engine is used as a loss function, accuracy is selected as the accuracy evaluation index, epochs is 15, and batch _ size is 16.
Step 4.4, verifying the accuracy of the updated CNN-BiGRU deep neural network model by using the verification data set so as to judge whether the updated CNN-BiGRU deep neural network model is converged, if so, taking the updated CNN-BiGRU deep neural network model as an optimal sign language classification model under the current super-parameter setting, otherwise, taking the updated CNN-BiGRU deep neural network model as a current network model, and returning to the step 4.2;
and 4.5, according to the process from the step 4.1 to the step 4.4, performing cross validation by using a plurality of hyper-parameters to obtain optimal sign language classification models under different hyper-parameters, and comparing the accuracy of the optimal sign language classification models on the test data set, so that the optimal sign language classification model with the highest accuracy is selected as the model finally used for recognizing the sign language.
And 5: sign language data are collected in real time, preprocessed and input into a final model to obtain sign language classification results.

Claims (2)

1. A sign language identification method based on CNN-BiGRU neural network fusion is characterized by comprising the following steps:
step 1: obtaining by thumb position coordinates F with a depth camera device 1 Index finger position coordinate F 2 Middle finger position coordinate F 3 Position coordinates F of ring finger 4 Position coordinates of little finger F 5 Palm center position P C Palm stable position P S Palm Pitch angle Pitch, palm Yaw angle Yaw, palm Roll angle Roll, hand grip radius r, palm width P W Various sign language data composed of palm center velocity v; each sign language data is provided with a corresponding category label; forming a sign language data set by a plurality of sign language data and category labels thereof;
and 2, step: carrying out data preprocessing on the sign language data set;
step 2.1, calculating a relative radius S of a hand-held ball, a fingertip palm center distance D, a palm center relative speed V, an absolute three-dimensional palm position standard deviation P and an inter-finger distance L according to the sign language data set to be used as a feature data set;
obtaining the relative radius size S of the handheld ball by using the formula (1):
Figure FDA0003763960210000011
obtaining the palm center relative velocity V in the x direction by using the formula (2), the formula (3) and the formula (4) respectively x Y-direction palm center relative velocity V y And the palm center relative velocity V in the z direction z
Figure FDA0003763960210000012
Figure FDA0003763960210000013
Figure FDA0003763960210000014
In the formula (2) to the formula (4), v k Represents the palm velocity in the k direction;
step 2.1.4, obtaining a standard deviation P of an absolute three-dimensional palm position by using a formula (5):
P=P C -P S (5)
step 2.2, standardizing the characteristic data set by adopting a zero-mean value normalization method to obtain a preprocessed characteristic data set;
step 2.3, performing data enhancement on the preprocessed characteristic data set to obtain an enhanced characteristic data set;
and step 3: dividing the enhanced feature data set into a training data set, a verification data set and a test data set;
and 4, step 4: establishing a CNN-BiGRU deep neural network model fused by one-dimensional CNN and BiGRU; the CNN-BiGRU deep neural network model comprises: a SpatialDropout layer, a one-dimensional CNN network, a BiGRU network and a full connection layer; the one-dimensional CNN network consists of a one-dimensional convolutional layer and a one-dimensional maximum pooling layer; the BiGRU network structure is formed by combining a forward propagation GRU unit and a backward propagation GRU unit;
step 4.1, setting a hyper-parameter, and initializing the parameter of the CNN-BiGRU deep neural network model so as to obtain a current network model;
step 4.2, inputting the training data set into a spatialDropout layer in the current network model, and obtaining primary sign language features through convolution and maximum pooling operation of a one-dimensional CNN network; after the initial sign language features pass through a BiGRU network, obtaining time sequence information of the sign language features and outputting each sign language category probability through a full connection layer;
4.3, reversely propagating the sign language category probability to the current network model by using an optimization algorithm, thereby updating parameters of each layer of network model and obtaining an updated CNN-BiGRU deep neural network model;
step 4.4, verifying the accuracy of the updated CNN-BiGRU deep neural network model by using the verification data set so as to judge whether the updated CNN-BiGRU deep neural network model is converged, if so, taking the updated CNN-BiGRU deep neural network model as an optimal sign language classification model under the current super-parameter setting, otherwise, taking the updated CNN-BiGRU deep neural network model as a current network model, and returning to the step 4.2;
and 4.5, obtaining the optimal sign language classification models under different hyper-parameters according to the process from the step 4.1 to the step 4.4, and comparing the accuracy of the optimal sign language classification models on the test data set, so that the optimal sign language classification model with the highest accuracy is selected as the model finally used for recognizing the sign language.
2. The method for sign language recognition based on CNN-BiGRU neural network fusion as claimed in claim 1, wherein the step 2.1 comprises:
step 2.1.1, obtaining the fingertip and palm center distance D by using the formula (6):
Figure FDA0003763960210000021
in the formula (6), F ix Coordinate value, F, representing the x direction in the ith finger position coordinate iy Coordinate value representing the y direction in the ith finger position coordinate, F iz Coordinate value, P, representing the z direction in the ith finger position coordinate Cx Coordinate value, P, representing the x-direction in coordinates of the palm center position Cy Coordinate value, P, representing the y-direction in coordinates of the palm center position Cz A coordinate value in the z direction in the palm center position coordinate is represented, and i is 1, 2, 3, 4 and 5 respectively represent a thumb, an index finger, a middle finger, a ring finger and a little finger;
step 2.1.2, obtaining the fingertip distance L by using the formula (7):
Figure FDA0003763960210000022
in the formula (7), F jx Coordinate value in x direction in j-th finger position coordinate, F jy Coordinate value, F, representing the y direction in the j-th finger position coordinate jz And j represents coordinate values in the z direction in the j-th finger position coordinate, wherein j represents a thumb, an index finger, a middle finger, a ring finger and a little finger respectively, and i is not equal to j.
CN202110304616.8A 2021-03-23 2021-03-23 Sign language identification method based on CNN-BiGRU neural network fusion Active CN112883922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110304616.8A CN112883922B (en) 2021-03-23 2021-03-23 Sign language identification method based on CNN-BiGRU neural network fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110304616.8A CN112883922B (en) 2021-03-23 2021-03-23 Sign language identification method based on CNN-BiGRU neural network fusion

Publications (2)

Publication Number Publication Date
CN112883922A CN112883922A (en) 2021-06-01
CN112883922B true CN112883922B (en) 2022-08-30

Family

ID=76041665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110304616.8A Active CN112883922B (en) 2021-03-23 2021-03-23 Sign language identification method based on CNN-BiGRU neural network fusion

Country Status (1)

Country Link
CN (1) CN112883922B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542241B (en) * 2021-06-30 2023-05-09 杭州电子科技大学 Intrusion detection method and device based on CNN-BiGRU hybrid model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9779296B1 (en) * 2016-04-01 2017-10-03 Kofax, Inc. Content-based detection and three dimensional geometric reconstruction of objects in image and video data
CN107563286A (en) * 2017-07-28 2018-01-09 南京邮电大学 A kind of dynamic gesture identification method based on Kinect depth information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9672418B2 (en) * 2015-02-06 2017-06-06 King Fahd University Of Petroleum And Minerals Arabic sign language recognition using multi-sensor data fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9779296B1 (en) * 2016-04-01 2017-10-03 Kofax, Inc. Content-based detection and three dimensional geometric reconstruction of objects in image and video data
CN107563286A (en) * 2017-07-28 2018-01-09 南京邮电大学 A kind of dynamic gesture identification method based on Kinect depth information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"American sign language recognition and training method with recurrent neural network";C.K.M. Lee等;《Expert Systems with Applications》;20201203;第167卷;全文 *
"融合关节旋转特征和指尖距离特征的手势识别";缪永伟等;《计算机学报》;20200131;第43卷(第01期);全文 *

Also Published As

Publication number Publication date
CN112883922A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
Yu et al. Exploration of Chinese sign language recognition using wearable sensors based on deep belief net
CN107908288A (en) A kind of quick human motion recognition method towards human-computer interaction
CN111476161A (en) Somatosensory dynamic gesture recognition method fusing image and physiological signal dual channels
Wu et al. A Visual-Based Gesture Prediction Framework Applied in Social Robots.
Alrubayi et al. A pattern recognition model for static gestures in malaysian sign language based on machine learning techniques
CN112148128B (en) Real-time gesture recognition method and device and man-machine interaction system
CN111444488A (en) Identity authentication method based on dynamic gesture
CN106502390B (en) A kind of visual human's interactive system and method based on dynamic 3D Handwritten Digit Recognition
Bu Human motion gesture recognition algorithm in video based on convolutional neural features of training images
CN109325408A (en) A kind of gesture judging method and storage medium
CN108960171B (en) Method for converting gesture recognition into identity recognition based on feature transfer learning
CN109765996A (en) Insensitive gesture detection system and method are deviated to wearing position based on FMG armband
Wang et al. Research on gesture image recognition method based on transfer learning
CN112883922B (en) Sign language identification method based on CNN-BiGRU neural network fusion
CN114255508A (en) OpenPose-based student posture detection analysis and efficiency evaluation method
Ma et al. Difference-guided representation learning network for multivariate time-series classification
CN111382699A (en) Dynamic gesture recognition method based on particle swarm optimization LSTM algorithm
Chen et al. Unsupervised sim-to-real adaptation for environmental recognition in assistive walking
Zhou et al. Intelligent recognition of medical motion image combining convolutional neural network with Internet of Things
CN109993116B (en) Pedestrian re-identification method based on mutual learning of human bones
Alhersh et al. Learning human activity from visual data using deep learning
Lu et al. Pose-guided model for driving behavior recognition using keypoint action learning
Savio et al. Image processing for face recognition using HAAR, HOG, and SVM algorithms
Deng et al. Attention based visual analysis for fast grasp planning with a multi-fingered robotic hand
Bilang et al. Cactaceae detection using MobileNet architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant