CN105956529A

CN105956529A - Chinese sign language identification method based on LSTM type RNN

Info

Publication number: CN105956529A
Application number: CN201610260747.XA
Authority: CN
Inventors: 程树英; 林鹏程; 吴丽君
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2016-04-25
Filing date: 2016-04-25
Publication date: 2016-09-21

Abstract

The invention relates to a Chinese sign language identification method based on LSTM type RNN. According to the features of the Chinese sign language, a plurality of groups of sign language features are acquired to form training data. The feature extraction of the training data is carried out, and are marked according to the linguistic meaning corresponding to the feature vectors. The training data is used as the input of the LSTM type RNN for model training, and then an optimal network model parameter is acquired, and is used as a final identification model. The trained model is used for identifying to-be-identified signal languages, and a character sequence having a maximum probability of an output layer is calculated, and is used as a decoding result, and is converted into a corresponding acoustic sequence, and the result is the identified sign language feature. The Chinese sign language identification method can be linked to a remote state, and the decline of the capability of the later state perceiving the former state is prevented, and the accuracy of identifying the Chinese continuous sign languages is improved.

Description

A kind of Chinese Sign Language recognition methods based on LSTM type RNN

Technical field

The present invention relates to Chinese Sign Language identification field, a kind of based on LSTM type RNN Chinese Sign Language recognition methods.

Background technology

Sign Language Recognition be one sign language information can be changed into voice, word carrying out read aloud or The technology of display.In Sign Language Recognition field, owing to continuous sign language recognition is the key of Sign Language Recognition Problem, therefore, the effect how improving Sign Language Recognition challenge is how that improving continuous sign language knows Other accuracy.

In prior art, the method for continuous sign language recognition mainly has following several:

The first, continuous sign language recognition generally use HMM (Hidden Markov Model, hidden Markov), this method introduces the previous state impact on current state in a model, The identification of sign language is realized by calculating output probability maximization；

The second, continuous sign language recognition may be used without CRF (Conditional RandomField, Condition random field), this method introduces contextual information in a model, needs to enter training characteristics Extend about row, and introduce manual features template and be trained.Traditional method is instructed the most respectively Get sign language model, then use the mode predicted step by step that sign language to be identified is identified.

But, above two method is primarily present problems with:

Although 1 use about extension mode can the association of state before and after to a certain degree introducing, But in order to reduce scale of model and complexity, extension size is extremely limited, therefore before and after link Distance must not be too far away, cause the current time decline to front position perception；

2, using the mode predicted step by step, the transmission of mistake can be caused if making a mistake, impact Last effect.

Summary of the invention

In view of this, the purpose of the present invention is to propose to a kind of middle national champion based on LSTM type RNN Language recognition methods, overcomes the decline to front position perception of the current time node.

The present invention uses below scheme to realize: a kind of Chinese Sign Language based on LSTM type RNN is known Other method, comprises the following steps:

Step S1: gather many group sign language features；

Step S2: be labeled according to the language meaning corresponding to the sign language feature collected, shape Becoming training data, wherein, described training data is for the training of neutral net；

Step S3: described training data carries out the instruction of model as the input of LSTM type RNN Practice, obtain optimum network model parameter, as finally identifying model；

Step S4: sign language to be identified is carried out collection apparatus, and as LSTM type RNN The input of model, calculates the character string of output layer maximum probability, and as the knot of decoding Really, described result is the sign language feature of identification.

Further, described step S1 particularly as follows: use data glove obtain sign language feature, Described data glove include flexibility sensor, nine axle sensors and for data process, Storage, the microprocessor sent.

Further, described step S2 is particularly as follows: by the sign language feature that collects by feature institute Language meaning to be expressed is classified, and the feature of every kind of language meaning is randomly selected a fixed number The feature group of amount, and described a number of characteristic component is not carried out the mark of language meaning, Tissue uses the form of matrix, forms training data.

Further, described step S3 is particularly as follows: according to the corresponding LSTM of sign language feature construction The model of type RNN, the most explicitly models, by the training data in step S2 Sign language feature, mark as input LSTM type RNN set up is trained, to obtain Take the weight parameter that different sign language feature is corresponding.

Further, described LSTM type RNN includes input layer, output layer and hidden layer；Institute State the input of input layer as sign language characteristic value sequence O₁O₂...O_T, the output of output layer is input Corresponding acoustics sequence S₁S₂...S_L, hidden layer includes multiple LSTM unit；Wherein, T is Time step number, L is acoustics sequence length.

Further, described LSTM unit includes that 3 control door, and described 3 control door For controlling the association inputting, export and cross between the internal state three of time step self.

Further, described step S4 is particularly as follows: use LSTM type RNN that step S3 generates Described sign language to be identified is identified, first to described sign language to be identified by final identification model Feature carry out the most abstract, extract characteristic vector, and according to described LSTM type RNN mould Sign language to be identified is predicted by type, carries out acoustical predictions further, to generate parameters,acoustic sequence Row, and generate phonetic synthesis result according to described parameters,acoustic.

Further, the flowing of the employing of LSTM type RNN described in step S4 following formula control information:

I_t=σ (W_ixI_t+W_imm_t-1+W_icC_t-1+b_i)；

F_t=σ (W_FxI_t+W_Fmm_t-1+W_FcC_t-1+b_F)；

c_t=F_t⊙c_t-1+I_t⊙g(W_cxI_t+W_cmm_t-1+b_c)；

O_t=σ (W_OxI_t+W_Omm_t-1+W_OcC_t-1+b_O)；

m_t=O_t⊙h(C_t)；

Wherein, given list entries I=(I₁,I₂...I_T), T is the length of list entries, I_tFor t Input, W is weight matrix, and b is bias matrix, and I, F, c, O, m represent respectively Input Input Gate, Forget Gate, Output Gate, state cell and LSTM The output of structure；

Wherein, σ is three excitation functions controlling door, and formula is:

f (x) = \frac{1}{1 + e^{x}};

Wherein, h is the excitation function of state, and formula is:

f (x) = \tanh = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} .

Compared with prior art, the present invention has following beneficial effect: the application is from sign language to be predicted Middle extraction characteristic vector, LSTM type RNN good by precondition carries out language to characteristic vector Yan Xue predicts, with Generative Linguistics argument sequence, generation module is raw according to linguistics argument sequence Become voice synthetic effect, i.e. by using LSTM type RNN network structure to train, the company of improving The accuracy of continuous Sign Language Recognition, thus improve recognition accuracy.

Accompanying drawing explanation

Fig. 1 is the inventive method schematic flow sheet.

Fig. 2 is embodiment of the present invention LSTM type RNN basic principle schematic.

Detailed description of the invention

Below in conjunction with the accompanying drawings and embodiment the present invention will be further described.

As it is shown in figure 1, present embodiments provide a kind of Chinese Sign Language based on LSTM type RNN Recognition methods, comprises the following steps:

Step S1: gather many group sign language features；

In the present embodiment, described step S1 is particularly as follows: use data glove to obtain sign language spy Levying, described data glove includes flexibility sensor, nine axle sensors and at data The microprocessor manage, store, sent.

In the present embodiment, described step S2 is particularly as follows: by the sign language feature that collects by spy Levy language meaning to be expressed to classify, the feature of every kind of language meaning is randomly selected one The feature group of determined number, and described a number of characteristic component is not carried out the mark of language meaning Note, tissue uses the form of matrix, forms training data.

In the present embodiment, described step S3 is particularly as follows: corresponding according to sign language feature construction The model of LSTM type RNN, the most explicitly model, by the instruction in step S2 Practice the sign language feature of data, LSTM type RNN set up is trained by mark as input, The weight parameter corresponding to obtain different sign language feature.

In the present embodiment, described LSTM type RNN includes input layer, output layer and hidden layer； The input of described input layer is as sign language characteristic value sequence O₁O₂...O_T, the output of output layer is defeated Enter corresponding acoustics sequence S₁S₂...S_L, hidden layer includes multiple LSTM unit；Wherein, T is Time step number, L is acoustics sequence length.

In the present embodiment, described LSTM unit includes that 3 control door, described 3 controls Door processed is used between the internal state three controlling to input, export and cross over time step self Association.

In the present embodiment, described step S4 is particularly as follows: use the LSTM that step S3 generates Type RNN finally identifies that described sign language to be identified is identified by model, first knows described treating The feature of other sign language carries out the most abstract, extracts characteristic vector, and according to described LSTM type Sign language to be identified is predicted by RNN model, carries out acoustical predictions further, to generate acoustics Argument sequence, and generate phonetic synthesis result according to described parameters,acoustic.

As in figure 2 it is shown, the basic thought of LSTM type RNN is by Input Gate, Output These different types of structures of Gate and Forget Gate control the flowing of information.? In the present embodiment, the flowing of the employing following formula control information of LSTM type RNN described in step S4:

I_t=σ (W_ixI_t+W_imm_t-1+W_icC_t-1+b_i)；

F_t=σ (W_FxI_t+W_Fmm_t-1+W_FcC_t-1+b_F)；

c_t=F_t⊙c_t-1+I_t⊙g(W_cxI_t+W_cmm_t-1+b_c)；

O_t=σ (W_OxI_t+W_Omm_t-1+W_OcC_t-1+b_O)；

m_t=O_t⊙h(C_t)；

Wherein, given list entries I=(I₁,I₂...I_T), T is the length of list entries, I_tDuring for t The input carved, W is weight matrix, and b is bias matrix, I, F, c, O, m generation respectively Table input Input Gate, Forget Gate, Output Gate, state cell and LSTM The output of structure；

Wherein, σ is three excitation functions controlling door, and formula is:

f (x) = \frac{1}{1 + e^{x}};

Wherein, h is the excitation function of state, and formula is:

f (x) = \tanh = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} .

Can be seen that LSTM type RNN has the state of caching history by structure and computing formula The effect of information, and by door, historical information is safeguarded, thus extend big model Place the context information impact on current information, improve the accuracy rate of continuous sign language recognition.

The foregoing is only presently preferred embodiments of the present invention, all according to scope of the present invention patent institute Impartial change and the modification done, all should belong to the covering scope of the present invention.

Claims

1. a Chinese Sign Language recognition methods based on LSTM type RNN, it is characterised in that: include with Lower step:

Step S1: gather many group sign language features；

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 1, It is characterized in that: described step S1 is particularly as follows: use data glove to obtain sign language feature, institute The data glove stated includes flexibility sensor, nine axle sensors and processes for data, deposit Storage, the microprocessor sent.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 1, It is characterized in that: described step S2 is particularly as follows: wanted the sign language feature collected by feature The language meaning expressed is classified, and the feature of every kind of language meaning is randomly selected some Feature group, and described a number of characteristic component is not carried out the mark of language meaning, group Knit the form of employing matrix, form training data.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 1, It is characterized in that: described step S3 is particularly as follows: according to the corresponding LSTM of sign language feature construction The model of type RNN, the most explicitly models, by the training data in step S2 Sign language feature, mark as input LSTM type RNN set up is trained, to obtain Take the weight parameter that different sign language feature is corresponding.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 4, It is characterized in that: described LSTM type RNN includes input layer, output layer and hidden layer；Described The input of input layer is as sign language characteristic value sequence O₁O₂...O_T, the output of output layer is input institute Corresponding acoustics sequence S₁S₂...S_L, hidden layer includes multiple LSTM unit；Wherein, when T is Between step number, L is acoustics sequence length.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 5, It is characterized in that: described LSTM unit includes that 3 control door, and described 3 control door use Input, export and cross over the association between the internal state three of time step self in control.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 1, It is characterized in that: described step S4 is particularly as follows: use LSTM type RNN that step S3 generates Described sign language to be identified is identified, first to described sign language to be identified by final identification model Feature carry out the most abstract, extract characteristic vector, and according to described LSTM type RNN mould Sign language to be identified is predicted by type, carries out acoustical predictions further, to generate parameters,acoustic sequence Row, and generate phonetic synthesis result according to described parameters,acoustic.

A kind of Chinese Sign Language recognition methods based on LSTM type RNN the most according to claim 1, It is characterized in that: the flowing of the employing following formula control information of LSTM type RNN described in step S4:

I_t=σ (W_ixI_t+W_imm_t-1+W_icC_t-1+b_i)；

F_t=σ (W_FxI_t+W_Fmm_t-1+W_FcC_t-1+b_F)；

c_t=F_t⊙c_t-1+I_t⊙g(W_cxI_t+W_cmm_t-1+b_c)；

O_t=σ (W_OxI_t+W_Omm_t-1+W_OcC_t-1+b_O)；

m_t=O_t⊙h(C_t)；

Wherein, σ is three excitation functions controlling door, and formula is:

f (x) = \frac{1}{1 + e^{x}};

Wherein, h is the excitation function of state, and formula is:

f (x) = \tanh = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}