CN111428871A

CN111428871A - Sign language translation method based on BP neural network

Info

Publication number: CN111428871A
Application number: CN202010243856.7A
Authority: CN
Inventors: 谢张宁; 朱惠臣; 孙晓光; 吴俊杰; 李智玮; 傅云霞; 雷李华; 孔明; 管钰晴; 刘娜; 王道档
Original assignee: China Jiliang University; Shanghai Institute of Measurement and Testing Technology
Current assignee: China Jiliang University; Shanghai Institute of Measurement and Testing Technology
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-17
Anticipated expiration: 2040-03-31
Also published as: CN111428871B

Abstract

The invention relates to a sign language translation method based on a BP neural network, which is characterized by comprising the following steps: the method comprises the following steps: 1. collecting gesture voltage signals by using the raspberry pi 3B through a wearable data glove; 2. compiling sign language words and common sign language sentences corresponding to each group of gesture voltage signals into a sign language sentence library by using a signal screening program; 3. writing a neural network classification program comprising a BP neural network structure frame model, a data transmission module and a storage module, wherein the BP neural network structure frame model adopts a three-layer neural network comprising an input layer, an output layer and a hidden layer; 4. converting the gesture voltage signals received each time into sign language words through a BP neural network framework model; 5. and 4, converting the sign language words obtained in the step 4 within a period of time into sign language word groups, matching the sign language word groups with the sign language word library, associating and filling the sign language word groups with the sign language word library to form sentence output results. The invention realizes automatic real-time translation and recognition of sign language by combining a neural network and a sensing technology.

Description

Sign language translation method based on BP neural network

Technical Field

The invention relates to a sign language translation method, in particular to a sign language translation method based on a BP neural network, which combines the neural network and a sensing technology to realize automatic translation and identification of sign language.

Background

In the process of communication between deaf-mutes and normal persons in the society at present, as the normal persons cannot understand sign language, a gap exists between the deaf-mutes and the normal persons, the communication circle of the deaf-mutes is limited, and great limitation is brought to the living and developing space of the deaf-mutes. The deaf-mute auxiliary equipment in the market has two types, one type is an electronic throat from the last 50 years, the electronic throat is arranged at the throat joint, senses the vibration of a vocal cord and expands the vibration to help sound, but the material for sound production is expensive, and the disabled without social work guarantee cannot afford the electronic throat. The other is sign language translating equipment based on computer vision, which appears in recent years, the price of the equipment is not high, but the limb motion recognition technology is still in a starting stage, and the requirement of image processing on the acquisition environment is strict.

The neural network is an operational model, and is formed by connecting a large number of nodes (or called neurons) with each other. Each node represents a particular output function, called an activation function. Every connection between two nodes represents a weighted value, called weight, for the signal passing through the connection, in such a way that the neural network simulates the human memory. The output of the network depends on the structure of the network, the way the network is connected, the weights and the activation functions. And the weights of the networks of each layer are the models to be stored. Neural networks are widely applied in machine learning, such as the fields of function approximation, pattern recognition, classification, data compression, data mining and the like. Therefore, it is a better method to use neural network to construct a nonlinear data classification model.

Disclosure of Invention

The invention aims to solve the defects of the prior art, designs a sign language translation method based on a BP (back propagation) neural network, collects a gesture voltage signal by using a sensing technology as the input of the BP neural network, converts the input voltage signal into corresponding semantic output required normally by using calculation and weighting among neurons, provides feasibility for manufacturing a sign language translation system based on gesture voltage and based on the BP (back propagation) neural network, facilitates the communication between disabled persons and normal persons, and shortens the distance between the disabled persons and the normal persons, so that deaf-dumb persons can be better integrated into the normal society.

The invention is realized by the following steps: a sign language translation method based on a BP neural network is characterized by comprising the following steps:

step 1: gesture voltage signals are collected by the raspberry pi 3B single board computer through the flexible sensor and the acceleration sensor which are arranged on the wearable data gloves, and the gesture voltage signals are transmitted to the storage device for storage through the integrated Bluetooth module after being filtered and amplified.

The wearable data glove in the step 1 is provided with flexible sensors which are strain gauges fixed on 10 finger positions, gesture voltage signals are represented by the bending degree of the strain gauges following the fingers and the mutual positions of two triaxial acceleration sensors fixed on the back of the left hand and the right hand respectively, and the acquisition of the gesture voltage signals is the acquisition of 10 finger bending signals and 6 gesture direction signals, namely 16 signals.

Step 2: sign language words and common sign language sentences corresponding to each group of signals are compiled into a sign language library by a signal screening program to prepare the sign language sentence library, and the gesture voltage signals and the corresponding sign language words collected for many times are divided into a training set and a testing set according to the ratio of 7: 3.

The sign language sentence library in the step 2 comprises a sign language sentence library and a sign language sub-library, wherein the sign language sentence library is manufactured by firstly recording the received current 16 gesture voltage signals in Excel, normalizing and storing the signals in the Access database.

And step 3: and (2) writing a program for establishing a BP neural network structure frame model, wherein the program mainly comprises three modules, namely a neural network structure frame model, a data transmission module and a storage module, training the BP neural network structure frame model through the training set in the step 2, introducing the trained BP neural network structure frame model into a test set for testing, storing the BP neural network structure frame model in the storage module after a test result accords with the preset result, and the BP neural network structure frame model adopts a three-layer neural network of an input layer, an output layer and a hidden layer.

The three-layer neural network of the BP neural network structure frame model in the step 3 is 16 neurons of an input layer, 64 neurons of a middle layer and 18 neurons of an output layer, the transmission mode is divided into two sections, one section has 8 voltage signals and 16 voltage signals in total, the output serial numbers of the 18 neurons of the output layer are 0 to 17, the output serial numbers sequentially correspond to 18 common phrases, and the common phrases are randomly combined into 53 common phrases.

And 4, step 4: converting the gesture voltage signals acquired each time into sign language words through a BP neural network framework model, and the method comprises the following steps:

step 4.1: receiving gesture voltage signals of the wearable data gloves, and screening complete signals by using a signal screening program;

step 4.2: and converting the gesture voltage signal into words through the trained BP neural network structure framework model.

And 5: converting sign language words obtained by converting the gesture voltage signals in the step 4 into sign language word groups within a period of time, matching the sign language word groups with a sign language word library, associating and filling the sign language word groups with the sign language word library to form sentence output results, and the method comprises the following steps of:

step 5.1: dividing the sentences in the sign language sentence library or collected vocabulary groups into words and counting, defining the vocabulary with high frequency and symbolic significance in the sentences as element 1, and defining the rest vocabularies as element 0;

step 5.2: all the frequently used sign language sentences in the sign language sentence library are in a word frequency vector format according to the specification of the step 5.1, and corresponding word frequency vectors are generated;

step 5.3: converting sign language words obtained by converting the gesture voltage signals collected in the step 4 in a period of time into sign language word groups, and converting the obtained sign language word groups into corresponding word frequency vectors in a word frequency vector format according to the specification of the step 5.1;

step 5.4: calculating cosine similarity between the word frequency vector converted in the step 5.3 and the word frequency vector in the sign language sentence library in the step 5.2, and selecting the sign language word in the sign language sentence library with the largest cosine similarity as an output word;

step 5.5: and matching the output words obtained in the step 5.4 with corresponding written language sentences according to the indexes of all the commonly used sign language sentences in the sign language sentence library, and taking the matched written language sentences as final output results.

The invention has the beneficial effects that: according to statistics, more than 2000 tens of thousands of people with physiological and language disabilities can be listened to in the whole country, the people have disabilities in communication with normal people, and the difficulty in communication among deaf-mute people is an important reason why the deaf-mute people cannot work normally. The method of the invention is externally connected with a sound and/or video playing device through the raspberry pi 3B single board computer, can convert the gesture into a normal statement audio or video form for output, has high translation accuracy and quick response, can provide great convenience for the disabled to communicate with the normal people, and meets the urgent requirements of the deaf-mute disabled to communicate with the normal people. The hardware equipment adopted by the method is low in cost, can quickly and effectively solve the problem of communication barrier encountered by the deaf-mute, opens a larger space for the employment of the deaf-mute, helps the deaf-mute barrier to better integrate into the society, and improves the living level of the deaf-mute.

Drawings

FIG. 1 is a schematic block diagram of the flow of the working steps of the method of the present invention.

FIG. 2 is a simplified schematic diagram of a single wearable data glove configuration for acquiring gesture voltage signals according to the method of the present invention.

FIG. 3 is a schematic diagram of the working principle and the working flow of the signal screening program of the method of the present invention.

FIG. 4 is a schematic diagram of a BP neural network structural framework model of the method of the present invention.

Fig. 5 is a schematic diagram of the working process flow of matching, associating, filling and sentence outputting of words generated after the conversion by the BP neural network in the method of the present invention.

Detailed Description

According to the attached figure 1, the invention relates to a sign language translation method based on a BP neural network, which comprises the following steps:

Step 2: sign language words and common sign language sentences corresponding to each group of signals are coded into a sign language library by a signal screening program to prepare a sign language sentence library, and gesture voltage signals and corresponding sign language words collected for many times are divided into a training set and a testing set according to the ratio of 7: 3;

step 2.1: writing hand database recording software based on C # language, wherein the software can record a hand gesture voltage signal received each time and semantics represented by the signal in Excel;

step 2.2: checking whether the received signals are complete or not by using a signal screening program, recording the signals in an Excel table if the signals are complete, and rejecting the signals if the signals are not complete;

step 2.3: dividing the gesture voltage signals collected for multiple times and corresponding sign language words into a training set and a data set according to the proportion of 7: 3;

step 2.4: and normalizing the sign language words and the common sign language sentences corresponding to the gesture voltage signals recorded in the Excel table, and then importing the sign language words and the common sign language sentences into an Access database to manufacture a sign language sentence library.

And 4, step 4: converting the gesture voltage signals acquired each time into sign language words through a BP neural network framework model;

step 4.2: and converting the gesture voltage signals into sign language words by using the trained BP neural network structural framework model.

And 5: converting sign language words obtained by the gesture voltage signal conversion in the step 4 into sign language word groups within a period of time, matching the sign language word groups with a sign language word library, and filling the matched sign language word groups into sentence output results;

step 5.1: dividing the sentences in the sign language sentence library or the collected word groups into words and counting the words, wherein the words with high frequency and symbolic significance in the sentences are defined as element 1, and the other words are element 0;

The invention is described in further detail below with reference to the figures and specific examples.

The specific working steps of the sign language translation method based on the BP neural network are as follows:

step 1: the flexible sensor, the acceleration sensor and the constant resistance on the raspberry group 3B single board computer and the wearable data glove are connected in series, the flexible sensor and the acceleration sensor can change voltage according to the mutual motion state of a finger bending hand, and collected gesture voltage signals are filtered and amplified and then transmitted to the storage device for storage through the Bluetooth module of the raspberry group 3B single board computer.

According to fig. 2, the flexible sensors disposed on the wearable data glove in step 1 are strain gauges fixed at the positions of 10 fingers, and the acceleration sensors are two triaxial acceleration sensors fixed at the positions of the back of the left hand and the back of the right hand. Each strain gauge outputs a voltage value, the three-axis acceleration sensor outputs three voltage values of x, y and z, and each output comprises a starting signalNumiAnd an end signalNumoA total of 18 signals were in turn:Numi，X ₁，X ₂，……X ₁₆，Numo. WhereinX ₁，X ₂，……X ₁₆Representing the voltage value of the gesture.

Step 2: and (4) compiling sign language words and common sign language sentences corresponding to each group of collected gesture voltage signals into a sign language library to manufacture the sign language sentence library. The sign language sentence library is divided into a training set and a test set in a ratio of 7: 3.

Step 2.1: the sign language library recording software is written based on C # language, and can record gesture voltage signals received each time and semantics represented by the signals in an Excel table.

Step 2.2: fig. 3 is a diagram illustrating the working principle and working steps of a signal screening program, which is used to check whether the signals received each time are complete, and if so, the signals are recorded in an Excel table, otherwise, the signals are rejected. The signal screening program receives the gesture voltage signal and meets the starting signalNumiWhen a stop signal is encountered, counting is startedNumoThe counting stops. When the counting K value is 16, the transmission data is completed and recorded in the Excel table, and if the counting K value is not 16, the counting is restarted.

Step 2.3: the gesture voltage signals and corresponding sign language words collected for multiple times are divided into a training set and a data set in a 7:3 ratio.

Step 2.4: and (4) normalizing the gesture voltage signals, the corresponding sign language words and the common sign language sentences in the Excel table, and importing the normalized sign language words and the common sign language sentences into an Access database to prepare a database.

Each time a gesture is made, the wearable data glove with the flexible sensors transmits 16 sensor values to build a model. It is a method to build an index library by associating different sets of voltage values with gestures. However, Chinese grammar has many gestures, and it takes much time to establish a relevant grammar library. And the voltage value groups formed by different people making gestures are not completely the same, and as the number of users increases, the voltage group corresponding to each gesture becomes larger and larger, which results in the increase of the index duration. Moreover, due to the limitation of the sensitivity of the device, different gesture voltage value sets are not very different, which brings many limitations to accurate recognition. In order to achieve quick and accurate identification, the invention adopts a BP neural network in machine learning to build a classification model of a voltage group.

The neural network classification program for gesture recognition is composed of a BP neural network algorithm part, a model prediction part and a data transmission part 3.

The BP neural network algorithm part comprises network forward propagation, backward propagation, model training and evaluation and model storage. The gesture recognition neural network is written by Python language, and the neural network constructed in the Python language has the advantages of conveniently modifying the number of neural units, the number of layers and an activation function and quickly calculating a large amount of data. The programming environment is PyCharm, and the used program libraries are Numpy, Pandas and SciPy. Numpy is used for algorithm writing of the BP neural network, Pandas is used for data import, and Scipy is used for output storage.

The framework of the BP neural network algorithm part is a BP neural network structural framework model formed by three neural networks of an input layer, an output layer and a hidden layer. The input layer has 16 input cells and the hidden layer has 64 cells. The output layer is 18 commonly used phrases, each output serial number is 0 to 17, 18 commonly used phrases are corresponded in sequence, and 50 commonly used phrases can be formed by randomly combining the commonly used phrases. The correspondence of the commonly used phrases and numbers are given in the following table.

Table 1 part common word group number correspondence table

Because there are 16 sensor values input, 16 input neuron units are set, and each group of data isX ₁，X ₂，……X ₁₆And setting the normalized data as an input layer. The normalized input layer is set asx ₁，x ₂，……x ₁₆The weight parameter on the connecting line between the jth neuron unit of the first layer and the ith neuron unit of the second layer is

Since there are many sign language library samples, 64 neuron units are provided in the hidden layer to avoid overfitting, so thatW ⁽¹⁾∈R ^（64,16）The bias term of the ith cell of the second layer is

The node activation function is a Sigmoid function:

the output of each neuron node of the second layer is then:

…

let the ith unit of the second layer input weighted sum be

The ith unit of the second layer inputs a weighted sum of

Then, then

，

Is provided with

The weight parameter on the connecting line between the jth neuron unit of the second layer and the ith neuron unit of the third layer is,

is the bias term of the ith unit of the third layer, and the output layer arranged by us has 18 outputs, so the bias term is easy to obtainW ⁽²⁾∈R ^（18,64）The third layer outputsa ^（3）The node activation function is Sigmoid function, and the BP neural network output is

Then, the same can be obtained:

each forward propagation to obtain a third layer of outputa ^（3）The result needs to be corrected later, and since the activation function is Sigmoid function, the activation function is derived and will be useda ^（3）AndY _isubstituting the difference into the error term of the output layer (i.e. the third layer)

And the error term of each neural unit of the hidden layer (i.e., the second layer)

：

Finally, the weight of each connection point is updatedƞTo learn the rate constant, the connection weights of the second and third layers

And connection weights of the first layer and the second layer

The updating is as follows:

the structural framework model of the BP neural network of the invention is shown in figure 4.

After one forward propagation and one backward propagation are carried out to update the weight, one training of the neural network is completed, and of course, one training is not enough and hundreds of training needs to be carried out, and the maximum training frequency of the neural network is 30000 times. The training sample of the invention is imported from a read _ Excel function in a Pandas library into a sign language library stored as an Excel file. The phrase library is an Excel file, columns from 1 st column to 16 th column are voltage values, and column 17 is a corresponding gesture phrase.

And performing model evaluation after training, wherein the model evaluation adopts mean square error evaluation. As the number of times of use and the number of people using the glove module are increased in the use process, the samples of the hand language library are more and more. There is a subtle difference in the voltage values of the same gesture of different people, which allows the neural network to recognize that overfitting may result, and to prevent overfitting, the samples are divided into a training set and a testing set. The training set is used for training, the testing set is used for testing the prediction accuracy, and the accuracy of the method can reach 80% by adopting BP neural network prediction.

After model training, the model needs to be stored, and the model after neural network training is the weight of each connection itemWAnd bias term for each neural nodebThe invention isW ^（1），b ^（2），W ^（2）Andb ^（3）. The savemat function in the Scipy library in Python can store the trained parameter matrix as a data file in a Mat format, and the loadmat function can import data in the Mat file into a program for prediction calling.

The model prediction part and the neural network algorithm part are different in training evaluation, and the obtained data is a gesture voltage value transmitted back from the mobile phone terminal in real time. After receiving the voltage value transmitted back by the mobile phone end in real time, firstly carrying out normalization processing on the reorganized voltage value, and calling a model (namely weight) with the maximum accuracy rate for training evaluation completionWAnd bias termb) Performing a forward propagation to propagate the completed result

The index value of the maximum value is the obtained gesture number, and the corresponding gesture is the predicted result.

And 4, step 4: and converting the gesture voltage signals acquired each time into sign language words through a BP neural network framework model.

Step 4.1: and receiving gesture voltage signals of the wearable data gloves, and screening complete signals by using a signal screening program.

Step 4.2: and converting the gesture voltage signals into sign language words by using a trained BP neural network framework model.

The specific operation is as follows: and acquiring a current 2-second gesture voltage signal, and converting the acquired gesture voltage signal into a sign language word through a BP neural network framework model.

And 5: converting the sign language words obtained by converting the gesture voltage signals in the step 4 into sign language word groups in a period of time, and matching and filling the sign language word groups into sentences. The specific working process flow from sign language words generated after conversion by the BP neural network framework model to matching association filling sentence output is shown in fig. 5.

Step 5.1: carrying out word segmentation and statistics on sentences in the sign language sentence library or collected word groups, defining words with high frequency and symbolic significance in the sentences as element 1, and defining the rest words as element 0, such as: performing word segmentation statistics according to sentences in the sign language sentence library [ i received, she is beautiful, and you are poor ] to obtain words appearing in the sentence library and the frequency of the words appearing [ i: 1, you: 1, she: 1, clothes: 1, poor: 1, beautiful: 1, receiving: 1, comprising: 1, very: 1], specifying the word-frequency vector format as [ I, you, her, clothing, poor, beautiful, received ];

step 5.2: and (3) defining all the frequently-used sign language sentences in the sign language sentence library into a word frequency vector format according to the step (5.1), and generating corresponding word frequency vectors, such as: the sentence "i received" has a corresponding word frequency vector of [1,0,0,0,0,0,1 ];

step 5.3: converting sign language words obtained by converting the gesture voltage signals acquired in the step 4 in a period of time into sign language word groups, converting the obtained sign language word groups into corresponding word frequency vectors according to the word frequency vector format specified in the step 5.1, and performing the following steps: the word frequency vector corresponding to the word group "she is beautiful" is [0,0,1,0,0,1,0 ];

step 5.4: and (4) calculating the cosine similarity between the word frequency vector converted in the step 5.3 and the word frequency vector in the sign language sentence library in the step 5.2, and selecting the sign language word in the sign language sentence library with the largest cosine similarity as an output word.

The output result can be output by a hardware device or a device of external sound and/or video playing equipment of a raspberry pi 3B single board computer, and the translation result can be output in a sentence audio or video form.

Claims

1. A sign language translation method based on a BP neural network is characterized by comprising the following steps:

step (1): the raspberry pi 3B single board computer collects gesture voltage signals through a flexible sensor and an acceleration sensor which are arranged on a wearable data glove, and the gesture voltage signals are transmitted to a storage device for storage through a Bluetooth module integrated with the gesture voltage signals after being filtered and amplified;

step (2): sign language words and common sign language sentences corresponding to each group of signals are coded into a sign language library by a signal screening program to prepare a sign language sentence library, and gesture voltage signals and corresponding sign language words collected for many times are divided into a training set and a testing set according to the ratio of 7: 3;

and (3): writing a program for establishing a BP neural network structure frame model, wherein the program mainly comprises three modules, namely a neural network structure frame model, a data transmission module and a storage module, training the BP neural network structure frame model through the training set in the step (2), introducing the trained BP neural network structure frame model into a test set for testing, storing the BP neural network structure frame model in the storage module after a test result accords with the preset result, and the BP neural network structure frame model adopts a three-layer neural network of an input layer, an output layer and a hidden layer;

and (4): converting the gesture voltage signals acquired each time into sign language words through a BP neural network framework model;

and (5): converting sign language words obtained by the gesture voltage signal conversion in the step (4) into sign language word groups within a period of time, matching the sign language word groups with the sign language word library, and filling the matched sign language word groups into sentence output results in an associated mode.

2. The sign language translation method based on the BP neural network as claimed in claim 1, wherein: the wearable data glove in the step (1) is provided with a flexible sensor which is a strain gauge fixed at 10 finger positions, the strain gauge is used for following the bending degree of fingers and representing gesture voltage signals by utilizing the mutual positions of two triaxial acceleration sensors respectively fixed at the left and right back of the hand positions, and the acquisition of the gesture voltage signals is realized by acquiring 10 finger bending signals and 6 gesture direction signals, namely 16 signals.

3. The sign language translation method based on the BP neural network as claimed in claim 1, wherein: the sign language sentence library in the step (2) comprises a sign language sentence library and a sign language sub-library, wherein the sign language sentence library is manufactured by firstly recording the received current 16 gesture voltage signals in Excel, normalizing and storing the signals in the Access database.

4. The sign language translation method based on the BP neural network as claimed in claim 1, wherein: the three-layer neural network of the BP neural network structure frame model in the step (3) is 16 neurons of an input layer, 64 neurons of a middle layer and 18 neurons of an output layer, the transmission mode is divided into two sections, one section has 8 voltage signals and 16 voltage signals, the output serial numbers of the 18 neurons of the output layer are 0 to 17, the output serial numbers sequentially correspond to 18 common phrases, and the common phrases are randomly combined into 53 common phrases.

5. The method for sign language translation based on the BP neural network as claimed in claim 1, wherein the step (4) of converting the gesture voltage signal into sign language words through the BP neural network framework model comprises the following steps:

step (4.1): receiving gesture voltage signals of the wearable data gloves, and screening complete signals by using a signal screening program;

step (4.2): and converting the gesture voltage signal into words through the trained BP neural network structure framework model.

6. The sign language translation method based on the BP neural network as claimed in claim 1, wherein the step (5) of matching with the sign language sentence library after the conversion of the BP neural network framework model and associating with the filled sentence output result comprises the steps of:

step (5.1): dividing the sentences in the sign language sentence library or collected vocabulary groups into words and counting, defining the vocabulary with high frequency and symbolic significance in the sentences as element 1, and defining the rest vocabularies as element 0;

step (5.2): defining all the frequently-used sign language sentences in the sign language sentence library into a word frequency vector format according to the step (5.1) to generate corresponding word frequency vectors;

step (5.3): converting sign language words obtained by converting the gesture voltage signals collected in the step (4) within a period of time into sign language word groups, and converting the obtained sign language word groups into corresponding word frequency vectors in a word frequency vector format according to the specification of the step (5.1);

step (5.4): calculating the cosine similarity between the word frequency vector converted in the step (5.3) and the word frequency vector in the sign language sentence library in the step (5.2), and selecting the sign language words in the sign language sentence library with the largest cosine similarity as output words;

step (5.5): and (5.4) matching the output words obtained in the step (5.4) with corresponding written language sentences according to the indexes of all the common sign language sentences in the sign language sentence library, and taking the matched written language sentences as final output results.