CN111914724A - Continuous Chinese sign language identification method and system based on sliding window segmentation - Google Patents
Continuous Chinese sign language identification method and system based on sliding window segmentation Download PDFInfo
- Publication number
- CN111914724A CN111914724A CN202010734304.6A CN202010734304A CN111914724A CN 111914724 A CN111914724 A CN 111914724A CN 202010734304 A CN202010734304 A CN 202010734304A CN 111914724 A CN111914724 A CN 111914724A
- Authority
- CN
- China
- Prior art keywords
- sign language
- data
- segmentation
- sliding window
- continuous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 230000006798 recombination Effects 0.000 claims abstract description 19
- 238000005215 recombination Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims description 18
- 238000012216 screening Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 210000000245 forearm Anatomy 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 206010011878 Deafness Diseases 0.000 description 9
- 238000012360 testing method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002567 electromyography Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 208000037466 short stature, oligodontia, dysmorphic facies, and motor delay Diseases 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 208000016354 hearing loss disease Diseases 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a continuous Chinese sign language identification method based on sliding window segmentation, which comprises the following steps: s1: collecting sEMG and IMU data of the arm through an arm ring; s2: preprocessing the data collected in the step S1; s3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data; s4: inputting the obtained new data into an LSTM neural network for training to obtain a sign language word predicted value; s5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result. Also discloses a continuous Chinese sign language recognition system based on sliding window segmentation. The average accuracy of the sign language recognition system provided by the invention reaches 83.8%, and is improved by 18.6% compared with an LSTM model.
Description
Technical Field
The invention relates to the technical field of sign language recognition, in particular to a continuous Chinese sign language recognition method and system based on sliding window segmentation.
Background
Communication is a basic requirement for all human beings, and disabled people with hearing impairment (deaf-mutes) are no exception. Communication between deaf-mutes, and between deaf-mutes and sound persons, is generally performed using sign language. The communication between deaf-mutes can be conveniently carried out by using natural sign language, and the communication between the deaf-mutes and sound persons depends on the grammatical sign language (sign language for short) for communication. However, when a sound person communicates with a deaf-mute, the sound person encounters a great obstacle to being unable to read the sign language. Therefore, Sign Language Recognition (SLR) system plays a very important role in the communication between deaf-mute and sound persons.
The mainstream sign language identification method at present can be divided into the following steps according to different media: computer vision based methods and sensor based methods. Koller et al propose to improve label-to-image alignment in real time with weak supervision and embed Convolutional Neural Networks (CNNs) into Hidden Markov Models (HMMs) for correcting frameset labels, improving 10% accuracy. Cui et al propose a weakly supervised framework with deep neural network that solves the video clip to gloss mapping problem by introducing convolutional neural network for spatio-temporal feature extraction and sequence learning, and that achieves comparable results to the latest technology at that time without additional supervision. The Yankee and the like construct a continuous SLR model based on a fast HMM, and the HMM is embedded into a Dynamic Time Warping based Level construction algorithm (LB-DTW) to improve the accuracy of sentence recognition. In addition, a coarse segmentation method is proposed to provide the maximum number of levels, using both grammatical and symbol length constraints to achieve improved recognition accuracy by reducing insertion, deletion and replacement errors, which has higher recognition performance and lower computational effort than other prior art techniques. Yellow et al propose a video-based sign language recognition method that uses a new dual-stream 3-DCNN to extract global-local spatio-temporal features from a logo video. Experiments were performed on the RWTH-pheonix-Weather dataset, which contained 7000 Weather forecast sentences from 9 gestators and a publicly available dataset obtained by Kinect, with recognition accuracy of 61.7% and 82.7%, respectively. Guo et al propose a hierarchical LSTM for sign language translation that addresses the difficulties that traditional HMM and CTC (connectionist Temporal Classification) may not be able to resolve the confusing word order corresponding to the visual content in a sentence during recognition. The inventor proposes a continuous sign language recognition method consisting of off-line training and on-line recognition. The method judges transition motion by utilizing a threshold matrix and a speed threshold in an off-line training stage, and determines the end point of each candidate symbol by utilizing rough segmentation based on the threshold matrix and fine segmentation based on a DTW (Length-Root method) and a Length-Root method in an on-line identification stage. The method solves some error problems caused by gesture transitional motion and verifies validity in a Kinect-based data set. However, the accuracy of the sign language recognition method based on computer vision is affected by factors such as light conversion and clothing occlusion.
Another approach is a sensor-based sign language recognition approach, with sensors including data gloves, armrings, smart watches, and the like. Li and the like provide a continuous SLR model framework which is easy to expand, strong in anti-interference capability and easy to transplant by using HMM technology, the problem of partial scalability in a continuous SLR system is solved, 1024 test sentences and 510 words of vocabularies collected by five gesture collectors by using data gloves are tested, and the accuracy rate of the words reaches 87.4%. Bukhari et al designed a real-time ASL recognition glove equipped with a series of sensors, and recognized 23 data sets using Principal Component Analysis (PCA) algorithm to achieve 92% accuracy. It is undeniable that a pair of gloves with data can accurately read the motion data of the hand motion, thereby realizing high-accuracy recognition, but the gloves have a disadvantage in portability.
The sign language recognition method based on the sensor mainly adopts devices such as data gloves, intelligent watches, armrings and the like to collect sign language gesture data to establish a model and recognize sign languages. The present invention provides a SOFM/SRN/HMM model, which uses an improved Simple Recursive Network (SRN), segments continuous sign language according to the converted SOFM representation, uses the output of the SRN as HMM state, and searches the best matching word sequence using the trellis viterbi algorithm. The system has the recognition accuracy of 82.9% for sign language vocabularies collected by 5113 data gloves, and the accuracy of 86.3% for continuous sign language recognition irrelevant to gesture collectors. Benalcazar et al, using an armring sensor to collect surface Electromyography (sEMG) signals, presented a new model for real-time gesture recognition based on k-Nearest Neighbor (KNN) and DTW algorithms and recognized five gesture signals to 86% accuracy. Yankee et al propose continuous SLR based on an optimized tree structure framework based on sEMG, Accelerometer (ACC) and Gyroscope (GYRO) sensor combinations. The algorithm classifies continuous Chinese Sign Language (CSL) subwords by adopting an optimized tree structure according to the direction and amplitude components of one or two hands. Experimental results on the 150 subword datasets they obtained showed an accuracy of 94.31% in the user specific test and 87.02% in the user independent test. Engin Kaya et al propose a new gesture recognition method that utilizes sEMG signals collected from information acquisition armrings and extracts seven different time domain features from the raw EMG signal using a sliding window method, and by comparing KNN, support vector machine and artificial neural network, the system results based on the support vector machine classifier were found to have the highest accuracy. Manor et al propose new sEMG and ACC signal acquisition positions to acquire hand motion signals of the right forearm, wrist and back of the hand using sEMG and ACC to acquire 18 CSL features. According to the method, sEMG and ACC data are divided by using a sliding window, features are extracted and combined into a feature vector, and Linear Discriminant Analysis (LDA) is used for judgment. Experiments show that the identification accuracy reaches 91.4%. The method is easy to apply Deep Belief Networks (DBN) to the field of Chinese sign language recognition based on wearable sensors. In order to obtain the optimal structure of the network, three different sensor fusion strategies are explored, including data fusion, feature fusion and decision fusion, and the optimal recognition accuracy in the experiment is 95.1% for the user-related test and 88.2% for the user-independent test.
However, the continuous sign language recognition using a sensor is still relatively little. Mittal et al proposed an improved Long Short Term Memory (LSTM) model. The model was tested using 942 different sign word recognitions in indian sign language sentences of 35 with an average accuracy of 72.3%. Gupta et al propose a classifier set based on different length windows to extract features, which realizes real-time classification of continuous sign language sentences by using a multi-mode wearable sensor to improve the accuracy of continuous sign language recognition classification. This study shows that the proposed integration method has a higher accuracy for classification of sentences than a single classifier learned using features extracted from a fixed duration window.
According to research and discovery of the inventor, no working armlet is used for continuous sign sentence recognition of Chinese sign language at present [11 ]. This problem currently faces two major challenges: how to accurately segment sign language words; because in continuous sign language, there are overlapped deformation between sign language words, how to improve the accuracy of word recognition. In view of these two challenges, it is desirable to provide a new sign language recognition system to solve the above problems.
Disclosure of Invention
The invention aims to solve the technical problem of providing a continuous Chinese sign language identification method and a system thereof based on sliding window segmentation, which can obviously improve the identification accuracy of the continuous Chinese sign language.
In order to solve the technical problems, the invention adopts a technical scheme that: the continuous Chinese sign language recognition method based on sliding window segmentation comprises the following steps:
s1: collecting sEMG and IMU data of the arm through an arm ring;
s2: preprocessing the data collected in the step S1;
s3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
s4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
s5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result.
In a preferred embodiment of the present invention, in step S1, when the sEMG and IMU data of the arm are collected, the arm ring is worn on the forearm, and the sEMG sensor mounted on the arm ring is located at the front end of the forearm to aim at the middle finger direction of the finger.
In a preferred embodiment of the present invention, in step S2, the preprocessing process includes:
s201: screening effective information in the signals by using data normalization and the starting time and the ending time of the normalized signals;
s202: filtering the screened effective information through a Butterworth filter.
In a preferred embodiment of the present invention, in step S3, the method for dividing the single sign language word into the continuous sign language words through the sliding window is as follows:
and selecting the sliding length of the sliding window based on the average length of the single sign language of the words, and averagely dividing a gesture signal into a plurality of groups of single sign language words by taking a gesture one second as a unit.
In a preferred embodiment of the present invention, in step S3, the step of performing average segmentation and reassembly on each of the divided data includes:
equally dividing a sign language word into n groups of data, sequentially deleting one group of data, forming a new data as an input by the remaining n-1 groups of data according to the original sequence, and generating n different data as n different inputs by a dividing gesture.
In a preferred embodiment of the present invention, in step S4, the LSTM neural network is composed of two fully-connected layers, two LSTM models and one fully-connected layer, the fully-connected layers include a first fully-connected layer of 512 neurons and a second fully-connected layer of 256 neurons, the LSTM model is bidirectional LSTMs, and each layer of LSTM contains 256 units.
In a preferred embodiment of the present invention, the step S5 includes the following steps:
s501: for the data in each sliding window, n sign language word predicted values obtained through LSTM neural network recognition are set as s1,s2,s3,si,...snWith a corresponding probability of p1,p2,p3,pi,...p5;
S502: setting the threshold value as D when the probability piWhen the value is more than or equal to D, adding siThe voting votes of (a) are regarded as valid votes; current probability pi<When D is greater than siThe voting votes of (a) are regarded as invalid votes;
s503: the effective number of votes is callThen, then
(1) If the number of votes for a single outcome is greater than half the number of valid votes callWhen the window is a 2 window, the result is used as the output result of the window;
(2) if the number of votes for two results is equal, and the number of votes is equal to the number of valid votes callWhen/2 is, then piThe highest result is used as the output result of the window;
(3) if the two conditions (1) and (2) do not exist, the window is regarded as having no valid output information.
In order to solve the technical problem, the invention adopts another technical scheme that: the continuous Chinese sign language recognition system based on sliding window segmentation is provided, and comprises:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing the sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting the characteristics of the data preprocessed by the data preprocessing module by using a sliding window, and comprises the steps of carrying out single sign language word division on continuous sign languages through the sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign language word predicted values;
and the threshold-based multi-voting decision module judges and analyzes the plurality of sign language word predicted values identified by the LSTM-based neural network structure by using a threshold-based multi-voting strategy to obtain an identification result.
In a preferred embodiment of the present invention, the data preprocessing module includes an information filtering unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing data normalization and the starting time and the ending time of the normalized signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
In a preferred embodiment of the present invention, the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation and reassembly unit;
the continuous sign language segmentation unit is used for carrying out single sign language word segmentation on the continuous sign language through a sliding window;
and the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data segmented by each continuous sign language segmentation unit to obtain a plurality of new data.
The invention has the beneficial effects that:
(1) the method utilizes sign language data acquired by the armlet to identify sign language sentences, and solves the problem of continuous sign language segmentation by using a sliding window method based on average word length; while the idea of segmentation is used in single sign language recognition. A single sign language is obtained through sliding window division, and parts of the sign language are taken for recognition, so that a single sign language word can be recognized for multiple times, and the recognition accuracy is improved;
(2) the continuous Chinese sign language recognition system based on sliding window segmentation has the average accuracy rate of 83.8 percent, and is improved by 18.6 percent compared with an LSTM model (a method of directly using an LSTM neural network after characteristics are extracted through a sliding window).
Drawings
FIG. 1 is a flow chart of a continuous Chinese sign language recognition method based on sliding window segmentation according to the present invention;
FIG. 2 is a schematic view of the manner in which the arm ring is worn;
FIG. 3 is a schematic diagram of sentence recognition accuracy when n has different values;
FIG. 4 is a schematic diagram of the LSTM neural network;
FIG. 5 is a data histogram of sentence recognition results using SSW and LSTM models, respectively;
FIG. 6 is a schematic representation of the stability of the SSW system of the present invention;
FIG. 7 is a data histogram of SSW sentence recognition accuracy as a function of the number of segmentation groups n;
fig. 8 is a block diagram of the continuous chinese sign language recognition system based on sliding window segmentation.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
Referring to fig. 1, an embodiment of the present invention includes:
a continuous Chinese sign language identification method based on Sliding Window segmentation (SSW) comprises the following steps:
s1: collecting sEMG and IMU data of the arm through an arm ring;
in this embodiment, the device used for data acquisition is a commercial Myo arm ring, which is worn on the forearm as shown in fig. 2, and the sEMG sensor with a badge is located at the front end of the forearm to aim at the middle finger direction of the finger. It has 8 sEMG sensors and an IMU unit for a total of 18 bits of data.
S2: preprocessing the data collected in the step S1; the pretreatment process comprises the following steps:
s201: screening effective information in the signals by using data normalization and the starting time and the ending time of the normalized signals;
s202: filtering the screened effective information through a Butterworth filter.
Specifically, the signal of a whole sentence obtained by the arm ring is set as,
E(x)=(E1(x),E2(x),...,E17(x),E18(x)), (1)
wherein Ei(x) (1 ≦ i ≦ 18) for the ith signal from the arm ring.
The complete sentences collected by the arm rings must have starting pauses and ending pauses, and effective information in signals should be screened out in order to reduce calculation overhead and facilitate subsequent training. Setting a sliding window and a sliding length, calculating an average Absolute value (MVA) in the current sliding window, and if the average Absolute value is smaller than a certain empirical value, indicating that the sentence starts or ends. In this embodiment, the sEMG signal acquisition frequency is 200Hz, 50 data points are used as the sliding window size, 5 data points are used as the sliding length, and MVA is calculated:
it is known from general experience that if the MVA is less than 10, it means that the segment of information is invalid information, and thereby the invalid information is deleted. The data is then converted into the [0,1] interval using normalization and absolute value taking operations and is denoted as w (x):
W(x)=(W1(x),W2(x),...,W17(x),W18(x)), (3)
wherein Wi(x) (1. ltoreq. i. ltoreq.18) represents Ei(x) And normalizing and taking the absolute value of the data.
While the sensor information is acquired by the arm ring, the signal has partial noise in addition to necessary data. The denoising operation is carried out by filtering the input signals, so that important information cannot be filtered in the denoising process, but partial differences of sensor signals are also reserved, the robustness of the model is ensured, and the accuracy is improved. The invention therefore performs the filtering operation by means of a butterworth filter. The butterworth filter is one of digital filters, and the frequency response curve in the band pass is the smoothest, so that a low pass filter can be designed to eliminate the noise interference of high frequency. The fewer parameters of the butterworth filter are computationally less expensive than other filters. The filtration was carried out with a third order butterworth filter with a cut-off frequency of 10Hz, and the result was recorded as a (x).
S3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
by observing sign language, the average time of a sign language word is about one second, so that A (x) is divided by one gesture for one second (200 data points). According to the Wahid et al study, the present example selects a sliding window with a sliding length of 40 (covering 80%). Dividing the data to obtain 18-dimensional data.
After the division of the single sign language words through the sliding window, each divided data is divided. Assume that data of one partition is S ═ S1,s2,...,s199,s200In which s isi={si1,wi2,...,si17,si18Dividing the signal S into n groups of data and recording the data as S (i is more than or equal to 1 and less than or equal to 200)1,S2,...Sn-1,SnOne group of data is deleted in sequence, and the rest n-1 groups of data form a new data as input according to the original sequence, so that n different data can be generated by one dividing gesture as n different inputs.
In this embodiment, the ten volunteers are invited to collect 100 sign language messages and 20 different sign language sentences without interference. After the sign language sentence divides the data through the sliding window, equally dividing a sign language word into n groups of data, sequentially deleting one group of data, forming a new data by the remaining n-1 groups of data according to the original sequence as new data, inputting the new data into the LSTM neural network, and obtaining an experimental result when the number n of the divided groups is different, wherein the experimental result is shown in FIG. 3. As can be seen from FIG. 3, when n is greater than or equal to 2, the data accuracy rate is increased rapidly, and the accuracy rate is greater than that without preprocessing, and when n is greater than or equal to 6, the data is basically stable, but the overhead in the algorithm operation process is greatly increased. Finally, the fixed n is 5, so that the accuracy is improved, and the high overhead in the operation process is prevented.
S4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
the neural network consists of two fully connected layers, two LSTM models and one fully connected layer, as shown in fig. 4. After the data processing, the original 200 × 18 matrix is changed to 160 × 18 matrix and input. The input data first passes through a full connection layer of 512 neurons and a full connection layer of 256 neurons. The outputs of the two fully-connected layers are used as inputs to bi-directional LSTMs, each layer of LSTM containing 256 cells. And then, inputting the output to a full connection layer, and judging to obtain a predicted value. The predicted value is a predicted result of gesture information of the data obtained by inputting the data after being divided and combined into a neural network.
S5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result.
For the 18-dimensional data in each sliding window, five results are obtained through neural network identification, and the result is set as s1,s2,s3,s4,s5With a corresponding probability of p1,p2,p3,p4,p5. Setting the threshold value as D when the probability piWhen the value is more than or equal to D, adding siThe voting of (2) is regarded as a valid vote, when probability pi<When D is greater than siThe votes in (1) are regarded as invalid votes. The effective number of votes is callThe specific operation is as follows:
(1) if the number of votes for a single outcome is greater than half the number of valid votes callAnd when the window is 2, the result is used as the output result of the window.
(2) If the number of votes for two results is equal, and the number of votes is equal to the number of valid votes callWhen/2 is, then piThe highest result is the output result for that window.
(3) If the two conditions do not exist, the window is regarded as having no effective output information.
Based on the method, the invention also provides a continuous Chinese sign language recognition system based on sliding window segmentation, which refers to fig. 8 and comprises the following modules:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing the sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting the characteristics of the data preprocessed by the data preprocessing module by using a sliding window, and comprises the steps of carrying out single sign language word division on continuous sign languages through the sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign language word predicted values;
and the threshold-based multi-voting decision module judges and analyzes the plurality of sign language word predicted values identified by the LSTM-based neural network structure by using a threshold-based multi-voting strategy to obtain an identification result.
In a preferred embodiment of the present invention, the data preprocessing module includes an information filtering unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing data normalization and the starting time and the ending time of the normalized signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
In a preferred embodiment of the present invention, the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation and reassembly unit;
the continuous sign language segmentation unit is used for carrying out single sign language word segmentation on the continuous sign language through a sliding window;
and the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data segmented by each continuous sign language segmentation unit to obtain a plurality of new data.
The invention evaluates the performance of the sign language sentence recognition system proposed by the text through experiments. The experiments used the Myo armloop as the information acquisition device. The arm ring is provided with 8 sEMG sensors, 1 triaxial accelerometer and 1 triaxial gyroscope, can acquire arm electromyographic signals at the frequency of 200Hz and sends the signals to a computer. The computer is equipped with Intel core i7-8700 processor, 16GB memory, Nvidia GTX 1080 video card, 8GB video card and Windows10 operating system.
In the present embodiment, the sentence recognition accuracy is defined as [13]
Where S, I, D represent the number of words replaced, inserted, and deleted, respectively. N represents the number of sign language words recognized by testing successive sign languages.
(1) SSW vs LSTM
Randomly selecting 10 sign language sentences, identifying the 10 sign language sentences by using an SSW system, and comparing the 10 sign language sentences with the LSTM neural network method directly after extracting features in a sliding window, wherein the experimental result is shown in figure 5. As can be seen from FIG. 5, the accuracy of SSW system identification is much higher than the LSTM model. Wherein, the average accuracy of the SSW system is 83.8%, and the average accuracy of the LSTM model is 65.2%. The reason is that the SSW system trains a sliding window partition for multiple times and applies a threshold-based multi-voting strategy, so that the influence of deviations such as gesture connection, incomplete sign language actions and the like in a gesture signal is reduced, and the accuracy of word recognition in sentences is improved.
(2) SSW stability
The purpose of the SSW system is to allow most sign language sentences of the deaf-mute to be translated into text, so it should be able to accurately recognize the gestures of any deaf-mute. In order to verify the stability of the SSW system, 10 volunteers of the same age group who did not acquire gestures were invited to perform sign language recognition in this embodiment, and the experimental results are shown in fig. 6. As can be seen from fig. 6, the average accuracy of volunteers who did not acquire a gesture was 79.84%, which was slightly reduced compared to the accuracy of 83.8% of volunteers who acquired a gesture, but still within the acceptance interval.
(3) Influence of the number of segmentation groups n on the accuracy of SSW recognition
In this experiment, gesture information of 10 volunteers was collected using an arm ring. A total of 100 gestures and 20 different sign languages were collected, 50 samples per sentence. Randomly selecting 3 sentences, changing the number n of the segmentation groups, and observing the change condition of the average accuracy rate of the sign language of the SSW system, wherein the experimental result is shown in FIG. 7, and the change condition of the SSW sentence identification accuracy rate along with the number n of the segmentation groups is shown in FIG. 7 by taking 'I have fever' as an example. It can be known from the figure that when the number of the segmentation groups n is 5, the accuracy is improved to 85% compared with the smaller number of the segmentation groups, when the number of the segmentation groups continues to increase, the calculation amount increases, the identification accuracy is not obviously improved, and even for some sentences, the identification accuracy is reduced.
The invention provides a continuous Chinese sign language recognition system SSW based on sliding window segmentation. The system uses a segmentation idea, firstly, a sliding window is used for segmenting the average length of a single sign language of a continuous sign language based on words, then, a gesture signal is averagely divided into n groups, n-1 groups are taken out each time and combined into new data according to the original sequence, and recognition is carried out for multiple times, so that the sentence recognition rate is improved. Meanwhile, the SSW processes the recognition result by adopting a threshold-based multi-voting strategy, and the recognition result is recorded as an effective ticket number only if the recognition accuracy is greater than the threshold, so that the influence of a large deviation part in the gesture signal is reduced, and the result is more credible. Although the SSW increases the calculation amount to a certain extent, the accuracy rate of sign language is greatly improved. Gesture collection and test on 10 volunteers show that the accuracy rate reaches 83.8%.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (9)
1. A continuous Chinese sign language identification method based on sliding window segmentation comprises the following steps:
s1: collecting sEMG and IMU data of the arm through an arm ring;
s2: preprocessing the data collected in the step S1;
s3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
s4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
s5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result.
2. The continuous Chinese sign language recognition method based on sliding window segmentation recited in claim 1, wherein in step S1, when collecting sEMG and IMU data of the arm, the arm ring is worn on the forearm, and the sEMG sensor mounted on the arm ring is located at the front end of the forearm and is aligned with the middle finger direction of the finger.
3. The continuous chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein in step S2, the preprocessing process comprises:
s201: screening effective information in the signals by using data normalization and the starting time and the ending time of the normalized signals;
s202: filtering the screened effective information through a Butterworth filter.
4. The continuous chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein in step S3, the method for dividing single sign language words into continuous sign languages through sliding window is:
and selecting the sliding length of the sliding window based on the average length of the single sign language of the words, and averagely dividing a gesture signal into a plurality of groups of single sign language words by taking a gesture one second as a unit.
5. The continuous chinese sign language recognition method of claim 1, wherein in step S3, the step of performing average segmentation and reassembly on each divided data includes:
equally dividing a sign language word into n groups of data, sequentially deleting one group of data, forming a new data as an input by the remaining n-1 groups of data according to the original sequence, and generating n different data as n different inputs by a dividing gesture.
6. The continuous Chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein the specific steps of step S5 include:
s501: for the data in each sliding window, n sign language word predicted values obtained through LSTM neural network recognition are set as s1,s2,s3,si,...snWith a corresponding probability of p1,p2,p3,pi,...p5;
S502: setting the threshold value as D when the probability piWhen the value is more than or equal to D, adding siThe voting votes of (a) are regarded as valid votes; current probability pi<When D is greater than siThe voting votes of (a) are regarded as invalid votes;
s503: the effective number of votes is callThen, then
(1) If the number of votes for a single outcome is greater than half the number of valid votes callWhen the window is a 2 window, the result is used as the output result of the window;
(2) if the number of votes for two results is equal, and the number of votes is equal to the number of valid votes callWhen/2 is, then piThe highest result is used as the output result of the window;
(3) if the two conditions (1) and (2) do not exist, the window is regarded as having no valid output information.
7. A continuous Chinese sign language recognition system based on sliding window segmentation is characterized by comprising:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing the sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting the characteristics of the data preprocessed by the data preprocessing module by using a sliding window, and comprises the steps of carrying out single sign language word division on continuous sign languages through the sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign language word predicted values;
and the threshold-based multi-voting decision module judges and analyzes the plurality of sign language word predicted values identified by the LSTM-based neural network structure by using a threshold-based multi-voting strategy to obtain an identification result.
8. The continuous Chinese sign language recognition system based on sliding window segmentation as claimed in claim 7, wherein the data preprocessing module comprises an information filtering unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing data normalization and the starting time and the ending time of the normalized signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
9. The continuous Chinese sign language recognition system based on sliding window segmentation as claimed in claim 7, wherein the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation and reassembly unit;
the continuous sign language segmentation unit is used for carrying out single sign language word segmentation on the continuous sign language through a sliding window;
and the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data segmented by each continuous sign language segmentation unit to obtain a plurality of new data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010734304.6A CN111914724B (en) | 2020-07-27 | 2020-07-27 | Continuous Chinese sign language identification method and system based on sliding window segmentation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010734304.6A CN111914724B (en) | 2020-07-27 | 2020-07-27 | Continuous Chinese sign language identification method and system based on sliding window segmentation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111914724A true CN111914724A (en) | 2020-11-10 |
CN111914724B CN111914724B (en) | 2023-10-27 |
Family
ID=73281861
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010734304.6A Active CN111914724B (en) | 2020-07-27 | 2020-07-27 | Continuous Chinese sign language identification method and system based on sliding window segmentation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914724B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906498A (en) * | 2021-01-29 | 2021-06-04 | 中国科学技术大学 | Sign language action recognition method and device |
CN114115531A (en) * | 2021-11-11 | 2022-03-01 | 合肥工业大学 | End-to-end sign language identification method based on attention mechanism |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5699441A (en) * | 1992-03-10 | 1997-12-16 | Hitachi, Ltd. | Continuous sign-language recognition apparatus and input apparatus |
KR20030030232A (en) * | 2001-10-09 | 2003-04-18 | 한국과학기술원 | Method and System for recognizing continuous sign language based on computer vision |
US20170300124A1 (en) * | 2017-03-06 | 2017-10-19 | Microsoft Technology Licensing, Llc | Ultrasonic based gesture recognition |
CN109271901A (en) * | 2018-08-31 | 2019-01-25 | 武汉大学 | A kind of sign Language Recognition Method based on Multi-source Information Fusion |
CN109902554A (en) * | 2019-01-09 | 2019-06-18 | 天津大学 | A kind of recognition methods of the sign language based on commercial Wi-Fi |
CN110286774A (en) * | 2019-07-03 | 2019-09-27 | 中国科学技术大学 | A kind of sign Language Recognition Method based on Wrist-sport sensor |
CN111340005A (en) * | 2020-04-16 | 2020-06-26 | 深圳市康鸿泰科技有限公司 | Sign language identification method and system |
-
2020
- 2020-07-27 CN CN202010734304.6A patent/CN111914724B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5699441A (en) * | 1992-03-10 | 1997-12-16 | Hitachi, Ltd. | Continuous sign-language recognition apparatus and input apparatus |
KR20030030232A (en) * | 2001-10-09 | 2003-04-18 | 한국과학기술원 | Method and System for recognizing continuous sign language based on computer vision |
US20170300124A1 (en) * | 2017-03-06 | 2017-10-19 | Microsoft Technology Licensing, Llc | Ultrasonic based gesture recognition |
CN109271901A (en) * | 2018-08-31 | 2019-01-25 | 武汉大学 | A kind of sign Language Recognition Method based on Multi-source Information Fusion |
CN109902554A (en) * | 2019-01-09 | 2019-06-18 | 天津大学 | A kind of recognition methods of the sign language based on commercial Wi-Fi |
CN110286774A (en) * | 2019-07-03 | 2019-09-27 | 中国科学技术大学 | A kind of sign Language Recognition Method based on Wrist-sport sensor |
CN111340005A (en) * | 2020-04-16 | 2020-06-26 | 深圳市康鸿泰科技有限公司 | Sign language identification method and system |
Non-Patent Citations (4)
Title |
---|
刘肖;袁冠;张艳梅;闫秋艳;王志晓;: "基于自适应多分类器融合的手势识别", 计算机科学, no. 07 * |
张晓冰;龚海刚;杨帆;戴锡笠;: "基于端到端句子级别的中文唇语识别研究", 软件学报, no. 06 * |
王文会;陈香;阳平;李云;杨基海;: "基于多传感器信息检测和融合的中国手语识别研究", 中国生物医学工程学报, no. 05 * |
王春立,高文,马继勇,高秀娟: "基于词根的中国手语识别方法", 计算机研究与发展, no. 02 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906498A (en) * | 2021-01-29 | 2021-06-04 | 中国科学技术大学 | Sign language action recognition method and device |
CN114115531A (en) * | 2021-11-11 | 2022-03-01 | 合肥工业大学 | End-to-end sign language identification method based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN111914724B (en) | 2023-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108805087B (en) | Time sequence semantic fusion association judgment subsystem based on multi-modal emotion recognition system | |
CN108805089B (en) | Multi-modal-based emotion recognition method | |
CN108877801B (en) | Multi-turn dialogue semantic understanding subsystem based on multi-modal emotion recognition system | |
CN108491077B (en) | Surface electromyographic signal gesture recognition method based on multi-stream divide-and-conquer convolutional neural network | |
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN105069434B (en) | A kind of human action Activity recognition method in video | |
CN110826466A (en) | Emotion identification method, device and storage medium based on LSTM audio-video fusion | |
CN111103976B (en) | Gesture recognition method and device and electronic equipment | |
CN109255289B (en) | Cross-aging face recognition method based on unified generation model | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
CN109271901A (en) | A kind of sign Language Recognition Method based on Multi-source Information Fusion | |
CN111914724B (en) | Continuous Chinese sign language identification method and system based on sliding window segmentation | |
Sheng et al. | Siamese networks for weakly supervised human activity recognition | |
Suh et al. | Adversarial deep feature extraction network for user independent human activity recognition | |
CN107909003B (en) | gesture recognition method for large vocabulary | |
CN114417836A (en) | Deep learning-based Chinese electronic medical record text semantic segmentation method | |
CN114384999B (en) | User-independent myoelectric gesture recognition system based on self-adaptive learning | |
CN111262637A (en) | Human body behavior identification method based on Wi-Fi channel state information CSI | |
CN112466284B (en) | Mask voice identification method | |
CN111913575A (en) | Method for recognizing hand-language words | |
CN116861217B (en) | Identity recognition method and system for mobile terminal | |
Sidig et al. | Arabic sign language recognition using vision and hand tracking features with HMM | |
CN116844080B (en) | Fatigue degree multi-mode fusion detection method, electronic equipment and storage medium | |
Zhu et al. | Emotion Recognition of College Students Based on Audio and Video Image. | |
CN112069898A (en) | Method and device for recognizing human face group attribute based on transfer learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |