CN111914724B - Continuous Chinese sign language identification method and system based on sliding window segmentation - Google Patents

Continuous Chinese sign language identification method and system based on sliding window segmentation Download PDF

Info

Publication number
CN111914724B
CN111914724B CN202010734304.6A CN202010734304A CN111914724B CN 111914724 B CN111914724 B CN 111914724B CN 202010734304 A CN202010734304 A CN 202010734304A CN 111914724 B CN111914724 B CN 111914724B
Authority
CN
China
Prior art keywords
sign language
data
segmentation
sliding window
continuous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010734304.6A
Other languages
Chinese (zh)
Other versions
CN111914724A (en
Inventor
王青山
王鑫炎
马晓迪
郑志文
朱钰
张江涛
王�琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202010734304.6A priority Critical patent/CN111914724B/en
Publication of CN111914724A publication Critical patent/CN111914724A/en
Application granted granted Critical
Publication of CN111914724B publication Critical patent/CN111914724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a continuous Chinese sign language identification method based on sliding window segmentation, which comprises the following steps: s1: collecting sEMG and IMU data of the arm through an arm ring; s2: preprocessing the data acquired in the step S1; s3: the method comprises the steps of performing feature extraction on preprocessed data by utilizing a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign language through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data; s4: inputting the obtained new data into an LSTM neural network for training to obtain a sign language word predicted value; s5: and judging and analyzing the plurality of identified sign language word predicted values by using a multi-voting strategy based on a threshold value to obtain an identification result. A continuous Chinese sign language recognition system based on sliding window segmentation is also disclosed. The average accuracy of the sign language recognition system provided by the invention reaches 83.8%, and is improved by 18.6% compared with an LSTM model.

Description

Continuous Chinese sign language identification method and system based on sliding window segmentation
Technical Field
The invention relates to the technical field of sign language recognition, in particular to a continuous Chinese sign language recognition method and system based on sliding window segmentation.
Background
Communication is a fundamental requirement of all humans, and disabled persons with hearing impairment (deaf-mute) are no exception. Sign language is commonly used for communication between the deaf-mute and the sound person. Natural sign language can be conveniently used for communication among the deaf-mute, and communication between the deaf-mute and sound person is more dependent on grammar sign language (simply called sign language). However, when a healthy person communicates with a deaf-mute, the healthy person encounters a great obstacle that the person cannot read the sign language. Sign language recognition (Sign Language Recognition, SLR) systems have taken a very important role in the communication of the deaf-mute with the sound person.
The current mainstream sign language recognition method can be divided into: computer vision based methods and sensor based methods. Koller et al propose to improve label-to-image alignment in a weakly supervised real time and embed convolutional neural networks (Convolutional Neural Network, CNN) into hidden markov models (Hidden Markov Model, HMM) for correcting the framing labels, improving 10% accuracy. Cui et al propose a weak supervision framework with deep neural network, which solves the mapping problem of video clips to gloss by introducing convolutional neural network to perform space-time feature extraction and sequence learning, and the method obtains the result equivalent to the latest technology at the time without additional supervision. Poplar et al built a continuous SLR model based on fast HMMs, which embedded HMMs into a level building algorithm (Dynamic Time Warping based Level Building, LB-DTW) based on dynamic time warping to improve the accuracy of sentence recognition. In addition, a rough segmentation method is proposed to provide the maximum number of levels, and the grammar constraint and the symbol length constraint are utilized to achieve improvement of recognition accuracy by reduction of insertion, deletion and substitution errors, and compared with other prior arts, the method has higher recognition performance and lower calculation amount. Yellow et al propose a video-based sign language recognition method that uses a new dual stream 3-DCNN to extract global-local spatiotemporal features from the logo video. Experiments were performed on RWTH-PHOENIX-Weather dataset containing 7000 sentences of Weather forecast sentences from 9 gesturing persons and publicly available dataset acquired by Kinect, with accuracy of recognition of 61.7% and 82.7%, respectively. Guo et al propose a hierarchical LSTM for sign language translation that addresses the difficulty that conventional HMMs and CTC (Connectionist Temporal Classification) may not be able to address the confusing word order corresponding to visual content in sentences during recognition. And the like, a continuous sign language recognition method consisting of offline training and online recognition is proposed. The method utilizes a threshold matrix and a rate threshold to judge transitional motion in an offline training stage, and utilizes coarse segmentation based on the threshold matrix and subdivision segmentation based on a DTW and Length-Root method (Length-Root method) to determine the endpoint of each candidate symbol in an online recognition stage. The method solves some of the error problems caused by gesture transition motion and verifies validity in Kinect-based datasets. However, accuracy of the sign language recognition method based on computer vision is affected by factors such as illumination transformation and clothing shielding.
Another method is a sign language recognition method based on sensors including data gloves, armrings, smart watches, etc. Li and the like provide a continuous SLR model frame which is easy to expand, strong in anti-interference capability and easy to transplant by utilizing an HMM technology, so that the problem of partial scalability in a continuous SLR system is solved, and the vocabulary of 1024 test sentences and 510 words obtained by collecting five gesture collectors by utilizing data gloves is tested, wherein the word accuracy rate reaches 87.4%. Bukhari et al designed a real-time ASL recognition glove equipped with a series of sensors, which used a principal component analysis (Principal Component Analysis, PCA) algorithm to recognize 23 datasets, yielding 92% accuracy. The motion data of the hand motion can be accurately read by carrying a pair of data gloves, so that the recognition with high accuracy is realized, but the portability is lacking.
The sign language identification method based on the sensor mainly adopts devices such as a data glove, an intelligent watch, an armring and the like to collect sign language gesture data to build a model, and identifies sign language. Higher proposes a SOFM/SRN/HMM model that employs a modified simple recursive network (Simple Recurrent Network, SRN), segments successive sign language according to the converted SOFM representation, and uses the output of SRN as HMM state, searching for the best matching word sequence using the Grignard algorithm. The system can achieve 82.9% accuracy rate of recognition of the sign language vocabulary collected by 5113 data gloves and 86.3% accuracy rate of continuous sign language recognition independent of gesture collectors. Benalcazar et al collect surface electromyography (surface Electromyography, sEMG) signals with an armband sensor, and propose a new model of real-time gesture recognition based on the k-Nearest Neighbor (KNN) and DTW algorithms and recognizing five gesture signals to 86% accuracy. Poplar et al propose continuous SLR based on an optimized tree structure framework based on sEMG, accelerometer (ACC) and Gyroscope (GYRO) sensor combinations. The algorithm classifies successive chinese sign language (Chinese Sign Language, CSL) subwords using an optimized tree structure based on the direction and magnitude components of one or both hands. Experimental results on the 150 subword dataset they obtained showed an accuracy of 94.31% in the user-specific test and 87.02% in the user-independent test. Engin Kaya et al propose a new gesture recognition method that utilizes sEMG signals collected from the information acquisition arm ring and extracts seven different time domain features from the original EMG signals using a sliding window method, and by comparing KNN, a support vector machine and an artificial neural network, the highest accuracy of the system results based on the support vector machine classifier is found. The bang et al propose new sEMG and ACC signal acquisition positions, using sEMG and ACC to acquire hand motion signals of the right forearm, wrist and back of the hand, thereby acquiring 18 CSL features. The method divides sEMG and ACC data by using a sliding window, extracts features and combines the features into feature vectors, and judges by using linear discriminant analysis (Linear Discriminant Analysis, LDA). Experiments show that the accuracy of identification reaches 91.4%. Easy et al apply deep belief networks (Deep Belief Networks, DBN) to the field of chinese sign language recognition based on wearable sensors. To obtain the best structure of the network, three different sensor fusion strategies were explored, including data fusion, feature fusion, and decision fusion, with an experimental best recognition accuracy of 95.1% for user-related tests and 88.2% for user-independent tests.
But there is less work in using the sensor for continuous sign language recognition. Mittal et al propose an improved Long Short-Term Memory (LSTM) model. The model uses 35 different sign words to identify 942 indian sign language sentences for testing, and the average accuracy is 72.3%. Gupta et al propose classifier sets based on different length window extraction features, and realize real-time classification of continuous sign language sentences by using a multi-mode wearable sensor so as to improve accuracy of continuous sign language recognition classification. This study shows that the proposed integration method has a higher accuracy for classification of sentences than a single classifier learned using features extracted from a fixed duration window.
According to the research of the inventor, no work arm ring is used for carrying out continuous hand sentence recognition of Chinese sign language at present [11]. This problem currently faces mainly two challenges: how to accurately segment sign language words; because of the overlapping deformation among the sign language words in the continuous sign language, how to improve the accuracy of word recognition. In view of these two challenges, it is desirable to provide a new sign language recognition system to solve the above problems.
Disclosure of Invention
The invention aims to solve the technical problem of providing a continuous Chinese sign language recognition method and a system based on sliding window segmentation, which can remarkably improve the recognition accuracy of continuous Chinese sign language.
In order to solve the technical problems, the invention adopts a technical scheme that: the continuous Chinese sign language identification method based on sliding window segmentation comprises the following steps:
s1: collecting sEMG and IMU data of the arm through an arm ring;
s2: preprocessing the data acquired in the step S1;
s3: the method comprises the steps of performing feature extraction on preprocessed data by utilizing a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign language through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
s4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
s5: and judging and analyzing the plurality of identified sign language word predicted values by using a multi-voting strategy based on a threshold value to obtain an identification result.
In a preferred embodiment of the present invention, in step S1, when the sEMG and IMU data of the arm are collected, the armlet is worn on the forearm, and the sEMG sensor provided on the armlet is located at the front end of the forearm in the middle finger direction.
In a preferred embodiment of the present invention, in step S2, the preprocessing process includes:
s201: screening out effective information in the signals by using data normalization and the starting time and the ending time of the regular signals;
s202: and filtering the screened effective information through a Butterworth filter.
In a preferred embodiment of the present invention, in step S3, the method for dividing the continuous sign language into single sign language words through the sliding window is as follows:
the sliding length of the sliding window is selected based on the average length of single sign language of the words, and one gesture signal is equally divided into a plurality of groups of single sign language words in a unit of one gesture second.
In a preferred embodiment of the present invention, in step S3, the step of performing average segmentation and reassembly on each of the divided data includes:
a sign language word is divided into n groups of data on average, one group of data is deleted in sequence, the rest n-1 groups of data are formed into new data according to the original sequence to be used as input, and n different data are generated by one dividing gesture to be used as n different inputs.
In a preferred embodiment of the present invention, in step S4, the LSTM neural network is composed of two fully connected layers, two LSTM models and one fully connected layer, the fully connected layers include a first fully connected layer of 512 neurons and a second fully connected layer of 256 neurons, the LSTM models are bidirectional LSTMs, and each LSTM includes 256 units.
In a preferred embodiment of the present invention, the specific steps of step S5 include:
s501: for the data in each sliding window, setting the n sign word predicted values obtained through LSTM neural network recognition as s 1 ,s 2 ,s 3 ,s i ,...s n The corresponding probability is p 1 ,p 2 ,p 3 ,p i ,...p 5
S502: setting the threshold value as D, when the probability p i When not less than D, s is i Is considered as a valid vote; when probability p i <D, s is i Is considered an invalid vote;
s503: the effective ticket number is c all Then
(1) If the number of votes of a single result is greater than half the number of votes effectively c all And/2, the result is used as an output result of the window;
(2) If the number of tickets with two results is equal and the number of tickets is equal to the effective number of tickets c all At/2, then p i The highest result is taken as the output result of the window;
(3) If there are no cases (1) and (2), the window is regarded as having no valid output information.
In order to solve the technical problems, the invention adopts another technical scheme that: provided is a continuous Chinese sign language recognition system based on sliding window segmentation, comprising:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting characteristics of the data preprocessed by the data preprocessing module, and the data segmentation and recombination module comprises the steps of dividing single sign language words of continuous sign language through a sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign word predicted values;
and the threshold-based multi-voting decision module is used for judging and analyzing a plurality of sign language word predicted values recognized by the LSTM-based neural network structure by applying a threshold-based multi-voting strategy to obtain a recognition result.
In a preferred embodiment of the present invention, the data preprocessing module includes an information screening unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing the data normalization and the start time and the end time of the regular signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
In a preferred embodiment of the present invention, the data segmentation and reorganization module includes a continuous sign language segmentation unit and a single sign language segmentation reorganization unit;
the continuous sign language segmentation unit is used for dividing single sign language words of continuous sign language through a sliding window;
the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data divided by each continuous sign language segmentation unit to obtain a plurality of new data.
The beneficial effects of the invention are as follows:
(1) The invention uses the sign language data collected by the arm ring to identify sign language sentences, and uses a sliding window method based on average word length to solve the problem of continuous sign language segmentation; while the idea of segmentation is used in single sign language recognition. A single sign language is obtained through sliding window division, and part of the sign language is taken for recognition, so that single sign language words can be recognized for multiple times, and the recognition accuracy is improved;
(2) The average accuracy of the continuous Chinese sign language recognition system based on sliding window segmentation provided by the invention reaches 83.8%, which is 18.6% higher than that of an LSTM model (a method of directly using an LSTM neural network after the sliding window extracts the characteristics).
Drawings
FIG. 1 is a flow chart of a continuous Chinese sign language recognition method based on sliding window segmentation of the present invention;
FIG. 2 is a schematic illustration of the arm ring wear;
FIG. 3 is a schematic diagram of sentence recognition accuracy when n is different in value;
FIG. 4 is a schematic diagram of the LSTM neural network;
FIG. 5 is a data histogram of sentence recognition results using SSW system and LSTM model, respectively;
FIG. 6 is a schematic diagram of the stability of the SSW system of the present invention;
FIG. 7 is a data histogram of SSW statement identification accuracy as a function of the number of split groups n;
fig. 8 is a block diagram of the continuous chinese sign language recognition system based on sliding window segmentation.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to fig. 1, an embodiment of the present invention includes:
a continuous chinese sign language recognition method based on sliding window segmentation (Split Sliding Window, SSW), comprising the steps of:
s1: collecting sEMG and IMU data of the arm through an arm ring;
in this embodiment, the data acquisition device is a commercial Myo arm ring, the wearing mode is shown in fig. 2, the data acquisition device is worn on the forearm, and the sEMG sensor with badge is located at the front end of the forearm and aligned with the middle finger direction. It has 8 sEMG sensors and an IMU unit, together with 18 bits of data.
S2: preprocessing the data acquired in the step S1; the pretreatment process comprises the following steps:
s201: screening out effective information in the signals by using data normalization and the starting time and the ending time of the regular signals;
s202: and filtering the screened effective information through a Butterworth filter.
Specifically, the signal of a whole sentence is obtained through the arm ring,
E(x)=(E 1 (x),E 2 (x),...,E 17 (x),E 18 (x)), (1)
wherein E is i (x) (1.ltoreq.i.ltoreq.18) represents the ith signal transmitted from the arm ring.
The complete sentence collected by the arm ring has a start pause and an end pause, so that the effective information in the signal should be screened out in order to reduce the calculation cost and facilitate the subsequent training. The sliding window and the sliding length are set, the mean absolute value (MeanValue of Absolute, MVA) in the current sliding window is calculated, and if it is smaller than a certain empirical value, the statement is started or ended. In this example, the sEMG signal acquisition frequency was 200Hz, with 50 data points as the sliding window size, 5 data points as the sliding length, and MVA was calculated:
it is known from general experience that if the MVA is less than 10, this piece of information is represented as invalid information, and thereby the invalid information is deleted. The data is then converted into [0,1] intervals using normalization operations and absolute values operations and noted as W (x):
W(x)=(W 1 (x),W 2 (x),...,W 17 (x),W 18 (x)), (3)
wherein W is i (x) (1.ltoreq.i.ltoreq.18) represents E i (x) Normalized and absolute value data.
The sensor information is acquired by the arm ring, and part of noise is also included in the signal except for necessary data. The denoising work is carried out by filtering the input signals, so that important information cannot be filtered out in the denoising process, but partial differences of sensor signals are reserved, the robustness of a model is ensured, and the accuracy is improved. The present invention thus performs a filtering operation by means of a butterworth filter. The butterworth filter is one of digital filters, and the frequency response curve in the band pass band is the smoothest, so that a low-pass filter can be designed to eliminate noise interference at high frequencies. Fewer parameters of the butterworth filter are computationally less expensive than other filters. The filtration was carried out with a three-order Butterworth filter with a cut-off frequency of 10Hz, and the result was designated A (x).
S3: the method comprises the steps of performing feature extraction on preprocessed data by utilizing a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign language through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
by observing the sign language, the average time of one sign language word can be obtained about one second, so that the data of A (x) is divided by taking one gesture as a unit of one second (200 data points). According to the study by Wahid et al, the present example selects the sliding window to have a sliding length of 40 (coverage 80%). Dividing the data to obtain an 18-dimensional data.
After the division of the single sign language word by the sliding window, division is performed on each divided data. Assume that the data of a certain partition is s= { S 1 ,s 2 ,...,s 199 ,s 200 (s is therein i ={s i1 ,w i2 ,...,s i17 ,s i18 Equal division of signal S into n sets of data records (1.ltoreq.i.ltoreq.200)Is S 1 ,S 2 ,...S n-1 ,S n One group of data is sequentially deleted, the rest n-1 groups of data are formed into new data according to the original sequence to serve as input, and n different data serving as n different inputs can be generated by one dividing gesture.
In this embodiment, ten volunteers are invited to collect 100 sign language information and 20 different sign language sentences under the condition of no interference. After the sign language sentence is divided into data through a sliding window, a sign language word is divided into n groups of data on average, one group of data is sequentially deleted, the rest n-1 groups of data are formed into new data according to the original sequence, the new data are used as new data, and an experimental result when the number n of the divided groups is different can be obtained by inputting the new data into the LSTM neural network, and the experimental result is shown in figure 3. As can be seen from FIG. 3, when n is not less than 2, the data accuracy rises faster, and the accuracy is greater than that without preprocessing, and when n is not less than 6, the data is basically stable, but the cost in the algorithm operation process is greatly increased. Finally, taking n=5, not only improves the accuracy, but also prevents the overheads from being too large in the operation process.
S4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
the neural network is composed of two fully connected layers, two LSTM models, and one fully connected layer, as shown in fig. 4. After the data processing, the original matrix of 200×18 is changed into a matrix of 160×18 for input. Input data first passes through the fully connected layers of 512 neurons and the fully connected layers of 256 neurons. The output of the two fully connected layers serves as the input to the bi-directional LSTMs, each layer LSTM containing 256 cells. And then, inputting the output to the full-connection layer, and judging to obtain a predicted value. The predicted value is a predicted result of gesture information of the data obtained by inputting the data subjected to segmentation and combination into a neural network.
S5: and judging and analyzing the plurality of identified sign language word predicted values by using a multi-voting strategy based on a threshold value to obtain an identification result.
For 18-dimensional data in each sliding window, five are obtained through neural network identificationAs a result, let the result be s 1 ,s 2 ,s 3 ,s 4 ,s 5 The corresponding probability is p 1 ,p 2 ,p 3 ,p 4 ,p 5 . Setting the threshold value as D, when the probability p i When not less than D, s is i Is considered as an effective vote when the probability p i <D, s is i Is considered an invalid vote. The effective ticket number is c all The specific operation is as follows:
(1) If the number of votes of a single result is greater than half the number of votes effectively c all And/2, the result is taken as the output result of the window.
(2) If the number of tickets with two results is equal and the number of tickets is equal to the effective number of tickets c all At/2, then p i The highest result is the output of the window.
(3) If the two conditions do not exist, the window is regarded as having no effective output information.
Based on the method, the invention also provides a continuous Chinese sign language recognition system based on sliding window segmentation, referring to fig. 8, comprising the following modules:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting characteristics of the data preprocessed by the data preprocessing module, and the data segmentation and recombination module comprises the steps of dividing single sign language words of continuous sign language through a sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign word predicted values;
and the threshold-based multi-voting decision module is used for judging and analyzing a plurality of sign language word predicted values recognized by the LSTM-based neural network structure by applying a threshold-based multi-voting strategy to obtain a recognition result.
In a preferred embodiment of the present invention, the data preprocessing module includes an information screening unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing the data normalization and the start time and the end time of the regular signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
In a preferred embodiment of the present invention, the data segmentation and reorganization module includes a continuous sign language segmentation unit and a single sign language segmentation reorganization unit;
the continuous sign language segmentation unit is used for dividing single sign language words of continuous sign language through a sliding window;
the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data divided by each continuous sign language segmentation unit to obtain a plurality of new data.
The invention evaluates the performance of the sign language sentence recognition system proposed herein through experiments. The experiment used Myo-arm rings as information gathering devices. The arm ring has 8 sEMG sensors, 1 triaxial accelerometer and 1 triaxial gyroscope, can gather arm electromyographic signals at 200Hz frequency, and send the signals to the computer. The computer is equipped with Intel Kuri 7-8700 processor, 16GB memory, nvidia GTX 1080 video card, 8GB video card and Windows10 operating system.
In the present embodiment, the sentence recognition accuracy is defined as [13]
Where S, I, D represents the number of words replaced, the number of words inserted, and the number of words deleted, respectively. N represents the number of sign language words identified by the test succession of sign language.
(1) SSW to LSTM comparison
10 sign language sentences are randomly selected, the sign language sentences are identified by using an SSW system and compared with a method of directly using an LSTM neural network after the characteristics are extracted by a sliding window, and the experimental result is shown in figure 5. As can be seen from fig. 5, the accuracy of SSW system identification is far higher than that of LSTM model. The average accuracy of the SSW system is 83.8%, and the average accuracy of the LSTM model is 65.2%. The reason is that the SSW system trains a sliding window division section for a plurality of times and applies a multi-voting strategy based on a threshold value, so that the influence of deviation such as gesture engagement, incomplete sign language actions and the like in a gesture signal is reduced, and the accuracy of word language identification in sentences is improved.
(2) SSW stability
The purpose of the SSW system is to enable the sign language sentence of most deaf-mutes to be translated into text, so it should be able to accurately recognize the gesture of any deaf-mutes. To verify the stability of the SSW system, 10 volunteers of the same age group, who did not collect the gesture, were invited to sign language recognition in this embodiment, and the experimental results are shown in fig. 6. As can be seen from fig. 6, the average accuracy of the volunteers not collecting the gestures was 79.84%, which is slightly reduced compared to the accuracy of 83.8% of the volunteers collecting the gestures, but still within the acceptance interval.
(3) Influence of the number n of segmentation groups on the accuracy of SSW recognition
In this experiment, gesture information of 10 volunteers was collected using an arm ring. A total of 100 gestures and 20 different sign language were collected, each of 50 samples. 3 sentences are randomly selected, the number n of the segmentation groups is changed, the change condition of the sign language average accuracy of the SSW system is observed, the experimental result is shown in fig. 7, the condition that the SSW sentence recognition accuracy rate changes along with the number n of the segmentation groups is shown in fig. 7 by taking 'I fever' as an example. From the graph, it can be known that when the number of divided groups n=5, the accuracy is improved to 85% compared with the smaller number of divided groups, and when the number of divided groups continues to increase, the calculated amount is increased, the recognition accuracy is not improved significantly, and even for some sentences, the recognition accuracy is reduced.
The invention provides a continuous Chinese sign language recognition system SSW based on sliding window segmentation. The system uses the dividing thought, firstly uses the sliding window to divide the average length of single sign language based on words of continuous sign language, then equally divides a gesture signal into n groups, takes out n-1 groups each time and combines the n-1 groups into new data according to the original sequence, carries out multiple times of recognition, and improves the sentence recognition rate. Meanwhile, SSW adopts a multi-voting strategy based on a threshold value to process the recognition result, and the recognition result is marked as the effective number of votes only if the recognition accuracy is larger than the threshold value, so that the influence of a larger deviation part in the gesture signal is reduced, and the result is more credible. SSW increases the calculation amount to a certain extent, but greatly improves the sign language accuracy. Gesture collection and testing on 10 volunteers show that the accuracy reaches 83.8%.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims (7)

1. A continuous Chinese sign language identification method based on sliding window segmentation comprises the following steps:
s1: collecting sEMG and IMU data of the arm through an arm ring;
s2: preprocessing the data acquired in the step S1;
s3: the method comprises the steps of performing feature extraction on preprocessed data by utilizing a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign language through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;
the method for dividing the single sign language words of the continuous sign language through the sliding window comprises the following steps:
selecting the sliding length of a sliding window based on the average length of single sign language of the word, and equally dividing a gesture signal into a plurality of groups of single sign language words by taking one gesture as a unit of one second;
the step of carrying out average segmentation and recombination on each piece of divided data comprises the following steps:
dividing a sign language word into n groups of data on average, deleting one group of data in sequence, forming a new data as input according to the original sequence of the rest n-1 groups of data, and generating n different data as n different inputs by one dividing gesture;
s4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;
s5: and judging and analyzing the plurality of identified sign language word predicted values by using a multi-voting strategy based on a threshold value to obtain an identification result.
2. The continuous chinese sign language recognition method based on sliding window segmentation according to claim 1, wherein in step S1, when the sEMG and IMU data of the arm are collected, the armlet is worn on the forearm, and the sEMG sensor provided on the armlet is positioned at the front end of the forearm to target the middle finger direction.
3. The continuous chinese sign language recognition method based on sliding window segmentation according to claim 1, wherein in step S2, the preprocessing process comprises:
s201: screening out effective information in the signals by using data normalization and the starting time and the ending time of the regular signals;
s202: and filtering the screened effective information through a Butterworth filter.
4. The continuous chinese sign language recognition method based on sliding window segmentation as set forth in claim 1, wherein the specific step of step S5 comprises:
s501: for the data in each sliding window, setting the n sign word predicted values obtained through LSTM neural network recognition as s 1 ,s 2 ,s 3 ,s i ,...s n The corresponding probability is p 1 ,p 2 ,p 3 ,p i ,...p 5
S502: setting the threshold value as D, when the probability p i When not less than D, s is i Is considered as a valid vote; when probability p i When < D, s is i Is considered an invalid vote;
s503: if the effective ticket number is call
(1) If the ticket number of the single result is more than half of the call/2 of the effective ticket number, the result is used as the output result of the window;
(2) If the ticket numbers of the two results are equal and the ticket number is equal to the effective ticket number call/2, taking the highest result pi as the output result of the window;
(3) If there are no cases (1) and (2), the window is regarded as having no valid output information.
5. A continuous chinese sign language recognition system based on sliding window segmentation applying the recognition method according to any one of claims 1 to 4, comprising:
the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;
the data preprocessing module is used for preprocessing sEMG and IMU data of the arm acquired by the data acquisition module;
the data segmentation and recombination module is used for extracting characteristics of the data preprocessed by the data preprocessing module, and the data segmentation and recombination module comprises the steps of dividing single sign language words of continuous sign language through a sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;
the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign word predicted values;
and the threshold-based multi-voting decision module is used for judging and analyzing a plurality of sign language word predicted values recognized by the LSTM-based neural network structure by applying a threshold-based multi-voting strategy to obtain a recognition result.
6. The continuous Chinese sign language identification system based on sliding window segmentation according to claim 5, wherein the data preprocessing module comprises an information screening unit and a filtering unit;
the information screening unit is used for screening effective information in the signals by utilizing the data normalization and the start time and the end time of the regular signals;
and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.
7. The continuous chinese sign language recognition system based on sliding window segmentation of claim 5, wherein the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation reassembly unit;
the continuous sign language segmentation unit is used for dividing single sign language words of continuous sign language through a sliding window;
the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data divided by each continuous sign language segmentation unit to obtain a plurality of new data.
CN202010734304.6A 2020-07-27 2020-07-27 Continuous Chinese sign language identification method and system based on sliding window segmentation Active CN111914724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010734304.6A CN111914724B (en) 2020-07-27 2020-07-27 Continuous Chinese sign language identification method and system based on sliding window segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010734304.6A CN111914724B (en) 2020-07-27 2020-07-27 Continuous Chinese sign language identification method and system based on sliding window segmentation

Publications (2)

Publication Number Publication Date
CN111914724A CN111914724A (en) 2020-11-10
CN111914724B true CN111914724B (en) 2023-10-27

Family

ID=73281861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010734304.6A Active CN111914724B (en) 2020-07-27 2020-07-27 Continuous Chinese sign language identification method and system based on sliding window segmentation

Country Status (1)

Country Link
CN (1) CN111914724B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906498A (en) * 2021-01-29 2021-06-04 中国科学技术大学 Sign language action recognition method and device
CN114115531B (en) * 2021-11-11 2022-09-30 合肥工业大学 End-to-end sign language recognition method based on attention mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5699441A (en) * 1992-03-10 1997-12-16 Hitachi, Ltd. Continuous sign-language recognition apparatus and input apparatus
KR20030030232A (en) * 2001-10-09 2003-04-18 한국과학기술원 Method and System for recognizing continuous sign language based on computer vision
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN109902554A (en) * 2019-01-09 2019-06-18 天津大学 A kind of recognition methods of the sign language based on commercial Wi-Fi
CN110286774A (en) * 2019-07-03 2019-09-27 中国科学技术大学 A kind of sign Language Recognition Method based on Wrist-sport sensor
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10528147B2 (en) * 2017-03-06 2020-01-07 Microsoft Technology Licensing, Llc Ultrasonic based gesture recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5699441A (en) * 1992-03-10 1997-12-16 Hitachi, Ltd. Continuous sign-language recognition apparatus and input apparatus
KR20030030232A (en) * 2001-10-09 2003-04-18 한국과학기술원 Method and System for recognizing continuous sign language based on computer vision
CN109271901A (en) * 2018-08-31 2019-01-25 武汉大学 A kind of sign Language Recognition Method based on Multi-source Information Fusion
CN109902554A (en) * 2019-01-09 2019-06-18 天津大学 A kind of recognition methods of the sign language based on commercial Wi-Fi
CN110286774A (en) * 2019-07-03 2019-09-27 中国科学技术大学 A kind of sign Language Recognition Method based on Wrist-sport sensor
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于多传感器信息检测和融合的中国手语识别研究;王文会;陈香;阳平;李云;杨基海;;中国生物医学工程学报(第05期);全文 *
基于端到端句子级别的中文唇语识别研究;张晓冰;龚海刚;杨帆;戴锡笠;;软件学报(第06期);全文 *
基于自适应多分类器融合的手势识别;刘肖;袁冠;张艳梅;闫秋艳;王志晓;;计算机科学(第07期);全文 *
基于词根的中国手语识别方法;王春立,高文,马继勇,高秀娟;计算机研究与发展(第02期);全文 *

Also Published As

Publication number Publication date
CN111914724A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN108899050B (en) Voice signal analysis subsystem based on multi-modal emotion recognition system
CN108491077B (en) Surface electromyographic signal gesture recognition method based on multi-stream divide-and-conquer convolutional neural network
CN108805088B (en) Physiological signal analysis subsystem based on multi-modal emotion recognition system
Gazda et al. Multiple-fine-tuned convolutional neural networks for Parkinson’s disease diagnosis from offline handwriting
CN111103976B (en) Gesture recognition method and device and electronic equipment
Alrubayi et al. A pattern recognition model for static gestures in malaysian sign language based on machine learning techniques
CN111914724B (en) Continuous Chinese sign language identification method and system based on sliding window segmentation
Shin et al. Korean sign language recognition using EMG and IMU sensors based on group-dependent NN models
Bu Human motion gesture recognition algorithm in video based on convolutional neural features of training images
CN110610172A (en) Myoelectric gesture recognition method based on RNN-CNN architecture
CN111898526B (en) Myoelectric gesture recognition method based on multi-stream convolution neural network
Sheng et al. Siamese networks for weakly supervised human activity recognition
Suh et al. Adversarial deep feature extraction network for user independent human activity recognition
CN115294658A (en) Personalized gesture recognition system and gesture recognition method for multiple application scenes
CN112464738A (en) Improved naive Bayes algorithm user behavior identification method based on mobile phone sensor
CN112801000A (en) Household old man falling detection method and system based on multi-feature fusion
CN111262637B (en) Human body behavior identification method based on Wi-Fi channel state information CSI
CN109766559B (en) Sign language recognition translation system and recognition method thereof
Zheng et al. L-sign: Large-vocabulary sign gestures recognition system
CN113988135A (en) Electromyographic signal gesture recognition method based on double-branch multi-stream network
CN114863572B (en) Myoelectric gesture recognition method of multi-channel heterogeneous sensor
CN111371951A (en) Smart phone user authentication method and system based on electromyographic signals and twin neural network
Singhal et al. Deep Learning Based Real Time Face Recognition For University Attendance System
CN113705339B (en) Cross-user human behavior recognition method based on antagonism domain adaptation strategy
Ma et al. Sign language recognition based on concept learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant