CN111914724A

CN111914724A - Continuous Chinese sign language identification method and system based on sliding window segmentation

Info

Publication number: CN111914724A
Application number: CN202010734304.6A
Authority: CN
Inventors: 王青山; 王鑫炎; 马晓迪; 郑志文; 朱钰; 张江涛; 王�琦
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-11-10
Anticipated expiration: 2040-07-27
Also published as: CN111914724B

Abstract

The invention discloses a continuous Chinese sign language identification method based on sliding window segmentation, which comprises the following steps: s1: collecting sEMG and IMU data of the arm through an arm ring; s2: preprocessing the data collected in the step S1; s3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data; s4: inputting the obtained new data into an LSTM neural network for training to obtain a sign language word predicted value; s5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result. Also discloses a continuous Chinese sign language recognition system based on sliding window segmentation. The average accuracy of the sign language recognition system provided by the invention reaches 83.8%, and is improved by 18.6% compared with an LSTM model.

Description

Continuous Chinese sign language identification method and system based on sliding window segmentation

Technical Field

The invention relates to the technical field of sign language recognition, in particular to a continuous Chinese sign language recognition method and system based on sliding window segmentation.

Background

Communication is a basic requirement for all human beings, and disabled people with hearing impairment (deaf-mutes) are no exception. Communication between deaf-mutes, and between deaf-mutes and sound persons, is generally performed using sign language. The communication between deaf-mutes can be conveniently carried out by using natural sign language, and the communication between the deaf-mutes and sound persons depends on the grammatical sign language (sign language for short) for communication. However, when a sound person communicates with a deaf-mute, the sound person encounters a great obstacle to being unable to read the sign language. Therefore, Sign Language Recognition (SLR) system plays a very important role in the communication between deaf-mute and sound persons.

The mainstream sign language identification method at present can be divided into the following steps according to different media: computer vision based methods and sensor based methods. Koller et al propose to improve label-to-image alignment in real time with weak supervision and embed Convolutional Neural Networks (CNNs) into Hidden Markov Models (HMMs) for correcting frameset labels, improving 10% accuracy. Cui et al propose a weakly supervised framework with deep neural network that solves the video clip to gloss mapping problem by introducing convolutional neural network for spatio-temporal feature extraction and sequence learning, and that achieves comparable results to the latest technology at that time without additional supervision. The Yankee and the like construct a continuous SLR model based on a fast HMM, and the HMM is embedded into a Dynamic Time Warping based Level construction algorithm (LB-DTW) to improve the accuracy of sentence recognition. In addition, a coarse segmentation method is proposed to provide the maximum number of levels, using both grammatical and symbol length constraints to achieve improved recognition accuracy by reducing insertion, deletion and replacement errors, which has higher recognition performance and lower computational effort than other prior art techniques. Yellow et al propose a video-based sign language recognition method that uses a new dual-stream 3-DCNN to extract global-local spatio-temporal features from a logo video. Experiments were performed on the RWTH-pheonix-Weather dataset, which contained 7000 Weather forecast sentences from 9 gestators and a publicly available dataset obtained by Kinect, with recognition accuracy of 61.7% and 82.7%, respectively. Guo et al propose a hierarchical LSTM for sign language translation that addresses the difficulties that traditional HMM and CTC (connectionist Temporal Classification) may not be able to resolve the confusing word order corresponding to the visual content in a sentence during recognition. The inventor proposes a continuous sign language recognition method consisting of off-line training and on-line recognition. The method judges transition motion by utilizing a threshold matrix and a speed threshold in an off-line training stage, and determines the end point of each candidate symbol by utilizing rough segmentation based on the threshold matrix and fine segmentation based on a DTW (Length-Root method) and a Length-Root method in an on-line identification stage. The method solves some error problems caused by gesture transitional motion and verifies validity in a Kinect-based data set. However, the accuracy of the sign language recognition method based on computer vision is affected by factors such as light conversion and clothing occlusion.

Another approach is a sensor-based sign language recognition approach, with sensors including data gloves, armrings, smart watches, and the like. Li and the like provide a continuous SLR model framework which is easy to expand, strong in anti-interference capability and easy to transplant by using HMM technology, the problem of partial scalability in a continuous SLR system is solved, 1024 test sentences and 510 words of vocabularies collected by five gesture collectors by using data gloves are tested, and the accuracy rate of the words reaches 87.4%. Bukhari et al designed a real-time ASL recognition glove equipped with a series of sensors, and recognized 23 data sets using Principal Component Analysis (PCA) algorithm to achieve 92% accuracy. It is undeniable that a pair of gloves with data can accurately read the motion data of the hand motion, thereby realizing high-accuracy recognition, but the gloves have a disadvantage in portability.

The sign language recognition method based on the sensor mainly adopts devices such as data gloves, intelligent watches, armrings and the like to collect sign language gesture data to establish a model and recognize sign languages. The present invention provides a SOFM/SRN/HMM model, which uses an improved Simple Recursive Network (SRN), segments continuous sign language according to the converted SOFM representation, uses the output of the SRN as HMM state, and searches the best matching word sequence using the trellis viterbi algorithm. The system has the recognition accuracy of 82.9% for sign language vocabularies collected by 5113 data gloves, and the accuracy of 86.3% for continuous sign language recognition irrelevant to gesture collectors. Benalcazar et al, using an armring sensor to collect surface Electromyography (sEMG) signals, presented a new model for real-time gesture recognition based on k-Nearest Neighbor (KNN) and DTW algorithms and recognized five gesture signals to 86% accuracy. Yankee et al propose continuous SLR based on an optimized tree structure framework based on sEMG, Accelerometer (ACC) and Gyroscope (GYRO) sensor combinations. The algorithm classifies continuous Chinese Sign Language (CSL) subwords by adopting an optimized tree structure according to the direction and amplitude components of one or two hands. Experimental results on the 150 subword datasets they obtained showed an accuracy of 94.31% in the user specific test and 87.02% in the user independent test. Engin Kaya et al propose a new gesture recognition method that utilizes sEMG signals collected from information acquisition armrings and extracts seven different time domain features from the raw EMG signal using a sliding window method, and by comparing KNN, support vector machine and artificial neural network, the system results based on the support vector machine classifier were found to have the highest accuracy. Manor et al propose new sEMG and ACC signal acquisition positions to acquire hand motion signals of the right forearm, wrist and back of the hand using sEMG and ACC to acquire 18 CSL features. According to the method, sEMG and ACC data are divided by using a sliding window, features are extracted and combined into a feature vector, and Linear Discriminant Analysis (LDA) is used for judgment. Experiments show that the identification accuracy reaches 91.4%. The method is easy to apply Deep Belief Networks (DBN) to the field of Chinese sign language recognition based on wearable sensors. In order to obtain the optimal structure of the network, three different sensor fusion strategies are explored, including data fusion, feature fusion and decision fusion, and the optimal recognition accuracy in the experiment is 95.1% for the user-related test and 88.2% for the user-independent test.

However, the continuous sign language recognition using a sensor is still relatively little. Mittal et al proposed an improved Long Short Term Memory (LSTM) model. The model was tested using 942 different sign word recognitions in indian sign language sentences of 35 with an average accuracy of 72.3%. Gupta et al propose a classifier set based on different length windows to extract features, which realizes real-time classification of continuous sign language sentences by using a multi-mode wearable sensor to improve the accuracy of continuous sign language recognition classification. This study shows that the proposed integration method has a higher accuracy for classification of sentences than a single classifier learned using features extracted from a fixed duration window.

According to research and discovery of the inventor, no working armlet is used for continuous sign sentence recognition of Chinese sign language at present [11 ]. This problem currently faces two major challenges: how to accurately segment sign language words; because in continuous sign language, there are overlapped deformation between sign language words, how to improve the accuracy of word recognition. In view of these two challenges, it is desirable to provide a new sign language recognition system to solve the above problems.

Disclosure of Invention

The invention aims to solve the technical problem of providing a continuous Chinese sign language identification method and a system thereof based on sliding window segmentation, which can obviously improve the identification accuracy of the continuous Chinese sign language.

In order to solve the technical problems, the invention adopts a technical scheme that: the continuous Chinese sign language recognition method based on sliding window segmentation comprises the following steps:

s1: collecting sEMG and IMU data of the arm through an arm ring;

s2: preprocessing the data collected in the step S1;

s3: performing feature extraction on the preprocessed data by using a sliding window, wherein the feature extraction comprises the steps of dividing single sign language words of continuous sign languages through the sliding window, and performing average segmentation and recombination on each divided data to obtain a plurality of new data;

s4: inputting the new data obtained in the step S3 into an LSTM neural network for training to obtain a sign language word predicted value;

s5: and judging and analyzing the predicted values of the plurality of sign language words by using a multi-voting strategy based on a threshold value to obtain a recognition result.

In a preferred embodiment of the present invention, in step S1, when the sEMG and IMU data of the arm are collected, the arm ring is worn on the forearm, and the sEMG sensor mounted on the arm ring is located at the front end of the forearm to aim at the middle finger direction of the finger.

In a preferred embodiment of the present invention, in step S2, the preprocessing process includes:

s201: screening effective information in the signals by using data normalization and the starting time and the ending time of the normalized signals;

s202: filtering the screened effective information through a Butterworth filter.

In a preferred embodiment of the present invention, in step S3, the method for dividing the single sign language word into the continuous sign language words through the sliding window is as follows:

and selecting the sliding length of the sliding window based on the average length of the single sign language of the words, and averagely dividing a gesture signal into a plurality of groups of single sign language words by taking a gesture one second as a unit.

In a preferred embodiment of the present invention, in step S3, the step of performing average segmentation and reassembly on each of the divided data includes:

equally dividing a sign language word into n groups of data, sequentially deleting one group of data, forming a new data as an input by the remaining n-1 groups of data according to the original sequence, and generating n different data as n different inputs by a dividing gesture.

In a preferred embodiment of the present invention, in step S4, the LSTM neural network is composed of two fully-connected layers, two LSTM models and one fully-connected layer, the fully-connected layers include a first fully-connected layer of 512 neurons and a second fully-connected layer of 256 neurons, the LSTM model is bidirectional LSTMs, and each layer of LSTM contains 256 units.

In a preferred embodiment of the present invention, the step S5 includes the following steps:

s501: for the data in each sliding window, n sign language word predicted values obtained through LSTM neural network recognition are set as s₁,s₂,s₃,s_i,...s_nWith a corresponding probability of p₁,p₂,p₃,p_i,...p₅；

S502: setting the threshold value as D when the probability p_iWhen the value is more than or equal to D, adding s_iThe voting votes of (a) are regarded as valid votes; current probability p_i<When D is greater than s_iThe voting votes of (a) are regarded as invalid votes;

s503: the effective number of votes is c_allThen, then

(1) If the number of votes for a single outcome is greater than half the number of valid votes c_allWhen the window is a 2 window, the result is used as the output result of the window;

(2) if the number of votes for two results is equal, and the number of votes is equal to the number of valid votes c_allWhen/2 is, then p_iThe highest result is used as the output result of the window;

(3) if the two conditions (1) and (2) do not exist, the window is regarded as having no valid output information.

In order to solve the technical problem, the invention adopts another technical scheme that: the continuous Chinese sign language recognition system based on sliding window segmentation is provided, and comprises:

the data acquisition module is used for collecting sEMG and IMU data of the arm through the arm ring;

the data preprocessing module is used for preprocessing the sEMG and IMU data of the arm acquired by the data acquisition module;

the data segmentation and recombination module is used for extracting the characteristics of the data preprocessed by the data preprocessing module by using a sliding window, and comprises the steps of carrying out single sign language word division on continuous sign languages through the sliding window, and carrying out average segmentation and recombination on each divided data to obtain a plurality of new data;

the LSTM-based neural network structure is used for training the new data obtained by the data segmentation and recombination module to obtain a plurality of sign language word predicted values;

and the threshold-based multi-voting decision module judges and analyzes the plurality of sign language word predicted values identified by the LSTM-based neural network structure by using a threshold-based multi-voting strategy to obtain an identification result.

In a preferred embodiment of the present invention, the data preprocessing module includes an information filtering unit and a filtering unit;

the information screening unit is used for screening effective information in the signals by utilizing data normalization and the starting time and the ending time of the normalized signals;

and the filtering unit is used for filtering the screened effective information by adopting a Butterworth filter.

In a preferred embodiment of the present invention, the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation and reassembly unit;

the continuous sign language segmentation unit is used for carrying out single sign language word segmentation on the continuous sign language through a sliding window;

and the single sign language segmentation and recombination unit is used for carrying out average segmentation and recombination on the data segmented by each continuous sign language segmentation unit to obtain a plurality of new data.

The invention has the beneficial effects that:

(1) the method utilizes sign language data acquired by the armlet to identify sign language sentences, and solves the problem of continuous sign language segmentation by using a sliding window method based on average word length; while the idea of segmentation is used in single sign language recognition. A single sign language is obtained through sliding window division, and parts of the sign language are taken for recognition, so that a single sign language word can be recognized for multiple times, and the recognition accuracy is improved;

(2) the continuous Chinese sign language recognition system based on sliding window segmentation has the average accuracy rate of 83.8 percent, and is improved by 18.6 percent compared with an LSTM model (a method of directly using an LSTM neural network after characteristics are extracted through a sliding window).

Drawings

FIG. 1 is a flow chart of a continuous Chinese sign language recognition method based on sliding window segmentation according to the present invention;

FIG. 2 is a schematic view of the manner in which the arm ring is worn;

FIG. 3 is a schematic diagram of sentence recognition accuracy when n has different values;

FIG. 4 is a schematic diagram of the LSTM neural network;

FIG. 5 is a data histogram of sentence recognition results using SSW and LSTM models, respectively;

FIG. 6 is a schematic representation of the stability of the SSW system of the present invention;

FIG. 7 is a data histogram of SSW sentence recognition accuracy as a function of the number of segmentation groups n;

fig. 8 is a block diagram of the continuous chinese sign language recognition system based on sliding window segmentation.

Detailed Description

The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.

Referring to fig. 1, an embodiment of the present invention includes:

a continuous Chinese sign language identification method based on Sliding Window segmentation (SSW) comprises the following steps:

s1: collecting sEMG and IMU data of the arm through an arm ring;

in this embodiment, the device used for data acquisition is a commercial Myo arm ring, which is worn on the forearm as shown in fig. 2, and the sEMG sensor with a badge is located at the front end of the forearm to aim at the middle finger direction of the finger. It has 8 sEMG sensors and an IMU unit for a total of 18 bits of data.

S2: preprocessing the data collected in the step S1; the pretreatment process comprises the following steps:

Specifically, the signal of a whole sentence obtained by the arm ring is set as,

E(x)＝(E₁(x),E₂(x),...,E₁₇(x),E₁₈(x))， (1)

wherein E_i(x) (1 ≦ i ≦ 18) for the ith signal from the arm ring.

The complete sentences collected by the arm rings must have starting pauses and ending pauses, and effective information in signals should be screened out in order to reduce calculation overhead and facilitate subsequent training. Setting a sliding window and a sliding length, calculating an average Absolute value (MVA) in the current sliding window, and if the average Absolute value is smaller than a certain empirical value, indicating that the sentence starts or ends. In this embodiment, the sEMG signal acquisition frequency is 200Hz, 50 data points are used as the sliding window size, 5 data points are used as the sliding length, and MVA is calculated:

it is known from general experience that if the MVA is less than 10, it means that the segment of information is invalid information, and thereby the invalid information is deleted. The data is then converted into the [0,1] interval using normalization and absolute value taking operations and is denoted as w (x):

W(x)＝(W₁(x),W₂(x),...,W₁₇(x),W₁₈(x))， (3)

wherein W_i(x) (1. ltoreq. i. ltoreq.18) represents E_i(x) And normalizing and taking the absolute value of the data.

While the sensor information is acquired by the arm ring, the signal has partial noise in addition to necessary data. The denoising operation is carried out by filtering the input signals, so that important information cannot be filtered in the denoising process, but partial differences of sensor signals are also reserved, the robustness of the model is ensured, and the accuracy is improved. The invention therefore performs the filtering operation by means of a butterworth filter. The butterworth filter is one of digital filters, and the frequency response curve in the band pass is the smoothest, so that a low pass filter can be designed to eliminate the noise interference of high frequency. The fewer parameters of the butterworth filter are computationally less expensive than other filters. The filtration was carried out with a third order butterworth filter with a cut-off frequency of 10Hz, and the result was recorded as a (x).

by observing sign language, the average time of a sign language word is about one second, so that A (x) is divided by one gesture for one second (200 data points). According to the Wahid et al study, the present example selects a sliding window with a sliding length of 40 (covering 80%). Dividing the data to obtain 18-dimensional data.

After the division of the single sign language words through the sliding window, each divided data is divided. Assume that data of one partition is S ═ S₁,s₂,...,s₁₉₉,s₂₀₀In which s is_i＝{s_i1,w_i2,...,s_i17,s_i18Dividing the signal S into n groups of data and recording the data as S (i is more than or equal to 1 and less than or equal to 200)¹,S²,...S^n-1,SⁿOne group of data is deleted in sequence, and the rest n-1 groups of data form a new data as input according to the original sequence, so that n different data can be generated by one dividing gesture as n different inputs.

In this embodiment, the ten volunteers are invited to collect 100 sign language messages and 20 different sign language sentences without interference. After the sign language sentence divides the data through the sliding window, equally dividing a sign language word into n groups of data, sequentially deleting one group of data, forming a new data by the remaining n-1 groups of data according to the original sequence as new data, inputting the new data into the LSTM neural network, and obtaining an experimental result when the number n of the divided groups is different, wherein the experimental result is shown in FIG. 3. As can be seen from FIG. 3, when n is greater than or equal to 2, the data accuracy rate is increased rapidly, and the accuracy rate is greater than that without preprocessing, and when n is greater than or equal to 6, the data is basically stable, but the overhead in the algorithm operation process is greatly increased. Finally, the fixed n is 5, so that the accuracy is improved, and the high overhead in the operation process is prevented.

the neural network consists of two fully connected layers, two LSTM models and one fully connected layer, as shown in fig. 4. After the data processing, the original 200 × 18 matrix is changed to 160 × 18 matrix and input. The input data first passes through a full connection layer of 512 neurons and a full connection layer of 256 neurons. The outputs of the two fully-connected layers are used as inputs to bi-directional LSTMs, each layer of LSTM containing 256 cells. And then, inputting the output to a full connection layer, and judging to obtain a predicted value. The predicted value is a predicted result of gesture information of the data obtained by inputting the data after being divided and combined into a neural network.

For the 18-dimensional data in each sliding window, five results are obtained through neural network identification, and the result is set as s₁,s₂,s₃,s₄,s₅With a corresponding probability of p₁,p₂,p₃,p₄,p₅. Setting the threshold value as D when the probability p_iWhen the value is more than or equal to D, adding s_iThe voting of (2) is regarded as a valid vote, when probability p_i<When D is greater than s_iThe votes in (1) are regarded as invalid votes. The effective number of votes is c_allThe specific operation is as follows:

(1) if the number of votes for a single outcome is greater than half the number of valid votes c_allAnd when the window is 2, the result is used as the output result of the window.

(2) If the number of votes for two results is equal, and the number of votes is equal to the number of valid votes c_allWhen/2 is, then p_iThe highest result is the output result for that window.

(3) If the two conditions do not exist, the window is regarded as having no effective output information.

Based on the method, the invention also provides a continuous Chinese sign language recognition system based on sliding window segmentation, which refers to fig. 8 and comprises the following modules:

The invention evaluates the performance of the sign language sentence recognition system proposed by the text through experiments. The experiments used the Myo armloop as the information acquisition device. The arm ring is provided with 8 sEMG sensors, 1 triaxial accelerometer and 1 triaxial gyroscope, can acquire arm electromyographic signals at the frequency of 200Hz and sends the signals to a computer. The computer is equipped with Intel core i7-8700 processor, 16GB memory, Nvidia GTX 1080 video card, 8GB video card and Windows10 operating system.

In the present embodiment, the sentence recognition accuracy is defined as [13]

Where S, I, D represent the number of words replaced, inserted, and deleted, respectively. N represents the number of sign language words recognized by testing successive sign languages.

(1) SSW vs LSTM

Randomly selecting 10 sign language sentences, identifying the 10 sign language sentences by using an SSW system, and comparing the 10 sign language sentences with the LSTM neural network method directly after extracting features in a sliding window, wherein the experimental result is shown in figure 5. As can be seen from FIG. 5, the accuracy of SSW system identification is much higher than the LSTM model. Wherein, the average accuracy of the SSW system is 83.8%, and the average accuracy of the LSTM model is 65.2%. The reason is that the SSW system trains a sliding window partition for multiple times and applies a threshold-based multi-voting strategy, so that the influence of deviations such as gesture connection, incomplete sign language actions and the like in a gesture signal is reduced, and the accuracy of word recognition in sentences is improved.

(2) SSW stability

The purpose of the SSW system is to allow most sign language sentences of the deaf-mute to be translated into text, so it should be able to accurately recognize the gestures of any deaf-mute. In order to verify the stability of the SSW system, 10 volunteers of the same age group who did not acquire gestures were invited to perform sign language recognition in this embodiment, and the experimental results are shown in fig. 6. As can be seen from fig. 6, the average accuracy of volunteers who did not acquire a gesture was 79.84%, which was slightly reduced compared to the accuracy of 83.8% of volunteers who acquired a gesture, but still within the acceptance interval.

(3) Influence of the number of segmentation groups n on the accuracy of SSW recognition

In this experiment, gesture information of 10 volunteers was collected using an arm ring. A total of 100 gestures and 20 different sign languages were collected, 50 samples per sentence. Randomly selecting 3 sentences, changing the number n of the segmentation groups, and observing the change condition of the average accuracy rate of the sign language of the SSW system, wherein the experimental result is shown in FIG. 7, and the change condition of the SSW sentence identification accuracy rate along with the number n of the segmentation groups is shown in FIG. 7 by taking 'I have fever' as an example. It can be known from the figure that when the number of the segmentation groups n is 5, the accuracy is improved to 85% compared with the smaller number of the segmentation groups, when the number of the segmentation groups continues to increase, the calculation amount increases, the identification accuracy is not obviously improved, and even for some sentences, the identification accuracy is reduced.

The invention provides a continuous Chinese sign language recognition system SSW based on sliding window segmentation. The system uses a segmentation idea, firstly, a sliding window is used for segmenting the average length of a single sign language of a continuous sign language based on words, then, a gesture signal is averagely divided into n groups, n-1 groups are taken out each time and combined into new data according to the original sequence, and recognition is carried out for multiple times, so that the sentence recognition rate is improved. Meanwhile, the SSW processes the recognition result by adopting a threshold-based multi-voting strategy, and the recognition result is recorded as an effective ticket number only if the recognition accuracy is greater than the threshold, so that the influence of a large deviation part in the gesture signal is reduced, and the result is more credible. Although the SSW increases the calculation amount to a certain extent, the accuracy rate of sign language is greatly improved. Gesture collection and test on 10 volunteers show that the accuracy rate reaches 83.8%.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A continuous Chinese sign language identification method based on sliding window segmentation comprises the following steps:

s1: collecting sEMG and IMU data of the arm through an arm ring;

s2: preprocessing the data collected in the step S1;

2. The continuous Chinese sign language recognition method based on sliding window segmentation recited in claim 1, wherein in step S1, when collecting sEMG and IMU data of the arm, the arm ring is worn on the forearm, and the sEMG sensor mounted on the arm ring is located at the front end of the forearm and is aligned with the middle finger direction of the finger.

3. The continuous chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein in step S2, the preprocessing process comprises:

4. The continuous chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein in step S3, the method for dividing single sign language words into continuous sign languages through sliding window is:

5. The continuous chinese sign language recognition method of claim 1, wherein in step S3, the step of performing average segmentation and reassembly on each divided data includes:

6. The continuous Chinese sign language recognition method based on sliding window segmentation as claimed in claim 1, wherein the specific steps of step S5 include:

s503: the effective number of votes is c_allThen, then

7. A continuous Chinese sign language recognition system based on sliding window segmentation is characterized by comprising:

8. The continuous Chinese sign language recognition system based on sliding window segmentation as claimed in claim 7, wherein the data preprocessing module comprises an information filtering unit and a filtering unit;

9. The continuous Chinese sign language recognition system based on sliding window segmentation as claimed in claim 7, wherein the data segmentation and reassembly module comprises a continuous sign language segmentation unit, a single sign language segmentation and reassembly unit;