CN113642422A - Continuous Chinese sign language recognition method - Google Patents

Continuous Chinese sign language recognition method Download PDF

Info

Publication number
CN113642422A
CN113642422A CN202110848023.8A CN202110848023A CN113642422A CN 113642422 A CN113642422 A CN 113642422A CN 202110848023 A CN202110848023 A CN 202110848023A CN 113642422 A CN113642422 A CN 113642422A
Authority
CN
China
Prior art keywords
video
word
sign language
encoder
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110848023.8A
Other languages
Chinese (zh)
Inventor
马乐
吴晓越
黄仰来
李志伟
徐东甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Electric Power University
Original Assignee
Northeast Dianli University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Dianli University filed Critical Northeast Dianli University
Priority to CN202110848023.8A priority Critical patent/CN113642422A/en
Publication of CN113642422A publication Critical patent/CN113642422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a method for identifying continuous Chinese sign language, which comprises the following steps: acquiring sign language presenter video data information; performing ROI (region of interest) processing on the video; constructing a self-encoder, inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video; inputting the processed video into a key frame identification module to identify key frames; generating an attention curve of each word based on the time sequence through the obtained key frame information; fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of a video segment corresponding to each word in the video; and when all the video segments are recognized, combining recognized word results to complete the recognition of the semantics of the continuous sign language video. The method effectively realizes the segmentation and the word-by-word training of the continuous video, can identify each word in the video, avoids the respective training of sentences containing the same word, and effectively identifies the continuous sign language with different word combination modes.

Description

Continuous Chinese sign language recognition method
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a Chinese continuous sign language identification method.
Background
Sign language is a main communication mode of hearing-impaired people and plays a great role in communication of daily life, however, as most normal people in China do not go through the learning of the sign language in China, the hearing-impaired people have a lot of difficulties in expressing the personal demands. The continuous sign language is the most common form for hearing-impaired people to express semantics, and has high research value in the real society due to the characteristics of coherent spatial motion, intuitive and easily understood semantics and the like.
The existing continuous sign language identification method adopts the following three steps: (1) training the recognition model by continuously expanding training data samples of various semantics and extracting features in the video; (2) carrying out feature extraction on a sign language video demonstrated by a sign language demonstrator to be identified; (3) and inputting the acquired characteristics of the sign language video of the demonstrator into the model for classification, and outputting a classification result to obtain an identification result. However, the vocabulary combination method of the continuous sign language is various, and the above method can only identify the semantics of the fixed vocabulary combination in the training sample, and is difficult to adapt to the diversity of the continuous sign language.
Therefore, how to more effectively recognize continuous sign language with different vocabulary combination modes is a problem to be solved urgently at present.
Disclosure of Invention
Through the description of the content, the invention provides a continuous sign language recognition method, which takes the video key frames of the data to be recognized as clues, and can accurately recognize the semantics of each word in the continuous sign language by assisting the attention mechanism among the key frames, thereby meeting the requirement of recognizing the diversity of different vocabulary combination modes for the continuous sign language.
The application provides a continuous Chinese sign language recognition method, the recognition process is shown as the attached figure 1, and the method comprises the following steps:
acquiring sign language presenter video data information;
performing ROI (region of interest) processing on the video;
inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video;
inputting the processed video into a key frame identification module to identify key frames;
generating an attention curve of each word based on the time sequence by obtaining the key frame information;
fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of the video segment corresponding to each word in the video;
and performing approximate matching on the regression result and the word vector to obtain and output a final semantic result.
The video data information of the sign language demonstrator is acquired by an RGB color camera, and the ROI processing amplifies the limb action information of the performer in the video to obtain more obvious action characteristics.
The processed video adopts optical flow to obtain a pause part in continuous action as a key frame.
The self-encoder is designed based on a Convolutional Neural Network (CNN) to obtain the characteristic vector of the action information in each frame of image.
And arranging the feature vectors of each frame according to the sequence of the video frames to obtain a group of vectors capable of highly describing the continuous motion features of the video.
And designing an attention curve by using a Gaussian function, and finally generating the attention curve of the video segment between each key frame by acquiring the key frame and other related information by using the Gaussian function.
Based on the obtained attention curve and the feature vector of each frame of the video, the feature of a corresponding word in the video is amplified, and the features of other words in the same video are reduced, so that the individual recognition of each word in the video is realized.
The recognition of each word is realized by performing regression on the feature vector of the video, and the result obtained by network regression is subjected to similar matching processing with the word vector to output a semantic vocabulary corresponding to the word vector.
Preferably, the method adopts a long-short term memory network to identify the action characteristics of a group of characteristic vectors corresponding to a word in time and space continuity, thereby avoiding the preprocessing step of unifying the lengths of video data in the traditional identification method.
Preferably, the method realizes the recognition capability of different word combination modes by amplifying, segmenting and recognizing the characteristics of the continuous video.
According to the technical scheme provided by the invention, the method for combining the video key frame screening and the attention mechanism can effectively realize the identification of the continuous sign language. Moreover, the method can complete the effect of training the isolated sign language video only by inputting the continuous sign language video in the training process of the recognition model, thereby effectively reducing the working intensity of manually marking the video and realizing the efficient continuous sign language recognition effect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a recognition process of a continuous Chinese sign language recognition method according to an embodiment of the present invention;
FIG. 2 is a graph of the effect of attention curve generation;
FIG. 3 is a flow chart of a training process of a continuous Chinese sign language recognition method according to an embodiment of the present invention;
Detailed Description
The invention is further described in detail with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are only a few, and not all, embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.
The flow chart of the training process of the continuous Chinese sign language identification method is shown in fig. 3, the method provided by the invention adopts an RGB color camera to acquire data information of performers, and the acquired video data information is classified and stored according to meanings.
Before training, ROI processing is needed to be carried out on the video, the ROI processing part is based on the recognition of human faces, according to the characteristic that most sign language demonstration in China is carried out on the upper half part of a body, fixed window interception is carried out on the upper half part of a trunk and an arm swing area by the position of the face of a demonstrator in the video and the size of the face occupying the whole video area, and the characteristic information of the demonstrator obtained by the method can be further amplified.
In addition, irrelevant information such as background images, surrounding environments and the like still exists in the video after feature amplification processing. Therefore, the performer is separated from the environment by adopting foreground segmentation, and the body motion characteristic information of the hand performer is further amplified.
Because the body appearance of the sign language performer and the distance from the camera can make the size of each video processed by the ROI different, in order to improve the accuracy of subsequent feature extraction and recognition, the processed images are unified into 224 × 224.
In the training process, the self-encoder adopts independent training, and the video data processed by ROI is subjected to encoding and decoding training, so that the video frame passing through the encoder obtains a feature vector Tf capable of highly describing the action features of the video frame, and the feature vector representation capability is judged by observing whether a decoder can restore the capability of the video frame according to the feature vector.
The self-encoder adopts Convolutional Neural Networks (CNN) to design the encoder and the decoder part in the self-encoder.
The gesture language has obvious pause between gesture language movements of each word in the demonstration process, and the variation range of the hand movement is smaller than that in the movement process, so that the optical flow method is adopted to calculate the optical flow of each frame image adjacent to the time sequence and count the optical flow vector histogram in the image area.
In the optical flow vector histogram statistical process, because each optical flow has tiny change, a threshold value is set for the optical flow vector value of each pixel point so as to avoid the influence of the tiny change on a statistical result.
After the optical flow vector histogram statistics is completed, a key frame section of transition between two words in the sign language video is judged through setting of a threshold value, and an intermediate frame of the key frame section is taken as a key frame for storage.
Mu and sigma are respectively the index of an intermediate frame of a certain segmented word corresponding to the length of a section of video in the video and the length of the intermediate frame from the boundary of the section of video, wherein mu is obtained by the index of the intermediate frame of the video frame between two key frames, and sigma is the length from the video frame corresponding to mu to the key frame closest to the mu.
And inputting the obtained mu and sigma values and the video frame length i into an attention mechanism module group by group to generate an attention curve W with the length i, wherein each mu and sigma corresponds to one attention curve W.
The attention curve is obtained by using a gaussian function, which is defined as follows:
Figure BDA0003181412860000061
in the formula, a is an amplitude parameter, i is a number of a current frame of the video, μ and σ are respectively an intermediate frame index of a certain segmented word corresponding to the length of a section of video in the video and the length of an intermediate frame from a boundary of the section of video, and according to fig. 2, the relationship between μ, σ and i and a generated attention curve and the video can be visually seen.
The obtained attention curves are respectively fused with the video feature vector groups, and it can be seen from different gaussian curve distributions in fig. 2 that each attention curve has an effect of enhancing feature vectors on video segments corresponding to μ and σ values, so that the corresponding words are associated with the corresponding video segments.
Meanwhile, the length of the attention curve is highly consistent with the length i of the video frame, so that the phenomena of loss, incompleteness and the like of the characteristic data cannot occur.
The attention curve and the video feature vector group are fused and input into the recognition module for training, the recognition module is designed by adopting a long-term and short-term memory network, the output of the network and the size of the word vector are both set to be 1 x 60 so as to facilitate the composition and calculation of a loss function, and the training process of the neural network is completed by the calculated result through reverse propagation.
The long-term and short-term memory network is a time recurrent neural network, is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence, is used as a feature recognition problem of a continuous time sequence, and sign language semantics of the continuous time sequence have obvious relation between contexts in continuous action, so that the network not only stores previously learned contents in a training process, but also can generate relevant relation between subsequently learned contents and the previously learned contents, reduces the influence of different video sequence lengths on the network learning process, and effectively learns the continuous sign language action features.
The sentence is segmented by adopting a jieba module in Python, words forming the sentence are independent one by one, the segmented words are trained by adopting a word2vec function in a Gensim module, and the trained model can generate a word vector corresponding to the input words.
In the training process, because the output of the network is the same as the dimensionality of the word vector, a loss function is constructed by using MSELoss (mean square loss function), and the MSELoss calculation formula is as follows:
loss(xi,yi)=(xi-yi)2
where the dimensions of x, y are the same and can be vectors or matrices and i is a subscript.
The parameters in the long-term and short-term memory network are adjusted through the result obtained by the calculation of the loss function through back propagation, so that the predicted value generated by the network can approach to the corresponding word vector as much as possible, even if the value of the loss function is continuously close to 0. And when the loss functions generated by all the training samples through the network are less than 0.001, the network can be considered to generate correct predicted values, and at the moment, the training is stopped and the network model is stored.
In the identification process, as in the training process, the RGB camera is adopted to acquire the action of an performer and perform ROI preprocessing, the processed video acquires key information and segments the video, relevant information is input into an attention mechanism to generate a corresponding attention curve and fused with a video feature vector sequence to be input into a network, an identification module network in the identification process is a model already completed in the training process, so that a corresponding prediction result can be directly generated for the input information, the prediction result is subjected to close matching, namely a standard word vector most similar to the prediction result is searched, the identification effect of corresponding words in the hand language video is realized, and after all video segments are identified, the identified word results are combined, and the identification of the semantics of the continuous hand language video can be completed.
The method effectively realizes the segmentation and the word-by-word training of the continuous video, can identify each word in the video, avoids the respective training of sentences containing the same word, and effectively identifies the continuous sign language with different word combination modes.
The above description of the disclosed embodiments is readily understood by those skilled in the art. Various modifications to these embodiments will be readily apparent to those skilled in the art. Therefore, any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A continuous sign language recognition method is provided, and the method comprises the following steps:
acquiring sign language presenter video data information;
performing ROI (region of interest) processing on the video;
inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video;
inputting the processed video into a key frame identification module to identify key frames;
generating an attention curve of each word based on the time sequence through the obtained key frame information;
fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of a video segment corresponding to each word in the video;
and performing approximate matching on the regression result and the word vector to obtain and output a final semantic result.
2. The method of claim 1, wherein the ROI processing is performed on motion feature information of the video image, and the steps include:
step 1: carrying out human body contour image region interception processing based on face recognition on the obtained sign language video;
step 2: and separating the performer from the environment by adopting foreground segmentation, and further amplifying the body motion characteristic information of the hand performer.
3. The method of claim 1, wherein the obtaining of the feature vector is implemented by self-encoding, and the implementation steps are as follows:
step 1: disassembling the training samples frame by frame and inputting the training samples into an autoencoder for training;
step 2: extracting image features from an encoder in the self-encoder and converting the image features into feature vectors, restoring the image to the feature vectors by the encoder, and training the self-encoder by comparing the generated image with an original image to ensure that the generated image restores the input image as much as possible;
and step 3: after the training is finished, the self-encoder model is stored, and at the moment, the image input to the self-encoder can obtain the feature vector which can uniquely represent the image.
4. The method as claimed in claim 1, wherein the identification of the key frames and the related information is obtained by an optical flow method, the histogram of optical flow vectors in the image area is counted, the key frame segment of the transition between two words in the sign language video is determined by setting a threshold, the intermediate frame of the key frame segment is taken as the key frame for storage, and the related information such as the intermediate frame index of a word after being divided corresponding to the length of a video segment in the video and the length of the intermediate frame from the video boundary can be obtained according to the positions of the two key frames.
5. The method of claim 1, wherein the neural network uses the feature vector and the attention curve as input, uses the predicted value of the network output and the word vector as loss function, and completes training by back propagation, and the method comprises the following steps:
step 1: designing an attention curve by adopting a Gaussian function, and taking the relevant data information obtained in the claim 4 and the video length as input to obtain an attention curve;
step 2: and respectively fusing the attention curve with the feature vector, inputting the fused attention curve into a neural network in a digital unit of a word for training, calculating a mean square loss function of an output predicted value obtained through the network and a standard word vector, and adjusting network parameters through back propagation of a calculation result to realize the training process of a network model on a video segment corresponding to each word of a continuous video.
CN202110848023.8A 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method Pending CN113642422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110848023.8A CN113642422A (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110848023.8A CN113642422A (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Publications (1)

Publication Number Publication Date
CN113642422A true CN113642422A (en) 2021-11-12

Family

ID=78418474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110848023.8A Pending CN113642422A (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Country Status (1)

Country Link
CN (1) CN113642422A (en)

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311180A (en) * 1999-03-11 2000-11-07 Fuji Xerox Co Ltd Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system
US20040088723A1 (en) * 2002-11-01 2004-05-06 Yu-Fei Ma Systems and methods for generating a video summary
CN101655859A (en) * 2009-07-10 2010-02-24 北京大学 Method for fast removing redundancy key frames and device thereof
CN102089616A (en) * 2008-06-03 2011-06-08 焕·J·郑 Interferometric defect detection and classification
US20120123780A1 (en) * 2010-11-15 2012-05-17 Futurewei Technologies, Inc. Method and system for video summarization
US20120143358A1 (en) * 2009-10-27 2012-06-07 Harmonix Music Systems, Inc. Movement based recognition and evaluation
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN107748761A (en) * 2017-09-26 2018-03-02 广东工业大学 A kind of extraction method of key frame of video frequency abstract
CN107784118A (en) * 2017-11-14 2018-03-09 北京林业大学 A kind of Video Key information extracting system semantic for user interest
US20180204111A1 (en) * 2013-02-28 2018-07-19 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN108347625A (en) * 2018-03-09 2018-07-31 北京数码视讯软件技术发展有限公司 A kind of method and apparatus of TS Streaming Medias positioning
CN109409231A (en) * 2018-09-27 2019-03-01 合肥工业大学 Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks
CN110019817A (en) * 2018-12-04 2019-07-16 阿里巴巴集团控股有限公司 A kind of detection method, device and the electronic equipment of text in video information
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110569823A (en) * 2019-09-18 2019-12-13 西安工业大学 sign language identification and skeleton generation method based on RNN
CN111158491A (en) * 2019-12-31 2020-05-15 苏州莱孚斯特电子科技有限公司 Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system
WO2020258661A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Speaking person separation method and apparatus based on recurrent neural network and acoustic features
CN112241470A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Video classification method and system
CN112257513A (en) * 2020-09-27 2021-01-22 南京工业大学 Training method, translation method and system for sign language video translation model
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311180A (en) * 1999-03-11 2000-11-07 Fuji Xerox Co Ltd Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system
US20040088723A1 (en) * 2002-11-01 2004-05-06 Yu-Fei Ma Systems and methods for generating a video summary
CN102089616A (en) * 2008-06-03 2011-06-08 焕·J·郑 Interferometric defect detection and classification
CN101655859A (en) * 2009-07-10 2010-02-24 北京大学 Method for fast removing redundancy key frames and device thereof
US20120143358A1 (en) * 2009-10-27 2012-06-07 Harmonix Music Systems, Inc. Movement based recognition and evaluation
US20120123780A1 (en) * 2010-11-15 2012-05-17 Futurewei Technologies, Inc. Method and system for video summarization
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
US20180204111A1 (en) * 2013-02-28 2018-07-19 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
US20200184278A1 (en) * 2014-03-18 2020-06-11 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN107748761A (en) * 2017-09-26 2018-03-02 广东工业大学 A kind of extraction method of key frame of video frequency abstract
CN107784118A (en) * 2017-11-14 2018-03-09 北京林业大学 A kind of Video Key information extracting system semantic for user interest
CN108347625A (en) * 2018-03-09 2018-07-31 北京数码视讯软件技术发展有限公司 A kind of method and apparatus of TS Streaming Medias positioning
CN109409231A (en) * 2018-09-27 2019-03-01 合肥工业大学 Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov
CN110019817A (en) * 2018-12-04 2019-07-16 阿里巴巴集团控股有限公司 A kind of detection method, device and the electronic equipment of text in video information
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks
WO2020258661A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Speaking person separation method and apparatus based on recurrent neural network and acoustic features
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110569823A (en) * 2019-09-18 2019-12-13 西安工业大学 sign language identification and skeleton generation method based on RNN
CN111158491A (en) * 2019-12-31 2020-05-15 苏州莱孚斯特电子科技有限公司 Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system
CN112241470A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Video classification method and system
CN112257513A (en) * 2020-09-27 2021-01-22 南京工业大学 Training method, translation method and system for sign language video translation model
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JUAN P. VELASQUEZ等: ""Dynamic Sign Language Recognition Using Gaussian Process Dynamical Models"", 《INTERNATIONAL WORK-CONFERENCE ON THE INTERPLAY BETWEEN NATURAL AND ARTIFICIAL COMPUTATION》, pages 491 - 500 *
SHENGWEI ZHANG等: ""Research on Dynamic Sign Language Recognition Based on Key Frame Weighted of DTW"", 《INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY AND ENHANCED LEARNING》, pages 11 - 20 *
SHILIANG HUANG等: ""A Novel Chinese Sign Language Recognition Method Based on Keyframe-Centered Clips"", 《IEEE SIGNAL PROCESSING LETTERS》, vol. 25, no. 3, pages 442 - 446 *
应锐等: ""基于运动块及关键帧的人体动作识别"", 《复旦学报(自然科学版)》, vol. 53, no. 6, pages 815 - 822 *
解启娜等: ""手语识别方法与技术综述"", 《计算机工程与应用》, vol. 57, no. 18, pages 1 - 12 *

Similar Documents

Publication Publication Date Title
Akmeliawati et al. Real-time Malaysian sign language translation using colour segmentation and neural network
Oliver et al. Layered representations for human activity recognition
Yang et al. Sign language spotting with a threshold model based on conditional random fields
Dabre et al. Machine learning model for sign language interpretation using webcam images
Nimisha et al. A brief review of the recent trends in sign language recognition
NadeemHashmi et al. A lip reading model using CNN with batch normalization
Sharma et al. Vision-based sign language recognition system: A Comprehensive Review
CN111354246A (en) System and method for helping deaf-mute to communicate
De Coster et al. Machine translation from signed to spoken languages: State of the art and challenges
Gogate et al. Real time emotion recognition and gender classification
CN116129013A (en) Method, device and storage medium for generating virtual person animation video
CN115964638A (en) Multi-mode social data emotion classification method, system, terminal, equipment and application
CN114694255A (en) Sentence-level lip language identification method based on channel attention and time convolution network
Abrar et al. Deep lip reading-a deep learning based lip-reading software for the hearing impaired
Mistree et al. Towards Indian sign language sentence recognition using INSIGNVID: Indian sign language video dataset
CN116564338B (en) Voice animation generation method, device, electronic equipment and medium
Tewari et al. Real Time Sign Language Recognition Framework For Two Way Communication
Shokoori et al. Sign language recognition and translation into pashto language alphabets
CN114882590B (en) Lip reading method based on event camera multi-granularity space-time feature perception
Avula et al. CNN based recognition of emotion and speech from gestures and facial expressions
CN113642422A (en) Continuous Chinese sign language recognition method
CN112135200B (en) Video description generation method for compressed video
CN112926665A (en) Text line recognition system based on domain self-adaptation and use method
Vayadande et al. LipReadNet: A Deep Learning Approach to Lip Reading
Chanda et al. Automatic hand gesture recognition with semantic segmentation and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination