CN113642422B - Continuous Chinese sign language recognition method - Google Patents

Continuous Chinese sign language recognition method Download PDF

Info

Publication number
CN113642422B
CN113642422B CN202110848023.8A CN202110848023A CN113642422B CN 113642422 B CN113642422 B CN 113642422B CN 202110848023 A CN202110848023 A CN 202110848023A CN 113642422 B CN113642422 B CN 113642422B
Authority
CN
China
Prior art keywords
video
word
sign language
encoder
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110848023.8A
Other languages
Chinese (zh)
Other versions
CN113642422A (en
Inventor
马乐
吴晓越
黄仰来
李志伟
徐东甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Electric Power University
Original Assignee
Northeast Dianli University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Dianli University filed Critical Northeast Dianli University
Priority to CN202110848023.8A priority Critical patent/CN113642422B/en
Publication of CN113642422A publication Critical patent/CN113642422A/en
Application granted granted Critical
Publication of CN113642422B publication Critical patent/CN113642422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a continuous Chinese sign language identification method, which comprises the following steps: acquiring video data information of a sign language demonstrator; ROI (RegionofInterest) processing the video; the self-encoder is constructed to input the processed video to the self-encoder to obtain the feature vector of each frame of the video; the processed video is input into a key frame identification module to identify key frames; generating a time-series-based attention curve of each word through the obtained key frame information; fusing the obtained attention curve with the feature vector generated by the self-encoder and inputting the fused attention curve into a long-term and short-term memory network to obtain a regression result of video segments corresponding to each word in the video; and after all the video segments are identified, combining the identified word results, and then completing the identification of the semantics of the continuous sign language video. The method effectively realizes segmentation and word-by-word training of the continuous video, can identify each word in the video, avoids the separate training of sentences containing the same word, and effectively identifies continuous sign language of different vocabulary combination modes.

Description

Continuous Chinese sign language recognition method
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a Chinese continuous sign language identification method.
Background
Sign language is used as a main communication mode of hearing impaired people and plays a great role in daily life communication, however, as China and most normal people do not learn Chinese sign language yet, the hearing impaired people have a plurality of difficulties in expressing own appeal. The continuous sign language is used as the most common form of the hearing impaired people expressing the semantics, and has high research value in the real society due to the characteristics of coherent spatial actions, visual and easy understanding of the semantics and the like.
The existing continuous sign language identification method comprises the following three steps: (1) Training the recognition model by continuously expanding training data samples of various semantemes and extracting features in the video; (2) Extracting features of a sign language video to be demonstrated by a sign language presenter to be identified; (3) And inputting the acquired characteristics of the sign language video of the presenter into a model for classification, and outputting a classification result to obtain a recognition result. However, the vocabulary combination modes of continuous sign language are various, and the method can only recognize the semantics of the fixed vocabulary combination in the training sample, so that the method is difficult to adapt to the diversity of the continuous sign language.
Therefore, how to more effectively identify continuous sign language of different vocabulary combination modes is a problem to be solved in the present day.
Disclosure of Invention
Through the description, the invention provides a continuous sign language recognition method, which takes the key frames of the data video to be recognized as clues, and can accurately recognize the semantics of each word in the continuous sign language by assisting the attention mechanism among the key frames, thereby meeting the requirement on the diversity of different vocabulary combination modes of continuous sign language recognition.
The application provides a continuous Chinese sign language identification method, the identification flow is shown in figure 1, the method comprises:
acquiring video data information of a sign language demonstrator;
ROI (Region of Interest) processing the video;
inputting the processed video into a self-encoder to obtain the feature vector of each frame of the video;
the processed video is input into a key frame identification module to identify key frames;
generating a time-series-based attention curve of each word by obtaining key frame information;
Fusing the obtained attention curve with the feature vector generated by the self-encoder and inputting the fused attention curve into a long-term and short-term memory network to obtain a regression result of video segments corresponding to each word in the video;
and performing close matching on the regression result and the word vector to obtain a final semantic result and outputting the final semantic result.
The video data information of the sign language demonstrator is acquired by adopting an RGB color camera, and the limb motion information of the performer in the video is amplified by the ROI processing to obtain more obvious motion characteristics.
The processed video adopts the optical flow to obtain a pause part in continuous action as a key frame.
The self-encoder is designed based on a convolutional neural network (CNN, convolutional Neural Networks) to obtain a feature vector of motion information in each frame of image.
And arranging the feature vectors of each frame according to the video frame sequence to obtain a group of vectors capable of highly describing the continuous motion features of the video.
And designing an attention curve by using a Gaussian function, and finally generating the attention curve of the video segment between each key frame by acquiring related information such as the key frames.
Based on the obtained attention curve and the feature vector of each frame of the video, the feature of a corresponding word in the video is amplified, and the features of other words in the same video are reduced, so that the individual recognition of each word in the video is realized.
The recognition of each word is realized by carrying out regression on the feature vector of the video, and the result obtained by the network regression is subjected to close matching processing with the word vector to output the semantic vocabulary corresponding to the word vector.
Preferably, the method adopts a long-term and short-term memory network to identify the action characteristics of a group of characteristic vectors corresponding to a word in time and space continuity, and avoids the preprocessing step that the traditional identification method needs to unify the length of video data.
Preferably, the method realizes the recognition capability of different word combination modes by amplifying, dividing and recognizing the characteristics of the continuous video.
According to the technical scheme provided by the invention, the continuous sign language can be effectively identified by using the method combining the video key frame screening and the attention mechanism. In addition, the method can complete the effect of training the isolated sign language video by inputting the continuous sign language video in the process of training the recognition model, effectively lightens the working intensity of manually marking the video and simultaneously realizes the efficient continuous sign language recognition effect.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a continuous Chinese sign language recognition method recognition process according to an embodiment of the present invention;
FIG. 2 is an attention curve generating effect graph;
FIG. 3 is a flowchart of a training process of a continuous Chinese sign language recognition method according to an embodiment of the present invention;
Detailed Description
The invention will be described in further detail with reference to the drawings and the detailed description. It is apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on embodiments of the present invention, are within the scope of the present invention.
A training process flow chart of the continuous Chinese sign language identification method is shown in fig. 3, and the method provided by the invention adopts an RGB color camera to acquire data information of a performer, and the acquired video data information is classified and stored according to the meaning.
The method is characterized in that the ROI processing part is used for carrying out fixed window interception on the upper half part of the trunk and the arm waving area by carrying out fixed window interception on the position of the face of a presenter in the video and the size of the face occupying the whole video area according to the characteristics that the demonstration of most of sign language in China is carried out on the upper half part of the trunk based on the identification of the face of a human body before training.
In addition, the video after the characteristic amplification processing still contains irrelevant information such as background images, surrounding environments and the like. Therefore, the foreground segmentation is adopted to separate the performer from the environment, and the limb action characteristic information of the gesture performer is further amplified.
The sizes of all videos processed by the ROI are different due to the physical appearance of the sign language performer and the distance from the camera, and in order to improve the accuracy of the subsequent feature extraction and recognition, the processed images are unified to be 224 x 224.
In the training process, the self-encoder adopts independent training, and the video frames passing through the encoder can obtain the feature vector Tf capable of highly describing the motion features of the video frames through the training of encoding and decoding the video data processed by the ROI, and the feature vector characterization capability is judged by observing whether the decoder can restore the video frames according to the feature vector.
The self-encoder uses convolutional neural networks (CNN, convolutional Neural Networks) to design the encoder and decoder sections in the self-encoder.
The sign language has obvious pause between sign language actions of each word in the demonstration process, and the change range of the hand action is smaller than that of the hand action in the action process, so that the invention adopts an optical flow method to calculate the optical flow of each frame of images adjacent to the time sequence and count the optical flow vector histogram in the image area.
In the process of optical flow vector histogram statistics, since each optical flow has a tiny change, a threshold value is set for the optical flow vector value of each pixel point so as to avoid the influence of the tiny change on the statistics result.
After the statistics of the optical flow vector histogram is completed, judging a key frame segment of transition between two words in the sign language video through the setting of a threshold value, and taking an intermediate frame of the key frame segment as a key frame to store.
Μ and σ are respectively the index of the middle frame of a certain segmented word corresponding to the length of a video segment in the video and the length of the middle frame from the boundary of the video segment, wherein μ is obtained from the index of the middle frame of the video frame between two key frames, and σ is from the length of the video frame corresponding to μ to the nearest key frame.
The obtained mu and sigma values and the video frame length i are input into an attention mechanism module group by group to generate an attention curve W with the length i, and each mu and sigma corresponds to one attention curve W.
The attention curve is obtained by using a gaussian function, and the gaussian function is defined as follows:
Where a is an amplitude parameter, i is the number of the current frame of the video, μ and σ are the intermediate frame index of a segment of video length and the length of the intermediate frame from the boundary of the segment of video corresponding to a segmented word, and according to fig. 2, the relationship between μ, σ and i and the generated attention curve and the video can be intuitively seen.
The obtained attention curves are respectively fused with the video feature vector groups, and each attention curve has the function of enhancing the feature vector on the video segment corresponding to the mu value and the sigma value according to different Gaussian curve distribution in the attached figure 2, so that the corresponding word is associated with the corresponding video segment.
Meanwhile, the length of the attention curve is highly consistent with the video frame length i, so that phenomena such as loss and incompleteness of feature data are avoided.
The attention curves are respectively fused with the video feature vector groups and input into the recognition module for training, the recognition module adopts a long-short-period memory network for design, the output of the network and the size of the word vector are both set to be 1 x 60 so as to facilitate the formation and calculation of a loss function, and the calculated result completes the training process of the neural network through counter propagation.
The long-short-term memory network is a time recursive neural network, is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence, is used for continuous sign language recognition as a feature recognition problem of the continuous time sequence, and has obvious relation between contexts in continuous actions, so that the network can store previously learned contents and can generate related relation between subsequently learned contents and the previously learned contents in the training process, influence of different lengths of video sequences on the network learning process is reduced, and continuous sign language action features are effectively learned.
The sentence is segmented by adopting the jieba modules in the Python, so that words forming the sentence are separated one by one, the segmented words are trained by adopting the word2vec functions in the Gensim modules, and word vectors corresponding to the input words can be generated by the trained models.
Since the output of the network is the same as the dimensionality of the word vector during training, MSELoss (mean square loss function) is used to construct the loss function, MSELoss is calculated as follows:
loss(xi,yi)=(xi-yi)2
Where the dimensions of x, y are the same and may be vectors or matrices and i is a subscript.
The result obtained by the loss function calculation adjusts parameters in the long-term and short-term memory network through back propagation, so that a predicted value generated by the network can approximate to a corresponding word vector as much as possible, even if the value of the loss function is continuously close to 0. When the loss function generated by all training samples through the network is smaller than 0.001, the network can be considered to generate a correct predicted value, and training is stopped and the network model is stored.
In the recognition process, the RGB camera is adopted to acquire the action of a performer and conduct ROI preprocessing as in the training process, the processed video is subjected to key information acquisition and video segmentation, related information is input into an attention mechanism to generate a corresponding attention curve and is fused with a video feature vector sequence to be input into a network, the recognition module network in the recognition process is a model completed in the training process, therefore, the input information can be directly generated into a corresponding prediction result, the prediction result is closely matched, namely, the standard word vector most similar to the prediction result is searched, the recognition effect of the corresponding word in the sign language video is achieved, and after all video segments are recognized, the recognized word results are combined, so that the semantic recognition of the continuous sign language video can be completed.
The method effectively realizes segmentation and word-by-word training of continuous videos, can identify each word in the videos, avoids the separate training of sentences containing the same words, and effectively identifies continuous sign language of different vocabulary combination modes.
The foregoing description of the disclosed embodiments is readily understood by those skilled in the art. Various modifications to these embodiments will be readily apparent to those skilled in the art. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should therefore be included within the scope of the present invention.

Claims (5)

1.A continuous sign language recognition method, the proposed method comprises:
acquiring video data information of a sign language demonstrator;
ROI (Region of Interest) processing the video;
inputting the processed video into a self-encoder to obtain the feature vector of each frame of the video;
Inputting the processed video to a key frame identification module, carrying out key frame and information identification by adopting an optical flow method, including counting an optical flow vector histogram in an image area, judging a key frame section of transition between two words in a sign language video through threshold setting, taking an intermediate frame of the key frame section as a key frame to be stored, and simultaneously obtaining an intermediate frame index of a certain segmented word corresponding to a video length in the video and length related information of the intermediate frame from the video boundary according to the positions of the two key frames;
generating a time-series-based attention curve of each word through the obtained key frame information;
adopting a Gaussian function to design an attention curve based on a time sequence, taking an intermediate frame index as a mean value of the Gaussian function, taking the length of a video boundary as a variance of the Gaussian function, fusing the curve with a feature vector generated by a self-encoder, and inputting the feature vector into a long-term and short-term memory network to obtain a regression result of a video segment corresponding to each word in a video;
and performing close matching on the regression result and the word vector to obtain a final semantic result and outputting the final semantic result.
2. The method according to claim 1, wherein the ROI processing is performed on motion feature information of the video image, comprising the steps of:
Step 1: carrying out facial recognition-based human body contour image region interception processing on the obtained sign language video;
Step 2: and separating the performer from the environment by adopting foreground segmentation, and further amplifying the limb action characteristic information of the gesture performer.
3. The method according to claim 1, wherein the feature vector is obtained by self-encoding, which comprises the steps of:
step 1: disassembling training samples frame by frame and inputting the training samples into a self-encoder for training;
step 2: the method comprises the steps that an encoder in the self-encoder extracts image features and converts the image features into feature vectors, the encoder restores the feature vectors to form images, the self-encoder is trained by comparing the generated images with original images, and the generated images restore the input images as much as possible;
step3: after training, the self-encoder model is saved, and at this time, the images input to the self-encoder can all obtain the feature vector capable of uniquely representing the image.
4. The method of claim 1, wherein the identification of the key frames and key frame information is performed by using an optical flow method, statistics is performed on an optical flow vector histogram in an image area, a key frame segment of transition between two words in the sign language video is determined by setting a threshold value, an intermediate frame of the key frame segment is taken as a key frame to be stored, and an intermediate frame index of a segmented word corresponding to a length of a video in the video and length information of the intermediate frame from a boundary of the video are obtained according to positions of the two key frames.
5. The method of claim 1, wherein the neural network takes as input a fusion of the feature vector and the attention curve, and wherein the predicted value and the word vector output by the neural network are used as a loss function, and wherein training is accomplished by back propagation, comprising the steps of:
Step 1: the attention curve is designed by adopting a Gaussian function, and the obtained intermediate frame index and the length information of the boundary and the video length are used as inputs to obtain an attention curve;
Step 2: and respectively merging the attention curves with the feature vectors, inputting the merged attention curves into a neural network for training in a word digital unit, calculating a mean square loss function between an output predicted value obtained through the network and a standard word vector, and adjusting network parameters by back propagation of calculation results to realize the training process of a network model on video segments corresponding to each word of continuous video.
CN202110848023.8A 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method Active CN113642422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110848023.8A CN113642422B (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110848023.8A CN113642422B (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Publications (2)

Publication Number Publication Date
CN113642422A CN113642422A (en) 2021-11-12
CN113642422B true CN113642422B (en) 2024-05-24

Family

ID=78418474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110848023.8A Active CN113642422B (en) 2021-07-27 2021-07-27 Continuous Chinese sign language recognition method

Country Status (1)

Country Link
CN (1) CN113642422B (en)

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311180A (en) * 1999-03-11 2000-11-07 Fuji Xerox Co Ltd Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system
CN101655859A (en) * 2009-07-10 2010-02-24 北京大学 Method for fast removing redundancy key frames and device thereof
CN102089616A (en) * 2008-06-03 2011-06-08 焕·J·郑 Interferometric defect detection and classification
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN107748761A (en) * 2017-09-26 2018-03-02 广东工业大学 A kind of extraction method of key frame of video frequency abstract
CN107784118A (en) * 2017-11-14 2018-03-09 北京林业大学 A kind of Video Key information extracting system semantic for user interest
CN108347625A (en) * 2018-03-09 2018-07-31 北京数码视讯软件技术发展有限公司 A kind of method and apparatus of TS Streaming Medias positioning
CN109409231A (en) * 2018-09-27 2019-03-01 合肥工业大学 Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks
CN110019817A (en) * 2018-12-04 2019-07-16 阿里巴巴集团控股有限公司 A kind of detection method, device and the electronic equipment of text in video information
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110569823A (en) * 2019-09-18 2019-12-13 西安工业大学 sign language identification and skeleton generation method based on RNN
CN111158491A (en) * 2019-12-31 2020-05-15 苏州莱孚斯特电子科技有限公司 Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system
WO2020258661A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Speaking person separation method and apparatus based on recurrent neural network and acoustic features
CN112241470A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Video classification method and system
CN112257513A (en) * 2020-09-27 2021-01-22 南京工业大学 Training method, translation method and system for sign language video translation model
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040088723A1 (en) * 2002-11-01 2004-05-06 Yu-Fei Ma Systems and methods for generating a video summary
US9981193B2 (en) * 2009-10-27 2018-05-29 Harmonix Music Systems, Inc. Movement based recognition and evaluation
EP2641401B1 (en) * 2010-11-15 2017-04-05 Huawei Technologies Co., Ltd. Method and system for video summarization
US11195057B2 (en) * 2014-03-18 2021-12-07 Z Advanced Computing, Inc. System and method for extremely efficient image and pattern recognition and artificial intelligence platform
US11074495B2 (en) * 2013-02-28 2021-07-27 Z Advanced Computing, Inc. (Zac) System and method for extremely efficient image and pattern recognition and artificial intelligence platform

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000311180A (en) * 1999-03-11 2000-11-07 Fuji Xerox Co Ltd Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system
CN102089616A (en) * 2008-06-03 2011-06-08 焕·J·郑 Interferometric defect detection and classification
CN101655859A (en) * 2009-07-10 2010-02-24 北京大学 Method for fast removing redundancy key frames and device thereof
WO2012078702A1 (en) * 2010-12-10 2012-06-14 Eastman Kodak Company Video key frame extraction using sparse representation
CN105005769A (en) * 2015-07-08 2015-10-28 山东大学 Deep information based sign language recognition method
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN107027051A (en) * 2016-07-26 2017-08-08 中国科学院自动化研究所 A kind of video key frame extracting method based on linear dynamic system
CN107748761A (en) * 2017-09-26 2018-03-02 广东工业大学 A kind of extraction method of key frame of video frequency abstract
CN107784118A (en) * 2017-11-14 2018-03-09 北京林业大学 A kind of Video Key information extracting system semantic for user interest
CN108347625A (en) * 2018-03-09 2018-07-31 北京数码视讯软件技术发展有限公司 A kind of method and apparatus of TS Streaming Medias positioning
CN109409231A (en) * 2018-09-27 2019-03-01 合肥工业大学 Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov
CN110019817A (en) * 2018-12-04 2019-07-16 阿里巴巴集团控股有限公司 A kind of detection method, device and the electronic equipment of text in video information
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks
WO2020258661A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Speaking person separation method and apparatus based on recurrent neural network and acoustic features
CN110399850A (en) * 2019-07-30 2019-11-01 西安工业大学 A kind of continuous sign language recognition method based on deep neural network
CN110569823A (en) * 2019-09-18 2019-12-13 西安工业大学 sign language identification and skeleton generation method based on RNN
CN111158491A (en) * 2019-12-31 2020-05-15 苏州莱孚斯特电子科技有限公司 Gesture recognition man-machine interaction method applied to vehicle-mounted HUD
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN111340005A (en) * 2020-04-16 2020-06-26 深圳市康鸿泰科技有限公司 Sign language identification method and system
CN112241470A (en) * 2020-09-24 2021-01-19 北京影谱科技股份有限公司 Video classification method and system
CN112257513A (en) * 2020-09-27 2021-01-22 南京工业大学 Training method, translation method and system for sign language video translation model
CN112464816A (en) * 2020-11-27 2021-03-09 南京特殊教育师范学院 Local sign language identification method and device based on secondary transfer learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"A Novel Chinese Sign Language Recognition Method Based on Keyframe-Centered Clips";Shiliang Huang等;《IEEE SIGNAL PROCESSING LETTERS》;第25卷(第3期);第442-446页 *
"Dynamic Sign Language Recognition Using Gaussian Process Dynamical Models";Juan P. Velasquez等;《International Work-Conference on the Interplay Between Natural and Artificial Computation》;第491-500页 *
"Research on Dynamic Sign Language Recognition Based on Key Frame Weighted of DTW";ShengWei Zhang等;《International Conference on Multimedia Technology and Enhanced Learning》;第11-20页 *
"基于运动块及关键帧的人体动作识别";应锐等;《复旦学报(自然科学版)》;第53卷(第6期);第815-822页 *
"手语识别方法与技术综述";解启娜等;《计算机工程与应用》;第57卷(第18期);第1-12页 *

Also Published As

Publication number Publication date
CN113642422A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN110188343B (en) Multi-mode emotion recognition method based on fusion attention network
CN106919903B (en) robust continuous emotion tracking method based on deep learning
Akmeliawati et al. Real-time Malaysian sign language translation using colour segmentation and neural network
NadeemHashmi et al. A lip reading model using CNN with batch normalization
CN111339837A (en) Continuous sign language recognition method
CN110210416B (en) Sign language recognition system optimization method and device based on dynamic pseudo tag decoding
CN111104884A (en) Chinese lip language identification method based on two-stage neural network model
Sharma et al. Vision-based sign language recognition system: A Comprehensive Review
CN114973412B (en) Lip language identification method and system
CN115187704A (en) Virtual anchor generation method, device, equipment and storage medium
CN115964638A (en) Multi-mode social data emotion classification method, system, terminal, equipment and application
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
Mistry et al. Indian sign language recognition using deep learning
Singh et al. Action recognition in dark videos using spatio-temporal features and bidirectional encoder representations from transformers
Abrar et al. Deep lip reading-a deep learning based lip-reading software for the hearing impaired
Vayadande et al. Lipreadnet: A deep learning approach to lip reading
CN113642422B (en) Continuous Chinese sign language recognition method
Silveira et al. SynLibras: A Disentangled Deep Generative Model for Brazilian Sign Language Synthesis
Chowdhury et al. Text Extraction through Video Lip Reading Using Deep Learning
Gavade et al. Facial Expression Recognition in Videos by learning Spatio-Temporal Features with Deep Neural Networks
Hallyal et al. Optimized recognition of CAPTCHA through attention models
CN111209433A (en) Video classification algorithm based on feature enhancement
Kumaragurubaran et al. Unlocking Sign Language Communication: A Deep Learning Paradigm for Overcoming Accessibility Challenges
Zhang et al. EvSign: Sign Language Recognition and Translation with Streaming Events
Manglani et al. Lip Reading Into Text Using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant