CN113642422A - Continuous Chinese sign language recognition method - Google Patents
Continuous Chinese sign language recognition method Download PDFInfo
- Publication number
- CN113642422A CN113642422A CN202110848023.8A CN202110848023A CN113642422A CN 113642422 A CN113642422 A CN 113642422A CN 202110848023 A CN202110848023 A CN 202110848023A CN 113642422 A CN113642422 A CN 113642422A
- Authority
- CN
- China
- Prior art keywords
- video
- word
- sign language
- encoder
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000007787 long-term memory Effects 0.000 claims abstract description 6
- 230000006403 short-term memory Effects 0.000 claims abstract description 6
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 230000008569 process Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 13
- 230000003287 optical effect Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 2
- 230000009471 action Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 208000032041 Hearing impaired Diseases 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a method for identifying continuous Chinese sign language, which comprises the following steps: acquiring sign language presenter video data information; performing ROI (region of interest) processing on the video; constructing a self-encoder, inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video; inputting the processed video into a key frame identification module to identify key frames; generating an attention curve of each word based on the time sequence through the obtained key frame information; fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of a video segment corresponding to each word in the video; and when all the video segments are recognized, combining recognized word results to complete the recognition of the semantics of the continuous sign language video. The method effectively realizes the segmentation and the word-by-word training of the continuous video, can identify each word in the video, avoids the respective training of sentences containing the same word, and effectively identifies the continuous sign language with different word combination modes.
Description
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a Chinese continuous sign language identification method.
Background
Sign language is a main communication mode of hearing-impaired people and plays a great role in communication of daily life, however, as most normal people in China do not go through the learning of the sign language in China, the hearing-impaired people have a lot of difficulties in expressing the personal demands. The continuous sign language is the most common form for hearing-impaired people to express semantics, and has high research value in the real society due to the characteristics of coherent spatial motion, intuitive and easily understood semantics and the like.
The existing continuous sign language identification method adopts the following three steps: (1) training the recognition model by continuously expanding training data samples of various semantics and extracting features in the video; (2) carrying out feature extraction on a sign language video demonstrated by a sign language demonstrator to be identified; (3) and inputting the acquired characteristics of the sign language video of the demonstrator into the model for classification, and outputting a classification result to obtain an identification result. However, the vocabulary combination method of the continuous sign language is various, and the above method can only identify the semantics of the fixed vocabulary combination in the training sample, and is difficult to adapt to the diversity of the continuous sign language.
Therefore, how to more effectively recognize continuous sign language with different vocabulary combination modes is a problem to be solved urgently at present.
Disclosure of Invention
Through the description of the content, the invention provides a continuous sign language recognition method, which takes the video key frames of the data to be recognized as clues, and can accurately recognize the semantics of each word in the continuous sign language by assisting the attention mechanism among the key frames, thereby meeting the requirement of recognizing the diversity of different vocabulary combination modes for the continuous sign language.
The application provides a continuous Chinese sign language recognition method, the recognition process is shown as the attached figure 1, and the method comprises the following steps:
acquiring sign language presenter video data information;
performing ROI (region of interest) processing on the video;
inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video;
inputting the processed video into a key frame identification module to identify key frames;
generating an attention curve of each word based on the time sequence by obtaining the key frame information;
fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of the video segment corresponding to each word in the video;
and performing approximate matching on the regression result and the word vector to obtain and output a final semantic result.
The video data information of the sign language demonstrator is acquired by an RGB color camera, and the ROI processing amplifies the limb action information of the performer in the video to obtain more obvious action characteristics.
The processed video adopts optical flow to obtain a pause part in continuous action as a key frame.
The self-encoder is designed based on a Convolutional Neural Network (CNN) to obtain the characteristic vector of the action information in each frame of image.
And arranging the feature vectors of each frame according to the sequence of the video frames to obtain a group of vectors capable of highly describing the continuous motion features of the video.
And designing an attention curve by using a Gaussian function, and finally generating the attention curve of the video segment between each key frame by acquiring the key frame and other related information by using the Gaussian function.
Based on the obtained attention curve and the feature vector of each frame of the video, the feature of a corresponding word in the video is amplified, and the features of other words in the same video are reduced, so that the individual recognition of each word in the video is realized.
The recognition of each word is realized by performing regression on the feature vector of the video, and the result obtained by network regression is subjected to similar matching processing with the word vector to output a semantic vocabulary corresponding to the word vector.
Preferably, the method adopts a long-short term memory network to identify the action characteristics of a group of characteristic vectors corresponding to a word in time and space continuity, thereby avoiding the preprocessing step of unifying the lengths of video data in the traditional identification method.
Preferably, the method realizes the recognition capability of different word combination modes by amplifying, segmenting and recognizing the characteristics of the continuous video.
According to the technical scheme provided by the invention, the method for combining the video key frame screening and the attention mechanism can effectively realize the identification of the continuous sign language. Moreover, the method can complete the effect of training the isolated sign language video only by inputting the continuous sign language video in the training process of the recognition model, thereby effectively reducing the working intensity of manually marking the video and realizing the efficient continuous sign language recognition effect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a recognition process of a continuous Chinese sign language recognition method according to an embodiment of the present invention;
FIG. 2 is a graph of the effect of attention curve generation;
FIG. 3 is a flow chart of a training process of a continuous Chinese sign language recognition method according to an embodiment of the present invention;
Detailed Description
The invention is further described in detail with reference to the accompanying drawings and the detailed description. It is to be understood that the described embodiments are only a few, and not all, embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.
The flow chart of the training process of the continuous Chinese sign language identification method is shown in fig. 3, the method provided by the invention adopts an RGB color camera to acquire data information of performers, and the acquired video data information is classified and stored according to meanings.
Before training, ROI processing is needed to be carried out on the video, the ROI processing part is based on the recognition of human faces, according to the characteristic that most sign language demonstration in China is carried out on the upper half part of a body, fixed window interception is carried out on the upper half part of a trunk and an arm swing area by the position of the face of a demonstrator in the video and the size of the face occupying the whole video area, and the characteristic information of the demonstrator obtained by the method can be further amplified.
In addition, irrelevant information such as background images, surrounding environments and the like still exists in the video after feature amplification processing. Therefore, the performer is separated from the environment by adopting foreground segmentation, and the body motion characteristic information of the hand performer is further amplified.
Because the body appearance of the sign language performer and the distance from the camera can make the size of each video processed by the ROI different, in order to improve the accuracy of subsequent feature extraction and recognition, the processed images are unified into 224 × 224.
In the training process, the self-encoder adopts independent training, and the video data processed by ROI is subjected to encoding and decoding training, so that the video frame passing through the encoder obtains a feature vector Tf capable of highly describing the action features of the video frame, and the feature vector representation capability is judged by observing whether a decoder can restore the capability of the video frame according to the feature vector.
The self-encoder adopts Convolutional Neural Networks (CNN) to design the encoder and the decoder part in the self-encoder.
The gesture language has obvious pause between gesture language movements of each word in the demonstration process, and the variation range of the hand movement is smaller than that in the movement process, so that the optical flow method is adopted to calculate the optical flow of each frame image adjacent to the time sequence and count the optical flow vector histogram in the image area.
In the optical flow vector histogram statistical process, because each optical flow has tiny change, a threshold value is set for the optical flow vector value of each pixel point so as to avoid the influence of the tiny change on a statistical result.
After the optical flow vector histogram statistics is completed, a key frame section of transition between two words in the sign language video is judged through setting of a threshold value, and an intermediate frame of the key frame section is taken as a key frame for storage.
Mu and sigma are respectively the index of an intermediate frame of a certain segmented word corresponding to the length of a section of video in the video and the length of the intermediate frame from the boundary of the section of video, wherein mu is obtained by the index of the intermediate frame of the video frame between two key frames, and sigma is the length from the video frame corresponding to mu to the key frame closest to the mu.
And inputting the obtained mu and sigma values and the video frame length i into an attention mechanism module group by group to generate an attention curve W with the length i, wherein each mu and sigma corresponds to one attention curve W.
The attention curve is obtained by using a gaussian function, which is defined as follows:
in the formula, a is an amplitude parameter, i is a number of a current frame of the video, μ and σ are respectively an intermediate frame index of a certain segmented word corresponding to the length of a section of video in the video and the length of an intermediate frame from a boundary of the section of video, and according to fig. 2, the relationship between μ, σ and i and a generated attention curve and the video can be visually seen.
The obtained attention curves are respectively fused with the video feature vector groups, and it can be seen from different gaussian curve distributions in fig. 2 that each attention curve has an effect of enhancing feature vectors on video segments corresponding to μ and σ values, so that the corresponding words are associated with the corresponding video segments.
Meanwhile, the length of the attention curve is highly consistent with the length i of the video frame, so that the phenomena of loss, incompleteness and the like of the characteristic data cannot occur.
The attention curve and the video feature vector group are fused and input into the recognition module for training, the recognition module is designed by adopting a long-term and short-term memory network, the output of the network and the size of the word vector are both set to be 1 x 60 so as to facilitate the composition and calculation of a loss function, and the training process of the neural network is completed by the calculated result through reverse propagation.
The long-term and short-term memory network is a time recurrent neural network, is suitable for processing and predicting important events with relatively long intervals and delays in a time sequence, is used as a feature recognition problem of a continuous time sequence, and sign language semantics of the continuous time sequence have obvious relation between contexts in continuous action, so that the network not only stores previously learned contents in a training process, but also can generate relevant relation between subsequently learned contents and the previously learned contents, reduces the influence of different video sequence lengths on the network learning process, and effectively learns the continuous sign language action features.
The sentence is segmented by adopting a jieba module in Python, words forming the sentence are independent one by one, the segmented words are trained by adopting a word2vec function in a Gensim module, and the trained model can generate a word vector corresponding to the input words.
In the training process, because the output of the network is the same as the dimensionality of the word vector, a loss function is constructed by using MSELoss (mean square loss function), and the MSELoss calculation formula is as follows:
loss(xi,yi)=(xi-yi)2
where the dimensions of x, y are the same and can be vectors or matrices and i is a subscript.
The parameters in the long-term and short-term memory network are adjusted through the result obtained by the calculation of the loss function through back propagation, so that the predicted value generated by the network can approach to the corresponding word vector as much as possible, even if the value of the loss function is continuously close to 0. And when the loss functions generated by all the training samples through the network are less than 0.001, the network can be considered to generate correct predicted values, and at the moment, the training is stopped and the network model is stored.
In the identification process, as in the training process, the RGB camera is adopted to acquire the action of an performer and perform ROI preprocessing, the processed video acquires key information and segments the video, relevant information is input into an attention mechanism to generate a corresponding attention curve and fused with a video feature vector sequence to be input into a network, an identification module network in the identification process is a model already completed in the training process, so that a corresponding prediction result can be directly generated for the input information, the prediction result is subjected to close matching, namely a standard word vector most similar to the prediction result is searched, the identification effect of corresponding words in the hand language video is realized, and after all video segments are identified, the identified word results are combined, and the identification of the semantics of the continuous hand language video can be completed.
The method effectively realizes the segmentation and the word-by-word training of the continuous video, can identify each word in the video, avoids the respective training of sentences containing the same word, and effectively identifies the continuous sign language with different word combination modes.
The above description of the disclosed embodiments is readily understood by those skilled in the art. Various modifications to these embodiments will be readily apparent to those skilled in the art. Therefore, any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. A continuous sign language recognition method is provided, and the method comprises the following steps:
acquiring sign language presenter video data information;
performing ROI (region of interest) processing on the video;
inputting the processed video into a self-encoder to obtain a feature vector of each frame of the video;
inputting the processed video into a key frame identification module to identify key frames;
generating an attention curve of each word based on the time sequence through the obtained key frame information;
fusing the obtained attention curve and the characteristic vector generated by the self-encoder and inputting the fused attention curve and the characteristic vector into a long-term and short-term memory network to obtain a regression result of a video segment corresponding to each word in the video;
and performing approximate matching on the regression result and the word vector to obtain and output a final semantic result.
2. The method of claim 1, wherein the ROI processing is performed on motion feature information of the video image, and the steps include:
step 1: carrying out human body contour image region interception processing based on face recognition on the obtained sign language video;
step 2: and separating the performer from the environment by adopting foreground segmentation, and further amplifying the body motion characteristic information of the hand performer.
3. The method of claim 1, wherein the obtaining of the feature vector is implemented by self-encoding, and the implementation steps are as follows:
step 1: disassembling the training samples frame by frame and inputting the training samples into an autoencoder for training;
step 2: extracting image features from an encoder in the self-encoder and converting the image features into feature vectors, restoring the image to the feature vectors by the encoder, and training the self-encoder by comparing the generated image with an original image to ensure that the generated image restores the input image as much as possible;
and step 3: after the training is finished, the self-encoder model is stored, and at the moment, the image input to the self-encoder can obtain the feature vector which can uniquely represent the image.
4. The method as claimed in claim 1, wherein the identification of the key frames and the related information is obtained by an optical flow method, the histogram of optical flow vectors in the image area is counted, the key frame segment of the transition between two words in the sign language video is determined by setting a threshold, the intermediate frame of the key frame segment is taken as the key frame for storage, and the related information such as the intermediate frame index of a word after being divided corresponding to the length of a video segment in the video and the length of the intermediate frame from the video boundary can be obtained according to the positions of the two key frames.
5. The method of claim 1, wherein the neural network uses the feature vector and the attention curve as input, uses the predicted value of the network output and the word vector as loss function, and completes training by back propagation, and the method comprises the following steps:
step 1: designing an attention curve by adopting a Gaussian function, and taking the relevant data information obtained in the claim 4 and the video length as input to obtain an attention curve;
step 2: and respectively fusing the attention curve with the feature vector, inputting the fused attention curve into a neural network in a digital unit of a word for training, calculating a mean square loss function of an output predicted value obtained through the network and a standard word vector, and adjusting network parameters through back propagation of a calculation result to realize the training process of a network model on a video segment corresponding to each word of a continuous video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110848023.8A CN113642422A (en) | 2021-07-27 | 2021-07-27 | Continuous Chinese sign language recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110848023.8A CN113642422A (en) | 2021-07-27 | 2021-07-27 | Continuous Chinese sign language recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113642422A true CN113642422A (en) | 2021-11-12 |
Family
ID=78418474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110848023.8A Pending CN113642422A (en) | 2021-07-27 | 2021-07-27 | Continuous Chinese sign language recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113642422A (en) |
Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000311180A (en) * | 1999-03-11 | 2000-11-07 | Fuji Xerox Co Ltd | Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system |
US20040088723A1 (en) * | 2002-11-01 | 2004-05-06 | Yu-Fei Ma | Systems and methods for generating a video summary |
CN101655859A (en) * | 2009-07-10 | 2010-02-24 | 北京大学 | Method for fast removing redundancy key frames and device thereof |
CN102089616A (en) * | 2008-06-03 | 2011-06-08 | 焕·J·郑 | Interferometric defect detection and classification |
US20120123780A1 (en) * | 2010-11-15 | 2012-05-17 | Futurewei Technologies, Inc. | Method and system for video summarization |
US20120143358A1 (en) * | 2009-10-27 | 2012-06-07 | Harmonix Music Systems, Inc. | Movement based recognition and evaluation |
WO2012078702A1 (en) * | 2010-12-10 | 2012-06-14 | Eastman Kodak Company | Video key frame extraction using sparse representation |
CN105005769A (en) * | 2015-07-08 | 2015-10-28 | 山东大学 | Deep information based sign language recognition method |
CN106210444A (en) * | 2016-07-04 | 2016-12-07 | 石家庄铁道大学 | Kinestate self adaptation key frame extracting method |
CN107027051A (en) * | 2016-07-26 | 2017-08-08 | 中国科学院自动化研究所 | A kind of video key frame extracting method based on linear dynamic system |
CN107748761A (en) * | 2017-09-26 | 2018-03-02 | 广东工业大学 | A kind of extraction method of key frame of video frequency abstract |
CN107784118A (en) * | 2017-11-14 | 2018-03-09 | 北京林业大学 | A kind of Video Key information extracting system semantic for user interest |
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
CN108347625A (en) * | 2018-03-09 | 2018-07-31 | 北京数码视讯软件技术发展有限公司 | A kind of method and apparatus of TS Streaming Medias positioning |
CN109409231A (en) * | 2018-09-27 | 2019-03-01 | 合肥工业大学 | Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov |
CN109871781A (en) * | 2019-01-28 | 2019-06-11 | 山东大学 | Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
CN110399850A (en) * | 2019-07-30 | 2019-11-01 | 西安工业大学 | A kind of continuous sign language recognition method based on deep neural network |
CN110569823A (en) * | 2019-09-18 | 2019-12-13 | 西安工业大学 | sign language identification and skeleton generation method based on RNN |
CN111158491A (en) * | 2019-12-31 | 2020-05-15 | 苏州莱孚斯特电子科技有限公司 | Gesture recognition man-machine interaction method applied to vehicle-mounted HUD |
US20200184278A1 (en) * | 2014-03-18 | 2020-06-11 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111340005A (en) * | 2020-04-16 | 2020-06-26 | 深圳市康鸿泰科技有限公司 | Sign language identification method and system |
WO2020258661A1 (en) * | 2019-06-26 | 2020-12-30 | 平安科技(深圳)有限公司 | Speaking person separation method and apparatus based on recurrent neural network and acoustic features |
CN112241470A (en) * | 2020-09-24 | 2021-01-19 | 北京影谱科技股份有限公司 | Video classification method and system |
CN112257513A (en) * | 2020-09-27 | 2021-01-22 | 南京工业大学 | Training method, translation method and system for sign language video translation model |
CN112464816A (en) * | 2020-11-27 | 2021-03-09 | 南京特殊教育师范学院 | Local sign language identification method and device based on secondary transfer learning |
-
2021
- 2021-07-27 CN CN202110848023.8A patent/CN113642422A/en active Pending
Patent Citations (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000311180A (en) * | 1999-03-11 | 2000-11-07 | Fuji Xerox Co Ltd | Method for feature set selection, method for generating video image class stastic model, method for classifying and segmenting video frame, method for determining similarity of video frame, computer-readable medium, and computer system |
US20040088723A1 (en) * | 2002-11-01 | 2004-05-06 | Yu-Fei Ma | Systems and methods for generating a video summary |
CN102089616A (en) * | 2008-06-03 | 2011-06-08 | 焕·J·郑 | Interferometric defect detection and classification |
CN101655859A (en) * | 2009-07-10 | 2010-02-24 | 北京大学 | Method for fast removing redundancy key frames and device thereof |
US20120143358A1 (en) * | 2009-10-27 | 2012-06-07 | Harmonix Music Systems, Inc. | Movement based recognition and evaluation |
US20120123780A1 (en) * | 2010-11-15 | 2012-05-17 | Futurewei Technologies, Inc. | Method and system for video summarization |
WO2012078702A1 (en) * | 2010-12-10 | 2012-06-14 | Eastman Kodak Company | Video key frame extraction using sparse representation |
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
US20200184278A1 (en) * | 2014-03-18 | 2020-06-11 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
CN105005769A (en) * | 2015-07-08 | 2015-10-28 | 山东大学 | Deep information based sign language recognition method |
CN106210444A (en) * | 2016-07-04 | 2016-12-07 | 石家庄铁道大学 | Kinestate self adaptation key frame extracting method |
CN107027051A (en) * | 2016-07-26 | 2017-08-08 | 中国科学院自动化研究所 | A kind of video key frame extracting method based on linear dynamic system |
CN107748761A (en) * | 2017-09-26 | 2018-03-02 | 广东工业大学 | A kind of extraction method of key frame of video frequency abstract |
CN107784118A (en) * | 2017-11-14 | 2018-03-09 | 北京林业大学 | A kind of Video Key information extracting system semantic for user interest |
CN108347625A (en) * | 2018-03-09 | 2018-07-31 | 北京数码视讯软件技术发展有限公司 | A kind of method and apparatus of TS Streaming Medias positioning |
CN109409231A (en) * | 2018-09-27 | 2019-03-01 | 合肥工业大学 | Multiple features fusion sign Language Recognition Method based on adaptive hidden Markov |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
CN109871781A (en) * | 2019-01-28 | 2019-06-11 | 山东大学 | Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks |
WO2020258661A1 (en) * | 2019-06-26 | 2020-12-30 | 平安科技(深圳)有限公司 | Speaking person separation method and apparatus based on recurrent neural network and acoustic features |
CN110399850A (en) * | 2019-07-30 | 2019-11-01 | 西安工业大学 | A kind of continuous sign language recognition method based on deep neural network |
CN110569823A (en) * | 2019-09-18 | 2019-12-13 | 西安工业大学 | sign language identification and skeleton generation method based on RNN |
CN111158491A (en) * | 2019-12-31 | 2020-05-15 | 苏州莱孚斯特电子科技有限公司 | Gesture recognition man-machine interaction method applied to vehicle-mounted HUD |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
CN111340005A (en) * | 2020-04-16 | 2020-06-26 | 深圳市康鸿泰科技有限公司 | Sign language identification method and system |
CN112241470A (en) * | 2020-09-24 | 2021-01-19 | 北京影谱科技股份有限公司 | Video classification method and system |
CN112257513A (en) * | 2020-09-27 | 2021-01-22 | 南京工业大学 | Training method, translation method and system for sign language video translation model |
CN112464816A (en) * | 2020-11-27 | 2021-03-09 | 南京特殊教育师范学院 | Local sign language identification method and device based on secondary transfer learning |
Non-Patent Citations (5)
Title |
---|
JUAN P. VELASQUEZ等: ""Dynamic Sign Language Recognition Using Gaussian Process Dynamical Models"", 《INTERNATIONAL WORK-CONFERENCE ON THE INTERPLAY BETWEEN NATURAL AND ARTIFICIAL COMPUTATION》, pages 491 - 500 * |
SHENGWEI ZHANG等: ""Research on Dynamic Sign Language Recognition Based on Key Frame Weighted of DTW"", 《INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY AND ENHANCED LEARNING》, pages 11 - 20 * |
SHILIANG HUANG等: ""A Novel Chinese Sign Language Recognition Method Based on Keyframe-Centered Clips"", 《IEEE SIGNAL PROCESSING LETTERS》, vol. 25, no. 3, pages 442 - 446 * |
应锐等: ""基于运动块及关键帧的人体动作识别"", 《复旦学报(自然科学版)》, vol. 53, no. 6, pages 815 - 822 * |
解启娜等: ""手语识别方法与技术综述"", 《计算机工程与应用》, vol. 57, no. 18, pages 1 - 12 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Akmeliawati et al. | Real-time Malaysian sign language translation using colour segmentation and neural network | |
Oliver et al. | Layered representations for human activity recognition | |
Yang et al. | Sign language spotting with a threshold model based on conditional random fields | |
Dabre et al. | Machine learning model for sign language interpretation using webcam images | |
Nimisha et al. | A brief review of the recent trends in sign language recognition | |
NadeemHashmi et al. | A lip reading model using CNN with batch normalization | |
Sharma et al. | Vision-based sign language recognition system: A Comprehensive Review | |
CN111354246A (en) | System and method for helping deaf-mute to communicate | |
De Coster et al. | Machine translation from signed to spoken languages: State of the art and challenges | |
Gogate et al. | Real time emotion recognition and gender classification | |
CN116129013A (en) | Method, device and storage medium for generating virtual person animation video | |
CN115964638A (en) | Multi-mode social data emotion classification method, system, terminal, equipment and application | |
CN114694255A (en) | Sentence-level lip language identification method based on channel attention and time convolution network | |
Abrar et al. | Deep lip reading-a deep learning based lip-reading software for the hearing impaired | |
Mistree et al. | Towards Indian sign language sentence recognition using INSIGNVID: Indian sign language video dataset | |
CN116564338B (en) | Voice animation generation method, device, electronic equipment and medium | |
Tewari et al. | Real Time Sign Language Recognition Framework For Two Way Communication | |
Shokoori et al. | Sign language recognition and translation into pashto language alphabets | |
CN114882590B (en) | Lip reading method based on event camera multi-granularity space-time feature perception | |
Avula et al. | CNN based recognition of emotion and speech from gestures and facial expressions | |
CN113642422A (en) | Continuous Chinese sign language recognition method | |
CN112135200B (en) | Video description generation method for compressed video | |
CN112926665A (en) | Text line recognition system based on domain self-adaptation and use method | |
Vayadande et al. | LipReadNet: A Deep Learning Approach to Lip Reading | |
Chanda et al. | Automatic hand gesture recognition with semantic segmentation and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |