CN112132094B - Continuous sign language recognition system based on multi-language collaboration - Google Patents
Continuous sign language recognition system based on multi-language collaboration Download PDFInfo
- Publication number
- CN112132094B CN112132094B CN202011060272.2A CN202011060272A CN112132094B CN 112132094 B CN112132094 B CN 112132094B CN 202011060272 A CN202011060272 A CN 202011060272A CN 112132094 B CN112132094 B CN 112132094B
- Authority
- CN
- China
- Prior art keywords
- sign language
- language
- shared
- sequence
- sign
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a continuous sign language recognition system based on multi-language cooperation, which uses a common visual feature encoder to extract feature expression, and uses different time sequence modeling networks (namely target sequence models) to learn the language characteristics of corresponding sign languages for sign languages of different languages. A shared sequential encoder (namely a shared sequence model) is used for expressing the same visual mode among different sign language languages, language embedding vectors are used for initialization, multi-language sign language recognition under a single frame is realized through multi-language collaborative training, visual commonalities among different sign languages are fully mined, and sign language recognition performance is improved.
Description
Technical Field
The invention relates to the technical field of action recognition in computer vision, in particular to a continuous sign language recognition system based on multi-language collaboration.
Background
In the continuous sign language recognition problem, each sign language video is labeled by an ordered sign language word sequence, and the essence of the continuous sign language recognition problem can be regarded as a process for learning the mapping relation between a video sequence and a labeled text sequence. Generally, a continuous sign language recognition system is composed of a visual feature encoder and a time-series modeling model. Feature expression of video plays a very important role in continuous sign language recognition, and hand features such as SIFT, HOG, etc. were used earlier to characterize hand patterns and trajectories. With the successful application of deep learning in computer vision, two-dimensional convolutional neural networks for image characterization and three-dimensional convolutional neural networks for video characterization are introduced into hand language recognition. Related work is carried out in an end-to-end system, the 2D CNN is used for extracting RGB image information, and good performance is achieved; in order to model the time-series dependency, sign language identification methods based on three-dimensional convolution kernels are also proposed in succession. In another video feature characterization mode, a 2D convolution network and 1D time sequence convolution are used for carrying out space-time expression on the sign language video, and the visual features extracted by the method are superior to those of other methods in a continuous sign language recognition task.
The sequence learning model in continuous sign language can be realized by means of connection time sequence classification, hidden Markov model, coder-decoder network, etc. The recurrent neural network is successfully applied to a plurality of sequence learning tasks and is introduced into a continuous sign language recognition problem, and the bidirectional LSTM-CTC structure is one of the most widely applied baseline methods in sign language. In addition, there is work to embed hidden markov models in neural network for sign language recognition. Similar to machine translation, attention-based codec networks are also used to learn the mapping between videos and annotations, thereby implementing the tasks of sign language recognition and sign language translation.
In the machine translation task, most methods are also single language translation problems focusing on source to target languages, and end-to-end solutions based on deep neural networks have made important progress in this type of problem. The machine translation system can extend the single language translation method to the multi-language translation task in various ways. By adding language identifiers at the beginning of the sentence to be translated, the monolingual model can be applied to multi-language translation through simple expansion. To raise the problem of corpus resource limitation, an attempt is made to generate more sentences from corpus for augmenting data using an end-to-end twin network. Furthermore, by using different parameter sharing strategies, the size problem of the model in a multi-lingual system can be balanced.
The prior art mainly has the following defects:
1) like natural language, sign languages in different countries and regions are also not used, and they have respective unique grammatical structures and vocabularies. In other words, it is difficult for people using different sign languages to understand the sign language semantics of the other party. The existing video sign language recognition method is often used for solving the problem of sign language recognition of single language, so that the sign language recognition system is limited in practical application and deployment.
2) Most of the existing multi-language sign language recognition algorithms are based on different sign language data sets, and a plurality of model parameters aiming at different sign language languages are trained on the same network architecture. The method can achieve certain effect, but ignores the problem that similar visual patterns exist among different sign languages, and the method of separate independent training is not beneficial to mining the commonality of the sign languages by the model.
Disclosure of Invention
The invention aims to provide a continuous sign language recognition system based on multi-language cooperation, which realizes multi-language sign language recognition under a single frame and has recognition performance superior to that of a single training recognition result.
The purpose of the invention is realized by the following technical scheme:
a continuous sign language recognition system based on multi-language collaboration, comprising: the method comprises the steps of sharing a visual feature encoder, a sharing sequence model and a plurality of target sequence models; wherein:
the shared visual characteristic encoder is used for extracting visual characteristics in sign language videos of all languages and inputting the visual characteristics to the shared sequence model and each target sequence model respectively;
the shared sequence model is used for expressing the same visual mode among different sign language languages, learning the commonality among the different sign language languages and initializing by embedding vectors in different languages;
each target sequence model for learning, in conjunction with an output of the shared sequence model, a mapping between a respective language visual feature and a respective sign language word;
in the training stage, performing joint optimization on all target sequence models; and each trained target sequence model can predict the probability distribution of the sign language words corresponding to the sign language video of the corresponding language.
According to the technical scheme provided by the invention, a common visual feature encoder (shared visual feature encoder) is used for extracting feature expression, and different time sequence modeling networks (namely target sequence models) are used for learning language characteristics of corresponding sign languages for sign languages of different languages. A shared sequential encoder (namely a shared sequence model) is used for expressing the same visual mode among different sign language languages, language embedding vectors are used for initialization, multi-language sign language recognition under a single frame is realized through multi-language collaborative training, visual commonalities among different sign languages are fully mined, and sign language recognition performance is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a diagram illustrating three basic frameworks for multi-language identification according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a continuous sign language recognition system based on multi-language collaboration according to an embodiment of the present invention;
fig. 3 is a schematic diagram of network iterative optimization provided in the embodiment of the present invention;
fig. 4 is a schematic diagram of obtaining alignment labels based on maximized probability according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Most of the existing sign language recognition frameworks only recognize single sign language languages, different sign languages have shared visual modes, and common characters among the sign languages of different languages can be ignored by using different sign language data sets to train a plurality of independent models. The invention expresses the same visual mode among different sign language languages through the shared time sequence encoder, realizes the multi-language sign language recognition under a single frame through multi-language collaborative training, and the recognition performance is superior to that of a recognition result of independent training.
Similar to multi-language machine translation, consider three system architectures with multi-language sign language recognition, as shown in FIG. 1. 1) The simplest approach is to use a shared visual coder and sequence model, as shown in part (a) of fig. 1, and this simple architecture can be easily implemented without major changes to existing continuous sign language systems. However, this will impair its sequence model's ability to handle multiple languages of correspondence mapping. 2) For different sign languages, different models are used respectively but share the same visual encoder, as shown in part (b) of fig. 1, where each branch functions the same as the classical sign language recognition system, and no information sharing is performed between each target branch, this design is related to languages, but complementarity between different sign languages cannot be explored. 3) To take advantage of the first two architectures, the third approach is to add an additional shared sequence model to learn the commonality between different sign languages, as shown in part (c) of fig. 1.
The continuous sign language recognition system based on multi-language collaboration provided by the embodiment of the invention adopts a structure shown in part (c) of fig. 1. It uses a common CNN-TCN for feature extraction for all input sign languages. An independent target sequence model is adopted to learn the corresponding relation between the visual characteristics and the hand words. Furthermore, in order to model the visual patterns shared between different sign languages, it is proposed to use a shared sequence model for common points between all sign languages. Each branch in the system (target sequence model) was optimized by CTC loss. As shown in fig. 2, the main structure of the system is shown, which comprises: a shared visual feature coder, a shared sequence model, and several target sequence models. The following is a description of the principles of the various parts of the system, as well as the training scheme.
One, sharing the visual feature encoder.
The shared visual characteristic encoder is used for extracting visual characteristics in sign language videos of all languages and inputting the visual characteristics to the shared sequence model and each target sequence model respectively.
In the embodiment of the invention, the shared visual characteristic encoder comprises the following components in sequence: the Spatial CNN and the Temporal CNN are used for extracting visual features of the video.
As indicated by the dashed box below fig. 2:
the space convolution network mainly comprises the following components in sequence: the first convolution layer, the first largest pooling layer, the second and third convolution layers, two inclusion layers, the second largest pooling layer, five inclusion layers, the third largest pooling layer, two inclusion layers, and the fourth largest pooling layer.
The time sequence convolution network mainly comprises two convolution layers and a maximum pooling layer, wherein the convolution layers and the maximum pooling layer are alternately arranged.
The temporal receptive field of the shared visual feature encoder is 16 frames. Denote the shared visual feature encoder as Ev. Sign language video for any languageThe visual features output by the shared visual features encoder are represented as:
wherein x istRepresenting the t-th video frame, N being the number of video frames, f representing the viewThe visual characteristics of the frequency segment (corresponding to the video frame under the receptive field) segment change the output duration into N/4 of the input.
And II, sequence learning.
The long-term dependence can be effectively modeled by a long-term memory network (LSTM), and the LSTM unit at the current time t is in a cell state CtAnd hidden state htIndicating that its basic idea controls the renewal of cellular and cryptic states by introducing gate structures. The LSTM cell has 3 different gate structures, respectively input gateForgetting doorAnd an output gateThe specific calculation method is as follows:
where σ is the activation function, t denotes the time, ftFor the input features, W and b are the linear mapping weights and offsets. The current cellular state and the cryptic state are updated as follows:
wherein, an indicates the product of each element in the vector.
In the present invention, in order to model bidirectional timing information, bidirectional long and short time memory networks (BLSTM) are used for timing encoding. Two different BLSTM networks are used in this framework for different purposes. On the one hand, it is desirable to use a separate sequence model (i.e., a target sequence model) to learn the mapping between visual features and hand-language words, since each language has its own unique rules. The separate sequence modeling branches help capture the features of each particular sign language and can reduce the interference problem between cross-language sequence modeling. On the other hand, in order to encode similar visual patterns in different sign languages, the invention introduces a shared sequence model to learn the commonality among different sign languages. To embed the markup information of a language, the state of the shared model is initialized using different language embedding vectors.
1. The sequence model is shared.
The sharing sequence model is used for expressing the same visual mode among different sign language languages, learning the commonality among different sign language languages and initializing by embedding vectors in different languages.
In order to embed the information related to the language category in the shared sequence model, the category information of the language is encoded by an Embedding Layer (Embedding Layer) and used for initialization of BLSTM in the shared sequence model to distinguish the difference of different languages.
For the input visual feature F, the output feature OsExpressed as:
Os=BLSTMs(F;h0=ek,c0=ek),
wherein h is0And c0Initial hidden state and cell state of the two-way long and short memory network, respectively, ekIs to indicate the kth handClass-embedded vectors for linguistic languages, BLSTMsRepresenting a shared sequence model.
2. A target sequence model.
Each target sequence model for learning a mapping between a respective language visual feature and a respective sign language word in conjunction with an output of the shared sequence model;
and thirdly, optimizing the model.
1. And (4) language joint optimization.
In the embodiment of the invention, in the training stage, joint optimization is carried out on all target sequence models; and each trained target sequence model can predict the probability distribution of the sign language words corresponding to the sign language videos of the corresponding languages.
In order to obtain the probability distribution of the target sequence words, the target sequence model is outputMapping to non-normalized log-probability space with fully-connected layers, expressed as:
wherein the superscript k represents the type identification of the sign language,weight and bias parameters, Y, of the fully connected layer, respectivelyt,sIs the t-th video segment belongs toProbability of the sign language word s.
And in the training stage, optimizing by adopting a connection time sequence classification loss CTC. By adopting a joint optimization mode, the total loss function is the sum of CTC loss functions of all target sequence models, expressed as,
wherein K is the total number of target sequence models,utilizing Y as a CTC loss function for a target sequence model in a kth sign language(k)And (4) calculating.
2. Shared visual feature encoder optimization.
Existing studies have shown that iterative training of CNNs is an effective way to further improve performance. The idea is to take the alignment between the input video and sign language words and use this method to fine tune the feature extractor network, the optimization process is shown in fig. 3. On this basis, the embodiment of the present invention provides a method for obtaining an alignment relationship between a sign language video and a sign language annotation sequence based on a maximum probability decoding algorithm, so as to perform fine adjustment on a shared visual feature encoder, which includes:
obtaining probability distribution Y of sign language words through target sequence model(k)Then, according to the sequence of the sign language words in the sign language labeling sequence, sequentially extracting the category probability values corresponding to the video segments corresponding to the current sign language words, operating the sign language words in the sign language labeling sequence, and combining the sign language words into a new probability matrix Y(k)′As shown in fig. 4, where T is the number of video segments; finding a new probability matrix Y using a dynamic programming algorithm(k)′The upper most probable path.
Note Pi,jIs a sequence of features f1,f2,…,fiAnd the annotation sequence s1,s2,…,sjThe maximum probability between them, the dynamically planned transfer equation is expressed as:
Pi,j=Y(k)′ i,j+max(Pi-1,j,Pi-1,j-1),
wherein Y is(k)′ i,jAs a new probability matrix Y(k)′The ith row and the jth column of the element, i.e. the ith video segment, belong to the sign language word sjThe probability of (d); i is less than or equal to N/4.
Through the above, we can obtain the alignment relationship between the sign language video and the sign language word label, that is, the category pseudo label (i.e. video segment pseudo label) at the segment level can be obtained. And taking the video as a video classification task to optimize the shared visual feature encoder. And the optimized shared visual feature encoder is used as a pre-training parameter, and the whole framework is introduced again for end-to-end training, so that continuous iterative optimization is realized.
According to the scheme of the embodiment of the invention, on one hand, multi-language sign language recognition under a single frame is realized through multi-language collaborative training, the visual commonality among different sign languages is fully excavated, and the sign language recognition performance is improved. On the other hand, the shared visual feature encoder is improved by acquiring the alignment relation between the video and the sign language annotation sequence through a maximum probability decoding algorithm.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (5)
1. A continuous sign language recognition system based on multi-language collaboration, comprising: a shared visual feature encoder, a shared sequence model, and a number of target sequence models; wherein:
the shared visual characteristic encoder is used for extracting visual characteristics in sign language videos of all languages and inputting the visual characteristics to the shared sequence model and each target sequence model respectively;
the shared sequence model is used for expressing the same visual mode among different sign language languages, learning the commonality among different sign language languages and initializing by embedding vectors in different languages;
each target sequence model for learning a mapping between a respective language visual feature and a respective sign language word in conjunction with an output of the shared sequence model;
in the training stage, performing joint optimization on all target sequence models; each trained target sequence model can predict the probability distribution of the sign language words corresponding to the sign language videos of the corresponding languages;
the maximum probability decoding algorithm is used for obtaining the alignment relation between the sign language video and the sign language annotation sequence, so that the shared visual feature encoder is finely adjusted, and the implementation mode comprises the following steps:
obtaining probability distribution Y of sign language words through target sequence model(k)Then, according to the sequence of the sign language words in the sign language labeling sequence, sequentially extracting the probability values corresponding to the video segments corresponding to the current sign language words, and combining the sign language words in the sign language labeling sequence into a new probability matrix Y after operating the sign language words(k)′(ii) a Finding a new probability matrix Y using a dynamic programming algorithm(k)′An upper maximum probability path;
note Pi,jIs a sequence of features f1,f2,…,fiH and the sequence of labels s1,s2,…,sjThe maximum probability between them, the dynamically planned transfer equation is expressed as:
Pi,j=Y(k)′ i,j+max(Pi-1,j,Pi-1,j-1)
wherein Y is(k)′ i,jAs a new probability matrix Y(k)′The ith row and the jth column of the element, i.e. the ith video segment, belong to the sign language word sjThe probability of (d);
through the operation, the alignment relation between the sign language video and the sign language word label is obtained, namely the pseudo label of the video segment is obtained, and therefore the shared visual feature encoder is optimized.
2. A continuous sign language recognition system based on multi-language collaboration as claimed in claim 1, wherein the shared visual feature encoder comprises, in sequence: a spatial convolution network and a time sequence convolution network; wherein:
the spatial convolution network comprises the following components in sequence: a first convolution layer, a first maximum pooling layer, a second and a third convolution layers, two inclusion layers, a second maximum pooling layer, five inclusion layers, a third maximum pooling layer, two inclusion layers, and a fourth maximum pooling layer;
the time sequence convolution network comprises two convolution layers and a maximum pooling layer, and the convolution layers and the maximum pooling layer are alternately arranged;
denote the shared visual feature encoder as EvSign language video in any languageThe visual features output by the shared visual features encoder are represented as:
wherein x istThe video segment is a video frame corresponding to a reception field of a shared visual feature encoder in a time sequence.
3. A continuous sign language recognition system based on multi-language collaboration as claimed in claim 1, wherein said shared sequence model is implemented by a bidirectional long and short memory network; for the input visual feature F, output the result OsExpressed as:
Os=BLSTMs(F;h0=ek,c0=ek)
wherein h is0And c0Initial implicit and cellular states of a bidirectional long and short memory network, respectively, ekIs a category embedded vector for the kth sign language.
4. The continuous sign language recognition system based on multi-language collaboration as claimed in claim 1, wherein each target sequence model is implemented by a two-way long-short memory network, which is initialized by zero vector;
wherein, F, OsRespectively, output of the shared visual feature encoder, the shared sequence model, h0And c0Respectively the initial hidden state and the cell state of the bidirectional long and short memory network.
5. The system of claim 1, wherein the sign language recognition system is a system for continuous sign language recognition based on multi-language collaboration,
target sequence model outputMapping to non-normalized log-probability space with fully-connected layers, expressed as:
Wherein the superscript k represents the category identification of the sign language,weight and bias parameters, Y, of the fully connected layer, respectivelyt,sIs the probability that the tth video segment belongs to the sign language word s;
in the training stage, the connection timing sequence classification loss CTC is adopted for optimization,
adopting a joint optimization mode, wherein the total loss function is the sum of CTC loss functions of all target sequence models and is represented as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011060272.2A CN112132094B (en) | 2020-09-30 | 2020-09-30 | Continuous sign language recognition system based on multi-language collaboration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011060272.2A CN112132094B (en) | 2020-09-30 | 2020-09-30 | Continuous sign language recognition system based on multi-language collaboration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112132094A CN112132094A (en) | 2020-12-25 |
CN112132094B true CN112132094B (en) | 2022-07-15 |
Family
ID=73843529
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011060272.2A Active CN112132094B (en) | 2020-09-30 | 2020-09-30 | Continuous sign language recognition system based on multi-language collaboration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112132094B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112861827B (en) * | 2021-04-08 | 2022-09-06 | 中国科学技术大学 | Sign language translation method and system using single language material translation |
CN113992894A (en) * | 2021-10-27 | 2022-01-28 | 甘肃风尚电子科技信息有限公司 | Abnormal event identification system based on monitoring video time sequence action positioning and abnormal detection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647603A (en) * | 2018-04-28 | 2018-10-12 | 清华大学 | Semi-supervised continuous sign language interpretation method based on attention mechanism and device |
CN110210416A (en) * | 2019-06-05 | 2019-09-06 | 中国科学技术大学 | Based on the decoded sign Language Recognition optimization method and device of dynamic pseudo label |
CN110874537A (en) * | 2018-08-31 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Generation method of multi-language translation model, translation method and translation equipment |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10867595B2 (en) * | 2017-05-19 | 2020-12-15 | Baidu Usa Llc | Cold fusing sequence-to-sequence models with language models |
-
2020
- 2020-09-30 CN CN202011060272.2A patent/CN112132094B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647603A (en) * | 2018-04-28 | 2018-10-12 | 清华大学 | Semi-supervised continuous sign language interpretation method based on attention mechanism and device |
CN110874537A (en) * | 2018-08-31 | 2020-03-10 | 阿里巴巴集团控股有限公司 | Generation method of multi-language translation model, translation method and translation equipment |
CN110210416A (en) * | 2019-06-05 | 2019-09-06 | 中国科学技术大学 | Based on the decoded sign Language Recognition optimization method and device of dynamic pseudo label |
CN111325099A (en) * | 2020-01-21 | 2020-06-23 | 南京邮电大学 | Sign language identification method and system based on double-current space-time diagram convolutional neural network |
Non-Patent Citations (2)
Title |
---|
Research of a Sign Language Translation System Based on Deep Learning;Siming He,and etc;《2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)》;20200109;第392-396页 * |
融合宽残差和长短时记忆网络的动态手势识别研究;梁智杰等;《计算机应用研究》;20191231;第36卷(第12期);第3846-3852页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112132094A (en) | 2020-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Back to mlp: A simple baseline for human motion prediction | |
CN110334361B (en) | Neural machine translation method for Chinese language | |
CN108733792B (en) | Entity relation extraction method | |
CN110490946B (en) | Text image generation method based on cross-modal similarity and antagonism network generation | |
Jang et al. | Recurrent neural network-based semantic variational autoencoder for sequence-to-sequence learning | |
CN112100351A (en) | Method and equipment for constructing intelligent question-answering system through question generation data set | |
CN111783462A (en) | Chinese named entity recognition model and method based on dual neural network fusion | |
CN111368565A (en) | Text translation method, text translation device, storage medium and computer equipment | |
Ruan et al. | Survey: Transformer based video-language pre-training | |
CN112364174A (en) | Patient medical record similarity evaluation method and system based on knowledge graph | |
CN110059324B (en) | Neural network machine translation method and device based on dependency information supervision | |
CN112132094B (en) | Continuous sign language recognition system based on multi-language collaboration | |
Tang et al. | Deep sequential fusion LSTM network for image description | |
CN111985205A (en) | Aspect level emotion classification model | |
CN110991290A (en) | Video description method based on semantic guidance and memory mechanism | |
Luo et al. | Hierarchical transfer learning architecture for low-resource neural machine translation | |
CN113204633A (en) | Semantic matching distillation method and device | |
Song et al. | Parallel temporal encoder for sign language translation | |
Qing-Dao-Er-Ji et al. | Research on Mongolian-Chinese machine translation based on the end-to-end neural network | |
Basmatkar et al. | Survey on neural machine translation for multilingual translation system | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN116432019A (en) | Data processing method and related equipment | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
Shirghasemi et al. | The impact of active learning algorithm on a cross-lingual model in a Persian sentiment task | |
Wang et al. | Multimodal object classification using bidirectional gated recurrent unit networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |