WO2016168591A1 - System and method for automated sign language recognition - Google Patents
System and method for automated sign language recognition Download PDFInfo
- Publication number
- WO2016168591A1 WO2016168591A1 PCT/US2016/027746 US2016027746W WO2016168591A1 WO 2016168591 A1 WO2016168591 A1 WO 2016168591A1 US 2016027746 W US2016027746 W US 2016027746W WO 2016168591 A1 WO2016168591 A1 WO 2016168591A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hand
- features
- signs
- processor
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 230000033001 locomotion Effects 0.000 claims abstract description 95
- 230000036544 posture Effects 0.000 claims abstract description 54
- 230000007704 transition Effects 0.000 claims description 77
- 238000012549 training Methods 0.000 claims description 74
- 238000000605 extraction Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 11
- 230000005484 gravity Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 description 57
- 210000004247 hand Anatomy 0.000 description 31
- 210000003811 finger Anatomy 0.000 description 15
- 238000012545 processing Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 230000003068 static effect Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000001133 acceleration Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 206010048865 Hypoacusis Diseases 0.000 description 4
- 230000009471 action Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000002730 additional effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000001143 conditioned effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 210000004553 finger phalanx Anatomy 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000003813 thumb Anatomy 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010370 hearing loss Effects 0.000 description 1
- 231100000888 hearing loss Toxicity 0.000 description 1
- 208000016354 hearing loss disease Diseases 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/014—Hand-worn input/output arrangements, e.g. data gloves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
- G09B21/009—Teaching or communicating with deaf persons
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
Definitions
- This disclosure relates generally to the field of sign language recognition (SLR) and, more specifically, to systems and methods for SLR including Hidden Markov Models (HMMs) and the modeling of non-dominant-hand features in sign language recognition.
- SLR sign language recognition
- HMMs Hidden Markov Models
- Sign languages are natural languages mainly used by hard-of-hearing (HoH) people. While spoken languages use voices to convey meaning, sign languages utilize hand shape, orientation and movement, sometimes with the aid of facial expression and body movement, to express thoughts. Modern research has proved that sign languages are true languages and can be the first languages of HoH people.
- SLR (including 32 million children) has disabling hearing loss.
- the technology of SLR may lead to products that can benefit the HoH communities in various ways.
- SLR compared with SR, SLR has received relatively limited research efforts and is immature yet. Introducing the mature SR techniques into SLR may lead to substantial progresses in SLR.
- SLR and SR are similar in the sense that they both attempt to transfer a sequence of input signals into a sequence of text words. While the input signals for SR are voice signals, those for SLR could be sensor signals from wearable device and/or image sequences from a camera. The similarity indicates that the SR frameworks can also be applicable for SLR. Though, certain strategies need to be used to handle special characteristics of sign languages.
- the first characteristic is that the dominant hand and the non-dominant hand are not equally important for sign languages. For example, frequently used sign-language words normally only involve the dominant hand in signing.
- the second one is that for the dominant- hand-only signs, the non-dominant-hand signals may not be stable and are influenced by the neighborhood signs or even additional actions. Strategies to handle these characteristics related to the non-dominant hand are thus beneficial for sign language recognition.
- the method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework.
- an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases.
- the speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network.
- Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals.
- FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures.
- the HMM framework for SLR differs from existing SR frameworks.
- the system generates an HMM framework for SLR through (1) training a HMM for each gesture that forms an individual sign; (2) training a transition HMM for the transition signals between two continuous signs within a larger phrase, a start HMM for the transition signals before the first sign of sign phrase begins, and an end HMM after the last sign of a phrase finishes; and (3) connecting the sign and transition HMMs into phrases with the transition/st/end HMMs into a decoding network similar to that of SR, as illustrated in FIG.3.
- the system generates an HMM-based recognition framework for the task of SLR.
- the SLR framework generates explicit models of transition signals to reduce the influence from the neighborhood signs on the modeling of signs, which improves the recognition accuracy.
- the SLR system also incorporates features that are extracted from the dominant hand and non-dominant hand into the training of HMMs. While traditional speech recognition comes from the single voice of a user, the input of the SLR system comes from two channels, which correspond to the movements of the left and right hands.
- One characteristic of sign language is the usages of dominant and non-dominant hands are unbalanced. The dominant hand is used in most signs, while the non-dominant hand appears less frequently and mainly plays an assistant role compared with the dominant hand. This characteristic indicates that incorporating dominant- hand and non-dominant-hand information in a distinguishing way may be beneficial.
- the system optionally employs two embodiments to only keep key information from the non-dominant hand in modeling and thus make the SLR system focus more on the dominant hand. In one
- the system only includes features from the non-dominant hand related to the hand global angles in modeling.
- the SLR systems and methods described herein also model the non-dominant hand variances for dominant-hand-only signs.
- the non-dominant hand variance models are designed to handle the special characteristic of sign language that the non-dominant hand inputs of dominant-hand-only signs are influenced by the neighborhood signs or even additional actions.
- the system design includes a Tying-with-Transition (TWT) solution to reduce the impact of this phenomenon on recognition performance.
- TWT Tying-with-Transition
- One embodiment of a training and sign language recognition method ties the non-dominant-hand related parameters of the transition model with those of the models for dominant-hand-only signs, and updates the decoding network by including the tied model as an alternative option for each dominant-hand-only sign model in the decoding network if this sign follows a double-hand sign.
- TWT Tying-with-Transition
- a training method for automated sign language recognition includes receiving, with an input device, a plurality of training inputs corresponding to a plurality of predetermined sequences of signs from hand movements and postures of a user, extracting, with the processor, a first set of features from the training inputs corresponding to hand movements and postures of the user for each sign in the plurality of predetermined sequences of signs, generating, with the processor, a first Hidden Markov Model (HMM) based on the first set of features from the training inputs corresponding to hand movements and postures for each sign in the predetermined sequences of signs, extracting, with the processor, a second set of features from the training inputs corresponding to hand movements and postures of the user for transitions between signs in the plurality of predetermined sequences of signs, generating, with the processor, a second HMM based on the second set of features from the training inputs and the predetermined sequences of signs, extracting, with the processor, a third set of features from the training
- a method for automated sign language recognition includes receiving, with an input device, an input based on a plurality of hand movements and postures of a user that correspond to a sequence of signs, extracting, with a processor, a plurality of features from the input corresponding to the plurality of hand movements and postures, identifying, with the processor, a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory, identifying, with the processor, a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory, and generating, with an output device, an output corresponding to the first sign from the input.
- HMM Hidden Markov Model
- a system for automated sign language recognition includes an input device configured to receive input corresponding to a plurality of hand movements and postures from a user corresponding to a sequence of signs, an output device, a memory, and a processor operatively connected to the input device, the output device, and the memory.
- the processor is configured to receive an input from the input device based on a plurality of hand movements and postures of a user that correspond to a sequence of sign, extract a plurality of features from the input corresponding to the plurality of hand movements and postures, identify a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory, identify a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory, and generate an output with the output device corresponding to the first sign from the input.
- HMM Hidden Markov Model
- FIG. 1 is a schematic diagram of a system that performs automated sign language recognition.
- FIG. 2 is a diagram of a prior art Hidden Markov Model (HMM) decoding network that is used in speech recognition.
- HMM Hidden Markov Model
- FIG. 3 is a diagram of an HMM decoding network that is used in sign-language recognition.
- FIG. 4 is a diagram depicting hand movements and postures in a sign language phrase.
- FIG. 5 is a depiction of an input device that includes gloves with sensors to detect the motion, position, orientation, and shape of the hands of a user who provides sign language input to the system of FIG. 1.
- FIG. 6 is a drawing of the structure of a human hand and different locations in the hand for sensors to detect the motion and shape of the individual fingers in the hand and the motion, position, and orientation of the entire hand.
- FIG. 7 is a diagram of coordinate on each sensor in a three-dimensional space used during feature extraction for input data corresponding to movements of the hands of a user in the system of FIG. 1.
- FIG. 8 is a block diagram of a process for training HMMs in the system of FIG. 1.
- FIG. 9 is a block diagram of a process for performing sign language recognition using the system of FIG. 1. DETAILED DESCRIPTION
- sign phoneme or more simply “phoneme” are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language.
- Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term "sign” refers to a word or other unit of language that is formed from one or more sign phonemes. In some instances a single sign phoneme corresponds to a single word, while in other instances multiple phonemes form a single word depending upon the conventions of the sign language.
- transition refers to the movements of the hands that are made between individual signs, such as between sequences of sign phonemes that form individual words or between the phonemes within a single sign.
- a transition movement is not a typical unit of a sign language that has independent meaning, but transition movements still provide important contextual information to indicate the end of a first word or phoneme and the beginning of a second word or phoneme.
- the term “st/end” refers to hand movements and postures of the user for starting and ending a sequence of signs, including predetermined sequences of signs that are used in a training process for sign language recognition and for non-predetermined sequences of signs that an automated sign language recognition system detects during operation.
- phrase refers to any sequence of signs that include some form of start and end hand motions along with motions for at least one individual sign. In phrases that include multiple signs, a transition hand movement separates individual signs within the phrase. Phrases can correspond to any sequence of interrelated signs such as signs corresponding to multiple words in a complete sentence or shorter word sequence. A phrase can also include, for example, sequences of numbers or individual letters in a predetermined sign language.
- the term "dominant hand” refers to a hand of the user that performs a greater number of movements when producing signs.
- the term “non-dominant hand” refers to the other hand that performs fewer movements.
- Many sign language conventions specify that dominant hand performs most of the movements to form signs while the non-dominant hand performs fewer gestures to form the signs.
- the dominant hand is the right hand of the user while the non-dominant hand is the left hand, although the roles of the hands may be reversed for some signers, such as left-handed signers, or for different sign language conventions where the left hand is the dominant hand for forming signs.
- a sign language recognition (SLR) system records data based on the hand gestures of a user who produces one or more signs in at least one form of recognized sign language.
- the system receives input using, for example, gloves that include motion and position sensors or from recorded video data of the hands while the user performs sign gestures.
- the system also includes one or more digital processing devices, data storage devices that hold a system of Hidden Markov Models (HMMs) that are used to recognize sign language gestures, and an output device.
- the digital processing devices analyze recorded data from the gestures and identify the meanings of words in the sign language from one or more hand gestures that correspond to different sign phonemes and with reference to transitions movements as the user repositions his or her hand to form different sign phonemes in one or more words.
- HMMs Hidden Markov Models
- the system also distinguishes between left-hand and right-hand gestures and analyzes the movements of both hands since some forms of sign language use the left and right hands for different purposes.
- the output device generates a text or audio translation of recognized signs, and the output can be provided to other humans for translation or additional computing systems receive the input to enable automated sign language recognition as a human machine interface.
- FIG. 1 depicts an embodiment of a system 100 that performs automated sign language recognition.
- the system 100 includes a processor 104, memory 108, one or more input devices 132, and one or more output devices 136.
- the processor 104 is, for example, a digital microprocessor that includes one or more central processing unit (CPU) cores and optionally one or more graphical processing unit (GPU) units, digital signal processors (DSPs), field programmable gate arrays (FPGAs), and application specific integrated circuits (ASICs) for processing data from the input devices 132 and generating output using the output devices 136.
- the input devices 132 include two gloves that the user 102 wears while producing sign language input.
- the gloves include accelerometers and gyroscope sensors that provide input data tracking the position, movement, orientation, and hand shape of both the dominant hand and non-dominant hand of the user 102 as the user 102 performs different sequences of signs.
- the memory 108 includes one or more volatile memory devices such as dynamic or static random access memory (RAM) and one or more non-volatile memory devices such as a magnetic disk or solid state storage device that store program instructions 1 10, model data, and other forms of data for operation of the system 100.
- RAM dynamic or static random access memory
- non-volatile memory devices such as a magnetic disk or solid state storage device that store program instructions 1 10, model data, and other forms of data for operation of the system 100.
- the memory 108 includes a sign language phrase model 112 that is formed from three classes of different Hidden Markov Models (HMMs) including a phrase st/end (st/end) model 116, individual sign models 120, and a sign transition model 124.
- HMMs Hidden Markov Models
- the combined HMMs 116 - 124 form a sign language recognition decoding network 300.
- the overall sign language phrase recognition model 112 uses the individual HMMs 116 - 124 in the decoding network 300 to identify the beginning and end of phrases, individual signs within a phrase, and transitions between signs in multi-sign phrases in features that are extracted from the input received from the input devices 132.
- the sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences.
- the memory 108 also stores training data 130 that the processor 104 uses to generate the HMMs in the sign language phrase recognition model 112.
- a user 102 provides sign language input to the system 100 using the input device 132.
- the user 102 provides sign inputs to the input devices 132 for one or more predetermined sequences of signs that the processor 104 stores with the training data 130 in the memory 108.
- the processor 104 uses the predetermined structure of the sign sequence and observed features that are extracted from the training input data to train the multiple HMMs for the phrase st/end model 116, individual sign models 120, and sign transition model 124.
- the user 102 provides sign inputs to the system 100 using the input devices 132 and the system 100 uses the trained HMMs 116 - 124 along with the language model 128 to identify sequences of signs based on the hand movements and postures of the user 102.
- the system 100 generates an output using the output devices 136 to, for example, generate an audio output in a predetermined language based on the identified signs in each phrase.
- the output device 136 is a display screen or network adapter that displays or transmits, respectively, a text representation of the identified signs.
- the system 100 receives the signs as command inputs and the processor 104 executes stored program instructions 110 based on the identified command inputs to produce an output using one or more of the output devices 136.
- the sign language recognition (SLR) system 100 trains HMMs and builds a decoding network in a similar way as conventional HMM-based speech recognition system.
- the SLR system views every unique basic sign as a sign phoneme, and trains a HMM for it.
- Each sign word can thus be modeled by connecting the HMMs of the component sign phonemes, just like speech words are modeled by connecting voice phoneme HMMs.
- the decoding network is constructed by connecting sign word models using a language model, which can be grammars, n- grams, or other types of language model.
- the special treatment needed for SLR lies in the modeling of transition.
- the system trains a HMM, called the transition model, to model the transition signals between neighboring signs.
- a start (st) HMM and an end HMM are also trained to model the transition signals in a phrase before the first sign and after the last sign respectively.
- the system then modify the decoding network by inserting the transition model between neighboring sign phonemes, inserting the start model before phrase start nodes and inserting the end model after the phrase end nodes.
- FIG. 3 illustrates the decoding network along with the HMM structures in the system 100 in more detail.
- an HMM-based SLR decoding network 300 incorporates all of the HMMs, including sign HMMs 120, the transition HMM 124, and the st/end HMM 116. All of these HMMs adopt a left-to-right HMM structure.
- the HMM decoding network 300 uses a fixed number of states within the sign HMM 120 to encode each of the potential sets of features for different sign phonemes.
- the numbers of states are tuned for the sign HMM 120, the transition HMM 124, and the st/end HMMs 116, respectively.
- FIG. 8 is a block diagram of a training process 800 for training HMMs that identify features corresponding to hand movements and postures for each sign, transition between signs, and start and end hand movements and postures for predetermined sequences of signs corresponding to multiple phrases in a training process.
- a description of the process 800 performing an action or function refers to the operation of a processor to execute stored program instructions to perform the function or action in association with other components in the automated sign language recognition system.
- the process 800 is described in conjunction with the system 100 of FIG. 1 for illustrative purposes.
- Process 800 begins as the system 100 receives predetermined sequences of signs in a sign language from the input device 132 including generated sensor data from both the dominant and non-dominant hands of the user 102 or other users who perform the training process (block 804).
- each sequence of sign language inputs includes a set of phonemes that correspond to a single word in a predetermined sign language, but the inputs also include start and end features from the user and transitions between sign phonemes.
- FIG. 4 depicts an example of an input with a sequence of individual signs that form one predetermined sequence during the training process.
- each of the signs in FIG. 4 forms one input sequence to the training process 800.
- the training process 800 does not require input for all possible words and phrases in the language, but the system 100 receives a representative sample of the sign phonemes, st/end signals, and transitions during the process 800 to enable recognition of sign language words that are not directly included in the training data inputs.
- the sequence 400 includes a phrase formed from four individual signs for the words "at” (sign 404), "where” (sign 408), and two signs that specify the word "register” (signs 412 and 416).
- the system 100 also receives input data for the hand movements and postures that the user performs at the beginning and end of the predetermined sequence along with transition movements for the hands that occur between the signs in the sequence.
- the terms “hand posture” or more simply “posture” refer to the orientation and shape of the hand, including either or both of the palm and fingers, that conveys information in sign phonemes along with the motion of the hand and fingers on the hand.
- the system 100 receives multiple inputs for a wide range of phrases from one or more users to provide a set of training data that not only provide training information for the recognition of individual signs, but further include input data corresponding to the st/end hand motions for phrases and for transitions between signs within the phrases.
- the system 100 collects training inputs that include a certain number of samples are collected for each of the sign words in the vocabulary, including both signs that include a single phoneme and multi-phoneme signs.
- the system 100 classifies the input data in several segments, i.e., the component signs, preceded by the start part (i.e., from the beginning to the start of the first sign), followed by the end part (i.e., from the finishing point of the last sign to the end), and the transition parts.
- Each sign phoneme model can thus be trained on all the segments of the focused signs in the training data, while the transition, start, and end models can be trained on the corresponding segments, respectively.
- the system 100 uses gloves with sensors that record acceleration and hand angle position information for the dominant and non-dominant hands of the user.
- FIG. 5 depicts examples of gloves 500 that are worn during the process 800 and during subsequent sign language recognition processing.
- the gloves 500 include multiple sensors, such as micro-electromechanical systems (MEMS) sensors that record the altitude, orientation, and acceleration of different portions of the dominant and non-dominant hands of the user as the user performs hand movements/postures to form the signs in the predetermined sequence.
- MEMS micro-electromechanical systems
- FIG. 6 depicts a more detailed view of one embodiment of the sensors 600 in a glove for a right hand.
- the sensors 600 include multiple sensors 604A and 604B that generate data for the position and movement of the phalanges (fingers) with the sensors 604A on the fingers and 604B on the thumb.
- the sensors 600 also include sensors 608 that generate data for the position and movement of the metacarpal region (palm) of the hand and a sensor 612 that measures the position and movement of the carpal region (wrist) of the hand. While FIG. 6 depicts a sensor arrangement for the right-hand, the system 100 includes a glove with a similar sensor arrangement for the left-hand as well.
- the process 800 continues as the system 100 performs feature extraction to identify the features that correspond to individual signs, sign transitions, and the st/end movements in the input corresponding to the predetermined sequence of sign phonemes (block 808).
- the feature extraction process enables the system 100 to identify characteristics of the sensor data for the hand movement/posture inputs in a uniform manner to train the HMMs 116 - 124 for subsequent sign language recognition operations.
- the system 100 receives manually inputted timestamps that categorize different sets of features for hand movements and postures that correspond to the st/end of the input, to different
- the system 100 optionally records time information for the
- the extracted features include both finger related (FR) features that correspond to the movements and shapes of individual fingers and to global angle (GA) features that correspond to the overall angles and movements of the hands.
- the system 100 extracts different sets of features for the input data from the dominant hand and the non-dominant hand.
- the extracted features for both hands form the basis of training data to train the HMMs 116 - 124 in the system 100.
- the feature set can be grouped into two categories: finger related and global angle related, which are referred to as FR features and GA features, respectively, herein.
- the FR features include the closeness of each finger, the openness of the thumb, and the deltas (i.e., the differences between the current and the previous frames) of these features.
- the GA features refer to the posture of the hand and are derived from the global polar angles, x, y and z, for a hand.
- the features include trigonometric functions of the angles and the delta of the angles for the hand as the hand moves to different positions and orientations to produce signs, inter-phoneme transitions, and phrase start/end gestures and postures.
- the system 100 uses the quaternions identified by angular velocity and acceleration to represent accumulated rotation and positions for the angles of both hands in three-dimensional spaces, such as the three-dimensional space in which the hands move to produce the training sequences, in a simple manner that avoids gimbal lock and since quaternion operations are mathematically non-commutative, which accurately represents the physical characteristics of the rotational movements of the hands of the user.
- FIG. 7 depicts components of the quaternions that the process 800 uses to extract the features.
- FIG. 7 depicts three spatial axes of movement x s , y s , and z s for the hand of the user 102 while forming sign phonemes or transitions.
- the axes x g , y g , and z g indicate a predetermined absolute coordinate system and the orientation of the hand changes within the absolute coordinate system.
- the sensors in the input device gloves 132 that the user 102 wears to provide input to the system 100 are In FIG. 7, the downward direction of gravity serves as an initial attitude unit quaternion, p s .
- the user 102 hangs his or her arms with hands and fingers extended downward in parallel to the direction of gravity (shown as g s in FIG. 7) in the direction of the quaternion, p s at the beginning of each training sign input sequence, to enable each initial quaternion p s in the input data to be aligned with the single reference quaternion for gravity without requiring complex calibration for the individual bones in the fingers and other portions of the hands of the user.
- the system 100 extracts two types of features from the input data.
- the first type of feature corresponds to features of the hand shape (finger related, or FR), and the second type of feature corresponds to the hand direction (global angle of hand, or GA).
- the system 100 extracts the FR features for each finger. In some embodiments, the system 100 only extracts the FR features for individual fingers only in the dominant hand while extracting the GA features for both the dominant and non- dominant hands as described in more detail below. In other embodiments, the system 100 extracts both the FR and GA features for both the dominant and non-dominant hands.
- the system 100 To extract the GA features for the direction of the hand, more specifically the direction of the palm, the system 100 identifies the mean of the directions of sensors that are associated with the palm of the hand, such as sensors '2a', '3a', '4a' and '5 a' in FIG. 6.
- the system 100 extracts a feature that fully describes the direction of the hand in space based on two perpendicular directions, a first direction along the palm plan and a second direction along the normal of the palm plan.
- the system 100 optionally identifies velocity and acceleration features of the finger bones or of the palm of the hand by identifying multiple rotational positions over time.
- the system 100 uses the quaternion representation of ⁇ '' including multiple measurements from the sensors generated at series of predetermined sampling times instead of the single measurement p s to estimate the accumulated rotation of either or both of the dominant and non-dominant hands.
- Process 800 continues as the system 100 uses the extracted feature data from multiple predetermined sequences of sign inputs to train the HMMs 116 - 124 (block 812).
- the processor 104 generates the HMM 116 for the features that are extracted from the input data corresponding to the start end hand movements and postures, the HMM 120 for the features that are extracted from the input data corresponding to the sign hand movements and postures, and the HMM 124 for the features that are extracted from the input data corresponding to the transition hand movements and postures.
- the system 100 performs a training process to generate the HMMs and the system 100 based on training inputs from the user 102 and the predetermined sign input sequences in the training data 130.
- a word may contain one or more sign phonemes.
- the system 100 generates the HMM 120 for the sign phonemes to accurately model words based on three requirements.
- words have to be modeled in order to evaluate the feature probability, P(X ⁇ W).
- language models have to be used to calculate the prior sentence probability, P(W).
- word models and language models have to be combined to form networks of HMM states in which the decoding for the most likely word string W* in the search space can be performed.
- the processor 104 performs the training process to generate the HMM 120 for the sign phonemes based on the observed features for the signs conditioned upon the signs in the predetermined training sequences of signs.
- the processor 104 similarly performs the training process for the transitions 124 based on the observed features for the transitions conditioned upon the transitions between signs in the predetermined training sequences of signs and the training process for the phrase st/end hand movements 116 based on the observed features for the st/end hand movements conditioned upon the st/end hand movements in the predetermined training data 130.
- the system 100 uses the extracted features from the feature data as a plurality of different feature vectors that form the inputs for the training processes of each of the separate HMMs including the st/end HMM 116, sign HMM 120, and transition HMM 124.
- the feature vectors include selected features from both the dominant hand and the non-dominant hand.
- the feature vectors include features from both hands that are recorded concurrently to map the movements of both hands that occur concurrently while the hands of the user 102 move to form signs and st/end or transition movements.
- the system 100 trains the sign phoneme models 120 based on the segments of the corresponding signs.
- Alternative training methods are based on the detailed properties of the start and end segments, which may vary based on different contexts for receiving training phrases.
- the start segment for each phrase contains static hand position and posture signals followed by movement signals, while every end segment contains movement signals which may or may not be followed by static signals.
- the movement signals can be viewed as a type of transition.
- the system 100 labels each start segment into a static segment and a transition segment, and label each end segment into a transition segment and a static segment, if it exists.
- the training process for all of the HMMs in the decoding network 300 includes the start and end segments with both the static segments for hand posture and the dynamic segments for hand movement.
- the system 100 generates the start model HMM 116 by connecting the static model with the transition model.
- the end model HMM is includes two alternatives, one connecting the transition model with the static model, and the other only including the transition model.
- the model HMMS 116 - 124 are diagonal covariance Gaussian mixtures, and the system 100 re-estimates the HMMs iteratively after initialization as well as after splitting the Gaussian mixtures.
- the training process implicitly includes Gaussian splitting that occurs within different sets of hand motions in the separate HMMs during the iterative process described above.
- the transition model may contain multiple Gaussian mixture components in each state and thus implicitly perform classification on the transition hand movements and postures using the transition HMM 124 that occur between different signs that are encoded in the separate HMM 120.
- the system 100 trains a "universal" transition HMM 124 that can identify the transitions that occur between multiple signs and does not require the system 100 to be trained to recognize tightly connected two-sign and three-sign sequences to accurately recognize sign language input.
- the st/end HMM 120 also servers a similar role to enable the system 100 to identify the features that correspond to the start and end of each phrase.
- Process 800 continues as the processor 104 stores the separate HMMs 116 - 124 in the memory 108 (block 816).
- the system 100 uses the trained HMMs to recognize sequences of signs that a user inputs to the system 100 using the input devices 132.
- the trained HMMs can of course recognize sequences of signs that correspond to the training data, but the system 100 is also configured to recognize a wide range of sign language sequences that do not exactly match the training sign sequences since the processor 104 uses the language model 128 to build a decoding network 300 that includes the separate HMMs to recognize individual signs and the transitions between signs with many different combinations that are not necessarily included in the training data directly.
- the system 100 performs a different feature extraction process and HMM generation process for the processing of blocks 812 and 816 in the process 800.
- the system 100 employs a Tying-with-Transition (TWT) solution to enhance the robustness of the sign language recognition system towards the dominant-hand-only sign phoneme that immediately follows a double-hand sign in a given sign- sequence phrase.
- TWT Tying-with-Transition
- the non-dominant hand should involve no movements by definition, but in reality, may move due to the neighboring signs or additional action.
- the system 100 separates the dominant-hand and non- dominant-hand features for extraction during the processing of block 808 in two data streams for HMM modeling. All the HMM models 116-124 are then trained using the same training procedures described above based on the two data streams, which results in two sets of Gaussian mixtures in each HMM state for the dominant-hand and non-dominant-hand related data streams respectively.
- the processor 104 generates the HMM models 120 for signs based on a first portion of the first set of features from the training inputs corresponding to hand movements and postures for a dominant hand of the user that form the sign phonemes during the training process and based on a portion of the second set of features from the training data corresponding to the hand movements and postures of a non-dominant hand of the user to perform transitions during the training process.
- the system 100 generates an additional enhanced HMM model for each of the dominant-hand-only sign model based on the trained HMM models for both hands by replacing the non-dominant-hand related Gaussian mixtures in the focused sign model with those non- dominant-hand related Gaussian mixtures in the transition model 124. Note that in this way, the non-dominant-hand variances are thus covered for the dominant-hand-only signs by the enhanced sign models.
- Both the enhanced dominant-hand-only sign models and the originally trained models i.e., st/end model 116, sign models 120, and transition model 124) are utilized in block 816 for the TWT solution.
- the decoding network 300 is modified to incorporate the enhanced dominant-hand-only sign models based on the neighboring signs.
- this sign model in the decoding network 300 is replaced with a confusion set of two parallel models, which include the corresponding enhanced model and the original model for this dominant-hand-only sign.
- the recognition procedure based on this modified decoding network remains the same.
- FIG. 9 depicts a process 900 for automated sign language recognition.
- a description of the process 900 performing an action or function refers to the operation of a processor to execute stored program instructions to perform the function or action in association with other components in the automated sign language recognition system.
- the process 900 is described in conjunction with the system 100 of FIG. 1 for illustrative purposes.
- Process 900 begins as the system 100 receives dominant hand and non-dominant hand input from the user 102 via the input devices 132 (block 904).
- the input devices are the gloves 500 that include sensors to record the hand movements and postures for the user 102 as the user 102 performs hand movements and postures corresponding to a sequence of signs.
- Process 900 continues as the system 100 extracts features from the input data (block 908).
- the system 100 performs the feature extraction processing of block 908 in substantially the same manner as the feature extraction that is described above in conjunction with the processing of block 808 in the process 800.
- the processor 104 extracts the same FR and GA features for the input data from the dominant hand and extracts the GA features and optionally the FR features for the input data from the non-dominant hand in the same manner as in the process 800 to produce feature vectors in the recognition process in the same manner as during the earlier training process.
- Process 900 continues as the system 100 analyzes the extracted features from the input to identify a start of a sequence of at least one sign that forms a phrase (block 912).
- the processor 104 uses the st/end HMM 116 in the sign language recognition model 112 to identify features that correspond to the hand movements and postures associated with beginning a new sequence of signs.
- the processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG.
- the HMMs 116 - 124 are linked to each other so after detection of the start of a sequence, the processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence.
- the sign input data from the user 102 includes either a transition to a new sign or an input that signifies the end of a phrase.
- the processor 104 uses both the transition HMM 124 and st/end HMM 116 to identify a set of features that precede and follow a sign in the input data (block 920).
- the system 100 interprets an entire phrase of sign inputs as a whole, and the processor 104 uses the HMM network 300 in the sign language recognition model 112 to identify input based on the maximum likelihoods of word sequences based not only on a sequence of words extending forward in time but also backward in time once the system 100 has received the entire phrase.
- the processor 104 identifies a path using the HMM network 300 and a maximum a posteriori (MAP) estimation process with a Bayes decision formulation to identify the sign phonemes and sequences of signs in the trained HMMs that have the greatest likelihood of matching the extracted features from the input data to determine if a set of features corresponds to a transition between signs or the end of a phrase and determines the transition or end of phrase.
- MAP maximum a posteriori
- the processor 104 also identifies the separation in time between the detection of features, which may indicate the likelihood of the features corresponding to a brief transition with a short time interval between signs or the end of a phrase with a somewhat longer time interval between signs. If the system 100 detects a transition but not the end of the sequence in the phrase (block 924), then the process 900 continues as the system 100 identifies the next sign in the sequence (block 916).
- a sign phoneme here refers to the smallest units that bear linguistic meaning in a predetermined sign language.
- the memory 108 stores HMMs in the decoding network 300 and includes all the recognized sign-language phonemes along with st/end and transition gestures, for recognition of phrases that are supported by the language model for inputs from the user 102, which can be grammars, n-gram models, or other types of language model.
- the hypothesized sign sequence with the highest P(X ⁇ W) P(W) value among the multiple hypotheses generated is used as the result of the sign language recognition procedure for the detected phrase.
- Block 928 further translates that sign sequence from a predetermined sign language into one or more different spoken or written languages such as Chinese or English.
- the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output.
- the system 100 After receiving one or more phrases, the system 100 generates an output corresponding to the identified words from the sign language input (block 932). As described above, in some embodiments the processor 104 generates synthesized speech for the words in the output using a loudspeaker or other audio output device 136. In this mode, the system 100 can produce audio output for recipients who can hear but are unable to understand the signs from the user 102. In another configuration, the output device 136 includes a display device that generates a text output corresponding to the signs from the user 102. In still another configuration, the system 100 either performs a command based on the sign inputs from the user 102. For example, in some embodiments the system 100 is further connected to a home automation system or other system that normally receives speech input, but in the configuration of FIG. 1 the system can accept sign language input to provide similar functionality to users who communicate via sign language instead of spoken words.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Business, Economics & Management (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Probability & Statistics with Applications (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A method for sign language recognition includes receiving, with an input device, an input based on a plurality of hand movements and postures of a user that correspond to a sequence of signs, extracting a plurality of features from the input corresponding to the plurality of hand movements and postures, identifying, a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory and identifying a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory. The method also includes generating an output corresponding to the first sign from the input.
Description
System and Method For Automated Sign Language Recognition
CLAIM OF PRIORITY
[0001] This application claims priority to U.S. Provisional Application No. 62/148,204, which is entitled "System and Method For Automated Sign Language Recognition," and was filed on April 16, 2015, the entire contents of which are hereby incorporated by reference herein. This application claims further priority to U S. Provisional Application No. 62/173,878, which is entitled "Sign Transition Modeling and a Scalable Solution to Continuous Sign Language Recognition for Real -World Applications," and was filed on June 10, 2015, the entire contents of which are hereby incorporated by reference herein.
FIELD
[0002] This disclosure relates generally to the field of sign language recognition (SLR) and, more specifically, to systems and methods for SLR including Hidden Markov Models (HMMs) and the modeling of non-dominant-hand features in sign language recognition.
BACKGROUND
[0003] Sign languages are natural languages mainly used by hard-of-hearing (HoH) people. While spoken languages use voices to convey meaning, sign languages utilize hand shape, orientation and movement, sometimes with the aid of facial expression and body movement, to express thoughts. Modern research has proved that sign languages are true languages and can be the first languages of HoH people.
[0004] The World Health Organization estimates that over 5% of the global population
(including 32 million children) has disabling hearing loss. The technology of SLR may lead to
products that can benefit the HoH communities in various ways. However, compared with SR, SLR has received relatively limited research efforts and is immature yet. Introducing the mature SR techniques into SLR may lead to substantial progresses in SLR.
[0005] The tasks of SLR and SR are similar in the sense that they both attempt to transfer a sequence of input signals into a sequence of text words. While the input signals for SR are voice signals, those for SLR could be sensor signals from wearable device and/or image sequences from a camera. The similarity indicates that the SR frameworks can also be applicable for SLR. Though, certain strategies need to be used to handle special characteristics of sign languages.
[0006] One major characteristic of sign language is that in the signals of a sign phrases, there are transition parts of signals between every two sequential signs, before the first sign and after the last sign. In previous works for small-vocabulary continuous SLR, the transition parts are normally ignored in modeling. However for larger-vocabulary continuous SLR, modeling transition parts explicitly has been proven to be beneficial. The previous works that explicitly model transition signals require transition models structurally different from sign models and significantly increase the complexity of recognition. Consequently, improvements to SLR systems and methods that improve the recognition of sign language gestures for sign languages with a wide range of vocabulary sizes would be beneficial.
[0007] For sign languages, there are other special characteristics related to the non-dominant hand. The first characteristic is that the dominant hand and the non-dominant hand are not equally important for sign languages. For example, frequently used sign-language words normally only involve the dominant hand in signing. The second one is that for the dominant- hand-only signs, the non-dominant-hand signals may not be stable and are influenced by the neighborhood signs or even additional actions. Strategies to handle these characteristics related
to the non-dominant hand are thus beneficial for sign language recognition. SUMMARY
[0008] The method includes generation of explicit models for the transition signals before, between, and after signs within a HMM framework. In traditional speech recognition, an HMM based framework trains HMMs for phonemes as well as for the short pause between words and the silence between phrases. The speech recognition HMM then connects the phoneme HMMs into word HMMs and connects the word HMMs with the short pause/silence HMMs into a decoding network. Recognition is conducted by searching for the path with the highest probability in the decoding network given the input signals. FIG. 2 illustrates a prior art decoding network 200 for SR along with the HMM structures.
[0009] The HMM framework for SLR differs from existing SR frameworks. The system generates an HMM framework for SLR through (1) training a HMM for each gesture that forms an individual sign; (2) training a transition HMM for the transition signals between two continuous signs within a larger phrase, a start HMM for the transition signals before the first sign of sign phrase begins, and an end HMM after the last sign of a phrase finishes; and (3) connecting the sign and transition HMMs into phrases with the transition/st/end HMMs into a decoding network similar to that of SR, as illustrated in FIG.3. In this way, the system generates an HMM-based recognition framework for the task of SLR. The SLR framework generates explicit models of transition signals to reduce the influence from the neighborhood signs on the modeling of signs, which improves the recognition accuracy.
[0010] The SLR system also incorporates features that are extracted from the dominant hand and non-dominant hand into the training of HMMs. While traditional speech recognition comes
from the single voice of a user, the input of the SLR system comes from two channels, which correspond to the movements of the left and right hands. One characteristic of sign language is the usages of dominant and non-dominant hands are unbalanced. The dominant hand is used in most signs, while the non-dominant hand appears less frequently and mainly plays an assistant role compared with the dominant hand. This characteristic indicates that incorporating dominant- hand and non-dominant-hand information in a distinguishing way may be beneficial. The system optionally employs two embodiments to only keep key information from the non-dominant hand in modeling and thus make the SLR system focus more on the dominant hand. In one
embodiment, the system only includes features from the non-dominant hand related to the hand global angles in modeling.
[0011] Experiment results indicate that compared with equally including left/right hand information, the first approach effectively improve the recognition accuracy while the second approaches provides comparable accuracy. One additional advantage is that using the two approaches, the non-dominant-hand device (e.g., digital gloves) to collect sign signals can be substantially simplified, only requiring one or several sensors to get the hand global angles during sign movements.
[0012] The SLR systems and methods described herein also model the non-dominant hand variances for dominant-hand-only signs. The non-dominant hand variance models are designed to handle the special characteristic of sign language that the non-dominant hand inputs of dominant-hand-only signs are influenced by the neighborhood signs or even additional actions. The system design includes a Tying-with-Transition (TWT) solution to reduce the impact of this phenomenon on recognition performance. One embodiment of a training and sign language recognition method ties the non-dominant-hand related parameters of the transition model with
those of the models for dominant-hand-only signs, and updates the decoding network by including the tied model as an alternative option for each dominant-hand-only sign model in the decoding network if this sign follows a double-hand sign. The advantage of adopting this solution is an increase in the robustness of the SLR system.
[0013] As mentioned in the last two paragraphs of the last section, previous works in SLR either did not explicitly model transition signals or modeled transition signals with the need to introduce complex recognition procedures. Prior art SLR systems do not distinguish between the left and right hand gestures as described herein. The SLR systems and methods described herein differ from previous works.
[0014] In one embodiment, a training method for automated sign language recognition has been developed. The method includes receiving, with an input device, a plurality of training inputs corresponding to a plurality of predetermined sequences of signs from hand movements and postures of a user, extracting, with the processor, a first set of features from the training inputs corresponding to hand movements and postures of the user for each sign in the plurality of predetermined sequences of signs, generating, with the processor, a first Hidden Markov Model (HMM) based on the first set of features from the training inputs corresponding to hand movements and postures for each sign in the predetermined sequences of signs, extracting, with the processor, a second set of features from the training inputs corresponding to hand movements and postures of the user for transitions between signs in the plurality of predetermined sequences of signs, generating, with the processor, a second HMM based on the second set of features from the training inputs and the predetermined sequences of signs, extracting, with the processor, a third set of features from the training inputs corresponding to hand movements and postures of the user for starting and ending each predetermined sequence of signs in the plurality of
predetermined sequences of signs, generating with the processor a third HMM based on the third set of features from the training inputs and the predetermined sequences of signs, and storing, with the processor, the first HMM, second HMM, and third HMM in a memory for recognition of additional signs received from the input device.
[0015] In another embodiment, a method for automated sign language recognition has been developed. The method includes receiving, with an input device, an input based on a plurality of hand movements and postures of a user that correspond to a sequence of signs, extracting, with a processor, a plurality of features from the input corresponding to the plurality of hand movements and postures, identifying, with the processor, a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory, identifying, with the processor, a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory, and generating, with an output device, an output corresponding to the first sign from the input.
[0016] In another embodiment, a system for automated sign language recognition has been developed. The system includes an input device configured to receive input corresponding to a plurality of hand movements and postures from a user corresponding to a sequence of signs, an output device, a memory, and a processor operatively connected to the input device, the output device, and the memory. The processor is configured to receive an input from the input device based on a plurality of hand movements and postures of a user that correspond to a sequence of sign, extract a plurality of features from the input corresponding to the plurality of hand movements and postures, identify a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory, identify a first sign in the input based on a second set of features in the plurality of
features and a second HMM stored in the memory, and generate an output with the output device corresponding to the first sign from the input.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] FIG. 1 is a schematic diagram of a system that performs automated sign language recognition.
[0018] FIG. 2 is a diagram of a prior art Hidden Markov Model (HMM) decoding network that is used in speech recognition.
[0019] FIG. 3 is a diagram of an HMM decoding network that is used in sign-language recognition.
[0020] FIG. 4 is a diagram depicting hand movements and postures in a sign language phrase.
[0021] FIG. 5 is a depiction of an input device that includes gloves with sensors to detect the motion, position, orientation, and shape of the hands of a user who provides sign language input to the system of FIG. 1.
[0022] FIG. 6 is a drawing of the structure of a human hand and different locations in the hand for sensors to detect the motion and shape of the individual fingers in the hand and the motion, position, and orientation of the entire hand.
[0023] FIG. 7 is a diagram of coordinate on each sensor in a three-dimensional space used during feature extraction for input data corresponding to movements of the hands of a user in the system of FIG. 1.
[0024] FIG. 8 is a block diagram of a process for training HMMs in the system of FIG. 1.
[0025] FIG. 9 is a block diagram of a process for performing sign language recognition using the system of FIG. 1.
DETAILED DESCRIPTION
[0026] For the purposes of promoting an understanding of the principles of the embodiments disclosed herein, reference is now be made to the drawings and descriptions in the following written specification. No limitation to the scope of the subject matter is intended by the references. The present disclosure also includes any alterations and modifications to the illustrated embodiments and includes further applications of the principles of the disclosed embodiments as would normally occur to one skilled in the art to which this disclosure pertains.
[0027] As used herein, the terms "sign phoneme" or more simply "phoneme" are used interchangeably and refer to a gesture, handshape, location, palm orientation, or other hand posture using one or two hands that corresponds to the smallest unit of a sign language. Each sign in a predetermined sign language includes at least one sign phoneme, and as used herein the term "sign" refers to a word or other unit of language that is formed from one or more sign phonemes. In some instances a single sign phoneme corresponds to a single word, while in other instances multiple phonemes form a single word depending upon the conventions of the sign language. Many signed words can use the same phoneme or different phonemes more than once in a sequence to form a single word or express an idea in the context of a larger sentence or phrase. As used herein, the term "transition" refers to the movements of the hands that are made between individual signs, such as between sequences of sign phonemes that form individual words or between the phonemes within a single sign. Unlike a sign phoneme, a transition movement is not a typical unit of a sign language that has independent meaning, but transition movements still provide important contextual information to indicate the end of a first word or phoneme and the beginning of a second word or phoneme. As used herein, the term "st/end"
refers to hand movements and postures of the user for starting and ending a sequence of signs, including predetermined sequences of signs that are used in a training process for sign language recognition and for non-predetermined sequences of signs that an automated sign language recognition system detects during operation. As used herein, the term "phrase" refers to any sequence of signs that include some form of start and end hand motions along with motions for at least one individual sign. In phrases that include multiple signs, a transition hand movement separates individual signs within the phrase. Phrases can correspond to any sequence of interrelated signs such as signs corresponding to multiple words in a complete sentence or shorter word sequence. A phrase can also include, for example, sequences of numbers or individual letters in a predetermined sign language.
[0028] As used herein, the term "dominant hand" refers to a hand of the user that performs a greater number of movements when producing signs. The term "non-dominant hand" refers to the other hand that performs fewer movements. Many sign language conventions specify that dominant hand performs most of the movements to form signs while the non-dominant hand performs fewer gestures to form the signs. In many instances the dominant hand is the right hand of the user while the non-dominant hand is the left hand, although the roles of the hands may be reversed for some signers, such as left-handed signers, or for different sign language conventions where the left hand is the dominant hand for forming signs.
[0029] As described below, a sign language recognition (SLR) system records data based on the hand gestures of a user who produces one or more signs in at least one form of recognized sign language. The system receives input using, for example, gloves that include motion and position sensors or from recorded video data of the hands while the user performs sign gestures. The system also includes one or more digital processing devices, data storage devices that hold a
system of Hidden Markov Models (HMMs) that are used to recognize sign language gestures, and an output device. The digital processing devices analyze recorded data from the gestures and identify the meanings of words in the sign language from one or more hand gestures that correspond to different sign phonemes and with reference to transitions movements as the user repositions his or her hand to form different sign phonemes in one or more words. In some embodiments, the system also distinguishes between left-hand and right-hand gestures and analyzes the movements of both hands since some forms of sign language use the left and right hands for different purposes. The output device generates a text or audio translation of recognized signs, and the output can be provided to other humans for translation or additional computing systems receive the input to enable automated sign language recognition as a human machine interface.
[0030] FIG. 1 depicts an embodiment of a system 100 that performs automated sign language recognition. The system 100 includes a processor 104, memory 108, one or more input devices 132, and one or more output devices 136. The processor 104 is, for example, a digital microprocessor that includes one or more central processing unit (CPU) cores and optionally one or more graphical processing unit (GPU) units, digital signal processors (DSPs), field programmable gate arrays (FPGAs), and application specific integrated circuits (ASICs) for processing data from the input devices 132 and generating output using the output devices 136. In one embodiment of the system 100 that is described in more detail below, the input devices 132 include two gloves that the user 102 wears while producing sign language input. The gloves include accelerometers and gyroscope sensors that provide input data tracking the position, movement, orientation, and hand shape of both the dominant hand and non-dominant hand of the user 102 as the user 102 performs different sequences of signs.
[0031] The memory 108 includes one or more volatile memory devices such as dynamic or static random access memory (RAM) and one or more non-volatile memory devices such as a magnetic disk or solid state storage device that store program instructions 1 10, model data, and other forms of data for operation of the system 100. In the embodiment of FIG. 1, the memory 108 includes a sign language phrase model 112 that is formed from three classes of different Hidden Markov Models (HMMs) including a phrase st/end (st/end) model 116, individual sign models 120, and a sign transition model 124. The combined HMMs 116 - 124 form a sign language recognition decoding network 300. The overall sign language phrase recognition model 112 uses the individual HMMs 116 - 124 in the decoding network 300 to identify the beginning and end of phrases, individual signs within a phrase, and transitions between signs in multi-sign phrases in features that are extracted from the input received from the input devices 132. The sign language phrase recognition model in the memory 108 also includes a language model 128 that converts identified signs from the sign language phrase recognition model 112 into phrases/sentences for one or more languages, such as Chinese or American Sign Language phrases/sentences. The memory 108 also stores training data 130 that the processor 104 uses to generate the HMMs in the sign language phrase recognition model 112.
[0032] During operation, a user 102 provides sign language input to the system 100 using the input device 132. In a training mode, the user 102 provides sign inputs to the input devices 132 for one or more predetermined sequences of signs that the processor 104 stores with the training data 130 in the memory 108. As described in more detail below, the processor 104 uses the predetermined structure of the sign sequence and observed features that are extracted from the training input data to train the multiple HMMs for the phrase st/end model 116, individual sign models 120, and sign transition model 124. In a recognition mode, the user 102 provides sign
inputs to the system 100 using the input devices 132 and the system 100 uses the trained HMMs 116 - 124 along with the language model 128 to identify sequences of signs based on the hand movements and postures of the user 102. The system 100 generates an output using the output devices 136 to, for example, generate an audio output in a predetermined language based on the identified signs in each phrase. In other embodiments, the output device 136 is a display screen or network adapter that displays or transmits, respectively, a text representation of the identified signs. In still other embodiments, the system 100 receives the signs as command inputs and the processor 104 executes stored program instructions 110 based on the identified command inputs to produce an output using one or more of the output devices 136.
[0033] The sign language recognition (SLR) system 100 trains HMMs and builds a decoding network in a similar way as conventional HMM-based speech recognition system. The SLR system views every unique basic sign as a sign phoneme, and trains a HMM for it. Each sign word can thus be modeled by connecting the HMMs of the component sign phonemes, just like speech words are modeled by connecting voice phoneme HMMs. The decoding network is constructed by connecting sign word models using a language model, which can be grammars, n- grams, or other types of language model. The special treatment needed for SLR lies in the modeling of transition. The system trains a HMM, called the transition model, to model the transition signals between neighboring signs. A start (st) HMM and an end HMM are also trained to model the transition signals in a phrase before the first sign and after the last sign respectively. The system then modify the decoding network by inserting the transition model between neighboring sign phonemes, inserting the start model before phrase start nodes and inserting the end model after the phrase end nodes. FIG. 3 illustrates the decoding network along with the HMM structures in the system 100 in more detail. In FIG. 3, an HMM-based SLR decoding
network 300 incorporates all of the HMMs, including sign HMMs 120, the transition HMM 124, and the st/end HMM 116. All of these HMMs adopt a left-to-right HMM structure. The HMM decoding network 300 uses a fixed number of states within the sign HMM 120 to encode each of the potential sets of features for different sign phonemes. The numbers of states are tuned for the sign HMM 120, the transition HMM 124, and the st/end HMMs 116, respectively.
[0034] FIG. 8 is a block diagram of a training process 800 for training HMMs that identify features corresponding to hand movements and postures for each sign, transition between signs, and start and end hand movements and postures for predetermined sequences of signs corresponding to multiple phrases in a training process. As set forth below, a description of the process 800 performing an action or function refers to the operation of a processor to execute stored program instructions to perform the function or action in association with other components in the automated sign language recognition system. The process 800 is described in conjunction with the system 100 of FIG. 1 for illustrative purposes.
[0035] Process 800 begins as the system 100 receives predetermined sequences of signs in a sign language from the input device 132 including generated sensor data from both the dominant and non-dominant hands of the user 102 or other users who perform the training process (block 804). In some embodiments, each sequence of sign language inputs includes a set of phonemes that correspond to a single word in a predetermined sign language, but the inputs also include start and end features from the user and transitions between sign phonemes. FIG. 4 depicts an example of an input with a sequence of individual signs that form one predetermined sequence during the training process. In some embodiments of the process 800, each of the signs in FIG. 4 forms one input sequence to the training process 800. Since signs are formed from individual phonemes, the training process 800 does not require input for all possible words and phrases in
the language, but the system 100 receives a representative sample of the sign phonemes, st/end signals, and transitions during the process 800 to enable recognition of sign language words that are not directly included in the training data inputs. In FIG. 4 the sequence 400 includes a phrase formed from four individual signs for the words "at" (sign 404), "where" (sign 408), and two signs that specify the word "register" (signs 412 and 416). In addition to the hand movements and/or postures that form the signs, the system 100 also receives input data for the hand movements and postures that the user performs at the beginning and end of the predetermined sequence along with transition movements for the hands that occur between the signs in the sequence. As used herein, the terms "hand posture" or more simply "posture" refer to the orientation and shape of the hand, including either or both of the palm and fingers, that conveys information in sign phonemes along with the motion of the hand and fingers on the hand. During the process 800 the system 100 receives multiple inputs for a wide range of phrases from one or more users to provide a set of training data that not only provide training information for the recognition of individual signs, but further include input data corresponding to the st/end hand motions for phrases and for transitions between signs within the phrases.
[0036] During the process 800, the system 100 collects training inputs that include a certain number of samples are collected for each of the sign words in the vocabulary, including both signs that include a single phoneme and multi-phoneme signs. To make a full use of the captured data, the system 100 classifies the input data in several segments, i.e., the component signs, preceded by the start part (i.e., from the beginning to the start of the first sign), followed by the end part (i.e., from the finishing point of the last sign to the end), and the transition parts. Each sign phoneme model can thus be trained on all the segments of the focused signs in the training data, while the transition, start, and end models can be trained on the corresponding segments,
respectively. The main advantages of this data collection methodology is as follows: (i) when the vocabulary size increases, the need in training data increases only linearly; (ii) every word in the vocabulary, including uncommon words, can be robustly trained; and (iii) it is feasible to label word samples to provide reliable segmentation information for training, which is especially variable for noisy real-world data.
[0037] As described above, in one embodiment the system 100 uses gloves with sensors that record acceleration and hand angle position information for the dominant and non-dominant hands of the user. FIG. 5 depicts examples of gloves 500 that are worn during the process 800 and during subsequent sign language recognition processing. The gloves 500 include multiple sensors, such as micro-electromechanical systems (MEMS) sensors that record the altitude, orientation, and acceleration of different portions of the dominant and non-dominant hands of the user as the user performs hand movements/postures to form the signs in the predetermined sequence. FIG. 6 depicts a more detailed view of one embodiment of the sensors 600 in a glove for a right hand. The sensors 600 include multiple sensors 604A and 604B that generate data for the position and movement of the phalanges (fingers) with the sensors 604A on the fingers and 604B on the thumb. The sensors 600 also include sensors 608 that generate data for the position and movement of the metacarpal region (palm) of the hand and a sensor 612 that measures the position and movement of the carpal region (wrist) of the hand. While FIG. 6 depicts a sensor arrangement for the right-hand, the system 100 includes a glove with a similar sensor arrangement for the left-hand as well.
[0038] Referring again to FIG. 8, the process 800 continues as the system 100 performs feature extraction to identify the features that correspond to individual signs, sign transitions, and the st/end movements in the input corresponding to the predetermined sequence of sign phonemes
(block 808). The feature extraction process enables the system 100 to identify characteristics of the sensor data for the hand movement/posture inputs in a uniform manner to train the HMMs 116 - 124 for subsequent sign language recognition operations. In some embodiments, the system 100 receives manually inputted timestamps that categorize different sets of features for hand movements and postures that correspond to the st/end of the input, to different
predetermined sign phonemes in each input sequence, and to the transitions between sign phonemes. Additionally, the system 100 optionally records time information for the
predetermined sequence, which includes recorded data for hand movements/postures that occur in a predetermined order, to distinguish between the different sets of hand movements and postures for the feature extraction process. As described in more detail below, the extracted features include both finger related (FR) features that correspond to the movements and shapes of individual fingers and to global angle (GA) features that correspond to the overall angles and movements of the hands. In some embodiments the system 100 extracts different sets of features for the input data from the dominant hand and the non-dominant hand. The extracted features for both hands form the basis of training data to train the HMMs 116 - 124 in the system 100.
[0039] For the dominant and non-dominant hands, the feature set can be grouped into two categories: finger related and global angle related, which are referred to as FR features and GA features, respectively, herein. The FR features include the closeness of each finger, the openness of the thumb, and the deltas (i.e., the differences between the current and the previous frames) of these features. The GA features refer to the posture of the hand and are derived from the global polar angles, x, y and z, for a hand. The features include trigonometric functions of the angles and the delta of the angles for the hand as the hand moves to different positions and orientations to produce signs, inter-phoneme transitions, and phrase start/end gestures and postures.
[0040] Given a series of sampled angular velocity values given by sensors at a time t, as s , and the angle rotated in each sample period can be estimated by . = + 2 — (t - If the rotation quaternion is qs,h and given some initial attitude quaternion, then the current orientation after M accumulated rotations is 1 M ps s M , where Qs M = q ~ and "" The processor 104 identifies the rotation and angular position of both the dominant and non-dominant hands using the Hamilton product of the quaternions based on the different sensor inputs and features that are generated at different times as the system 100 receives input from the user 102. In the process 800, the system 100 uses the quaternions identified by angular velocity and acceleration to represent accumulated rotation and positions for the angles of both hands in three-dimensional spaces, such as the three-dimensional space in which the hands move to produce the training sequences, in a simple manner that avoids gimbal lock and since quaternion operations are mathematically non-commutative, which accurately represents the physical characteristics of the rotational movements of the hands of the user.
[0041] FIG. 7 depicts components of the quaternions that the process 800 uses to extract the features. FIG. 7 depicts three spatial axes of movement xs, ys, and zs for the hand of the user 102 while forming sign phonemes or transitions. The axes xg, yg, and zg indicate a predetermined absolute coordinate system and the orientation of the hand changes within the absolute coordinate system. The sensors in the input device gloves 132 that the user 102 wears to provide input to the system 100 are In FIG. 7, the downward direction of gravity serves as an initial attitude unit quaternion, ps. The user 102 hangs his or her arms with hands and fingers extended downward in parallel to the direction of gravity (shown as gs in FIG. 7) in the direction of the quaternion, ps at the beginning of each training sign input sequence, to enable each initial
quaternion ps in the input data to be aligned with the single reference quaternion for gravity without requiring complex calibration for the individual bones in the fingers and other portions of the hands of the user.
[0042] Given a sampled acceleration of as'' , the system 100 estimates the gravity using the mean of the first k samples, g s = . The
system 100 estimates the directions of the finger bones pointing in a global coordinate space, as = ~ , PSQS fl a , where q0 is the rotation from the initial sensor coordinate to the global coordinate according to the sensor coordinate.
[0043] As mentioned above, the system 100 extracts two types of features from the input data. The first type of feature corresponds to features of the hand shape (finger related, or FR), and the second type of feature corresponds to the hand direction (global angle of hand, or GA). In one embodiment, the FR feature for the hand shape is the cosine distance between the directions of adjacent finger bones, A s' r = os .■ or . . For instance, if s = '2c' and r = '2b', as depicted in the hand configuration of FIG. 6, then ^,. describes a rotation about the proximal interphalangeal joint (the second far end joint of the index finger). The system 100 extracts the FR features for each finger. In some embodiments, the system 100 only extracts the FR features for individual fingers only in the dominant hand while extracting the GA features for both the dominant and non- dominant hands as described in more detail below. In other embodiments, the system 100 extracts both the FR and GA features for both the dominant and non-dominant hands.
[0044] To extract the GA features for the direction of the hand, more specifically the direction of the palm, the system 100 identifies the mean of the directions of sensors that are associated with the palm of the hand, such as sensors '2a', '3a', '4a' and '5 a' in FIG. 6. The system 100
extracts a feature that fully describes the direction of the hand in space based on two perpendicular directions, a first direction along the palm plan and a second direction along the normal of the palm plan. Additionally, the system 100 optionally identifies velocity and acceleration features of the finger bones or of the palm of the hand by identifying multiple rotational positions over time. The system 100 uses the quaternion representation of σ '' including multiple measurements from the sensors generated at series of predetermined sampling times instead of the single measurement ps to estimate the accumulated rotation of either or both of the dominant and non-dominant hands.
[0045] Process 800 continues as the system 100 uses the extracted feature data from multiple predetermined sequences of sign inputs to train the HMMs 116 - 124 (block 812). The processor 104 generates the HMM 116 for the features that are extracted from the input data corresponding to the start end hand movements and postures, the HMM 120 for the features that are extracted from the input data corresponding to the sign hand movements and postures, and the HMM 124 for the features that are extracted from the input data corresponding to the transition hand movements and postures. To generate each of the HMMs, the system 100 performs a training process to generate the HMMs and the system 100 based on training inputs from the user 102 and the predetermined sign input sequences in the training data 130.
[0046] For sign languages, a word may contain one or more sign phonemes. The system 100 generates the HMM 120 for the sign phonemes to accurately model words based on three requirements. First, words have to be modeled in order to evaluate the feature probability, P(X\ W). Second, language models have to be used to calculate the prior sentence probability, P(W). Third, the word models and language models have to be combined to form networks of HMM states in which the decoding for the most likely word string W* in the search space can
be performed. During the process 800, the processor 104 performs the training process to generate the HMM 120 for the sign phonemes based on the observed features for the signs conditioned upon the signs in the predetermined training sequences of signs. The processor 104 similarly performs the training process for the transitions 124 based on the observed features for the transitions conditioned upon the transitions between signs in the predetermined training sequences of signs and the training process for the phrase st/end hand movements 116 based on the observed features for the st/end hand movements conditioned upon the st/end hand movements in the predetermined training data 130.
[0047] The system 100 uses the extracted features from the feature data as a plurality of different feature vectors that form the inputs for the training processes of each of the separate HMMs including the st/end HMM 116, sign HMM 120, and transition HMM 124. The feature vectors include selected features from both the dominant hand and the non-dominant hand. The feature vectors include features from both hands that are recorded concurrently to map the movements of both hands that occur concurrently while the hands of the user 102 move to form signs and st/end or transition movements.
[0048] In one embodiment, the system 100 trains the sign phoneme models 120 based on the segments of the corresponding signs. Alternative training methods are based on the detailed properties of the start and end segments, which may vary based on different contexts for receiving training phrases. To normalize the start and end of each phrase during the training process, one training process embodiment require the signers to put both hands down before and after signing each phrase. The start segment for each phrase contains static hand position and posture signals followed by movement signals, while every end segment contains movement signals which may or may not be followed by static signals. The movement signals can be
viewed as a type of transition. During the process 800, the system 100 labels each start segment into a static segment and a transition segment, and label each end segment into a transition segment and a static segment, if it exists. The training process for all of the HMMs in the decoding network 300 includes the start and end segments with both the static segments for hand posture and the dynamic segments for hand movement. The system 100 generates the start model HMM 116 by connecting the static model with the transition model. The end model HMM is includes two alternatives, one connecting the transition model with the static model, and the other only including the transition model. In the system 100, the model HMMS 116 - 124 are diagonal covariance Gaussian mixtures, and the system 100 re-estimates the HMMs iteratively after initialization as well as after splitting the Gaussian mixtures.
[0049] The training process implicitly includes Gaussian splitting that occurs within different sets of hand motions in the separate HMMs during the iterative process described above. For example, through the standard HMM training procedure of iterative re-estimation and Gaussian splitting, the transition model may contain multiple Gaussian mixture components in each state and thus implicitly perform classification on the transition hand movements and postures using the transition HMM 124 that occur between different signs that are encoded in the separate HMM 120. During process 800 the system 100 trains a "universal" transition HMM 124 that can identify the transitions that occur between multiple signs and does not require the system 100 to be trained to recognize tightly connected two-sign and three-sign sequences to accurately recognize sign language input. The st/end HMM 120 also servers a similar role to enable the system 100 to identify the features that correspond to the start and end of each phrase.
[0050] Process 800 continues as the processor 104 stores the separate HMMs 116 - 124 in the memory 108 (block 816). As described in more detail below, during a sign language recognition
process the system 100 uses the trained HMMs to recognize sequences of signs that a user inputs to the system 100 using the input devices 132. The trained HMMs can of course recognize sequences of signs that correspond to the training data, but the system 100 is also configured to recognize a wide range of sign language sequences that do not exactly match the training sign sequences since the processor 104 uses the language model 128 to build a decoding network 300 that includes the separate HMMs to recognize individual signs and the transitions between signs with many different combinations that are not necessarily included in the training data directly.
[0051] In another embodiment of the process 800, the system 100 performs a different feature extraction process and HMM generation process for the processing of blocks 812 and 816 in the process 800. In the other embodiment, the system 100 employs a Tying-with-Transition (TWT) solution to enhance the robustness of the sign language recognition system towards the dominant-hand-only sign phoneme that immediately follows a double-hand sign in a given sign- sequence phrase. For dominant-hand-only signs, the non-dominant hand should involve no movements by definition, but in reality, may move due to the neighboring signs or additional action.
[0052] In the TWT embodiment, the system 100 separates the dominant-hand and non- dominant-hand features for extraction during the processing of block 808 in two data streams for HMM modeling. All the HMM models 116-124 are then trained using the same training procedures described above based on the two data streams, which results in two sets of Gaussian mixtures in each HMM state for the dominant-hand and non-dominant-hand related data streams respectively. Thus, in the TWT embodiment the processor 104 generates the HMM models 120 for signs based on a first portion of the first set of features from the training inputs corresponding to hand movements and postures for a dominant hand of the user that form the sign phonemes
during the training process and based on a portion of the second set of features from the training data corresponding to the hand movements and postures of a non-dominant hand of the user to perform transitions during the training process.
[0053] The system 100 generates an additional enhanced HMM model for each of the dominant-hand-only sign model based on the trained HMM models for both hands by replacing the non-dominant-hand related Gaussian mixtures in the focused sign model with those non- dominant-hand related Gaussian mixtures in the transition model 124. Note that in this way, the non-dominant-hand variances are thus covered for the dominant-hand-only signs by the enhanced sign models. Both the enhanced dominant-hand-only sign models and the originally trained models (i.e., st/end model 116, sign models 120, and transition model 124) are utilized in block 816 for the TWT solution. The decoding network 300 is modified to incorporate the enhanced dominant-hand-only sign models based on the neighboring signs. If a sign in the decoding network 300 is a dominant-hand-only sign and the preceding sign in the same path involves the non-dominant hand, this sign model in the decoding network 300 is replaced with a confusion set of two parallel models, which include the corresponding enhanced model and the original model for this dominant-hand-only sign. The recognition procedure based on this modified decoding network remains the same.
[0054] FIG. 9 depicts a process 900 for automated sign language recognition. As set forth below, a description of the process 900 performing an action or function refers to the operation of a processor to execute stored program instructions to perform the function or action in association with other components in the automated sign language recognition system. The process 900 is described in conjunction with the system 100 of FIG. 1 for illustrative purposes.
[0055] Process 900 begins as the system 100 receives dominant hand and non-dominant hand input from the user 102 via the input devices 132 (block 904). As described above, in one embodiment the input devices are the gloves 500 that include sensors to record the hand movements and postures for the user 102 as the user 102 performs hand movements and postures corresponding to a sequence of signs.
[0056] Process 900 continues as the system 100 extracts features from the input data (block 908). The system 100 performs the feature extraction processing of block 908 in substantially the same manner as the feature extraction that is described above in conjunction with the processing of block 808 in the process 800. For example, the processor 104 extracts the same FR and GA features for the input data from the dominant hand and extracts the GA features and optionally the FR features for the input data from the non-dominant hand in the same manner as in the process 800 to produce feature vectors in the recognition process in the same manner as during the earlier training process.
[0057] Process 900 continues as the system 100 analyzes the extracted features from the input to identify a start of a sequence of at least one sign that forms a phrase (block 912). In the system 100, the processor 104 uses the st/end HMM 116 in the sign language recognition model 112 to identify features that correspond to the hand movements and postures associated with beginning a new sequence of signs. The processor 104 uses the same sets of extracted features for recognition using the HMM models in the decoding network 300 as are used during the training process of FIG. 8, such as sets of FR and GA features for the dominant hand and GA features for the non-dominant hand or the FR and GA features of the dominant hand in combination with the transition features of the non-dominant hand in the TWT embodiment. As indicated in FIG. 1, the HMMs 116 - 124 are linked to each other so after detection of the start of a sequence, the
processor 104 analyzes the next set of features that occur in the time interval following the detection of the initial start feature with a high probability weighting toward the features corresponding to the phonemes in a given sign (block 916). The processor 104 uses the HMM 120 to identify the next sign in the sequence based on the next set of features that are extracted from the input sequence.
[0058] During process 900, the sign input data from the user 102 includes either a transition to a new sign or an input that signifies the end of a phrase. For example, in a multi-sign phrase the user performs a series of signs with transitions between the signs until the phrase is completed, while in a single-sign phrase the user produces an end of phrase motion after the sign. During process 900, the processor 104 uses both the transition HMM 124 and st/end HMM 116 to identify a set of features that precede and follow a sign in the input data (block 920). In some embodiments, the system 100 interprets an entire phrase of sign inputs as a whole, and the processor 104 uses the HMM network 300 in the sign language recognition model 112 to identify input based on the maximum likelihoods of word sequences based not only on a sequence of words extending forward in time but also backward in time once the system 100 has received the entire phrase. The processor 104 identifies a path using the HMM network 300 and a maximum a posteriori (MAP) estimation process with a Bayes decision formulation to identify the sign phonemes and sequences of signs in the trained HMMs that have the greatest likelihood of matching the extracted features from the input data to determine if a set of features corresponds to a transition between signs or the end of a phrase and determines the transition or end of phrase. Additionally, in some configurations the processor 104 also identifies the separation in time between the detection of features, which may indicate the likelihood of the features corresponding to a brief transition with a short time interval between signs or the end of
a phrase with a somewhat longer time interval between signs. If the system 100 detects a transition but not the end of the sequence in the phrase (block 924), then the process 900 continues as the system 100 identifies the next sign in the sequence (block 916).
[0059] The processing of blocks 916 - 924 continues until the processor 104 identifies features that correspond to the end of the sequence of signs in the phrase (block 924). The recognition procedure of the detected phrase (block 928) simultaneously generates multiple hypothesized sign sequences, which are the top-ranked paths with the highest P(X\ W) P(W) values (as in the formula above) in the decoding network 300 given the inputs related to the phrase, using the standard HMM decoding technology. During the process 900 the system 100 performs a maximum a posteriori (MAP) process with a Bayes decision formulation to identify the paths in the HMM decoding network 300 for recognition of the sign input from the user 102. The system 100 conducts sign language recognition based on the following formula:
W * = arg m ax P (W ) = arg m ax ( X \ W ) P ( W )
Where X={xi, x2, ···, XT] is a sequence of T observed feature vectors corresponding to the unknown input sentence, W={w1} w2, wM) is any sequence of M signs into which the input may be transcribed, and is the set of all allowable phrases representing the search space of the language model (e.g., grammars). As described above, a sign phoneme here refers to the smallest units that bear linguistic meaning in a predetermined sign language.
[0060] In the system 100, the memory 108 stores HMMs in the decoding network 300 and includes all the recognized sign-language phonemes along with st/end and transition gestures, for recognition of phrases that are supported by the language model for inputs from the user 102, which can be grammars, n-gram models, or other types of language model. The hypothesized sign sequence with the highest P(X\ W) P(W) value among the multiple hypotheses generated is
used as the result of the sign language recognition procedure for the detected phrase. Block 928 further translates that sign sequence from a predetermined sign language into one or more different spoken or written languages such as Chinese or English. To ensure a proper grammatical output, the translation procedure may include reordering the words in the output from the actual order of the signs as entered by the user 102 or adding additional articles, conjunctions, prepositions, and other words that are not directly included in the sign language to the final output.
[0061] After receiving one or more phrases, the system 100 generates an output corresponding to the identified words from the sign language input (block 932). As described above, in some embodiments the processor 104 generates synthesized speech for the words in the output using a loudspeaker or other audio output device 136. In this mode, the system 100 can produce audio output for recipients who can hear but are unable to understand the signs from the user 102. In another configuration, the output device 136 includes a display device that generates a text output corresponding to the signs from the user 102. In still another configuration, the system 100 either performs a command based on the sign inputs from the user 102. For example, in some embodiments the system 100 is further connected to a home automation system or other system that normally receives speech input, but in the configuration of FIG. 1 the system can accept sign language input to provide similar functionality to users who communicate via sign language instead of spoken words.
[0062] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems, applications or methods. Various presently unforeseen or unanticipated alternatives,
modifications, variations or improvements may be subsequently made by those skilled in the art that are also intended to be encompassed by the following claims.
Claims
1 . A training method for automated sign language recognition comprising:
receiving, with an input device, a plurality of training inputs corresponding to a plurality of predetermined sequences of signs from hand movements and postures of a user;
extracting, with the processor, a first set of features from the training inputs corresponding to hand movements and postures of the user for each sign in the plurality of predetermined sequences of signs;
generating, with the processor, a first Hidden Markov Model (HMM) based on the first set of features from the training inputs corresponding to hand movements and postures for each sign in the predetermined sequences of signs;
extracting, with the processor, a second set of features from the training inputs corresponding to hand movements and postures of the user for transitions between signs in the plurality of predetermined sequences of signs;
generating, with the processor, a second HMM based on the second set of features from the training inputs and the predetermined sequences of signs;
extracting, with the processor, a third set of features from the training inputs corresponding to hand movements and postures of the user for starting and ending each predetermined sequence of signs in the plurality of predetermined sequences of signs;
generating with the processor a third HMM based on the third set of features from the training inputs and the predetermined sequences of signs; and
storing, with the processor, the first HMM, second HMM, and third HMM in a memory for recognition of additional signs received from the input device.
2. The method of claim 1 , the receiving of the plurality of training inputs further comprising:
receiving, with the input device, a first plurality of inputs from a dominant hand of the user corresponding to the predetermined sequences of signs;
receiving, with the input device, a second plurality of inputs from a non-dominant hand of the user corresponding to the predetermined sequences of signs.
3. The method of claim 2, the receiving of the training inputs further comprising
receiving, from a first plurality of sensors in a glove worn on the dominant hand of the user, a first set of finger position data corresponding to at least two fingers in the dominant hand of the user;
receiving, from the first plurality of sensors in the glove worn on the dominant hand of the user, a first set of hand angle data corresponding to an angle of the dominant hand of the user; and
receiving, from a second plurality of sensors in a glove worn on the non-dominant hand of the user, a second set of hand angle data corresponding to an angle of the non- dominant hand of the user.
4. The method of claim 3, the extraction of the first set of features further comprising: identifying, with the processor, at least one feature in the first set of features as a cosine distance between a first finger and a second finger in the at least two fingers of the dominant hand based on the first set of finger position data.
5. The method of claim 3, the extraction of the second set of features further
comprising:
identifying, with the processor, a first hand angle in the first set of hand angle data generated at a first time, the first hand angle being parallel to a predetermined direction of gravity;
identifying, with the processor, a second hand angle in the first set of hand angle data generated at a second time; and
extracting, with the processor, an angular position of the dominant hand as a feature in the second set of features based on the first hand angle, the second hand angle, and a quaternion.
6. The method of claim 3, the extraction of the third set of features further comprising: identifying, with the processor, a first hand angle in the second set of hand angle data generated at a first time, the first hand angle being parallel to a predetermined direction of gravity;
identifying, with the processor, a second hand angle in the second set of hand angle data generated at a second time; and
extracting, with the processor, an angular position of the non-dominant hand as a feature in the second set of features based on the first hand angle, the second hand angle, and a quaternion.
7. The method of claim 1 , the generation of the first HMM further comprising:
generating, with the processor, the first HMM based on a first portion of the first set of features from the training inputs corresponding to the hand movements and postures for a dominant hand of the user for each sign in the plurality of predetermined sequences of signs and a second portion of the second set of features from the training data corresponding to the hand movements and postures of a non-dominant hand of the user for the transitions between signs in the plurality of predetermined sequences of signs.
8. A method for automated sign language recognition comprising:
receiving, with an input device, an input based on a plurality of hand movements and postures of a user that correspond to a sequence of signs;
extracting, with a processor, a plurality of features from the input corresponding to the plurality of hand movements and postures;
identifying, with the processor, a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory;
identifying, with the processor, a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory; and
generating, with an output device, an output corresponding to the first sign from the input.
9. The method of claim 8 further comprising:
identifying, with the processor, a first transition corresponding to hand
movements and postures that occur between signs in the input based on a third set of features in the plurality of features and a third HMM stored in the memory;
identifying, with the processor, a second sign in the input based on a fourth set of features in the plurality of features and the second HMM stored in the memory; and
10. The method of claim 9 further comprising:
identifying, with the processor, an end of the sequence of signs in the input based on a fifth set of features in the plurality of features and the first HMM stored in the memory; and
generating, with the output device, the output corresponding to the first sign and the second from the input after identifying the end of the sequence of signs.
1 1 . The method of claim 8, the receiving of the input further comprising:
receiving, with the input device, a first plurality of inputs corresponding to a plurality of movements of a dominant hand of the user; and
receiving, with the input device, a second plurality of inputs corresponding to a plurality of movements of a non-dominant hand of the user.
12. The method of claim 1 1 , the receiving of the input further comprising
receiving, from a first plurality of sensors in a glove worn on the dominant hand of the user, a first set of finger position data corresponding to at least two fingers in the dominant hand of the user;
receiving, from the first plurality of sensors in the glove worn on the dominant hand of the user, a first set of hand angle data corresponding to an angle of the dominant hand of the user; and
receiving, from a second plurality of sensors in a glove worn on the non-dominant hand of the user, a second set of hand angle data corresponding to an angle of the non- dominant hand of the user.
13. The method of claim 12, the extraction of the features further comprising:
identifying, with the processor, at least one feature as a cosine distance between a first finger and a second finger in the at least two fingers of the dominant hand based on the first set of finger position data.
14. The method of claim 12, the extraction of the features further comprising:
identifying, with the processor, a first hand angle in the first set of hand angle data generated at a first time, the first hand angle being parallel to a predetermined direction of gravity;
identifying, with the processor, a second hand angle in the first set of hand angle data generated at a second time; and
extracting, with the processor, a position of the dominant hand as a feature based on the first hand angle, the second hand angle, and a quaternion.
15. The method of claim 12, the extraction of the features further comprising:
identifying, with the processor, a first hand angle in the second set of hand angle data generated at a first time, the first hand angle being parallel to a predetermined direction of gravity;
identifying, with the processor, a second hand angle in the second set of hand angle data generated at a second time; and
extracting, with the processor, a position of the non-dominant hand as a feature in the second set of features based on the first hand angle, the second hand angle, and a quaternion.
16. A system for automated sign language recognition comprising:
an input device configured to receive input corresponding to a plurality of hand movements and postures from a user corresponding to a sequence of signs;
an output device;
a memory; and
a processor operatively connected to the input device, the output device, and the memory, the processor being configured to:
receive an input from the input device based on a plurality of hand movements and postures of a user that correspond to a sequence of sign;
extract a plurality of features from the input corresponding to the plurality of hand movements and postures;
identify a start of the sequence of signs in the input based on a first set of features in the plurality of features and a first Hidden Markov Model (HMM) stored in the memory;
identify a first sign in the input based on a second set of features in the plurality of features and a second HMM stored in the memory; and
generate an output with the output device corresponding to the first sign from the input.
17. The system of claim 16, the processor being further configured to:
identify a first transition corresponding to hand movements and postures that occur between signs in the input based on a third set of features in the plurality of features and a third HMM stored in the memory;
identify a second sign in the input based on a fourth set of features in the plurality of features and the second HMM stored in the memory; and
generate the output corresponding to the first sign and the second from the input.
18. The system of claim 17, the processor being further configured to:
identify an end of the sequence of signs in the input based on a fifth set of features in the plurality of features and the first HMM stored in the memory; and
generate the output corresponding to the first sign and the second from the input after identifying the end of the sequence of signs.
19. The system of claim 16, the processor being further configured to:
receive a first plurality of inputs corresponding to a plurality of movements of a dominant hand of the user from the input device; and
receive a second plurality of inputs corresponding to a plurality of movements of a non-dominant hand of the user from the input device.
20. The system of claim 19 the input device further comprising:
a first glove including a first plurality of sensors;
a second glove including a second plurality of sensors; and
the processor being further configured to:
receive, from the first plurality of sensors in the first glove worn on the dominant hand of the user, a first set of finger position data corresponding to at least two fingers in the dominant hand of the user;
receive, from the first plurality of sensors in the first glove worn on the dominant hand of the user, a first set of hand angle data corresponding to an angle of the dominant hand of the user; and
receive, from the second plurality of sensors in the glove worn on the non- dominant hand of the user, a second set of hand angle data corresponding to an angle of the non-dominant hand of the user.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16780832.8A EP3284019A4 (en) | 2015-04-16 | 2016-04-15 | System and method for automated sign language recognition |
CN201680035117.0A CN107690651B (en) | 2015-04-16 | 2016-04-15 | System and method for automated sign language recognition |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562148204P | 2015-04-16 | 2015-04-16 | |
US62/148,204 | 2015-04-16 | ||
US201562173878P | 2015-06-10 | 2015-06-10 | |
US62/173,878 | 2015-06-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016168591A1 true WO2016168591A1 (en) | 2016-10-20 |
Family
ID=57126332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2016/027746 WO2016168591A1 (en) | 2015-04-16 | 2016-04-15 | System and method for automated sign language recognition |
Country Status (4)
Country | Link |
---|---|
US (1) | US10109219B2 (en) |
EP (1) | EP3284019A4 (en) |
CN (1) | CN107690651B (en) |
WO (1) | WO2016168591A1 (en) |
Cited By (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019094618A1 (en) * | 2017-11-08 | 2019-05-16 | Signall Technologies Zrt | Computer vision based sign language interpreter |
CN113822186A (en) * | 2021-09-10 | 2021-12-21 | 阿里巴巴达摩院(杭州)科技有限公司 | Sign language translation, customer service, communication method, device and readable medium |
US11514947B1 (en) | 2014-02-05 | 2022-11-29 | Snap Inc. | Method for real-time video processing involving changing features of an object in the video |
US11546280B2 (en) | 2019-03-29 | 2023-01-03 | Snap Inc. | Messaging system with discard user interface |
US11551374B2 (en) | 2019-09-09 | 2023-01-10 | Snap Inc. | Hand pose estimation from stereo cameras |
US11558325B2 (en) | 2018-01-02 | 2023-01-17 | Snap Inc. | Generating interactive messages with asynchronous media content |
US11595569B2 (en) | 2014-07-07 | 2023-02-28 | Snap Inc. | Supplying content aware photo filters |
US11599255B2 (en) | 2019-06-03 | 2023-03-07 | Snap Inc. | User interfaces to facilitate multiple modes of electronic communication |
US20230097257A1 (en) | 2020-12-31 | 2023-03-30 | Snap Inc. | Electronic communication interface with haptic feedback response |
US11627141B2 (en) | 2015-03-18 | 2023-04-11 | Snap Inc. | Geo-fence authorization provisioning |
US11662900B2 (en) | 2016-05-31 | 2023-05-30 | Snap Inc. | Application control using a gesture based trigger |
US11671559B2 (en) | 2020-09-30 | 2023-06-06 | Snap Inc. | Real time video editing |
US11670059B2 (en) | 2021-09-01 | 2023-06-06 | Snap Inc. | Controlling interactive fashion based on body gestures |
US11676412B2 (en) | 2016-06-30 | 2023-06-13 | Snap Inc. | Object modeling and replacement in a video stream |
US11675494B2 (en) | 2020-03-26 | 2023-06-13 | Snap Inc. | Combining first user interface content into second user interface |
US11690014B2 (en) | 2015-05-14 | 2023-06-27 | Snap Inc. | Systems and methods for wearable initiated handshaking |
US11714535B2 (en) | 2019-07-11 | 2023-08-01 | Snap Inc. | Edge gesture interface with smart interactions |
US11716301B2 (en) | 2018-01-02 | 2023-08-01 | Snap Inc. | Generating interactive messages with asynchronous media content |
US11714280B2 (en) | 2017-08-25 | 2023-08-01 | Snap Inc. | Wristwatch based interface for augmented reality eyewear |
US11722444B2 (en) | 2018-06-08 | 2023-08-08 | Snap Inc. | Generating interactive messages with entity assets |
US11720126B2 (en) | 2016-06-30 | 2023-08-08 | Snap Inc. | Motion and image-based control system |
US11727660B2 (en) | 2016-01-29 | 2023-08-15 | Snap Inc. | Local augmented reality persistent sticker objects |
US11726642B2 (en) | 2019-03-29 | 2023-08-15 | Snap Inc. | Messaging system with message transmission user interface |
US11734844B2 (en) | 2018-12-05 | 2023-08-22 | Snap Inc. | 3D hand shape and pose estimation |
US11734959B2 (en) | 2021-03-16 | 2023-08-22 | Snap Inc. | Activating hands-free mode on mirroring device |
US11743219B2 (en) | 2014-05-09 | 2023-08-29 | Snap Inc. | Dynamic configuration of application component tiles |
US11748958B2 (en) | 2021-12-07 | 2023-09-05 | Snap Inc. | Augmented reality unboxing experience |
US11747912B1 (en) | 2022-09-22 | 2023-09-05 | Snap Inc. | Steerable camera for AR hand tracking |
USD998637S1 (en) | 2021-03-16 | 2023-09-12 | Snap Inc. | Display screen or portion thereof with a graphical user interface |
US11778149B2 (en) | 2011-05-11 | 2023-10-03 | Snap Inc. | Headware with computer and optical element for use therewith and systems utilizing same |
US11775079B2 (en) | 2020-03-26 | 2023-10-03 | Snap Inc. | Navigating through augmented reality content |
US11776194B2 (en) | 2019-12-30 | 2023-10-03 | Snap Inc | Animated pull-to-refresh |
US11783862B2 (en) | 2014-12-19 | 2023-10-10 | Snap Inc. | Routing messages by message parameter |
US11782577B2 (en) | 2020-12-22 | 2023-10-10 | Snap Inc. | Media content player on an eyewear device |
US11790625B2 (en) | 2019-06-28 | 2023-10-17 | Snap Inc. | Messaging system with augmented reality messages |
US11790276B2 (en) | 2017-07-18 | 2023-10-17 | Snap Inc. | Virtual object machine learning |
US11797162B2 (en) | 2020-12-22 | 2023-10-24 | Snap Inc. | 3D painting on an eyewear device |
US11798201B2 (en) | 2021-03-16 | 2023-10-24 | Snap Inc. | Mirroring device with whole-body outfits |
US11797099B1 (en) | 2022-09-19 | 2023-10-24 | Snap Inc. | Visual and audio wake commands |
US11803345B2 (en) | 2014-12-19 | 2023-10-31 | Snap Inc. | Gallery of messages from individuals with a shared interest |
US11809633B2 (en) | 2021-03-16 | 2023-11-07 | Snap Inc. | Mirroring device with pointing based navigation |
US11832015B2 (en) | 2020-08-13 | 2023-11-28 | Snap Inc. | User interface for pose driven virtual effects |
US11855947B1 (en) | 2014-10-02 | 2023-12-26 | Snap Inc. | Gallery of ephemeral messages |
US11863508B2 (en) | 2017-07-31 | 2024-01-02 | Snap Inc. | Progressive attachments system |
US11861068B2 (en) | 2015-06-16 | 2024-01-02 | Snap Inc. | Radial gesture navigation |
US11876763B2 (en) | 2020-02-28 | 2024-01-16 | Snap Inc. | Access and routing of interactive messages |
US11880542B2 (en) | 2021-05-19 | 2024-01-23 | Snap Inc. | Touchpad input for augmented reality display device |
US11908243B2 (en) | 2021-03-16 | 2024-02-20 | Snap Inc. | Menu hierarchy navigation on electronic mirroring devices |
US11928306B2 (en) | 2021-05-19 | 2024-03-12 | Snap Inc. | Touchpad navigation for augmented reality display device |
US11934628B2 (en) | 2022-03-14 | 2024-03-19 | Snap Inc. | 3D user interface depth forgiveness |
US11941166B2 (en) | 2020-12-29 | 2024-03-26 | Snap Inc. | Body UI for augmented reality components |
US11948266B1 (en) | 2022-09-09 | 2024-04-02 | Snap Inc. | Virtual object manipulation with gestures in a messaging system |
US11962598B2 (en) | 2016-10-10 | 2024-04-16 | Snap Inc. | Social media post subscribe requests for buffer user accounts |
US11960653B2 (en) | 2022-05-10 | 2024-04-16 | Snap Inc. | Controlling augmented reality effects through multi-modal human interaction |
US11960651B2 (en) | 2020-03-30 | 2024-04-16 | Snap Inc. | Gesture-based shared AR session creation |
US11960784B2 (en) | 2021-12-07 | 2024-04-16 | Snap Inc. | Shared augmented reality unboxing experience |
US11972014B2 (en) | 2014-05-28 | 2024-04-30 | Snap Inc. | Apparatus and method for automated privacy protection in distributed images |
US11977732B2 (en) | 2014-11-26 | 2024-05-07 | Snap Inc. | Hybridization of voice notes and calling |
US11978283B2 (en) | 2021-03-16 | 2024-05-07 | Snap Inc. | Mirroring device with a hands-free mode |
US11991469B2 (en) | 2020-06-30 | 2024-05-21 | Snap Inc. | Skeletal tracking for real-time virtual effects |
US11989348B2 (en) | 2020-12-31 | 2024-05-21 | Snap Inc. | Media content items with haptic feedback augmentations |
US11997422B2 (en) | 2020-12-31 | 2024-05-28 | Snap Inc. | Real-time video communication interface with haptic feedback response |
US11995780B2 (en) | 2022-09-09 | 2024-05-28 | Snap Inc. | Shooting interaction using augmented reality content in a messaging system |
US11995108B2 (en) | 2017-07-31 | 2024-05-28 | Snap Inc. | Systems, devices, and methods for content selection |
US12001878B2 (en) | 2022-06-03 | 2024-06-04 | Snap Inc. | Auto-recovery for AR wearable devices |
US12002168B2 (en) | 2022-06-20 | 2024-06-04 | Snap Inc. | Low latency hand-tracking in augmented reality systems |
US12008152B1 (en) | 2020-12-31 | 2024-06-11 | Snap Inc. | Distance determination for mixed reality interaction |
US12034690B2 (en) | 2013-05-30 | 2024-07-09 | Snap Inc. | Maintaining a message thread with opt-in permanence for entries |
US12050729B2 (en) | 2021-03-31 | 2024-07-30 | Snap Inc. | Real-time communication interface with haptic and audio feedback response |
US12069399B2 (en) | 2022-07-07 | 2024-08-20 | Snap Inc. | Dynamically switching between RGB and IR capture |
US12093443B1 (en) | 2023-10-30 | 2024-09-17 | Snap Inc. | Grasping virtual objects with real hands for extended reality |
US12105283B2 (en) | 2020-12-22 | 2024-10-01 | Snap Inc. | Conversation interface on an eyewear device |
US12112025B2 (en) | 2023-02-16 | 2024-10-08 | Snap Inc. | Gesture-driven message content resizing |
US12131015B2 (en) | 2023-04-10 | 2024-10-29 | Snap Inc. | Application control using a gesture based trigger |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10146318B2 (en) | 2014-06-13 | 2018-12-04 | Thomas Malzbender | Techniques for using gesture recognition to effectuate character selection |
KR102450803B1 (en) * | 2016-02-11 | 2022-10-05 | 한국전자통신연구원 | Duplex sign language translation apparatus and the apparatus for performing the duplex sign language translation method |
SG11202000636XA (en) * | 2017-08-01 | 2020-02-27 | Visa Int Service Ass | Private data processing |
US10275646B2 (en) | 2017-08-03 | 2019-04-30 | Gyrfalcon Technology Inc. | Motion recognition via a two-dimensional symbol having multiple ideograms contained therein |
CN108256458B (en) * | 2018-01-04 | 2020-08-04 | 东北大学 | Bidirectional real-time translation system and method for deaf natural sign language |
CN110597446A (en) * | 2018-06-13 | 2019-12-20 | 北京小鸟听听科技有限公司 | Gesture recognition method and electronic equipment |
CN108960126A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Method, apparatus, equipment and the system of sign language interpreter |
CN110286774B (en) * | 2019-07-03 | 2021-08-13 | 中国科学技术大学 | Sign language identification method based on wrist motion sensor |
CN110348420B (en) * | 2019-07-18 | 2022-03-18 | 腾讯科技(深圳)有限公司 | Sign language recognition method and device, computer readable storage medium and computer equipment |
CN114639158A (en) * | 2020-11-30 | 2022-06-17 | 伊姆西Ip控股有限责任公司 | Computer interaction method, apparatus and program product |
US11503361B1 (en) * | 2021-07-26 | 2022-11-15 | Sony Group Corporation | Using signing for input to search fields |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040002838A1 (en) * | 2002-06-27 | 2004-01-01 | Oliver Nuria M. | Layered models for context awareness |
US20050256817A1 (en) * | 2004-05-12 | 2005-11-17 | Wren Christopher R | Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models |
WO2009055148A1 (en) * | 2007-10-26 | 2009-04-30 | Honda Motor Co., Ltd. | Hand sign recognition using label assignment |
US20110301934A1 (en) * | 2010-06-04 | 2011-12-08 | Microsoft Corporation | Machine based sign language interpreter |
US20130294651A1 (en) * | 2010-12-29 | 2013-11-07 | Thomson Licensing | System and method for gesture recognition |
Family Cites Families (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60136892A (en) * | 1983-12-26 | 1985-07-20 | Hitachi Ltd | On-line recognition device of hand written graphic |
US5956419A (en) * | 1995-04-28 | 1999-09-21 | Xerox Corporation | Unsupervised training of character templates using unsegmented samples |
US5737593A (en) * | 1995-06-02 | 1998-04-07 | International Business Machines Corporation | System and method for defining shapes with which to mine time sequences in computerized databases |
US5917941A (en) * | 1995-08-08 | 1999-06-29 | Apple Computer, Inc. | Character segmentation technique with integrated word search for handwriting recognition |
US20070177804A1 (en) * | 2006-01-30 | 2007-08-02 | Apple Computer, Inc. | Multi-touch gesture dictionary |
WO1999057648A1 (en) * | 1998-05-07 | 1999-11-11 | Art - Advanced Recognition Technologies Ltd. | Handwritten and voice control of vehicle components |
US6917693B1 (en) * | 1999-12-20 | 2005-07-12 | Ford Global Technologies, Llc | Vehicle data acquisition and display assembly |
US6788809B1 (en) * | 2000-06-30 | 2004-09-07 | Intel Corporation | System and method for gesture recognition in three dimensions using stereo imaging and color vision |
US7092870B1 (en) * | 2000-09-15 | 2006-08-15 | International Business Machines Corporation | System and method for managing a textual archive using semantic units |
US7974843B2 (en) * | 2002-01-17 | 2011-07-05 | Siemens Aktiengesellschaft | Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer |
US7565295B1 (en) * | 2003-08-28 | 2009-07-21 | The George Washington University | Method and apparatus for translating hand gestures |
US7426301B2 (en) * | 2004-06-28 | 2008-09-16 | Mitsubishi Electric Research Laboratories, Inc. | Usual event detection in a video using object and frame features |
US7295904B2 (en) * | 2004-08-31 | 2007-11-13 | International Business Machines Corporation | Touch gesture based interface for motor vehicle |
JP4092371B2 (en) * | 2005-02-15 | 2008-05-28 | 有限会社Kiteイメージ・テクノロジーズ | Handwritten character recognition method, handwritten character recognition system, handwritten character recognition program, and recording medium |
US20140035805A1 (en) * | 2009-04-02 | 2014-02-06 | David MINNEN | Spatial operating environment (soe) with markerless gestural control |
US20090278915A1 (en) * | 2006-02-08 | 2009-11-12 | Oblong Industries, Inc. | Gesture-Based Control System For Vehicle Interfaces |
US20100023314A1 (en) * | 2006-08-13 | 2010-01-28 | Jose Hernandez-Rebollar | ASL Glove with 3-Axis Accelerometers |
US20080120108A1 (en) * | 2006-11-16 | 2008-05-22 | Frank Kao-Ping Soong | Multi-space distribution for pattern recognition based on mixed continuous and discrete observations |
US8352265B1 (en) * | 2007-12-24 | 2013-01-08 | Edward Lin | Hardware implemented backend search engine for a high-rate speech recognition system |
US8131086B2 (en) * | 2008-09-24 | 2012-03-06 | Microsoft Corporation | Kernelized spatial-contextual image classification |
US8150160B2 (en) * | 2009-03-26 | 2012-04-03 | King Fahd University Of Petroleum & Minerals | Automatic Arabic text image optical character recognition method |
US9551590B2 (en) * | 2009-08-28 | 2017-01-24 | Robert Bosch Gmbh | Gesture-based information and command entry for motor vehicle |
US20120249797A1 (en) * | 2010-02-28 | 2012-10-04 | Osterhout Group, Inc. | Head-worn adaptive display |
US8493174B2 (en) * | 2010-03-24 | 2013-07-23 | Dharma P. Agrawal | Apparatus for instantaneous translation of sign language |
CN101794528B (en) * | 2010-04-02 | 2012-03-14 | 北京大学软件与微电子学院无锡产学研合作教育基地 | Gesture language-voice bidirectional translation system |
US20110254765A1 (en) * | 2010-04-18 | 2011-10-20 | Primesense Ltd. | Remote text input using handwriting |
US9396385B2 (en) * | 2010-08-26 | 2016-07-19 | Blast Motion Inc. | Integrated sensor and video motion analysis method |
US8817087B2 (en) * | 2010-11-01 | 2014-08-26 | Robert Bosch Gmbh | Robust video-based handwriting and gesture recognition for in-car applications |
US8572764B2 (en) * | 2010-12-09 | 2013-11-05 | Dieter Thellmann | Exercising glove |
US8811701B2 (en) * | 2011-02-23 | 2014-08-19 | Siemens Aktiengesellschaft | Systems and method for automatic prostate localization in MR images using random walker segmentation initialized via boosted classifiers |
US9189068B2 (en) * | 2011-03-14 | 2015-11-17 | Lg Electronics Inc. | Apparatus and a method for gesture recognition |
CN102129860B (en) * | 2011-04-07 | 2012-07-04 | 南京邮电大学 | Text-related speaker recognition method based on infinite-state hidden Markov model |
TWI578977B (en) * | 2011-04-07 | 2017-04-21 | 香港中文大學 | Device for retinal image analysis |
US8760395B2 (en) * | 2011-05-31 | 2014-06-24 | Microsoft Corporation | Gesture recognition techniques |
US8885882B1 (en) * | 2011-07-14 | 2014-11-11 | The Research Foundation For The State University Of New York | Real time eye tracking for human computer interaction |
US8971572B1 (en) * | 2011-08-12 | 2015-03-03 | The Research Foundation For The State University Of New York | Hand pointing estimation for human computer interaction |
US9002099B2 (en) * | 2011-09-11 | 2015-04-07 | Apple Inc. | Learning-based estimation of hand and finger pose |
US10795448B2 (en) * | 2011-09-29 | 2020-10-06 | Magic Leap, Inc. | Tactile glove for human-computer interaction |
JP5593339B2 (en) * | 2012-02-07 | 2014-09-24 | 日本システムウエア株式会社 | Gesture recognition device using a steering wheel of an automobile, hand recognition method and program thereof |
WO2013128291A2 (en) * | 2012-02-29 | 2013-09-06 | Robert Bosch Gmbh | Method of fusing multiple information sources in image-based gesture recognition system |
US9448636B2 (en) * | 2012-04-18 | 2016-09-20 | Arb Labs Inc. | Identifying gestures using gesture data compressed by PCA, principal joint variable analysis, and compressed feature matrices |
US9536135B2 (en) * | 2012-06-18 | 2017-01-03 | Microsoft Technology Licensing, Llc | Dynamic hand gesture recognition using depth data |
DE102012216193B4 (en) * | 2012-09-12 | 2020-07-30 | Continental Automotive Gmbh | Method and device for operating a motor vehicle component using gestures |
CN102982315B (en) * | 2012-11-05 | 2016-06-15 | 中国科学院计算技术研究所 | The Hand Gesture Segmentation recognition methods of a kind of non-gesture mode of automatic detection and system |
US9696867B2 (en) * | 2013-01-15 | 2017-07-04 | Leap Motion, Inc. | Dynamic user interactions for display control and identifying dominant gestures |
US9955895B2 (en) * | 2013-11-05 | 2018-05-01 | The Research Foundation For The State University Of New York | Wearable head-mounted, glass-style computing devices with EOG acquisition and analysis for human-computer interfaces |
CN103593680B (en) * | 2013-11-19 | 2016-09-14 | 南京大学 | A kind of dynamic gesture identification method based on the study of HMM independent increment |
WO2015103693A1 (en) * | 2014-01-07 | 2015-07-16 | Arb Labs Inc. | Systems and methods of monitoring activities at a gaming venue |
WO2015126095A1 (en) * | 2014-02-21 | 2015-08-27 | 삼성전자 주식회사 | Electronic device |
US10013710B2 (en) * | 2014-04-17 | 2018-07-03 | Ebay Inc. | Fashion preference analysis |
US9552070B2 (en) * | 2014-09-23 | 2017-01-24 | Microsoft Technology Licensing, Llc | Tracking hand/body pose |
US10725533B2 (en) * | 2014-09-26 | 2020-07-28 | Intel Corporation | Systems, apparatuses, and methods for gesture recognition and interaction |
CN110794960B (en) * | 2014-12-08 | 2024-02-06 | 罗希特·塞思 | Wearable wireless HMI device |
US9836620B2 (en) * | 2014-12-30 | 2017-12-05 | Samsung Electronic Co., Ltd. | Computing system for privacy-aware sharing management and method of operation thereof |
-
2016
- 2016-04-15 US US15/099,628 patent/US10109219B2/en active Active
- 2016-04-15 WO PCT/US2016/027746 patent/WO2016168591A1/en active Application Filing
- 2016-04-15 CN CN201680035117.0A patent/CN107690651B/en active Active
- 2016-04-15 EP EP16780832.8A patent/EP3284019A4/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040002838A1 (en) * | 2002-06-27 | 2004-01-01 | Oliver Nuria M. | Layered models for context awareness |
US20050256817A1 (en) * | 2004-05-12 | 2005-11-17 | Wren Christopher R | Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models |
WO2009055148A1 (en) * | 2007-10-26 | 2009-04-30 | Honda Motor Co., Ltd. | Hand sign recognition using label assignment |
US20110301934A1 (en) * | 2010-06-04 | 2011-12-08 | Microsoft Corporation | Machine based sign language interpreter |
US20130294651A1 (en) * | 2010-12-29 | 2013-11-07 | Thomson Licensing | System and method for gesture recognition |
Non-Patent Citations (3)
Title |
---|
DAVIS J ET AL., VISUAL GESTURE RECOGNITION, vol. 141, no. 2, 1 April 1994 (1994-04-01), pages 101 - 6, ISSN: 1350-245X |
GAOLIN FANG ET AL., LARGE-VOCABULARY CONTINUOUS SIGN LANGUAGE RECOGNITION BASED ON TRANSITION -MOVEMENT MODELS, vol. 365, no. 1, 1 January 2007 (2007-01-01), pages 1 - 9, ISSN: 1083-4427 |
See also references of EP3284019A4 |
Cited By (85)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11778149B2 (en) | 2011-05-11 | 2023-10-03 | Snap Inc. | Headware with computer and optical element for use therewith and systems utilizing same |
US12034690B2 (en) | 2013-05-30 | 2024-07-09 | Snap Inc. | Maintaining a message thread with opt-in permanence for entries |
US11514947B1 (en) | 2014-02-05 | 2022-11-29 | Snap Inc. | Method for real-time video processing involving changing features of an object in the video |
US11743219B2 (en) | 2014-05-09 | 2023-08-29 | Snap Inc. | Dynamic configuration of application component tiles |
US11972014B2 (en) | 2014-05-28 | 2024-04-30 | Snap Inc. | Apparatus and method for automated privacy protection in distributed images |
US11595569B2 (en) | 2014-07-07 | 2023-02-28 | Snap Inc. | Supplying content aware photo filters |
US11855947B1 (en) | 2014-10-02 | 2023-12-26 | Snap Inc. | Gallery of ephemeral messages |
US12113764B2 (en) | 2014-10-02 | 2024-10-08 | Snap Inc. | Automated management of ephemeral message collections |
US11977732B2 (en) | 2014-11-26 | 2024-05-07 | Snap Inc. | Hybridization of voice notes and calling |
US11803345B2 (en) | 2014-12-19 | 2023-10-31 | Snap Inc. | Gallery of messages from individuals with a shared interest |
US11783862B2 (en) | 2014-12-19 | 2023-10-10 | Snap Inc. | Routing messages by message parameter |
US11627141B2 (en) | 2015-03-18 | 2023-04-11 | Snap Inc. | Geo-fence authorization provisioning |
US11902287B2 (en) | 2015-03-18 | 2024-02-13 | Snap Inc. | Geo-fence authorization provisioning |
US11690014B2 (en) | 2015-05-14 | 2023-06-27 | Snap Inc. | Systems and methods for wearable initiated handshaking |
US11861068B2 (en) | 2015-06-16 | 2024-01-02 | Snap Inc. | Radial gesture navigation |
US11727660B2 (en) | 2016-01-29 | 2023-08-15 | Snap Inc. | Local augmented reality persistent sticker objects |
US11662900B2 (en) | 2016-05-31 | 2023-05-30 | Snap Inc. | Application control using a gesture based trigger |
US11720126B2 (en) | 2016-06-30 | 2023-08-08 | Snap Inc. | Motion and image-based control system |
US11676412B2 (en) | 2016-06-30 | 2023-06-13 | Snap Inc. | Object modeling and replacement in a video stream |
US11892859B2 (en) | 2016-06-30 | 2024-02-06 | Snap Inc. | Remoteless control of drone behavior |
US11962598B2 (en) | 2016-10-10 | 2024-04-16 | Snap Inc. | Social media post subscribe requests for buffer user accounts |
US11790276B2 (en) | 2017-07-18 | 2023-10-17 | Snap Inc. | Virtual object machine learning |
US11995108B2 (en) | 2017-07-31 | 2024-05-28 | Snap Inc. | Systems, devices, and methods for content selection |
US11863508B2 (en) | 2017-07-31 | 2024-01-02 | Snap Inc. | Progressive attachments system |
US11714280B2 (en) | 2017-08-25 | 2023-08-01 | Snap Inc. | Wristwatch based interface for augmented reality eyewear |
WO2019094618A1 (en) * | 2017-11-08 | 2019-05-16 | Signall Technologies Zrt | Computer vision based sign language interpreter |
US11847426B2 (en) | 2017-11-08 | 2023-12-19 | Snap Inc. | Computer vision based sign language interpreter |
US11558325B2 (en) | 2018-01-02 | 2023-01-17 | Snap Inc. | Generating interactive messages with asynchronous media content |
US11716301B2 (en) | 2018-01-02 | 2023-08-01 | Snap Inc. | Generating interactive messages with asynchronous media content |
US11722444B2 (en) | 2018-06-08 | 2023-08-08 | Snap Inc. | Generating interactive messages with entity assets |
US11734844B2 (en) | 2018-12-05 | 2023-08-22 | Snap Inc. | 3D hand shape and pose estimation |
US11546280B2 (en) | 2019-03-29 | 2023-01-03 | Snap Inc. | Messaging system with discard user interface |
US12034686B2 (en) | 2019-03-29 | 2024-07-09 | Snap Inc. | Messaging system with discard user interface |
US11726642B2 (en) | 2019-03-29 | 2023-08-15 | Snap Inc. | Messaging system with message transmission user interface |
US11809696B2 (en) | 2019-06-03 | 2023-11-07 | Snap Inc. | User interfaces to facilitate multiple modes of electronic communication |
US11599255B2 (en) | 2019-06-03 | 2023-03-07 | Snap Inc. | User interfaces to facilitate multiple modes of electronic communication |
US11790625B2 (en) | 2019-06-28 | 2023-10-17 | Snap Inc. | Messaging system with augmented reality messages |
US11714535B2 (en) | 2019-07-11 | 2023-08-01 | Snap Inc. | Edge gesture interface with smart interactions |
US11551374B2 (en) | 2019-09-09 | 2023-01-10 | Snap Inc. | Hand pose estimation from stereo cameras |
US11880509B2 (en) | 2019-09-09 | 2024-01-23 | Snap Inc. | Hand pose estimation from stereo cameras |
US11776194B2 (en) | 2019-12-30 | 2023-10-03 | Snap Inc | Animated pull-to-refresh |
US11876763B2 (en) | 2020-02-28 | 2024-01-16 | Snap Inc. | Access and routing of interactive messages |
US11775079B2 (en) | 2020-03-26 | 2023-10-03 | Snap Inc. | Navigating through augmented reality content |
US11675494B2 (en) | 2020-03-26 | 2023-06-13 | Snap Inc. | Combining first user interface content into second user interface |
US11960651B2 (en) | 2020-03-30 | 2024-04-16 | Snap Inc. | Gesture-based shared AR session creation |
US11991469B2 (en) | 2020-06-30 | 2024-05-21 | Snap Inc. | Skeletal tracking for real-time virtual effects |
US11832015B2 (en) | 2020-08-13 | 2023-11-28 | Snap Inc. | User interface for pose driven virtual effects |
US11671559B2 (en) | 2020-09-30 | 2023-06-06 | Snap Inc. | Real time video editing |
US11943562B2 (en) | 2020-09-30 | 2024-03-26 | Snap Inc. | Real time video editing |
US11797162B2 (en) | 2020-12-22 | 2023-10-24 | Snap Inc. | 3D painting on an eyewear device |
US11782577B2 (en) | 2020-12-22 | 2023-10-10 | Snap Inc. | Media content player on an eyewear device |
US12105283B2 (en) | 2020-12-22 | 2024-10-01 | Snap Inc. | Conversation interface on an eyewear device |
US11941166B2 (en) | 2020-12-29 | 2024-03-26 | Snap Inc. | Body UI for augmented reality components |
US12008152B1 (en) | 2020-12-31 | 2024-06-11 | Snap Inc. | Distance determination for mixed reality interaction |
US20230097257A1 (en) | 2020-12-31 | 2023-03-30 | Snap Inc. | Electronic communication interface with haptic feedback response |
US11997422B2 (en) | 2020-12-31 | 2024-05-28 | Snap Inc. | Real-time video communication interface with haptic feedback response |
US11989348B2 (en) | 2020-12-31 | 2024-05-21 | Snap Inc. | Media content items with haptic feedback augmentations |
US11734959B2 (en) | 2021-03-16 | 2023-08-22 | Snap Inc. | Activating hands-free mode on mirroring device |
US11978283B2 (en) | 2021-03-16 | 2024-05-07 | Snap Inc. | Mirroring device with a hands-free mode |
USD998637S1 (en) | 2021-03-16 | 2023-09-12 | Snap Inc. | Display screen or portion thereof with a graphical user interface |
US11798201B2 (en) | 2021-03-16 | 2023-10-24 | Snap Inc. | Mirroring device with whole-body outfits |
USD1029031S1 (en) | 2021-03-16 | 2024-05-28 | Snap Inc. | Display screen or portion thereof with a graphical user interface |
US11908243B2 (en) | 2021-03-16 | 2024-02-20 | Snap Inc. | Menu hierarchy navigation on electronic mirroring devices |
US11809633B2 (en) | 2021-03-16 | 2023-11-07 | Snap Inc. | Mirroring device with pointing based navigation |
US12050729B2 (en) | 2021-03-31 | 2024-07-30 | Snap Inc. | Real-time communication interface with haptic and audio feedback response |
US11880542B2 (en) | 2021-05-19 | 2024-01-23 | Snap Inc. | Touchpad input for augmented reality display device |
US11928306B2 (en) | 2021-05-19 | 2024-03-12 | Snap Inc. | Touchpad navigation for augmented reality display device |
US12056832B2 (en) | 2021-09-01 | 2024-08-06 | Snap Inc. | Controlling interactive fashion based on body gestures |
US11670059B2 (en) | 2021-09-01 | 2023-06-06 | Snap Inc. | Controlling interactive fashion based on body gestures |
CN113822186A (en) * | 2021-09-10 | 2021-12-21 | 阿里巴巴达摩院(杭州)科技有限公司 | Sign language translation, customer service, communication method, device and readable medium |
US11960784B2 (en) | 2021-12-07 | 2024-04-16 | Snap Inc. | Shared augmented reality unboxing experience |
US11748958B2 (en) | 2021-12-07 | 2023-09-05 | Snap Inc. | Augmented reality unboxing experience |
US11934628B2 (en) | 2022-03-14 | 2024-03-19 | Snap Inc. | 3D user interface depth forgiveness |
US11960653B2 (en) | 2022-05-10 | 2024-04-16 | Snap Inc. | Controlling augmented reality effects through multi-modal human interaction |
US12001878B2 (en) | 2022-06-03 | 2024-06-04 | Snap Inc. | Auto-recovery for AR wearable devices |
US12002168B2 (en) | 2022-06-20 | 2024-06-04 | Snap Inc. | Low latency hand-tracking in augmented reality systems |
US12069399B2 (en) | 2022-07-07 | 2024-08-20 | Snap Inc. | Dynamically switching between RGB and IR capture |
US11948266B1 (en) | 2022-09-09 | 2024-04-02 | Snap Inc. | Virtual object manipulation with gestures in a messaging system |
US11995780B2 (en) | 2022-09-09 | 2024-05-28 | Snap Inc. | Shooting interaction using augmented reality content in a messaging system |
US11797099B1 (en) | 2022-09-19 | 2023-10-24 | Snap Inc. | Visual and audio wake commands |
US12105891B2 (en) | 2022-09-22 | 2024-10-01 | Snap Inc. | Steerable camera for AR hand tracking |
US11747912B1 (en) | 2022-09-22 | 2023-09-05 | Snap Inc. | Steerable camera for AR hand tracking |
US12112025B2 (en) | 2023-02-16 | 2024-10-08 | Snap Inc. | Gesture-driven message content resizing |
US12131015B2 (en) | 2023-04-10 | 2024-10-29 | Snap Inc. | Application control using a gesture based trigger |
US12093443B1 (en) | 2023-10-30 | 2024-09-17 | Snap Inc. | Grasping virtual objects with real hands for extended reality |
Also Published As
Publication number | Publication date |
---|---|
US10109219B2 (en) | 2018-10-23 |
EP3284019A4 (en) | 2018-12-05 |
CN107690651A (en) | 2018-02-13 |
CN107690651B (en) | 2022-06-28 |
US20160307469A1 (en) | 2016-10-20 |
EP3284019A1 (en) | 2018-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10109219B2 (en) | System and method for automated sign language recognition | |
Li et al. | Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications | |
Wang et al. | An approach based on phonemes to large vocabulary Chinese sign language recognition | |
Vogler et al. | ASL recognition based on a coupling between HMMs and 3D motion analysis | |
Al-Ahdal et al. | Review in sign language recognition systems | |
Gao et al. | Sign language recognition based on HMM/ANN/DP | |
Ong et al. | Automatic sign language analysis: A survey and the future beyond lexical meaning | |
Bauer et al. | Towards an automatic sign language recognition system using subunits | |
EP2911089B1 (en) | Method and system for handwriting and gesture recognition | |
Yang et al. | Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine | |
EP3465616A1 (en) | Automatic body movement recognition and association system | |
WO2015171646A1 (en) | Method and system for speech input | |
Ma et al. | A continuous Chinese sign language recognition system | |
Hienz et al. | HMM-based continuous sign language recognition using stochastic grammars | |
Xue et al. | A Chinese sign language recognition system using leap motion | |
RU2737231C1 (en) | Method of multimodal contactless control of mobile information robot | |
Riad et al. | Signsworld; deeping into the silence world and hearing its signs (state of the art) | |
Fraiwan et al. | A Kinect-based system for Arabic sign language to speech translation | |
Elakkiya | Recognition of Russian and Indian sign languages used by the deaf people | |
Rybach et al. | Appearance-based features for automatic continuous sign language recognition | |
Yuan et al. | Recognition of strong and weak connection models in continuous sign language | |
CN115457654A (en) | Real-time video stream sign language identification method based on human body key points | |
Ten Holt et al. | Why don't you see what I mean? Prospects and Limitations of current Automatic Sign Recognition Research | |
Zhao et al. | Realizing speech to gesture conversion by keyword spotting | |
Okada et al. | Recognizing words from gestures: Discovering gesture descriptors associated with spoken utterances |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16780832 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2016780832 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |