WO2001059741A1 - Sign language to speech converting method and apparatus - Google Patents
Sign language to speech converting method and apparatus Download PDFInfo
- Publication number
- WO2001059741A1 WO2001059741A1 PCT/EP2001/000478 EP0100478W WO0159741A1 WO 2001059741 A1 WO2001059741 A1 WO 2001059741A1 EP 0100478 W EP0100478 W EP 0100478W WO 0159741 A1 WO0159741 A1 WO 0159741A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- words
- speech
- gestures
- natural language
- speech synthesizer
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B21/00—Teaching, or communicating with, the blind, deaf or mute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Definitions
- the invention relates to signal language translators and specifically to such translators that convert sign language directly to spoken words using a portable computer.
- Data gloves have been used for classification of sign language.
- static finger-spelling is translated into letters or words are translated and gestures (movement) are ignored.
- Discrete Hidden Markov models with data glove inputs allow interactive learning, which has been used successfully to train a series of gestures. This technology is described in "On-line, interactive learning of gestures for human/robot interfaces," Christopher Lee and Yanhsheng Xu, IEEE Int'l. Conf. on Robotics and Automation, vol. 4, pp. 2982-2987, 1996.
- a neural network trained specifically by a user has been shown the ability to recognize small sets of letters signed by dynamic finger-spelling. This technology is described in "A multi-stage approach to fingerspelling and gesture recognition," R. Erenshteyn and P. Laskov, Proc. Workshop on the Integration of Gesture in Language and Speech, Wilmington, DE, 1996.
- Another prior art system tracks gestures continuously using colored gloves and camera-based image processing techniques.
- the system allows no fingerspelling and encumbers the user with a video input system and the requirement of wearing specially colored gloves as well as the need to remain in the field of view of one or more cameras.
- This technology is described in "Visual recognition of American Sign Language using Hidden Markov models,” Thad Striner, Master's thesis, The Media Laboratory, MIT, 1995.
- Data gloves have been proposed for mapping hand gestures into text using neural networks.
- This technology is described in "Glove-Talk II: Mapping hand gestures to speech using neural networks - an approach to building adaptive interfaces," Sidney Fels, PhD thesis, Univ. Toronto, 1994.
- Real-time processing using neural networks requires tremendous processing power.
- a portable appliance converts gesture-based inputs from a signer to audible speech in real time.
- the device employs a portable main processor, for example, one of the portable computers now in common use.
- a portable main processor for example, one of the portable computers now in common use.
- Dynamic and static gestures are classified by a Continuous Hidden Markov model (CHMM) which is capable of robust and rapid real time classification of both static and dynamic gestures.
- CHMM Continuous Hidden Markov model
- a natural language processor is used to transform the gesture classes into grammatically correct sequences of words.
- a speech synthesizer converts the word sequences into audible speech.
- the invention achieves gains in both portability and utility by its use of HMM to classify gestures.
- Such models are forgiving and relatively computationally undemanding. Thus, they can handle variation in the form of an input and still generate a proper classification. In addition, they are much more stingy in their use of computational resources than, say, neural networks.
- the use of a data-glove as an input and a speaker as an output offers a high degree of portability of the appliance. Additionally, the use of a data-glove allows a relatively small-bandwidth port to be used.
- the output for a speech engine which could receive text through a port or other symbolic output and be synthesized by an inexpensive external processor system.
- the processing unit could already have a sound card with speech synthesis capability as to many personal digital assistants (PDAs).
- PDAs personal digital assistants
- Fig. 1 is an illustration of a portable sign language-to-speech converter according to an embodiment of the invention.
- data gloves 130 and position sensors 110 apply hand- position and configuration data to a gesture recognition processor 120.
- the gesture recognition processor 120 classifies hand gestures into discrete symbols identifiable with words and generates outputs in real time indicating the words classified. Where classifications produce a low index of confidence, this information may also be output.
- the classification information is applied in turn to a natural language processor 140 that converts the words into full grammatical sentences and phrases, which may be output as text or as some other more compact symbolic form.
- the output of the natural language processor 140 is applied to a speech synthesizer 150.
- the speech synthesizer 150 generates a sound signal that may be output to a speaker 195.
- the sound signal may be generated at a port connectable 160 to, for example, headphones (not shown), to allow private use or use in a noisy environment. This might be particularly useful where the signer is a good lip-reader because conversations can be completely private to non-lip-readers.
- the data glove 130 and position sensor 110 may be any electro-mechanical device effective to generate signals responsively to fingerspelling and sign language gestures.
- inertial sensors with direct and integrated signals may provide velocity and position information for various parts of the hand, such as the wrist, some or all fingertips, etc.
- data-gloves currently on the market and used for control applications may be utilized.
- the types of inputs required to form a practical device for this application are becoming clearer as research continues in this area.
- various prototypes discussed above have proven that hand configuration, position, and velocity information can be distilled into a manageable dataspace (a reasonable number of independent inputs) and these inputs applied to various types of recognition processors to classify sign-language-type gestures.
- the gesture recognition processor 120 can be based on various different technologies effective to classify the gesture inputs.
- Present technology in software and hardware makes a Continuous Hidden Markov Model (CHMM) strategy the preferred approach.
- CHMM classification technology Another advantage of CHMM classification technology is the fact that such classifiers tend to be tolerant of variation in the input values and relative values.
- processor speed, integration-scale, size and cost of computing hardware evolves, other classification technologies may prove appropriate, for example, neural network-based classifiers.
- the gesture recognition processor 120 outputs a class indicator for each recognized gesture.
- a stream of such indicators is applied to the natural language processor which adds missing words to form grammatical sentences and phrases. Since sign-language does not necessarily include all elements of normal speech - obvious and essential components of grammar, such as subject and articles may be omitted - the natural language processor may insert these before application to the speech synthesizer 150.
- the natural language processor 140 identifies ungrammatical usage and corrects them. Such techniques are well-developed for word-processors and can be applied directly in the instant context. Note that the natural language processor 140 is not essential since the ungrammatical speech corresponding to sign language may still be recognizable.
- the natural language processor may be best, therefore, for no modifications to be made where the confidence corresponding to a change is low. That is, the natural language processor may be tuned to make changes only when a confidence measure for the contemplated change is high, since comprehensible speech may be derived directly from the output of the gesture recognition processor.
- the speech synthesizer 150 may be any word-to-audio conversion device such as a text-to-speech converter. Preferably the speech is output to a small speaker or other audio transducer. Note that text need not be an intermediate product in the instant invention. However, it may facilitate the use of off the shelf devices such as text to existing speech converters.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001558982A JP2003522978A (en) | 2000-02-10 | 2001-01-17 | Method and apparatus for converting sign language into speech |
EP01900465A EP1181679A1 (en) | 2000-02-10 | 2001-01-17 | Sign language to speech converting method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US50189400A | 2000-02-10 | 2000-02-10 | |
US09/501,894 | 2000-02-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001059741A1 true WO2001059741A1 (en) | 2001-08-16 |
Family
ID=23995449
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/000478 WO2001059741A1 (en) | 2000-02-10 | 2001-01-17 | Sign language to speech converting method and apparatus |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1181679A1 (en) |
JP (1) | JP2003522978A (en) |
WO (1) | WO2001059741A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002047099A1 (en) * | 2000-12-09 | 2002-06-13 | Energy Storage Systems Pty Ltd | A connection between a conductive substrate and a laminate |
WO2004114107A1 (en) * | 2003-06-20 | 2004-12-29 | Nadeem Mohammad Qadir | Human-assistive wearable audio-visual inter-communication apparatus. |
CN104064187A (en) * | 2014-07-09 | 2014-09-24 | 张江杰 | Sign language conversion voice system |
EP2825938A1 (en) * | 2012-03-15 | 2015-01-21 | Ibrahim Farid Cherradi El Fadili | Extending the free fingers typing technology and introducing the finger taps language technology |
US10296085B2 (en) | 2014-03-05 | 2019-05-21 | Markantus Ag | Relatively simple and inexpensive finger operated control device including piezoelectric sensors for gesture input, and method thereof |
US10334103B2 (en) | 2017-01-25 | 2019-06-25 | International Business Machines Corporation | Message translation for cognitive assistance |
US10424224B2 (en) | 2014-08-20 | 2019-09-24 | Robert Bosch Gmbh | Glove for use in collecting data for sign language recognition |
CN111428802A (en) * | 2020-03-31 | 2020-07-17 | 上海市计量测试技术研究院 | Sign language translation method based on support vector machine |
US10902743B2 (en) | 2017-04-14 | 2021-01-26 | Arizona Board Of Regents On Behalf Of Arizona State University | Gesture recognition and communication |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5114871B2 (en) * | 2006-05-31 | 2013-01-09 | 沖電気工業株式会社 | Video providing device |
CN108229318A (en) * | 2017-11-28 | 2018-06-29 | 北京市商汤科技开发有限公司 | The training method and device of gesture identification and gesture identification network, equipment, medium |
WO2023166557A1 (en) * | 2022-03-01 | 2023-09-07 | 日本電気株式会社 | Speech recognition system, speech recognition method, and recording medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5047952A (en) * | 1988-10-14 | 1991-09-10 | The Board Of Trustee Of The Leland Stanford Junior University | Communication system for deaf, deaf-blind, or non-vocal individuals using instrumented glove |
EP0560587A2 (en) * | 1992-03-10 | 1993-09-15 | Hitachi, Ltd. | Sign language translation system and method |
US6141643A (en) * | 1998-11-25 | 2000-10-31 | Harmon; Steve | Data input glove having conductive finger pads and thumb pad, and uses therefor |
-
2001
- 2001-01-17 JP JP2001558982A patent/JP2003522978A/en not_active Withdrawn
- 2001-01-17 WO PCT/EP2001/000478 patent/WO2001059741A1/en not_active Application Discontinuation
- 2001-01-17 EP EP01900465A patent/EP1181679A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5047952A (en) * | 1988-10-14 | 1991-09-10 | The Board Of Trustee Of The Leland Stanford Junior University | Communication system for deaf, deaf-blind, or non-vocal individuals using instrumented glove |
EP0560587A2 (en) * | 1992-03-10 | 1993-09-15 | Hitachi, Ltd. | Sign language translation system and method |
US6141643A (en) * | 1998-11-25 | 2000-10-31 | Harmon; Steve | Data input glove having conductive finger pads and thumb pad, and uses therefor |
Non-Patent Citations (2)
Title |
---|
FELS S S ET AL: "GLOVE-TALK: A NEURAL NETWORK INTERFACE BETWEEN A DATA-GLOVE AND A SPEECH SYNTHESIZER", IEEE TRANSACTIONS ON NEURAL NETWORKS,US,IEEE INC, NEW YORK, vol. 4, no. 1, 1993, pages 2 - 8, XP000331412, ISSN: 1045-9227 * |
LEE C ET AL: "ONLINE, INTERACTIVE LEARNING OF GESTURES FOR HUMAN/ROBOT INTERFACES", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION,US,NEW YORK, IEEE, vol. CONF. 13, 22 April 1996 (1996-04-22), pages 2982 - 2987, XP000773139, ISBN: 0-7802-2989-8 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002047099A1 (en) * | 2000-12-09 | 2002-06-13 | Energy Storage Systems Pty Ltd | A connection between a conductive substrate and a laminate |
WO2004114107A1 (en) * | 2003-06-20 | 2004-12-29 | Nadeem Mohammad Qadir | Human-assistive wearable audio-visual inter-communication apparatus. |
EP2825938A1 (en) * | 2012-03-15 | 2015-01-21 | Ibrahim Farid Cherradi El Fadili | Extending the free fingers typing technology and introducing the finger taps language technology |
US10296085B2 (en) | 2014-03-05 | 2019-05-21 | Markantus Ag | Relatively simple and inexpensive finger operated control device including piezoelectric sensors for gesture input, and method thereof |
CN104064187A (en) * | 2014-07-09 | 2014-09-24 | 张江杰 | Sign language conversion voice system |
US10424224B2 (en) | 2014-08-20 | 2019-09-24 | Robert Bosch Gmbh | Glove for use in collecting data for sign language recognition |
US10334103B2 (en) | 2017-01-25 | 2019-06-25 | International Business Machines Corporation | Message translation for cognitive assistance |
US10902743B2 (en) | 2017-04-14 | 2021-01-26 | Arizona Board Of Regents On Behalf Of Arizona State University | Gesture recognition and communication |
CN111428802A (en) * | 2020-03-31 | 2020-07-17 | 上海市计量测试技术研究院 | Sign language translation method based on support vector machine |
CN111428802B (en) * | 2020-03-31 | 2023-02-07 | 上海市计量测试技术研究院 | Sign language translation method based on support vector machine |
Also Published As
Publication number | Publication date |
---|---|
JP2003522978A (en) | 2003-07-29 |
EP1181679A1 (en) | 2002-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vijayalakshmi et al. | Sign language to speech conversion | |
Mehdi et al. | Sign language recognition using sensor gloves | |
KR101229034B1 (en) | Multimodal unification of articulation for device interfacing | |
JP6815899B2 (en) | Output statement generator, output statement generator and output statement generator | |
Yousaf et al. | A novel technique for speech recognition and visualization based mobile application to support two-way communication between deaf-mute and normal peoples | |
CN113748462A (en) | Determining input for a speech processing engine | |
KR20080023030A (en) | On-line speaker recognition method and apparatus for thereof | |
WO2001059741A1 (en) | Sign language to speech converting method and apparatus | |
Swee et al. | Wireless data gloves Malay sign language recognition system | |
Mian Qaisar | Isolated speech recognition and its transformation in visual signs | |
Priya et al. | Indian and english language to sign language translator-an automated portable two way communicator for bridging normal and deprived ones | |
Raut et al. | Hand sign interpreter | |
Swee et al. | Malay sign language gesture recognition system | |
Riad et al. | Signsworld; deeping into the silence world and hearing its signs (state of the art) | |
EP4131256A1 (en) | Voice recognition system and method using accelerometers for sensing bone conduction | |
Hatwar et al. | Home automation system based on gesture recognition system | |
Khambaty et al. | Cost effective portable system for sign language gesture recognition | |
Kou et al. | Design by talking with computers | |
Lin et al. | Acoustical implicit communication in human-robot interaction | |
Hernandez-Rebollar | Gesture-driven American sign language phraselator | |
Jayapriya et al. | Development of MEMS sensor-based double handed gesture-to-speech conversion system | |
Dhal | Controlling Devices Through Voice Based on AVR Microcontroller | |
US20230386491A1 (en) | Artificial intelligence device | |
Jian | Gesture recognition using windowed dynamic time warping | |
Huang et al. | Office presence detection using multimodal context information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001900465 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2001 558982 Kind code of ref document: A Format of ref document f/p: F |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 2001900465 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001900465 Country of ref document: EP |