US7107215B2 - Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study - Google Patents
Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study Download PDFInfo
- Publication number
- US7107215B2 US7107215B2 US09/835,535 US83553501A US7107215B2 US 7107215 B2 US7107215 B2 US 7107215B2 US 83553501 A US83553501 A US 83553501A US 7107215 B2 US7107215 B2 US 7107215B2
- Authority
- US
- United States
- Prior art keywords
- language
- phonetics
- transcribe
- phonetic
- well
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
Definitions
- the present invention relates generally to the field of controlling a computer dictation application using multi-gender human voice instead of a keyboard. More specifically, the present invention is related to determining a compact model to transcribe the Arabic language acoustically in a well-defined basic phonetic study.
- Phonetics as defined by the Merriam-Webster® dictionary (Collegiate 10th ed.), is a system of speech sounds of a language or group of languages, and further comprises the study and systematic classification of the sounds made in spoken utterance. Hence, the phonetic system represents the practical application of this science to language study. An important part of phonetics is phonemes.
- Phonemes as defined by Merriam-Webster® dictionary (Collegiate 10th ed.), are abstract units of the phonetics system (associated with a particular language) that correspond to a group of speech sounds. For example, velar
- Allophone as defined by Merriam-Webster® dictionary (Collegiate 10 th ed.), is one of two or more variants of the same phoneme.
- of spin are allophones of the phoneme
- Orthography is another system associated with the sounds of a given language.
- Orthography as defined by Merriam-Webster® dictionary (Collegiate 10 th ed.), is the representation of the sounds of a language by letters and diacritics.
- a diacritic is further defined as a mark near or through an orthographic or phonetic character or combination of characters indicating a phonetic value different from that given the unmarked or otherwise marked element.
- An example of a diacritic is the acute accents of résumé, which are added to the letter e to indicate a special phonetic value.
- Graphemes are the set of units of a writing system (as letters and letter combinations) that represent a phoneme.
- Geminated graphemes are a sequence of identical speech sounds (as in meanness or Italian notte).
- ASR automatic speech recognition
- the present invention provides for a method and a system for developing a compact model to transcribe the Arabic language acoustically based on a well-defined basic phonetic study.
- the compact model is accomplished in the present invention by reducing the set of phonemes.
- Table 4 represents the minimized set used in the dictation system. More specifically, Arabic words, provided as examples in Tables 1 and 4, illustrate that in the instance of gemination, only one grapheme (and not a doubled one) is used, while it is still doubled phonemically. It is also clear in the case of vowels; that is, while there are almost six degrees of vowels in table 1, and in table 4 there are only three. Hence, the difference in pronunciation is not taken into account in the written text. Accordingly, the present invention provides for a set of phonemes to be used by Arabic dictation software capable of automatic speech recognition.
- FIG. 1 illustrates the 1993 version of the International Phonetic Alphabet.
- FIG. 2 illustrates the method associated with the preferred embodiment of the present invention for determining a compact model to transcribe the Arabic language acoustically based on a well-defined basic phonetic study.
- FIG. 3 illustrates in further detail the data extraction step of FIG. 2 .
- FIG. 4 illustrates the composition of the maximal set described in the method of FIG. 2 .
- FIG. 5 illustrates the various kinds of phonemes.
- FIG. 6 illustrations the reduction of maximal set for the text to speech system and the automatic speech recognition sets in a preferred embodiment of the invention.
- ASR automatic speech recognition
- a basic phonetic study A general description of such a study is starts with identifying a language on which a basic phonetic study needs to be performed, any material related to the phonology and phonetics of the identified language is collected (or alternatively extracted from a database over a network). This provides for an overview of the phonetic structure of the identified language. Furthermore, technological problems and transcription problems associated with the language are identified. For example, literature in Arabic phonetics uses the terms “emphatic”, “pharyngealized”, and “velarized”, which exhibit clear differences that mark their uniqueness. Addtionally, it is necessary to interpret the symbols in the literature and find a mapping to a single and more recent phonetic alphabet based on feature description rather than symbol shapes.
- the International Phonetic Alphabet was used in conjunction with this invention.
- the IPA as defined by the International Phonetic Association (http://www.arts.gla.ac.uk/IPA/ipa.html) is a standard set of symbols for transcribing the sounds of spoken languages.
- the above mentioned website provides for a full chart of IPA symbols as reproduced in FIG. 1 .
- charts for consonants, vowels, tones and accents, suprasegmentals, diacritics and other symbols are also provided.
- the last version of the IPA dates to 1993, as shown in FIG. 1 .
- a structured table is constructed with the following information: i) all phonemes of the language, ii) all allophones of the language and their relation to the phonemes, iii) a preliminary set of rules governing the selection of allophones, iv) a set of examples, and v) the most common representation of the sounds using Roman letters.
- FIG. 2 illustrates the method 100 associated with the preferred embodiment of the present invention for determining a compact model to transcribe the Arabic language acoustically (based on a well-defined basic phonetic study).
- a language for which a compact model is to be developed is identified 102 .
- information regarding the identified language is extracted or collected 104 .
- Data extraction can be accomplished via a variety means including, but not limited to: extracting data regarding the Arabic language via a network (such as the Internet, Local Area Network (LAN), Wide Area Network (WAN) or database (local or remote).
- a list is created where the phonological and phonetic units are defined 106 .
- the variations in the Arabic language are identified 108 .
- a maximal set is created that contains all phonemes, allophones, and transliteration symbols associated with the Arabic language 110 .
- Transliteration refers to the process of representing or spelling a word (in a first language) in the characters of another alphabet (second language).
- the maximal set is reduced 112 to provide for a compact set to transcribe the Arabic language acoustically. The details of the reduction step are explained in detail in the following sections.
- terminological problems are identified 202 . Certain terms that have been used by several phonological linguists in their attempt to define and describe the nature of various Arabic sounds have proved invalid; i.e. whereas few linguists may include phonemes like /F 7 /,/R 7 /, and /X/ into the category of Emphatics , others may include them in the category of pharyngeals. As a result of this non-final consensus, the most appropriate category depending upon their influence on the neighboring vowels was selected.
- transcription problems associated with the language in question e.g., Arabic
- IPA In contrast to what the IPA exhibits in using special symbols (ASCII characters) which might cause technical problems if used in the present system; the transcription set was limited to include the characters which can be typed easily on the keyboard. Furthermore, phonological and phonetic units were extracted or collected 206 and a feature set was established based on this information 208 . Next, a representative symbol for the transcription alphabet is selected 210 and a structured source is built 212 .
- Our structured source consists of Phonemes, which are divided into three main units: Consonants, Vowels and Semi-Vowels.
- the unit “Consonants” includes a variety of Allophones and Geminations. Aallophones may have their own gemination variety.
- the unit “Vowels” has a variety of allophones only, while the unit “Semi-vowel” has just gemination variety. The features of these units are determined according to three conditions: Place, manner of articulation in addition to the nature of the sound being voiced or voiceless.
- FIG. 4 illustrates the composition of the maximal set described in step 110 of FIG. 2 .
- Maximal set 300 comprises (but is not limited to): phonemes 302 , allophones 304 , a set of rules governing the selection of allophones 306 , a set of examples 308 , and the transliteration symbols 310 .
- phonemes 302 phonemes 302
- allophones 304 a set of rules governing the selection of allophones 306
- a set of examples 308 a set of examples 308
- transliteration symbols 310 the preferred language of this application is Arabic, one skilled in the art could extend the present invention to cover other similar languages. A detailed description of the Arabic phonetic study as per the present invention is given below.
- Arabic dialects Natively learned varieties that are used in informal situations and in the everyday communication of a geographically defined community.
- Arabic letters need to be transliterated, in other words, they need to be represented by Roman alphabets in such a way that there is a one-to-one mapping between the two character systems. There is a need to not only transliterate characters, but diacritics also. Therefore, Arabic distinctive phonetic groups were created. For example, as illustrated in FIG. 4 :
- Pharyngeal phonemes like /t % K/, /D % K/, and /d % K/ were created.
- Arabic language has a more distinctive syllabification and lexical stress system than any other language.
- a maximal set is created, that contains all phonemes, the allophones of the language, a preliminary set of rules governing the selection of allophones, a set of examples, and a transliteration symbols.
- all the phonemes and allophones with which any given text message can be conveyed is found. For example, i) all the allophones for the vowels are identified; ii) allophones that represent any borrowed word in Arabic are identified, and iii) in the case of gemination, add symbols to represent the phoneme when it is geminated. Thus, geminated phonemes represented by doubling the original symbol, are represented by a new symbol.
- the automatic speech recognition (ASR) set is less in number than the text to speech (TTS) set, thereby reducing the memory consumption in the resident computer system enabling easier storage of the compact set of phonetics.
- the present invention may be implemented on a conventional computing equipment, a multi-nodal system (e.g. LAN) or networking system (e.g. Internet, WWW, wireless web). All programming and data related thereto are stored in computer memory, static or dynamic, and may be retrieved by the user in any of: conventional computer storage, display (i.e. CRT) and/or hardcopy (i.e. printed) formats.
- the programming of the present invention may be implemented by one of skill in the art of automatic speech recognition (ASR).
- ASR automatic speech recognition
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/835,535 US7107215B2 (en) | 2001-04-16 | 2001-04-16 | Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study |
EP02008591A EP1251490A1 (fr) | 2001-04-16 | 2002-04-16 | Modèle phonetique compact pour la reconnaissance des langues arabes |
CA002384518A CA2384518A1 (fr) | 2001-04-16 | 2002-04-16 | Determination d'un modele compact pour transcrire la langue arabe verbalement dans le cadre d'une etude phonetique de base bien definie |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/835,535 US7107215B2 (en) | 2001-04-16 | 2001-04-16 | Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study |
Publications (2)
Publication Number | Publication Date |
---|---|
US20030040909A1 US20030040909A1 (en) | 2003-02-27 |
US7107215B2 true US7107215B2 (en) | 2006-09-12 |
Family
ID=25269758
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/835,535 Expired - Fee Related US7107215B2 (en) | 2001-04-16 | 2001-04-16 | Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study |
Country Status (3)
Country | Link |
---|---|
US (1) | US7107215B2 (fr) |
EP (1) | EP1251490A1 (fr) |
CA (1) | CA2384518A1 (fr) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030191626A1 (en) * | 2002-03-11 | 2003-10-09 | Yaser Al-Onaizan | Named entity translation |
US20050033565A1 (en) * | 2003-07-02 | 2005-02-10 | Philipp Koehn | Empirical methods for splitting compound words with application to machine translation |
US20050054322A1 (en) * | 2001-04-04 | 2005-03-10 | Elsey Nicholas J. | Technique for effectively communicating travel directions |
US20050228643A1 (en) * | 2004-03-23 | 2005-10-13 | Munteanu Dragos S | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US20050234701A1 (en) * | 2004-03-15 | 2005-10-20 | Jonathan Graehl | Training tree transducers |
US20060015320A1 (en) * | 2004-04-16 | 2006-01-19 | Och Franz J | Selection and use of nonstatistical translation components in a statistical machine translation framework |
US20060195312A1 (en) * | 2001-05-31 | 2006-08-31 | University Of Southern California | Integer programming decoder for machine translation |
US20070033001A1 (en) * | 2005-08-03 | 2007-02-08 | Ion Muslea | Identifying documents which form translated pairs, within a document collection |
US20070094169A1 (en) * | 2005-09-09 | 2007-04-26 | Kenji Yamada | Adapter for allowing both online and offline training of a text to text system |
US7389222B1 (en) | 2005-08-02 | 2008-06-17 | Language Weaver, Inc. | Task parallelization in a text-to-text system |
US7430503B1 (en) * | 2004-08-24 | 2008-09-30 | The United States Of America As Represented By The Director, National Security Agency | Method of combining corpora to achieve consistency in phonetic labeling |
US7974833B2 (en) | 2005-06-21 | 2011-07-05 | Language Weaver, Inc. | Weighted system of expressing language information using a compact notation |
US20110275037A1 (en) * | 2010-05-07 | 2011-11-10 | King Abdulaziz City For Science And Technology | System and method of transliterating names between different languages |
US20120035928A1 (en) * | 2002-12-16 | 2012-02-09 | Nuance Communications, Inc. | Speaker adaptation of vocabulary for speech recognition |
US8214196B2 (en) | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US8234106B2 (en) | 2002-03-26 | 2012-07-31 | University Of Southern California | Building a translation lexicon from comparable, non-parallel corpora |
US8380486B2 (en) | 2009-10-01 | 2013-02-19 | Language Weaver, Inc. | Providing machine-generated translations and corresponding trust levels |
US8433556B2 (en) | 2006-11-02 | 2013-04-30 | University Of Southern California | Semi-supervised training for statistical word alignment |
US8468149B1 (en) | 2007-01-26 | 2013-06-18 | Language Weaver, Inc. | Multi-lingual online community |
US8548794B2 (en) | 2003-07-02 | 2013-10-01 | University Of Southern California | Statistical noun phrase translation |
US8600728B2 (en) | 2004-10-12 | 2013-12-03 | University Of Southern California | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US8615389B1 (en) | 2007-03-16 | 2013-12-24 | Language Weaver, Inc. | Generation and exploitation of an approximate language model |
US8676563B2 (en) | 2009-10-01 | 2014-03-18 | Language Weaver, Inc. | Providing human-generated and machine-generated trusted translations |
US8694303B2 (en) | 2011-06-15 | 2014-04-08 | Language Weaver, Inc. | Systems and methods for tuning parameters in statistical machine translation |
US8825466B1 (en) | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
US8831928B2 (en) | 2007-04-04 | 2014-09-09 | Language Weaver, Inc. | Customizable machine translation service |
US8886515B2 (en) | 2011-10-19 | 2014-11-11 | Language Weaver, Inc. | Systems and methods for enhancing machine translation post edit review processes |
US8886518B1 (en) | 2006-08-07 | 2014-11-11 | Language Weaver, Inc. | System and method for capitalizing machine translated text |
US8886517B2 (en) | 2005-06-17 | 2014-11-11 | Language Weaver, Inc. | Trust scoring for language translation systems |
US8942973B2 (en) | 2012-03-09 | 2015-01-27 | Language Weaver, Inc. | Content page URL translation |
US8943080B2 (en) | 2006-04-07 | 2015-01-27 | University Of Southern California | Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections |
US8990064B2 (en) | 2009-07-28 | 2015-03-24 | Language Weaver, Inc. | Translating documents based on content |
US9122674B1 (en) | 2006-12-15 | 2015-09-01 | Language Weaver, Inc. | Use of annotations in statistical machine translation |
US9152622B2 (en) | 2012-11-26 | 2015-10-06 | Language Weaver, Inc. | Personalized machine translation via online adaptation |
US9213694B2 (en) | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US10319252B2 (en) | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9122655B2 (en) * | 2004-11-15 | 2015-09-01 | International Business Machines Corporation | Pre-translation testing of bi-directional language display |
KR100739726B1 (ko) * | 2005-08-30 | 2007-07-13 | 삼성전자주식회사 | 문자열 매칭 방법 및 시스템과 그 방법을 기록한 컴퓨터판독 가능한 기록매체 |
KR101265263B1 (ko) * | 2006-01-02 | 2013-05-16 | 삼성전자주식회사 | 발음 기호를 이용한 문자열 매칭 방법 및 시스템과 그방법을 기록한 컴퓨터 판독 가능한 기록매체 |
US20080300861A1 (en) * | 2007-06-04 | 2008-12-04 | Ossama Emam | Word formation method and system |
US8719016B1 (en) | 2009-04-07 | 2014-05-06 | Verint Americas Inc. | Speech analytics system and system and method for determining structured speech |
WO2010125736A1 (fr) * | 2009-04-30 | 2010-11-04 | 日本電気株式会社 | Dispositif de création de modèle de langage, procédé de création de modèle de langage et support d'enregistrement lisible par ordinateur |
US8672681B2 (en) * | 2009-10-29 | 2014-03-18 | Gadi BenMark Markovitch | System and method for conditioning a child to learn any language without an accent |
US8473280B2 (en) * | 2010-08-06 | 2013-06-25 | King Abdulaziz City for Science & Technology | System and methods for cost-effective bilingual texting |
US9966064B2 (en) | 2012-07-18 | 2018-05-08 | International Business Machines Corporation | Dialect-specific acoustic language modeling and speech recognition |
WO2015161493A1 (fr) * | 2014-04-24 | 2015-10-29 | Motorola Solutions, Inc. | Procédé et appareil servant à améliorer le trille alvéolaire |
US10002543B2 (en) * | 2014-11-04 | 2018-06-19 | Knotbird LLC | System and methods for transforming language into interactive elements |
US20170148341A1 (en) * | 2015-11-25 | 2017-05-25 | David A. Boulton | Methodology and system for teaching reading |
CN111008824B (zh) * | 2019-12-13 | 2023-08-18 | 陇东学院 | 基于网络与地理信息技术多学科融合的科研求助管理系统 |
US11783813B1 (en) * | 2021-05-02 | 2023-10-10 | Abbas Rafii | Methods and systems for improving word discrimination with phonologically-trained machine learning models |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5758023A (en) * | 1993-07-13 | 1998-05-26 | Bordeaux; Theodore Austin | Multi-language speech recognition system |
US5930754A (en) * | 1997-06-13 | 1999-07-27 | Motorola, Inc. | Method, device and article of manufacture for neural-network based orthography-phonetics transformation |
US5933804A (en) * | 1997-04-10 | 1999-08-03 | Microsoft Corporation | Extensible speech recognition system that provides a user with audio feedback |
US5953701A (en) | 1998-01-22 | 1999-09-14 | International Business Machines Corporation | Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence |
US6016470A (en) | 1997-11-12 | 2000-01-18 | Gte Internetworking Incorporated | Rejection grammar using selected phonemes for speech recognition system |
US6347298B2 (en) * | 1998-12-16 | 2002-02-12 | Compaq Computer Corporation | Computer apparatus for text-to-speech synthesizer dictionary reduction |
US6363342B2 (en) * | 1998-12-18 | 2002-03-26 | Matsushita Electric Industrial Co., Ltd. | System for developing word-pronunciation pairs |
US6434521B1 (en) * | 1999-06-24 | 2002-08-13 | Speechworks International, Inc. | Automatically determining words for updating in a pronunciation dictionary in a speech recognition system |
US20020143543A1 (en) * | 2001-03-30 | 2002-10-03 | Sudheer Sirivara | Compressing & using a concatenative speech database in text-to-speech systems |
US6490557B1 (en) * | 1998-03-05 | 2002-12-03 | John C. Jeppesen | Method and apparatus for training an ultra-large vocabulary, continuous speech, speaker independent, automatic speech recognition system and consequential database |
US6546369B1 (en) * | 1999-05-05 | 2003-04-08 | Nokia Corporation | Text-based speech synthesis method containing synthetic speech comparisons and updates |
US6738738B2 (en) * | 2000-12-23 | 2004-05-18 | Tellme Networks, Inc. | Automated transformation from American English to British English |
-
2001
- 2001-04-16 US US09/835,535 patent/US7107215B2/en not_active Expired - Fee Related
-
2002
- 2002-04-16 CA CA002384518A patent/CA2384518A1/fr not_active Abandoned
- 2002-04-16 EP EP02008591A patent/EP1251490A1/fr not_active Ceased
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5758023A (en) * | 1993-07-13 | 1998-05-26 | Bordeaux; Theodore Austin | Multi-language speech recognition system |
US5933804A (en) * | 1997-04-10 | 1999-08-03 | Microsoft Corporation | Extensible speech recognition system that provides a user with audio feedback |
US5930754A (en) * | 1997-06-13 | 1999-07-27 | Motorola, Inc. | Method, device and article of manufacture for neural-network based orthography-phonetics transformation |
US6016470A (en) | 1997-11-12 | 2000-01-18 | Gte Internetworking Incorporated | Rejection grammar using selected phonemes for speech recognition system |
US5953701A (en) | 1998-01-22 | 1999-09-14 | International Business Machines Corporation | Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence |
US6490557B1 (en) * | 1998-03-05 | 2002-12-03 | John C. Jeppesen | Method and apparatus for training an ultra-large vocabulary, continuous speech, speaker independent, automatic speech recognition system and consequential database |
US6347298B2 (en) * | 1998-12-16 | 2002-02-12 | Compaq Computer Corporation | Computer apparatus for text-to-speech synthesizer dictionary reduction |
US6363342B2 (en) * | 1998-12-18 | 2002-03-26 | Matsushita Electric Industrial Co., Ltd. | System for developing word-pronunciation pairs |
US6546369B1 (en) * | 1999-05-05 | 2003-04-08 | Nokia Corporation | Text-based speech synthesis method containing synthetic speech comparisons and updates |
US6434521B1 (en) * | 1999-06-24 | 2002-08-13 | Speechworks International, Inc. | Automatically determining words for updating in a pronunciation dictionary in a speech recognition system |
US6738738B2 (en) * | 2000-12-23 | 2004-05-18 | Tellme Networks, Inc. | Automated transformation from American English to British English |
US20020143543A1 (en) * | 2001-03-30 | 2002-10-03 | Sudheer Sirivara | Compressing & using a concatenative speech database in text-to-speech systems |
Non-Patent Citations (3)
Title |
---|
Mouri-Beji, F., "A Statistical Model for an Automatic Procedure to Comprss a Word Transcription Dictionary", Proceedings, Advances in Pattern Recognition, 1037-1044 (1998). |
Selouni, S. etal., Recognition of Arabic Phonetic Features Using Neural Networks and Knowledge Based System: å Comparative Study, IEEE 404-411 (1998). |
Shultz, T., et al., "Multilingual and Crosslingual Speech Recognition", Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop 1-4 (1998). |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050054322A1 (en) * | 2001-04-04 | 2005-03-10 | Elsey Nicholas J. | Technique for effectively communicating travel directions |
US7493101B2 (en) * | 2001-04-04 | 2009-02-17 | Grape Technology Group, Inc. | Text to speech conversion method |
US20060195312A1 (en) * | 2001-05-31 | 2006-08-31 | University Of Southern California | Integer programming decoder for machine translation |
US8214196B2 (en) | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US7249013B2 (en) * | 2002-03-11 | 2007-07-24 | University Of Southern California | Named entity translation |
US7580830B2 (en) * | 2002-03-11 | 2009-08-25 | University Of Southern California | Named entity translation |
US20030191626A1 (en) * | 2002-03-11 | 2003-10-09 | Yaser Al-Onaizan | Named entity translation |
US20080114583A1 (en) * | 2002-03-11 | 2008-05-15 | University Of Southern California | Named entity translation |
US8234106B2 (en) | 2002-03-26 | 2012-07-31 | University Of Southern California | Building a translation lexicon from comparable, non-parallel corpora |
US8731928B2 (en) * | 2002-12-16 | 2014-05-20 | Nuance Communications, Inc. | Speaker adaptation of vocabulary for speech recognition |
US8417527B2 (en) * | 2002-12-16 | 2013-04-09 | Nuance Communications, Inc. | Speaker adaptation of vocabulary for speech recognition |
US20120035928A1 (en) * | 2002-12-16 | 2012-02-09 | Nuance Communications, Inc. | Speaker adaptation of vocabulary for speech recognition |
US7711545B2 (en) | 2003-07-02 | 2010-05-04 | Language Weaver, Inc. | Empirical methods for splitting compound words with application to machine translation |
US8548794B2 (en) | 2003-07-02 | 2013-10-01 | University Of Southern California | Statistical noun phrase translation |
US20050033565A1 (en) * | 2003-07-02 | 2005-02-10 | Philipp Koehn | Empirical methods for splitting compound words with application to machine translation |
US20050234701A1 (en) * | 2004-03-15 | 2005-10-20 | Jonathan Graehl | Training tree transducers |
US7698125B2 (en) | 2004-03-15 | 2010-04-13 | Language Weaver, Inc. | Training tree transducers for probabilistic operations |
US8296127B2 (en) | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US20050228643A1 (en) * | 2004-03-23 | 2005-10-13 | Munteanu Dragos S | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US8977536B2 (en) | 2004-04-16 | 2015-03-10 | University Of Southern California | Method and system for translating information with a higher probability of a correct translation |
US8666725B2 (en) | 2004-04-16 | 2014-03-04 | University Of Southern California | Selection and use of nonstatistical translation components in a statistical machine translation framework |
US20060015320A1 (en) * | 2004-04-16 | 2006-01-19 | Och Franz J | Selection and use of nonstatistical translation components in a statistical machine translation framework |
US7430503B1 (en) * | 2004-08-24 | 2008-09-30 | The United States Of America As Represented By The Director, National Security Agency | Method of combining corpora to achieve consistency in phonetic labeling |
US8600728B2 (en) | 2004-10-12 | 2013-12-03 | University Of Southern California | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US8886517B2 (en) | 2005-06-17 | 2014-11-11 | Language Weaver, Inc. | Trust scoring for language translation systems |
US7974833B2 (en) | 2005-06-21 | 2011-07-05 | Language Weaver, Inc. | Weighted system of expressing language information using a compact notation |
US7389222B1 (en) | 2005-08-02 | 2008-06-17 | Language Weaver, Inc. | Task parallelization in a text-to-text system |
US7813918B2 (en) | 2005-08-03 | 2010-10-12 | Language Weaver, Inc. | Identifying documents which form translated pairs, within a document collection |
US20070033001A1 (en) * | 2005-08-03 | 2007-02-08 | Ion Muslea | Identifying documents which form translated pairs, within a document collection |
US20070094169A1 (en) * | 2005-09-09 | 2007-04-26 | Kenji Yamada | Adapter for allowing both online and offline training of a text to text system |
US7624020B2 (en) | 2005-09-09 | 2009-11-24 | Language Weaver, Inc. | Adapter for allowing both online and offline training of a text to text system |
US10319252B2 (en) | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US8943080B2 (en) | 2006-04-07 | 2015-01-27 | University Of Southern California | Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections |
US8886518B1 (en) | 2006-08-07 | 2014-11-11 | Language Weaver, Inc. | System and method for capitalizing machine translated text |
US8433556B2 (en) | 2006-11-02 | 2013-04-30 | University Of Southern California | Semi-supervised training for statistical word alignment |
US9122674B1 (en) | 2006-12-15 | 2015-09-01 | Language Weaver, Inc. | Use of annotations in statistical machine translation |
US8468149B1 (en) | 2007-01-26 | 2013-06-18 | Language Weaver, Inc. | Multi-lingual online community |
US8615389B1 (en) | 2007-03-16 | 2013-12-24 | Language Weaver, Inc. | Generation and exploitation of an approximate language model |
US8831928B2 (en) | 2007-04-04 | 2014-09-09 | Language Weaver, Inc. | Customizable machine translation service |
US8825466B1 (en) | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
US8990064B2 (en) | 2009-07-28 | 2015-03-24 | Language Weaver, Inc. | Translating documents based on content |
US8676563B2 (en) | 2009-10-01 | 2014-03-18 | Language Weaver, Inc. | Providing human-generated and machine-generated trusted translations |
US8380486B2 (en) | 2009-10-01 | 2013-02-19 | Language Weaver, Inc. | Providing machine-generated translations and corresponding trust levels |
US10984429B2 (en) | 2010-03-09 | 2021-04-20 | Sdl Inc. | Systems and methods for translating textual content |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US20110275037A1 (en) * | 2010-05-07 | 2011-11-10 | King Abdulaziz City For Science And Technology | System and method of transliterating names between different languages |
US8433557B2 (en) * | 2010-05-07 | 2013-04-30 | Technology Development Center, King Abdulaziz City For Science And Technology | System and method of transliterating names between different languages |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
US8694303B2 (en) | 2011-06-15 | 2014-04-08 | Language Weaver, Inc. | Systems and methods for tuning parameters in statistical machine translation |
US8886515B2 (en) | 2011-10-19 | 2014-11-11 | Language Weaver, Inc. | Systems and methods for enhancing machine translation post edit review processes |
US8942973B2 (en) | 2012-03-09 | 2015-01-27 | Language Weaver, Inc. | Content page URL translation |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US10402498B2 (en) | 2012-05-25 | 2019-09-03 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US9152622B2 (en) | 2012-11-26 | 2015-10-06 | Language Weaver, Inc. | Personalized machine translation via online adaptation |
US9213694B2 (en) | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
Also Published As
Publication number | Publication date |
---|---|
EP1251490A1 (fr) | 2002-10-23 |
CA2384518A1 (fr) | 2002-10-16 |
US20030040909A1 (en) | 2003-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7107215B2 (en) | Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study | |
El-Imam | Phonetization of Arabic: rules and algorithms | |
Young-Scholten et al. | The role of orthographic input in second language German: Evidence from naturalistic adult learners’ production | |
Masmoudi et al. | A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition. | |
JP2001296880A (ja) | 固有名の複数のもっともらしい発音を生成する方法および装置 | |
Chan et al. | Development of a Cantonese-English code-mixing speech corpus. | |
Alsharhan et al. | Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions | |
Ashour | Major differences between Arabic and English pronunciation systems: a contrastive analysis study | |
Alotaibi et al. | Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR | |
Oo et al. | Burmese speech corpus, finite-state text normalization and pronunciation grammars with an application to text-to-speech | |
Al-Anzi et al. | Synopsis on Arabic speech recognition | |
Salor et al. | Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition | |
Iyanda et al. | Development of a Yorúbà Textto-Speech System Using Festival | |
Marasek et al. | Multi-level annotation in SpeeCon Polish speech database | |
TWI759003B (zh) | 語音辨識模型的訓練方法 | |
Dropuljić et al. | Development of acoustic model for Croatian language using HTK | |
Reddy et al. | Kannada phonemes to speech dictionary: statistical approach | |
Nagoev et al. | Phonetic-acoustic database of highly accented Russian speech | |
Hernández-Mena et al. | Creating a grammar-based speech recognition parser for Mexican Spanish using HTK, compatible with CMU Sphinx-III system | |
Garcia et al. | A bisaya text-to-speech (TTS) system utilizing rulebased algorithm and concatenative speech synthesis | |
Zitouni et al. | OrienTel: speech-based interactive communication applications for the mediterranean and the Middle East | |
Silamu et al. | HMM-based uyghur continuous speech recognition system | |
JP2001282098A (ja) | 外国語学習装置、外国語学習方法および媒体 | |
Sultanmuradovna | BOOK OF ENGLISH PHONETICS | |
Upadhyay et al. | Garhwali speech database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SAKHR SOFTWARE COMPANY, EGYPT Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GHALI, MIKHAIL E.;REEL/FRAME:012057/0701 Effective date: 20010426 |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20140912 |