EP1668629A1 - Letter to sound conversion for synthesized pronounciation of a text segment - Google Patents
Letter to sound conversion for synthesized pronounciation of a text segmentInfo
- Publication number
- EP1668629A1 EP1668629A1 EP04784356A EP04784356A EP1668629A1 EP 1668629 A1 EP1668629 A1 EP 1668629A1 EP 04784356 A EP04784356 A EP 04784356A EP 04784356 A EP04784356 A EP 04784356A EP 1668629 A1 EP1668629 A1 EP 1668629A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- sub
- word
- text
- words
- speech synthesis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000006243 chemical reaction Methods 0.000 title description 4
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 21
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000013507 mapping Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
Definitions
- the present invention relates generally to Text-To-Speech (TTS) synthesis.
- the invention is particularly useful for letter to sound conversion for synthesized pronunciation of a text segment.
- BACKGROUND OF THE INVENTION Text to Speech (TTS) conversion often referred to as concatenated text to speech synthesis, allows electronic devices to receive an input text string and provide a converted representation of the string in the form of synthesized speech.
- TTS Text to Speech
- a device that may be required to synthesize speech originating from a non-deterministic number of received text strings will have difficulty in providing high quality realistic synthesized speech.
- a method for text to speech synthesis including: receiving a text string and selecting at least one word therefrom; segmenting the word into a sub-words the sub-words forming a sub-word sequence with at least one of sub-words comprising at least two letters; identifying phonemes for the sub-words; concatenating the phonemes into a phoneme sequence; and performing speech synthesis on the phoneme sequence.
- the sub-word sequence is determined by analysis of possible sub-words that could comprise the word
- each one of the possible sub-words has an associated predefined weight.
- the sub-words with the maximum combined weights that form the selected word are chosen to provide the sub-word sequence.
- the sub-word sequence is suitably determined from analysis of a Direct Acyclic Graph.
- the identifying phonemes use a phoneme identifier table comprising a phonemes corresponding to at least one said sub-word.
- the identifier table also comprises a position relevance indicator that indicates the relevance of the position of the sub-word in the word. There may also suitably be a phoneme weight associated with the position relevance indicator.
- Fig. 1 is a schematic block diagram of an electronic device in accordance with the present invention
- Fig. 2 is flow diagram illustrating a method for text to speech synthesis
- Fig. 3 illustrates a Direct Acyclic Graph (DAG)
- Fig 4 is part of a mapping table that maps symbols with phonemes
- Fig 5 is part of a phoneme identifier table
- Fig 6 is part of a vowel pair table.
- an electronic device 100 in the form of a radio-telephone, comprising a device processor 102 operatively coupled by a bus 103 to a user interface 104 that is typically a touch screen or alternatively a display screen and keypad.
- the electronic device 100 also has an utterance corpus 106, a speech synthesizer 110, Non Volatile memory 120, Read Only Memory 118 and Radio communications module 116 all operatively coupled to the processor 102 by the bus 103.
- the speech synthesizer 110 has an output coupled to drive a speaker 112.
- the corpus 106 includes representations of words or phonemes and associated sampled, digitized and processed utterance waveforms PUWs.
- the Non Volatile memory 120 in use for Text-To-Speech (TTS) synthesis (the text may be received by module 116 or otherwise).
- the waveform utterance corpus comprises sampled and digitized utterance waveforms in the form of phonemes and stress/emphasis of prosodic features.
- the radio frequency communications unit 116 is typically a combined receiver and transmitter having a common antenna.
- the radio frequency communications unit 116 has a transceiver coupled to antenna via a radio frequency amplifier.
- the transceiver is also coupled to a combined modulator/demodulator that couples the communications unit 116 to the processor 102.
- the non- volatile memory 120 stores a user programmable phonebook database Db and Read Only Memory 118 stores operating code (OC) for device processor 102
- a step 220 of receiving a text string TS from the memory 120 is performed.
- the text string TS may have originated from a text message received by module 116 or by any other means.
- Step 230 provides for selecting at least one word from the text string TS and a segmenting step 240 provides for Segmenting the word into sub-words the sub- words forming a sub-word sequence with at least one of sub-words comprising at least two letters.
- An identifying step 250 then provides for identifying phonemes for the sub-words.
- a concatenating step 260 then provides for concatenating the phonemes into a phoneme sequence.
- the sub-word sequence is determined by analysis of all possible sub-words that could comprise the selected word. For instance, referring briefly to the Direct Acyclic Graph (DAG) of Fig. 3, if the selected word was "mention”, then the Direct Acyclic Graph DAG is constructed with all possible sub-words that could comprise the selected word "mention”. With each sub-word a pre-defined weight WT is provided, for example as shown the sub-word "ment”, “men” and “tion” have respective weights 88, 86 and 204.
- DAG Direct Acyclic Graph
- the concatenating step 260 traverses the DAG and selects the sub-words with the maximum combined (summed) weights WT that form the selected word. In the case for the word "mention” the sub-words "men” and “tion” would be selected.
- the step 250 of identifying phonemes uses two tables, stored in memory 120, one table part of which is illustrated in Fig. 4 is a mapping table
- the other table is a phoneme identifier table PIT as part of which is illustrated in FIG. 5.
- the phoneme identifier table PIT comprises a sub-word field; phoneme weight field; position relevance field(s) or indicators; and a phoneme identifier field(s).
- the first line is aa 120 A_C, where aa is the sub- word; 120 is the phoneme weight, the letter A is the position relevance and "C" is the phoneme identifier corresponding to the sub-word aa.
- the position relevance may be labeled as: A with the meaning relevant for all positions; I with the meaning relevant for sub-words at the beginning of a word; M with the meaning relevant for sub-words in the middle of a word; and F with the meaning relevant for sub-words at the end of a word.
- short morpheme-like string is always preferable. For instance, the word seeing will be segmented as s ee
- affix If one short string is a prefix or suffix of a long string, we add its occurring time to the long string; but other sub-strings are not being considered. ambiguity
- one morpheme-like string can correspond to multiple phoneme strings; for instance, en can pronounce as ehn and axn.
- the morpheme-like string can correspond to more than one phoneme string.
- we choose the phoneme string with maximal occurring time and calculate the ratio r as follows: r ⁇ max ⁇ N llk ⁇ (3) where u is the string index while k is the position index, if r ⁇ ⁇ ( ⁇ is a threshold, ⁇ 0.7) , we exclude this morpheme- like string.
- the method 200 next effects a step 265 of performing stress or emphasis assignment on the phonemes that represent vowels.
- This step 265 identifies vowels from the suitably identified phonemes identified in the pervious step 250. Essentially, this step 265 searches a relative strength/weakness vowel pair table stored in the memory 120. Part of this vowel pair table is illustrated in Fig 6.
- the stress weights are determined by using a training lexicon. Each entry in this lexicon has word format and its corresponding pronunciation, including stress, syllable boundary and letter-to-phoneme alignment. So based on this lexicon, stress was determined by statistical analysis. In this regard, stress reflects strong/weak relationship between vowels. To generate required data, statistical analysis for all entries in the lexicon were therefore conducted. Specifically, within the scope of a word, if vowel vf is stressed, VJ is unstressed, we assign one point for the pair (v , ) and zero point for pair (VJ,V ). If both are unstressed, the point is also zero.
- a test step 270 is then performed to determine if there are any more words in the text string TS that need to be processed. If yes then the method 200 returns to step 230, otherwise a performing speech synthesis on the phoneme sequence is effected at a performing step 280.
- the performing speech synthesis is effected by the synthesizer 110 on the phoneme sequence for each of the words.
- the method 200 then ends at an end step 290.
- the stress primary, secondary or no stress as appropriate
- the vowels is also used to provide an improved synthesized speech quality by appropriate stress emphasis.
- the present invention improves or at least alleviates sounds and vowel stress/emphasis depending on other adjacent letters and position in a text segment to be synthesized.
- the detailed description provides a preferred exemplary embodiment only, and is not intended to limit the scope, applicability, or configuration of the invention. Rather, the detailed description of the preferred exemplary embodiment provides those skilled in the art with an enabling description for implementing preferred exemplary embodiment of the invention. It should be understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB031327095A CN1308908C (en) | 2003-09-29 | 2003-09-29 | Transformation from characters to sound for synthesizing text paragraph pronunciation |
PCT/US2004/030468 WO2005034083A1 (en) | 2003-09-29 | 2004-09-17 | Letter to sound conversion for synthesized pronounciation of a text segment |
Publications (3)
Publication Number | Publication Date |
---|---|
EP1668629A1 true EP1668629A1 (en) | 2006-06-14 |
EP1668629A4 EP1668629A4 (en) | 2007-01-10 |
EP1668629B1 EP1668629B1 (en) | 2009-03-11 |
Family
ID=34398362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04784356A Ceased EP1668629B1 (en) | 2003-09-29 | 2004-09-17 | Letter-to-sound conversion for synthesized pronunciation of a text segment |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1668629B1 (en) |
KR (1) | KR100769032B1 (en) |
CN (1) | CN1308908C (en) |
DE (1) | DE602004019949D1 (en) |
RU (1) | RU2320026C2 (en) |
WO (1) | WO2005034083A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8234116B2 (en) | 2006-08-22 | 2012-07-31 | Microsoft Corporation | Calculating cost measures between HMM acoustic models |
KR100935014B1 (en) * | 2008-01-29 | 2010-01-06 | 고려대학교 산학협력단 | Method for prediction of symptom corresponding to analysis of coloring patterns in art therapy assessment and medium of recording its program |
US9472182B2 (en) * | 2014-02-26 | 2016-10-18 | Microsoft Technology Licensing, Llc | Voice font speaker and prosody interpolation |
RU2606312C2 (en) * | 2014-11-27 | 2017-01-10 | Роман Валерьевич Мещеряков | Speech synthesis device |
CN105895075B (en) * | 2015-01-26 | 2019-11-15 | 科大讯飞股份有限公司 | Improve the method and system of synthesis phonetic-rhythm naturalness |
CN105895076B (en) * | 2015-01-26 | 2019-11-15 | 科大讯飞股份有限公司 | A kind of phoneme synthesizing method and system |
RU2692051C1 (en) | 2017-12-29 | 2019-06-19 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for speech synthesis from text |
CN109002454B (en) * | 2018-04-28 | 2022-05-27 | 陈逸天 | Method and electronic equipment for determining spelling partition of target word |
CN109376358B (en) * | 2018-10-25 | 2021-07-16 | 陈逸天 | Word learning method and device based on historical spelling experience and electronic equipment |
US12094447B2 (en) | 2018-12-13 | 2024-09-17 | Microsoft Technology Licensing, Llc | Neural text-to-speech synthesis with multi-level text information |
CN112786002B (en) * | 2020-12-28 | 2022-12-06 | 科大讯飞股份有限公司 | Voice synthesis method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0689192A1 (en) * | 1994-06-22 | 1995-12-27 | International Business Machines Corporation | A speech synthesis system |
US6064960A (en) * | 1997-12-18 | 2000-05-16 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
US20020184030A1 (en) * | 2001-06-04 | 2002-12-05 | Hewlett Packard Company | Speech synthesis apparatus and method |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5748840A (en) * | 1990-12-03 | 1998-05-05 | Audio Navigation Systems, Inc. | Methods and apparatus for improving the reliability of recognizing words in a large database when the words are spelled or spoken |
KR100236961B1 (en) * | 1997-07-23 | 2000-01-15 | 정선종 | Method for word grouping by its vowel-consonant structure |
US6347295B1 (en) * | 1998-10-26 | 2002-02-12 | Compaq Computer Corporation | Computer method and apparatus for grapheme-to-phoneme rule-set-generation |
JP2002535728A (en) * | 1999-01-05 | 2002-10-22 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Speech recognition device including sub-word memory |
KR100373329B1 (en) * | 1999-08-17 | 2003-02-25 | 한국전자통신연구원 | Apparatus and method for text-to-speech conversion using phonetic environment and intervening pause duration |
US6634300B2 (en) * | 2000-05-20 | 2003-10-21 | Baker Hughes, Incorporated | Shaped charges having enhanced tungsten liners |
US8744835B2 (en) * | 2001-03-16 | 2014-06-03 | Meaningful Machines Llc | Content conversion method and apparatus |
US7143353B2 (en) * | 2001-03-30 | 2006-11-28 | Koninklijke Philips Electronics, N.V. | Streaming video bookmarks |
-
2003
- 2003-09-29 CN CNB031327095A patent/CN1308908C/en not_active Expired - Fee Related
-
2004
- 2004-09-17 KR KR1020067006095A patent/KR100769032B1/en active IP Right Grant
- 2004-09-17 DE DE602004019949T patent/DE602004019949D1/en not_active Expired - Lifetime
- 2004-09-17 WO PCT/US2004/030468 patent/WO2005034083A1/en active Application Filing
- 2004-09-17 EP EP04784356A patent/EP1668629B1/en not_active Ceased
- 2004-09-17 RU RU2006114705/09A patent/RU2320026C2/en not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0689192A1 (en) * | 1994-06-22 | 1995-12-27 | International Business Machines Corporation | A speech synthesis system |
US6064960A (en) * | 1997-12-18 | 2000-05-16 | Apple Computer, Inc. | Method and apparatus for improved duration modeling of phonemes |
US20020184030A1 (en) * | 2001-06-04 | 2002-12-05 | Hewlett Packard Company | Speech synthesis apparatus and method |
Non-Patent Citations (4)
Title |
---|
BULYKO I ET AL: "Efficient integrated response generation from multiple targets using weighted finite state transducers" COMPUTER SPEECH AND LANGUAGE, ELSEVIER, LONDON, GB, vol. 16, no. 3-4, July 2002 (2002-07), pages 533-550, XP004418857 ISSN: 0885-2308 * |
JON R W YI ET AL: "A FLEXIBLE, SCALABLE FINITE-STATE TRANSDUCER ARCHITECTURE FOR CORPUS-BASED CONCATENATIVE SPEECH SYNTHESIS1" INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP), vol. 3, 16 October 2000 (2000-10-16), pages 322-325, XP007011011 * |
MACCHI M: "Issues in text-to-speech synthesis" INTELLIGENCE AND SYSTEMS, 1998. PROCEEDINGS., IEEE INTERNATIONAL JOINT SYMPOSIA ON ROCKVILLE, MD, USA 21-23 MAY 1998, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 21 May 1998 (1998-05-21), pages 318-325, XP010288887 ISBN: 0-8186-8548-4 * |
See also references of WO2005034083A1 * |
Also Published As
Publication number | Publication date |
---|---|
RU2320026C2 (en) | 2008-03-20 |
CN1604184A (en) | 2005-04-06 |
KR20060056404A (en) | 2006-05-24 |
CN1308908C (en) | 2007-04-04 |
DE602004019949D1 (en) | 2009-04-23 |
EP1668629A4 (en) | 2007-01-10 |
KR100769032B1 (en) | 2007-10-22 |
WO2005034083A1 (en) | 2005-04-14 |
EP1668629B1 (en) | 2009-03-11 |
RU2006114705A (en) | 2007-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1668631A1 (en) | Identifying natural speech pauses in a text string | |
KR100769033B1 (en) | Method for synthesizing speech | |
US6505158B1 (en) | Synthesis-based pre-selection of suitable units for concatenative speech | |
EP1168299B1 (en) | Method and system for preselection of suitable units for concatenative speech | |
JP4473193B2 (en) | Mixed language text speech synthesis method and speech synthesizer | |
US6029132A (en) | Method for letter-to-sound in text-to-speech synthesis | |
WO2005059894A1 (en) | Multi-lingual speech synthesis | |
EP1668629B1 (en) | Letter-to-sound conversion for synthesized pronunciation of a text segment | |
KR100593757B1 (en) | Foreign language studying device for improving foreign language studying efficiency, and on-line foreign language studying system using the same | |
KR20150105075A (en) | Apparatus and method for automatic interpretation | |
JPH05143093A (en) | Method and apparatus for forming model of uttered word | |
EP1668630B1 (en) | Improvements to an utterance waveform corpus | |
JP3655808B2 (en) | Speech synthesis apparatus, speech synthesis method, portable terminal device, and program recording medium | |
JP2000056789A (en) | Speech synthesis device and telephone set | |
JP3366253B2 (en) | Speech synthesizer | |
JP3626398B2 (en) | Text-to-speech synthesizer, text-to-speech synthesis method, and recording medium recording the method | |
JP2015060038A (en) | Voice synthesizer, language dictionary correction method, language dictionary correction computer program | |
KR200412740Y1 (en) | Foreign language studying device for improving foreign language studying efficiency, and on-line foreign language studying system using the same | |
JPH09237096A (en) | Kanji (chinese character) explaining method and device | |
JP5301376B2 (en) | Speech synthesis apparatus and program | |
Görmez et al. | TTTS: Turkish text-to-speech system | |
CN114327090A (en) | Japanese input method and related device and equipment | |
KR20050006936A (en) | Method of selective prosody realization for specific forms in dialogical text for Korean TTS system | |
JP2006284700A (en) | Voice synthesizer and voice synthesizing processing program | |
Gakuru | Development of a kenyan english text to speech system: A method of developing a TTS for a previously undefined english dialect |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060323 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB IT |
|
DAX | Request for extension of the european patent (deleted) | ||
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB IT |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20061212 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 13/06 20060101AFI20061206BHEP |
|
17Q | First examination report despatched |
Effective date: 20070918 |
|
RTI1 | Title (correction) |
Free format text: LETTER-TO-SOUND CONVERSION FOR SYNTHESIZED PRONUNCIATION OF A TEXT SEGMENT |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB IT |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 602004019949 Country of ref document: DE Date of ref document: 20090423 Kind code of ref document: P |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20091214 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20101014 AND 20101020 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 13 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20171229 Year of fee payment: 14 Ref country code: DE Payment date: 20171130 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20171228 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20171228 Year of fee payment: 14 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 602004019949 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20180917 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190402 Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180917 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180917 |