WO1997034292A1 - Method and device at speech-to-speech translation - Google Patents

Method and device at speech-to-speech translation Download PDF

Info

Publication number
WO1997034292A1
WO1997034292A1 PCT/SE1997/000205 SE9700205W WO9734292A1 WO 1997034292 A1 WO1997034292 A1 WO 1997034292A1 SE 9700205 W SE9700205 W SE 9700205W WO 9734292 A1 WO9734292 A1 WO 9734292A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
language
fundamental tone
translation
translated
Prior art date
Application number
PCT/SE1997/000205
Other languages
French (fr)
Inventor
Bertil Lyberg
Original Assignee
Telia Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telia Ab filed Critical Telia Ab
Publication of WO1997034292A1 publication Critical patent/WO1997034292A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning

Definitions

  • the present invention relates to, from a given natural speech, to produce a corresponding speech in a second language.
  • the second language is produced artificially.
  • translation is from a speech to another in different languages.
  • the complexity is higher because recognition of the first language is a difficulty in itself. More difficulties will arise if the translated speech shall be reproduced with the voice and characteristics which characterizes the original speaker.
  • the invention includes an analyzing unit which analyses the duration and the fundamental tone in the speech in the first language.
  • a prosody interpreting unit determines, on basis of the analysis and the information regarding the characteristics of the language, prosody characteristic information in the first language which is used by a prosody generating unit for the second language for control of the speech synthesis.
  • a speech synthesis device accordingly effects stresses in the in the second language translated speech which from linguistic point of view correspond to stresses in the first language.
  • the translated speech is represented by an artificial voice, the characteristics of which does not correspond to that of the first speaker.
  • an artificial voice of a speaker's verbal presentation it is important that the speaker's voice characteristics in all essentials is translated into the second language.
  • the presentation shall at that in translated sentence be correspondent in respective language.
  • the present invention relates to a method and device at speech-to-speech translation.
  • a given speech in a first language is recognized in a speech recognition equipment, A.
  • the speech recognition equipment produces a text which is transferred to a translator, B, for translation to a second language.
  • Parallelly with these procedures fundamental tone information for the first speech is produced.
  • the fundamental tone information has an effect on the prosody generation, G, which effects a text-to-speech converter, C.
  • G which effects a text-to-speech converter
  • From the text-to-speech converter a speech in a second language is obtained, the synthesis of which essentially is in accordance with the synthesis of the first language.
  • the device relates to speech-to-speech translation where a first speech is given.
  • the first speech is given in a first language.
  • the given speech is recognized and translated into a second language.
  • the fundamental tone information in the first language is translated to the second language at which the second speech is produced with a pitch and fundamental tone dynamics corresponding to that of the first speech.
  • the at this produced information will at that announce essentially the same message as the original information in the first speech.
  • the fundamental tone of the first speech is normalized and its sentence accents are extracted. This information indicates on one hand the characteristics of the speaker regarding speech, and on the other which parts in the speech that are emphasized.
  • the accents further decide which shades of the translation that can be decisive at the interpretation of the speech.
  • the normalization means that the fundamental tone variation of the speech is divided by the fundamental tone declination of the speech. From normalization of the fundamental tone curve, the dynamics of the speeech can be gathered.
  • sentence accents in the incoming speech are classified.
  • the location of said sentence accents in the second language are determined.
  • the sentence accents consequently are translated into the second language at which an accentuation corresponding to that of the first language is obtained.
  • the sentence accent information and the fundamental tone information, fundamental tone declination and fundamental tone dynamics are transferred to a prosody generator.
  • a written translation of the speech is combined with said other information. This information is after that utilized at the text-to-speech conversion at which a speech is produced in a pitch of the voice and an intonation in the second language which is well in accordance with the speech the person would have produced in the second language, at which a part of the speaker's identity is transferred.
  • the present invention allows that a speech produced by a speaker in a first language is presented with the voice characteristics of the speaker. To a listener of the translated speech this means that the experience is that the translated speech is experienced as directly spoken by the first speaker.
  • the utilization of the sentence accents of the first speech and translation of these to the second speech further implies that the characteristics of the second speech is preserved, as well as the intonation at the translation.
  • Fig. 1 shows the invention in the form of a block diagram.
  • Fig. 2 shows a diagram over the fundamental tone variations over the fundamental tone declination.
  • Fig. 3 shows a curve over the fundamental tone variation divided by the fundamental tone declination.
  • Speech recognition equipments are since before well known to the expert within the speech recognition field.
  • the fundamental functions in speech recognition equipments can be found in books as well as in periodicals.
  • a first speech, speech 1, representing speech from a person is received by a speech recognition equipment, A, which converts the speech into a text string.
  • the speech recognition equipment evaluates different interpretations which can exist with regard to the interpretation of the speech.
  • the selection of the most probable speech can be made in different ways, for instance by calculus of probability, interpretations of previous sequences in the speech, linguistic selection methods etc.
  • the text string which has been produced in the speech recognition equipment, A is after that transferred to a translator, B, which translates the given speech to a text string in the second language.
  • the fundamental characteristics of the second language is added to the speech of the translated speech.
  • the fundamental characteristics consist of normal accents and pitches in the language.
  • the person's voice characteristics is transferred to the second speech.
  • the intonation in the first language is translated into the second language to make it possible to preserve the meaning.
  • Information regarding these voice characteristics are obtained by fundamental tone extraction.
  • the fundamental tone of the speech, speech 1 is extracted in a fundamental tone extractor, D.
  • the fundamental tone is a combination of fundamental tone declination and fundamental tone variation. Fig.2.
  • the normalization means that the variation of the fundamental tone is divided by the declination of the fundamental tone, Fig.3. This information indicates the fundamental tone dynamics of the speaker in the first speech.
  • the sentence accents in the first speech is further determined.
  • the information regarding the sentence accents are transferred to a sentence accent translator, F, which also receives information regarding the translation from translator.
  • the specific sentence accents which have been identified for the first language now are translated into the second language. I.e. the sentence accents are placed in the second language with regard to the characteristics of the second language.
  • the translation of the sentence accents are after that returned to the translator for linquistic control.
  • the linguistic control includes that the accentuations are modified to the use of the second language.
  • the in this way modified text string is after that transferred to a text-to speech-converter, C, and to a prosody converter, G.
  • the prosody converter further receives information from the sentence accent translator, F, and fundamental tone information from E.
  • a prosody which is adapted to second language after that is generated.
  • the information from the prosody generator, G is after that transferred to the text-to- speech converter for generation of a speech, speech 2, the synthesis of which essentially corresponds to the synthesis of the first speech.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a method and device at speech-to-speech translation. A given speech in a first language is recognized in a speech recognition equipment (A). The speech recognition equipment produces a text which is transferred to a translator (B) for translation into a second language. Parallel to these procedures, fundamental tone information is assembled for the first speech. The fundamental tone information influences the prosody generation (G), which influences a text-to-speech converter (C). From the text-to-speech converter a speech in a second language is obtained, the synthesis of which is essentially in accordance with the synthesis of the first speech.

Description

TITLE OF THE INVENTION:
Method and device at speech-to-speech translation.
TECHNICAL FIELD
The present invention relates to, from a given natural speech, to produce a corresponding speech in a second language. The second language is produced artificially.
PRIOR.ART
Attempts to translate between different languages have previously been made. For instance there exist devices which from a given text translate between different languages. Different interpretations of a text however can occur which makes the translator's work more difficult.
Other examples of translation are from a speech to another in different languages. In this case the complexity is higher because recognition of the first language is a difficulty in itself. More difficulties will arise if the translated speech shall be reproduced with the voice and characteristics which characterizes the original speaker.
In patent document 9301596-4 a device for improved understanding of speech at artificial translation from one language into another is described. The invention includes an analyzing unit which analyses the duration and the fundamental tone in the speech in the first language. A prosody interpreting unit determines, on basis of the analysis and the information regarding the characteristics of the language, prosody characteristic information in the first language which is used by a prosody generating unit for the second language for control of the speech synthesis. A speech synthesis device accordingly effects stresses in the in the second language translated speech which from linguistic point of view correspond to stresses in the first language.
DESCRIPTION OF THE INVENTION TECHNICAL PROBLEM At translation of speech between different languages there is a wish that the characteristics of the speech in the first language is transferred to the second language at the translation. These characteristics are of vital importance for the identification of the speaker of the produced speech. If characteristics are lacking, the produced speech can on one hand be difficult to understand, and on the other give different signals in the speech respective in the characteristics of the speech. The prosodic information content of the speech shall consequently be possible to transfer with principally maintained meaning. Further, there is a wish that the voice of the original speaker shall be reproduced in a lifelike way in the second language.
Further, there is need to find methods and devices which can be used at direct translation between conversing perse.s. This can for instance relate to persons who are communicating over a telecommunications network. Other fields which need translations are for instance persons in authority, physicians etc who shall communicate with immigrants in different situations. Especially if the person with whom the communication is made, speaks a less frequent language, or if the language in itself is well known but a dialect which is difficult to understand is utilized, interpretation problems may arise. The supply of interpreters further are limited, so distance interpretation may sometimes be necessary. The interpreter can in such connections lose much information in ways of expression and body language which are of importance for the interpretation.
It is further desirable at the translation to obtain a characteristic in the translated speech which corresponds to the speaker's voice and reproduces his/her state of mind. In the devices and methods which are known, the translated speech is represented by an artificial voice, the characteristics of which does not correspond to that of the first speaker. At an artificial voice of a speaker's verbal presentation it is important that the speaker's voice characteristics in all essentials is translated into the second language. The presentation shall at that in translated sentence be correspondent in respective language. The possibilities for real identification for the person whith whom one is talking will at that increase exceedingly. The following invention intends to solve said problems.
THE SOLUTION
The present invention relates to a method and device at speech-to-speech translation. A given speech in a first language is recognized in a speech recognition equipment, A. The speech recognition equipment produces a text which is transferred to a translator, B, for translation to a second language. Parallelly with these procedures fundamental tone information for the first speech is produced. The fundamental tone information has an effect on the prosody generation, G, which effects a text-to-speech converter, C. From the text-to-speech converter a speech in a second language is obtained, the synthesis of which essentially is in accordance with the synthesis of the first language. The device relates to speech-to-speech translation where a first speech is given. The first speech is given in a first language. The given speech is recognized and translated into a second language. The fundamental tone information in the first language is translated to the second language at which the second speech is produced with a pitch and fundamental tone dynamics corresponding to that of the first speech. The at this produced information will at that announce essentially the same message as the original information in the first speech. The fundamental tone of the first speech is normalized and its sentence accents are extracted. This information indicates on one hand the characteristics of the speaker regarding speech, and on the other which parts in the speech that are emphasized. The accents further decide which shades of the translation that can be decisive at the interpretation of the speech. The normalization means that the fundamental tone variation of the speech is divided by the fundamental tone declination of the speech. From normalization of the fundamental tone curve, the dynamics of the speeech can be gathered. Further, sentence accents in the incoming speech are classified. The location of said sentence accents in the second language are determined. The sentence accents consequently are translated into the second language at which an accentuation corresponding to that of the first language is obtained. The sentence accent information and the fundamental tone information, fundamental tone declination and fundamental tone dynamics are transferred to a prosody generator. In the prosody generator a written translation of the speech is combined with said other information. This information is after that utilized at the text-to-speech conversion at which a speech is produced in a pitch of the voice and an intonation in the second language which is well in accordance with the speech the person would have produced in the second language, at which a part of the speaker's identity is transferred.
ADVANTAGES
The present invention allows that a speech produced by a speaker in a first language is presented with the voice characteristics of the speaker. To a listener of the translated speech this means that the experience is that the translated speech is experienced as directly spoken by the first speaker. The utilization of the sentence accents of the first speech and translation of these to the second speech further implies that the characteristics of the second speech is preserved, as well as the intonation at the translation. With the present invention consequently an instrument is given where a given speech at translation into a second language is given a corresponding characteristic in the second language.
By the invention is given possibility for two persons to talk to each other in their mother tongues. Use of such systems are of current interest at telecommunication, communication physician/patient etc.
DESCRIPTION OF FIGURES
Fig. 1 shows the invention in the form of a block diagram. Fig. 2 shows a diagram over the fundamental tone variations over the fundamental tone declination.
Fig. 3 shows a curve over the fundamental tone variation divided by the fundamental tone declination.
DETAILED EMBODIMENT
In the following the invention is described on the basis of the figures and the terms therein.
Speech recognition equipments are since before well known to the expert within the speech recognition field. The fundamental functions in speech recognition equipments can be found in books as well as in periodicals. A first speech, speech 1, representing speech from a person, is received by a speech recognition equipment, A, which converts the speech into a text string. The speech recognition equipment evaluates different interpretations which can exist with regard to the interpretation of the speech. The selection of the most probable speech can be made in different ways, for instance by calculus of probability, interpretations of previous sequences in the speech, linguistic selection methods etc. The text string which has been produced in the speech recognition equipment, A, is after that transferred to a translator, B, which translates the given speech to a text string in the second language. In the translator, B, the fundamental characteristics of the second language is added to the speech of the translated speech. The fundamental characteristics consist of normal accents and pitches in the language. In order to make a translated speech to give the impression that it is produced by the person in question, it is required that the person's voice characteristics is transferred to the second speech. Further is required that the intonation in the first language is translated into the second language to make it possible to preserve the meaning. Information regarding these voice characteristics are obtained by fundamental tone extraction. Parallelly with the speech recognition in A, the fundamental tone of the speech, speech 1, is extracted in a fundamental tone extractor, D. The fundamental tone is a combination of fundamental tone declination and fundamental tone variation. Fig.2. These components are separated from each other in E. A normalization of the fundamental tone after that takes place. The normalization means that the variation of the fundamental tone is divided by the declination of the fundamental tone, Fig.3. This information indicates the fundamental tone dynamics of the speaker in the first speech. The sentence accents in the first speech is further determined. The information regarding the sentence accents are transferred to a sentence accent translator, F, which also receives information regarding the translation from translator. The specific sentence accents which have been identified for the first language now are translated into the second language. I.e. the sentence accents are placed in the second language with regard to the characteristics of the second language. The translation of the sentence accents are after that returned to the translator for linquistic control. The linguistic control includes that the accentuations are modified to the use of the second language. The in this way modified text string is after that transferred to a text-to speech-converter, C, and to a prosody converter, G. The prosody converter further receives information from the sentence accent translator, F, and fundamental tone information from E. In the prosody converter a prosody which is adapted to second language after that is generated. The information from the prosody generator, G, is after that transferred to the text-to- speech converter for generation of a speech, speech 2, the synthesis of which essentially corresponds to the synthesis of the first speech.
The invention is not restricted to the above as example shown example or parts of the following patent claims but may be subject to modifications within the frame of the idea of invention.

Claims

PATENT CLAIMS
1. Method at speech-to-speech translation, where a first speech, representing a first language, is recognized and translated into a speech in a second language, c h a r a c t e r i z e d in that the fundamental tone information of the first speech is translated into the second language, and the second speech is produced with a pitch and a fundamental tone dynamics which is in accordance with the first speech. 2_. Method according to patent claim 1, c h a r a c t e r i z e d in that the fundamental tone of the first speech is normalized and that the sentence accents of the first speech are extracted.
3. Method according to patent claim 1 or 2, c h a r a c t e r i z e d in that the sentence accents are translated into the second language.
4. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that information regarding the pitch and fundamental tone dynamics of the first speech is transferred to a prosody generator.
5. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the first speech is transformed to a first text which is translated into a second text in the second language. 6. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the sentence accent translation influences the prosody presentation which influences the presentation of the second speech. 7. Method according to any of the previous patent claims, c h a r a c t e r i z e d in that the fundamental tone dynamics of the incoming voice is given by maximum of the fundamental tone variation of the first speech, divided by the fundamental tone declination of the first speech where the fundamental tone declination indicates the pitch of the first speech. 8. Device at speech-to-speech translation, where a first speech, representing a first language, is recognized and translated into a second speech in a second language, c h a r a c t e r i z e d in that the fundamental tone information of the first speech is translated into the second language, at which the second speech is produced with a pitch and a fundamental tone dynamics corresponding to the first language.
9. Device according to patent claim 8, c h a r a c t e r i z e d in that the fundamental tone of the first speech is normalized and that the sentence accents are extracted.
10. Device according to patent claim 8 or 9, c h a r a c t e r i z e d in that the sentence accent information from the first speech is translated into the second language.
11. Device according to any of the patent claims 8-10, c h a r a c t e r i z e d in that the sentence accent information is arranged to influence the translation from the first language into the second language. 12. Device according to any of the patent claims 8-11, c h a r a c t e r i z e d in that the information regarding the pitch and the fundamental tone dynamics of the first speech is transferred to a prosody generator.
13. Device according to any of the patent claims 8-12, c h a r a c t e r i z e d in that the first speech is transformed to a text in the second language in a translator.
14. Device according to any of the patent claims 8-13, c h a r a c t e r i z e d in that the prosody generator is influenced by the text and the sentence accent translation.
15. Device according to any of the patent claims 8-14, c h a r a c t e r i z e d in that the prosody generator is arranged to influence a text-to-speech converter which is arranged to produce the second speech from the text.
PCT/SE1997/000205 1996-03-13 1997-02-11 Method and device at speech-to-speech translation WO1997034292A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE9600959A SE9600959L (en) 1996-03-13 1996-03-13 Speech-to-speech translation method and apparatus
SE9600959-2 1996-03-13

Publications (1)

Publication Number Publication Date
WO1997034292A1 true WO1997034292A1 (en) 1997-09-18

Family

ID=20401770

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE1997/000205 WO1997034292A1 (en) 1996-03-13 1997-02-11 Method and device at speech-to-speech translation

Country Status (2)

Country Link
SE (1) SE9600959L (en)
WO (1) WO1997034292A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998043236A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Method of speech synthesis
WO1998043235A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Device and method for prosody generation at visual synthesis
EP1014277A1 (en) * 1998-12-22 2000-06-28 Nortel Networks Corporation Communication system and method employing automatic language identification
DE10107749A1 (en) * 2001-02-16 2002-08-29 Holger Ostermann Worldwide international communication using a modular communication arrangement with speech recognition, translation capability, etc.
WO2002084643A1 (en) * 2001-04-11 2002-10-24 International Business Machines Corporation Speech-to-speech generation system and method
ES2180392A1 (en) * 2000-09-26 2003-02-01 Crouy-Chanel Pablo Grosschmid System, device, and installation of mechanized simultaneous language interpretation
US7805307B2 (en) 2003-09-30 2010-09-28 Sharp Laboratories Of America, Inc. Text to speech conversion system
EP3491642A4 (en) * 2016-08-01 2020-04-08 Speech Morphing Systems, Inc. Method to model and transfer prosody of tags across languages
WO2021208531A1 (en) * 2020-04-16 2021-10-21 北京搜狗科技发展有限公司 Speech processing method and apparatus, and electronic device
US20220084500A1 (en) * 2018-01-11 2022-03-17 Neosapience, Inc. Multilingual text-to-speech synthesis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0624865A1 (en) * 1993-05-10 1994-11-17 Telia Ab Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language
EP0664537A2 (en) * 1993-11-03 1995-07-26 Telia Ab Method and arrangement in automatic extraction of prosodic information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0624865A1 (en) * 1993-05-10 1994-11-17 Telia Ab Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language
EP0664537A2 (en) * 1993-11-03 1995-07-26 Telia Ab Method and arrangement in automatic extraction of prosodic information

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998043235A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Device and method for prosody generation at visual synthesis
WO1998043236A3 (en) * 1997-03-25 1998-12-23 Telia Ab Method of speech synthesis
WO1998043235A3 (en) * 1997-03-25 1998-12-23 Telia Ab Device and method for prosody generation at visual synthesis
US6385580B1 (en) 1997-03-25 2002-05-07 Telia Ab Method of speech synthesis
US6389396B1 (en) 1997-03-25 2002-05-14 Telia Ab Device and method for prosody generation at visual synthesis
WO1998043236A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Method of speech synthesis
EP1014277A1 (en) * 1998-12-22 2000-06-28 Nortel Networks Corporation Communication system and method employing automatic language identification
ES2180392A1 (en) * 2000-09-26 2003-02-01 Crouy-Chanel Pablo Grosschmid System, device, and installation of mechanized simultaneous language interpretation
DE10107749A1 (en) * 2001-02-16 2002-08-29 Holger Ostermann Worldwide international communication using a modular communication arrangement with speech recognition, translation capability, etc.
WO2002084643A1 (en) * 2001-04-11 2002-10-24 International Business Machines Corporation Speech-to-speech generation system and method
US7461001B2 (en) 2001-04-11 2008-12-02 International Business Machines Corporation Speech-to-speech generation system and method
US7805307B2 (en) 2003-09-30 2010-09-28 Sharp Laboratories Of America, Inc. Text to speech conversion system
EP3491642A4 (en) * 2016-08-01 2020-04-08 Speech Morphing Systems, Inc. Method to model and transfer prosody of tags across languages
US20220084500A1 (en) * 2018-01-11 2022-03-17 Neosapience, Inc. Multilingual text-to-speech synthesis
US11769483B2 (en) * 2018-01-11 2023-09-26 Neosapience, Inc. Multilingual text-to-speech synthesis
WO2021208531A1 (en) * 2020-04-16 2021-10-21 北京搜狗科技发展有限公司 Speech processing method and apparatus, and electronic device

Also Published As

Publication number Publication date
SE9600959D0 (en) 1996-03-13
SE9600959L (en) 1997-09-14

Similar Documents

Publication Publication Date Title
EP0624865B1 (en) Arrangement for increasing the comprehension of speech when translating speech from a first language to a second language
CN112435650B (en) Multi-speaker and multi-language voice synthesis method and system
US7124082B2 (en) Phonetic speech-to-text-to-speech system and method
EP0749109A3 (en) Speech recognition for tonal languages
JP2005502102A (en) Speech-speech generation system and method
US20070088547A1 (en) Phonetic speech-to-text-to-speech system and method
JP3616250B2 (en) Synthetic voice message creation method, apparatus and recording medium recording the method
CN108364632A (en) A kind of Chinese text voice synthetic method having emotion
US20070203703A1 (en) Speech Synthesizing Apparatus
WO1997034292A1 (en) Method and device at speech-to-speech translation
EP0664537B1 (en) Method and arrangement in automatic extraction of prosodic information
CN115762471A (en) Voice synthesis method, device, equipment and storage medium
JPH0580791A (en) Device and method for speech rule synthesis
US11783813B1 (en) Methods and systems for improving word discrimination with phonologically-trained machine learning models
Smith et al. Clinical applications of speech synthesis
Banerjee et al. Prosody Labelled Dataset for Hindi
Banerjee et al. Prosody Labelled Dataset for Hindi using Semi-Automated Approach
Zovato et al. Interplay between pragmatic and acoustic level to embody expressive cues in a Text to Speech system
Rizk et al. Arabic text to speech synthesizer: Arabic letter to sound rules
Kuo et al. An NN-based approach to prosody generation for English word spelling in English-Chinese bilingual TTS
KR20240075980A (en) Voice synthesizer learning method using synthesized sounds for disentangling language, pronunciation/prosody, and speaker information
JP2001166787A (en) Voice synthesizer and natural language processing method
Islam Development of a Bangla text to speech converter
JP2578876B2 (en) Text-to-speech device
KR19980065482A (en) Speech synthesis method to change the speaking style

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP NO US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: JP

Ref document number: 97532500

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase