EP1271469A1 - Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole - Google Patents
Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole Download PDFInfo
- Publication number
- EP1271469A1 EP1271469A1 EP01115216A EP01115216A EP1271469A1 EP 1271469 A1 EP1271469 A1 EP 1271469A1 EP 01115216 A EP01115216 A EP 01115216A EP 01115216 A EP01115216 A EP 01115216A EP 1271469 A1 EP1271469 A1 EP 1271469A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- features
- anyone
- acoustical
- synthesizing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
Definitions
- the present invention relates to a method for generating personality patterns and to a method for synthesizing speech.
- man-machine dialogue systems to ensure an easy and reliable use by a human user.
- These man-machine dialogue systems are enabled to receive and consider users' utterances, in particular orders and/or inquiries, and to react and respond in an appropriate way.
- current speech synthesis systems involved in such man-machine dialogue systems suffer from a lack of personality and naturalness.
- the systems are enabled to deal with the context of the situation in an appropriate way, the prepared and output speech of the dialogue system often sounds monotonically, machine-like, and not embedded into the particular situation.
- the object is achieved by a method for generating personality patterns, in particular for synthesizing speech, with the features of claim 1. Furtheron, the object is achieved by a method for synthesizing speech according to the characterizing features of claim 11.
- a system and a computer program product for carrying out the inventive methods are the subject-matter of claims 14 and 15, respectively. Preferred embodiments of the inventive methods are within the scope of the dependent subclaims.
- a speech input is received and/or preprocessed.
- acoustical and/or non-acoustical speech features are extracted.
- a personality pattern is generated and/or stored.
- online input speech and/or speech of a speech data base for at least one given speaker are used for receiving said speech input.
- a speech data base enables a system involving the inventive method to generate the personality patterns in advance of an application. That means that, before the system is applied for example in an speech synthesizing unit, a speech model for a single speaker or for a variety of speakers can be constructed.
- the personality patterns during the application in a speech synthesizing unit in a real time or online manner, so as to adapt a speech output generated in a dialogue system during the application and/or during the dialogue with the user.
- pitch Within the class of prosodic features, pitch, pitch range, intonation attitude, loudness, speaking rate, phone duration, speech element duration features, and or the like can be employed.
- voice quality features phonation type, articulation manner, voice timbre features, and/or the like can be employed.
- contextual features and/or the like may be important in accordance to a further advantageous embodiment of the present invention.
- syntactical, grammatical, semantical features, and/or the like can be used as contextual features.
- a process of speech recognition is preferably carried out within the inventive method.
- a process of speaker identification and/or adaptation can be performed, in particular so as to increase the matching rate of the feature extraction and/or of the recognition rate of the process of speech recognition.
- the inventive method for synthesizing speech in particular for a man-machine dialogue system, the inventive method for generating personality patterns is employed.
- the method for generating personality patterns is essentially carried out in a preprocessing step, in particular based on a speech data base or the like.
- the method for generating personality patterns can be carried out and/or continued in a continuous, real time, or online manner. This enables a system involving said method for synthesizing speech to adapt its speech output in accordance to the received input during the dialogue.
- Both of the methods for generating personality patterns and/or for synthesizing speech can be configured to create a personality pattern or a speech output which is in some sense complementary to the personality pattern or character assigned to the speaker of the speech input. That means, for instance, that in the case of an emergency call system for activating ambulance or fire alarm services the speaker of the speech input might be excited and/or confused. It might therefore be necessary to calm down the speaking person and this can be achieved by creating a personality pattern for the speech synthesis reflecting a strong and confident and safe character. Additionally, it might also be possible to construct personality patterns for the synthesized speech output which reflects a gender which is complementary to the gender of the speaker of the speech input, i. e. in the case of a male speaker, the system might respond as a female speaker so as to make the dialogue most convenient for the speaking person.
- a computer program product comprising computer program means which is adapted to perform and/or to realize the inventive method for generating personality patterns and/or for synthesizing speech and/or the steps thereof when it is executed on a computer, a digital signal processing means, and/or the like.
- both his relevant voice quality features and his speech itself - as described by any units, such as words, syllables, diphones, sentences, and/or the like - is automatically extracted according to the invention. Also information about preferred sentence structure and word usage are extracted and used to create a speech synthesis system with those characteristics in a completely unsupervised way.
- the proposed methods can be used to mimic the actual speaker talking to the device but also to equip the device with some different personalities, e. g. gathered from the speaking style of famous people, movie stars, or the like. This can be very attractive for potential customers.
- the proposed system can be used not only to mimic speaker's behavior but more generally to control the dialogue depending on changing speaking style and emotions of the human partner.
- the collection of features describing the speaker's personality can be done on different levels during the conversation of the human by a dialogue unit.
- the speech signal has to be recorded and segmented into phones, diphones, and/or into other speech units or speech elements in dependence on the speech synthesis method used in the system.
- Prosodic features like pitch, pitch range, attitude of sentence intonation (monotonous or effected), loudness, speaking rate, durations of phones, and/or the like can be collected to characterize the speaker's prosody.
- Voice quality features like phonation type, articulation manner, voice timbre, and/or the like can be automatically extracted from the collected speech data.
- Speaker identification or a speaker identification module are necessary for a proper function of the system.
- the system can also collect all the words recognized from the adherences spoken by the speaker and to generate and evaluate statistics on the usage. This can be used to find the most frequent phrases, words used by a given speaker, and/or the like. Also syntactic information gathered from the recognized phrases can enhance the quality of personality description.
- the dialogue system can adjust parameters and units of acoustic output - for example the synthesized waveforms or the like - and modes of text generation to suite the recognized speaker's characteristic.
- the parameterized personality can be stored for future use or can be preprogrammed in the dialogue device.
- the information can be used to recognize speakers and to change the personality of the system depending on the user's preference or mood, for example in case of a system with a built-in emotion recognition engine.
- the personality can be changed according to the user's wish, preprogrammed sequence or depending on changing speaker's style and emotions of the speaker.
- the main advantage of such a system is the possibility to adapt the dialogue to the given speaker, make the dialogue more attractive, and/or the like.
- the possibility to mimic certain speakers or to switch between different personalities or speaking styles can be very entertaining and attractive for the user.
- FIG. 1 shows a preferred embodiment of the inventive method for a synthesizing speech employing an embodiment of the inventive method for generating personality pattern from a given received speech input SI.
- step S1 speech input S1 is received.
- a first section S10 of the inventive method for synthesizing speech non-acoustic features are extracted from the received speech input SI.
- acoustical features are extracted from the received speech input SI.
- the sections S10 and S20 can be performed parallely or sequentially on a given device or apparatus.
- a speech input S1 for extracting non-acoustical features from the speech input S1 in a first step S11, speech parameters are extracted from said speech input SI.
- the speech input S1 is fed into a speech recognizer to analyze the content and the context of the received speech input SI.
- contextual features are extracted from said speech input S1, in particular syntactical, semantical, grammatical, and statistical information on particular speech elements are obtained.
- the second section S20 of the inventive method for synthesizing speech consists of three steps S21, S22, and S23 to be performed independently from each other.
- prosodic features are extracted from the received speech input SI.
- Said prosodic feature may comprise features of pitch, pitch range, intonation attitude, loudness, speaking rate, speech element duration, and/or the like.
- voice quality features are extracted from the given received speech input SI, for instance phonation type, articulation manner, voice timbre features, and/or the like.
- the non-acoustical features and the acoustical features obtained from sections S10 and S20 are merged in a following postprocessing step S30 to detect, model, and store a personality pattern PP for the given speaker.
- the data describing the personality pattern PP for the current speaker are fed into a following step S40 which includes the steps of speech synthesis, text generation, and dialogue managing from which a responsive speech output SO is generated and then output in a final step S50.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01115216A EP1271469A1 (fr) | 2001-06-22 | 2001-06-22 | Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01115216A EP1271469A1 (fr) | 2001-06-22 | 2001-06-22 | Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1271469A1 true EP1271469A1 (fr) | 2003-01-02 |
Family
ID=8177799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01115216A Withdrawn EP1271469A1 (fr) | 2001-06-22 | 2001-06-22 | Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1271469A1 (fr) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004068466A1 (fr) * | 2003-01-24 | 2004-08-12 | Voice Signal Technologies, Inc. | Procede et appareil de synthese prosodique mimetique |
WO2005081508A1 (fr) * | 2004-02-17 | 2005-09-01 | Voice Signal Technologies, Inc. | Procedes et appareil de personnalisation remplacable d'interfaces multimodales integrees |
EP2147429A1 (fr) * | 2007-05-24 | 2010-01-27 | Microsoft Corporation | Dispositif basé sur la personnalité |
US7873390B2 (en) | 2002-12-09 | 2011-01-18 | Voice Signal Technologies, Inc. | Provider-activated software for mobile communication devices |
WO2014024399A1 (fr) * | 2012-08-10 | 2014-02-13 | Casio Computer Co., Ltd. | Dispositif de commande de reproduction de contenu, procédé de commande de reproduction de contenu et programme associé |
US9363378B1 (en) | 2014-03-19 | 2016-06-07 | Noble Systems Corporation | Processing stored voice messages to identify non-semantic message characteristics |
US9865281B2 (en) | 2015-09-02 | 2018-01-09 | International Business Machines Corporation | Conversational analytics |
CN110751940A (zh) * | 2019-09-16 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | 一种生成语音包的方法、装置、设备和计算机存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
WO1999012324A1 (fr) * | 1997-09-02 | 1999-03-11 | Jack Hollins | Systeme de conversation au moyen de langage naturel simulant la voix de personnalites connues et active par une carte de telephone |
US6144938A (en) * | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
-
2001
- 2001-06-22 EP EP01115216A patent/EP1271469A1/fr not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
WO1999012324A1 (fr) * | 1997-09-02 | 1999-03-11 | Jack Hollins | Systeme de conversation au moyen de langage naturel simulant la voix de personnalites connues et active par une carte de telephone |
US6144938A (en) * | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
Non-Patent Citations (2)
Title |
---|
JANET E. CAHN: "The Generation of Affect in Synthesized Speech", JOURNAL OF THE AMERICAN VOICE I/O SOCIETY, vol. 8, July 1990 (1990-07-01), pages 1 - 19, XP002183399, Retrieved from the Internet <URL:http://www.media.mit.edu/~cahn/masters-thesis.htm> [retrieved on 20011120] * |
KLASMEYER ET AL: "The perceptual importance of selected voice quality parameters", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 1997. ICASSP-97., 1997 IEEE INTERNATIONAL CONFERENCE ON MUNICH, GERMANY 21-24 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 21 April 1997 (1997-04-21), pages 1615 - 1618, XP010226301, ISBN: 0-8186-7919-0 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7873390B2 (en) | 2002-12-09 | 2011-01-18 | Voice Signal Technologies, Inc. | Provider-activated software for mobile communication devices |
WO2004068466A1 (fr) * | 2003-01-24 | 2004-08-12 | Voice Signal Technologies, Inc. | Procede et appareil de synthese prosodique mimetique |
US8768701B2 (en) | 2003-01-24 | 2014-07-01 | Nuance Communications, Inc. | Prosodic mimic method and apparatus |
CN1742321B (zh) * | 2003-01-24 | 2010-08-18 | 语音信号科技公司 | 韵律模仿合成方法和装置 |
WO2005081508A1 (fr) * | 2004-02-17 | 2005-09-01 | Voice Signal Technologies, Inc. | Procedes et appareil de personnalisation remplacable d'interfaces multimodales integrees |
US8285549B2 (en) | 2007-05-24 | 2012-10-09 | Microsoft Corporation | Personality-based device |
US8131549B2 (en) * | 2007-05-24 | 2012-03-06 | Microsoft Corporation | Personality-based device |
AU2008256989B2 (en) * | 2007-05-24 | 2012-07-19 | Microsoft Technology Licensing, Llc | Personality-based device |
EP2147429A4 (fr) * | 2007-05-24 | 2011-10-19 | Microsoft Corp | Dispositif basé sur la personnalité |
EP2147429A1 (fr) * | 2007-05-24 | 2010-01-27 | Microsoft Corporation | Dispositif basé sur la personnalité |
WO2014024399A1 (fr) * | 2012-08-10 | 2014-02-13 | Casio Computer Co., Ltd. | Dispositif de commande de reproduction de contenu, procédé de commande de reproduction de contenu et programme associé |
US9363378B1 (en) | 2014-03-19 | 2016-06-07 | Noble Systems Corporation | Processing stored voice messages to identify non-semantic message characteristics |
US9865281B2 (en) | 2015-09-02 | 2018-01-09 | International Business Machines Corporation | Conversational analytics |
US9922666B2 (en) | 2015-09-02 | 2018-03-20 | International Business Machines Corporation | Conversational analytics |
US11074928B2 (en) | 2015-09-02 | 2021-07-27 | International Business Machines Corporation | Conversational analytics |
CN110751940A (zh) * | 2019-09-16 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | 一种生成语音包的方法、装置、设备和计算机存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7355306B2 (ja) | 機械学習を利用したテキスト音声合成方法、装置およびコンピュータ読み取り可能な記憶媒体 | |
KR100811568B1 (ko) | 대화형 음성 응답 시스템들에 의해 스피치 이해를 방지하기 위한 방법 및 장치 | |
Shichiri et al. | Eigenvoices for HMM-based speech synthesis. | |
US7966186B2 (en) | System and method for blending synthetic voices | |
JP4884212B2 (ja) | 音声合成装置 | |
US20200251104A1 (en) | Content output management based on speech quality | |
JPH10507536A (ja) | 言語認識 | |
JP5507260B2 (ja) | 発話音声プロンプトを作成するシステム及び技法 | |
WO2007148493A1 (fr) | Dispositif de reconnaissance d'émotion | |
CA2167200A1 (fr) | Systeme de reconnaissance vocale multilangue | |
EP1280137B1 (fr) | Procédé de reconnaissance du locuteur | |
JP2006517037A (ja) | 韻律的模擬語合成方法および装置 | |
JP2011186143A (ja) | ユーザ挙動を学習する音声合成装置、音声合成方法およびそのためのプログラム | |
EP1271469A1 (fr) | Procédé de génération de caractéristiques de personnalité et procédé de synthèse de la parole | |
Levinson et al. | Speech synthesis in telecommunications | |
O'Shaughnessy | Modern methods of speech synthesis | |
US20230148275A1 (en) | Speech synthesis device and speech synthesis method | |
Creer et al. | Building personalized synthetic voices for individuals with dysarthria using the HTS toolkit | |
US20230146945A1 (en) | Method of forming augmented corpus related to articulation disorder, corpus augmenting system, speech recognition platform, and assisting device | |
JP3706112B2 (ja) | 音声合成装置及びコンピュータプログラム | |
Westall et al. | Speech technology for telecommunications | |
Carlson | Synthesis: Modeling variability and constraints | |
Nthite et al. | End-to-End Text-To-Speech synthesis for under resourced South African languages | |
Houidhek et al. | Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic | |
KR102116014B1 (ko) | 음성인식엔진과 성대모사용음성합성엔진을 이용한 화자 성대모사시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
AKX | Designation fees paid | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: 8566 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20030703 |