JPH09152884A

JPH09152884A - Speech synthesizing device

Info

Publication number: JPH09152884A
Application number: JP7312222A
Authority: JP
Inventors: Hiroyuki Fujimoto; 博之藤本; Kazuya Sako; 和也佐古; Shoji Fujimoto; 昇治藤本; Ikue Takahashi; 育恵高橋
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1995-11-30
Filing date: 1995-11-30
Publication date: 1997-06-10

Abstract

PROBLEM TO BE SOLVED: To learn the speech pattern of a user and synthesize a natural voice by providing a speech recognition part and a feature parameter storage part which learns speech information and updates feature parameters. SOLUTION: A phoneme dictionary part 3 of a feature parameter storage part which learns speech information from the speech recognition part 7 and updates feature parameters of a voice to be synthesized is provided with a phoneme dictionary update part 3A as a RAM which stores phonemes that a user generates from a microphone 6. Further, a rhythm rule part 4 is provided with a rhythm rule update part 4 which stores the intonation and accent of the voice that the user generates from the microphone 6 through a filter bank 8. Then the voice spoken for speech recognition is inputted to the speech recognition part 7 to analyze the feature parameters of the voice, which are updated when a mode wherein the parameter update is performed is entered, thereby synthesizing the voice by using the feature parameters of the user.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、文字や音声記号な
どの離散的記号で表現された系列を、連続音声に変換す
る規則音声合成装置に関し、特にユーザの音声パターン
を学習し自然な音声合成を可能にする音声合成装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a regular speech synthesizer for converting a sequence represented by discrete symbols such as letters and phonetic symbols into continuous speech, and more particularly to learning a user's speech pattern to synthesize a natural speech. The present invention relates to a speech synthesizer that enables

【０００２】[0002]

【従来の技術】図７は従来の音声合成装置の一例を示す
図である。なお、全図を通じて同一の構成要素には同一
の参照番号又は記号を付して示す。本図に示すように、
音声合成装置は、車両に搭載され道路を案内するナビゲ
ーション部１からの文字、記号列の情報を音声に合成す
る音声合成部２と、音声の音素を音声合成部２に与える
ために記憶するためＲＯＭ(Read Only Memory)からなる
音素辞書部３と、音声のイントネーション、アクセント
等を音声合成部２に与えるために記憶するためＲＯＭか
らなる韻律ルール部４と、音声合成部２で合成された音
声を出力するスピーカ５とを具備する。音素辞書部３及
び韻律ルール部４は特徴パラメータを格納する特徴パラ
メータ格納部を構成する。2. Description of the Related Art FIG. 7 is a diagram showing an example of a conventional speech synthesizer. The same components are denoted by the same reference numerals or symbols throughout the drawings. As shown in this figure,
The voice synthesizing device is stored in the vehicle for providing the voice synthesizing unit 2 with a voice synthesizing unit 2 for synthesizing information on characters and symbol strings from a navigation unit 1 for guiding a road, which is installed in a vehicle, into a voice. A phoneme dictionary unit 3 formed of a ROM (Read Only Memory), a prosody rule unit 4 formed of a ROM for storing a voice intonation, accent, etc. for giving to the voice synthesis unit 2, and a voice synthesized by the voice synthesis unit 2. And a speaker 5 for outputting. The phoneme dictionary unit 3 and the prosody rule unit 4 constitute a characteristic parameter storage unit that stores characteristic parameters.

【０００３】ここに、音素辞書部３には、母音と子音等
の音素が記憶される。図８は、韻律ルール部４に記憶さ
れるイントネーション、アクセントを示す図である。本
図に示すように、韻律ルール部４には、時間と共にピッ
チ周波数が減少するイントネーション、ピッチ周波数の
時間変化が台形をなすアクセントが記憶されている。The phoneme dictionary unit 3 stores phonemes such as vowels and consonants. FIG. 8 is a diagram showing intonations and accents stored in the prosody rule unit 4. As shown in the figure, the prosody rule section 4 stores intonation in which the pitch frequency decreases with time and accents in which the time change of the pitch frequency has a trapezoidal shape.

【０００４】韻律ルール部４には、イントネーション、
アクセントが数種設けられており、ユーザの好みに応じ
て１つを選択可能にしてある。The prosody rule section 4 has an intonation,
Several types of accents are provided, and one can be selected according to the user's preference.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記音
声合成装置では、ユーザの好みに応じたイントネーショ
ン、アクセントを選択することが可能であるが、選択の
範囲が限定されているので、ユーザにとって選択の自由
度が小さく、必ずしも十分にイントネーション、アクセ
ントがユーザの好みに適合していないとの問題がある。
また、音素も一方的に与えられるため、ユーザの好みに
適合していないという問題がある。However, in the above speech synthesizer, it is possible to select the intonation and accent according to the user's preference, but since the selection range is limited, the user can select the intonation and accent. There is a problem that the degree of freedom is small and the intonation and accent do not always match the user's preference.
Further, since phonemes are also given unilaterally, there is a problem that the phonemes are not suited to the user's preference.

【０００６】したがって、本発明は、上記問題点に鑑
み、ユーザの好みに応じてイントネーション、アクセン
ト、音素を与えることができる音声合成装置を提供する
ことを目的とする。Therefore, in view of the above problems, it is an object of the present invention to provide a speech synthesizer capable of giving intonation, accents, and phonemes according to user's preference.

【０００７】[0007]

【課題を解決するための手段】本発明は、前記問題点を
解決するために、次の構成を有する音声合成装置を提供
する。すなわち、文字、記号列を音声に合成する音声合
成装置に、音声情報に対して前記文字、記号列で応答す
るシステムと、前記音声情報を認識する音声認識部と、
前記音声を合成するための特徴パラメータを有し、前記
音声情報を学習して前記特徴パラメータの更新を行う特
徴パラメータ格納部とが設けられる。この装置により、
ユーザの発声パターンを学習して、ユーザの好みに応じ
た特徴パラメータを選択することが可能となり、自然な
音声合成を可能にする。In order to solve the above problems, the present invention provides a speech synthesizer having the following configuration. That is, a voice synthesizing device for synthesizing characters and symbol strings into speech, a system that responds to the voice information with the characters and symbol strings, and a voice recognition unit that recognizes the voice information
A feature parameter storage unit having a feature parameter for synthesizing the voice, and learning the voice information to update the feature parameter is provided. With this device,
By learning the utterance pattern of the user, it becomes possible to select the characteristic parameter according to the user's preference, which enables natural speech synthesis.

【０００８】前記特徴パラメータ格納部部では、前記音
声情報から抽出されたイントネーション及びアクセント
の韻律情報を更新する。この手段により、ユーザ発声の
韻律情報を学習して、ユーザの言語環境に応じた韻律を
持つ合成音を生成可能とする。前記特徴パラメータ格納
部に抽出されたイントネーションとアクセントを別々に
合成に使用する。この手段により、上記と同様に、ユー
ザ発声の韻律情報を学習して、ユーザの言語環境に応じ
た韻律を持つ合成音を生成可能とする。In the characteristic parameter storage section, the prosody information of intonation and accent extracted from the voice information is updated. By this means, it is possible to learn the prosody information of the user's utterance and generate a synthetic sound having a prosody according to the user's language environment. The intonation and accent extracted in the characteristic parameter storage section are separately used for synthesis. By this means, similarly to the above, it is possible to learn the prosody information of the user's utterance and generate a synthetic sound having a prosody according to the language environment of the user.

【０００９】具体的には、前記音声情報の周波数分析を
行いピッチ周波数の時間変化を基に更新すべきイントネ
ーション、アクセントの韻律が更新される。前記特徴パ
ラメータ格納部では、音素辞書のデータを前記音声情報
から抽出された音素に更新する。この手段により、ユー
ザ発声音声を登録可能とすることにより、ユーザ自身の
音声で合成が可能となり、ユーザの言語環境に応じた韻
律を持つ合成音を生成可能とする。Specifically, the frequency analysis of the voice information is performed, and the intonation and accent prosody to be updated are updated based on the time change of the pitch frequency. The feature parameter storage unit updates the phoneme dictionary data to the phonemes extracted from the voice information. By allowing the user's voice to be registered by this means, the user's own voice can be synthesized, and a synthesized voice having a prosody according to the user's language environment can be generated.

【００１０】[0010]

【発明の実施の形態】以下本発明の実施の形態について
図面を参照して説明する。図１は本発明に係る音声合成
装置を示す図である。図７の構成に対して、音声合成装
置には、マイクロフォン８と、マイクロフォン６に接続
されナビゲーション１に音声認識結果を出力する音声認
識部７と、マイクロフォン６に接続され複数のバンドパ
スフィルタ群からなりピーチ周波数の分析を行うフィル
タバンク８とが追加して設けられる。さらに、音声認識
部７からの音声情報を学習して合成すべき音声の特徴パ
ラメータの更新を行うため、特徴パラメータ格納部の音
素辞書部３には、マイクロフォン６からユーザが発生す
る音素を記憶するためのＲＡＭ(Random Access Memory)
として、音素辞書更新部３Ａを有し、また、その韻律ル
ール部４には、フィルタバンクを経由して、マイクロフ
ォン６からユーザが発生する音声のイントネーション、
アクセントを記憶するためのＲＡＭとして韻律ルール更
新部４Ａが設けられる。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a speech synthesizer according to the present invention. In contrast to the configuration of FIG. 7, the voice synthesizer includes a microphone 8, a voice recognition unit 7 connected to the microphone 6 for outputting a voice recognition result to the navigation 1, and a plurality of band pass filter groups connected to the microphone 6. Further, a filter bank 8 for analyzing the false peach frequency is additionally provided. Further, in order to learn the voice information from the voice recognition unit 7 and update the feature parameter of the voice to be synthesized, the phoneme dictionary unit 3 of the feature parameter storage unit stores the phonemes generated by the user from the microphone 6. RAM (Random Access Memory) for
Has a phoneme dictionary updating unit 3A, and the prosody rule unit 4 has an intonation of a voice generated by a user from a microphone 6 via a filter bank.
A prosody rule updating unit 4A is provided as a RAM for storing accents.

【００１１】マイクロフォン６に「近くのガソリンスタ
ンドを探して欲しい」とユーザが発声すると、音声認識
装置７では、この音声を認識して、ナビゲーション１
に、ガソリンスタンドの検索を指令する。ナビゲーショ
ン１からは、検索の結果、「１００ｍ先に○○のガソリ
ンスタンドがあります」との記号列を音声合成部２で合
成し、スピーカ５より出力される。When the user utters "I want you to find a nearby gas station" into the microphone 6, the voice recognition device 7 recognizes this voice and the navigation 1
To order a gas station search. From the navigation 1, as a result of the search, the speech synthesizer 2 synthesizes a symbol string "There is a gas station of XX in 100 meters ahead", and it is output from the speaker 5.

【００１２】なお、ナビゲーション１は一例であり、音
声認識部７からの音声情報に対して前記文字、記号列で
応答するものならば他のシステムであってもよい。図２
は図１のフィルタバンク８で分析されるピッチ周期１／
ｆを示す図である。本図に示すようにフィルタバンク８
では、音声認識装置へのユーザの発声を用い、ピッチ周
波数の時間変化を測定し、イントネーション、アクセン
トを求めて、音素辞書更新部３Ａに格納される。また、
この結果はユーザの発声速度（モーラ数／秒）、声の高
さとしても、使用される。The navigation 1 is an example, and another system may be used as long as it responds to the voice information from the voice recognition section 7 with the character and symbol strings. FIG.
Is the pitch period 1 / analyzed in the filter bank 8 of FIG.
It is a figure which shows f. As shown in this figure, the filter bank 8
Then, using the user's utterance to the voice recognition device, the time change of the pitch frequency is measured, the intonation and the accent are obtained, and stored in the phoneme dictionary updating unit 3A. Also,
This result is also used as the user's speaking speed (number of mora / second) and voice pitch.

【００１３】韻律ルール更新部４Ａには、ユーザにより
数十種類の音素の発声を特別にして貰い、その結果が格
納される。このようにして、音声認識装置を利用して、
一律的な音声合成に代わり、ユーザの発声のイントネー
ション、アクセントで合成されるので、ユーザの言語環
境に応じた韻律を持つ合成音を生成することが可能にな
る。さらに、ユーザ発声音素を登録することにより、ユ
ーザ自身の発声で合成が可能となる。In the prosody rule updating unit 4A, the user specially utters dozens of types of phonemes, and the result is stored. In this way, using the voice recognition device,
Instead of uniform voice synthesis, the intonation and accent of the user's utterance are synthesized, so that it is possible to generate a synthetic sound having a prosody according to the user's language environment. Furthermore, by registering the user uttered phonemes, it becomes possible to synthesize by the user's own utterance.

【００１４】図３は本発明の音声合成装置の一連の基本
動作を示す第１のフローチャートである。ステップＳ１
において、音声認識部７への音声入力が行われる。ステ
ップＳ２において、パラメータ更新の終了を判断する。
パラメータ更新、選択はユーザが判断する。このパラメ
ータには、複数のイントネーション、アクセント、発声
速度、声の高さ等がある。FIG. 3 is a first flowchart showing a series of basic operations of the speech synthesizer of the present invention. Step S1
At, voice input to the voice recognition unit 7 is performed. In step S2, it is determined whether the parameter update is completed.
The user determines parameter updating and selection. The parameters include a plurality of intonations, accents, vocalization rates, pitches, and the like.

【００１５】ステップＳ３において、入力音声の特徴パ
ラメータを抽出する。ステップＳ４において、合成音の
特徴パラメータの更新を行い、終了したステップＳ２の
戻る。ステップＳ５において、音声認識処理を行う。ス
テップＳ６において、ナビゲーション処理を行う。In step S3, characteristic parameters of the input voice are extracted. In step S4, the characteristic parameter of the synthetic sound is updated, and the process returns to the completed step S2. In step S5, voice recognition processing is performed. In step S6, navigation processing is performed.

【００１６】ステップＳ７において、更新されたパラメ
ータを用いて、ナビゲーションの指令について、音声合
成の処理を行う。したがって、本発明によれば、音声認
識を行うめに発声された音声を、音声認識部７に入力す
ると共に、この音声の特徴パラメータの解析を行い、こ
のとき、パラメータ更新を行うモードになっていれば、
音声の特徴パラメータの更新を行い、ユーザの特徴パラ
メータを用いた音声合成が行われる。すなわち、ユーザ
の発声パターンを学習して、より自然な音声合成を可能
にする。In step S7, a voice synthesis process is performed on the navigation command using the updated parameters. Therefore, according to the present invention, the voice uttered for voice recognition is input to the voice recognition unit 7, the characteristic parameter of the voice is analyzed, and at this time, the mode is set to update the parameter. If
The characteristic parameter of the voice is updated, and the voice synthesis is performed using the characteristic parameter of the user. That is, the user's utterance pattern is learned to enable more natural voice synthesis.

【００１７】図４は図３の変形例を示す第２のフローチ
ャートである。ステップＳ１、２、５、６、７は同じで
ある。ステップＳ８において、入力音声の韻律情報を抽
出する。ステップＳ９において、イントネーション・ル
ールの更新を行う。ステップＳ１０において、アクセン
ト・ルールの更新を行う。FIG. 4 is a second flowchart showing a modification of FIG. Steps S1, 2, 5, 6, 7 are the same. In step S8, prosody information of the input voice is extracted. In step S9, the intonation rule is updated. In step S10, the accent rule is updated.

【００１８】したがって、本発明によれば、音声認識を
行うために発声された音声を、音声認識部７に入力する
と共に、この音声の韻律情報の抽出を行う。このとき、
パラメータ更新を行うモードになっていれば、音声のイ
ントネーション・ルール及びアクセント・ルール等の韻
律ルール部４の更新を行い、更新したパラメータで音声
合成を行う。すなわち、ユーザ発声の韻律情報を学習し
て、ユーザの言語環境に応じた韻律を持つ音声合成音を
生成可能とする。Therefore, according to the present invention, the voice uttered for voice recognition is input to the voice recognition unit 7 and the prosody information of this voice is extracted. At this time,
If the mode for updating parameters is set, the prosody rule unit 4 such as the intonation rule and accent rule of the voice is updated, and voice synthesis is performed using the updated parameters. That is, by learning the prosody information of the user's utterance, it is possible to generate a synthesized voice having a prosody according to the user's language environment.

【００１９】図５は図４の変形例を示す第３のフローチ
ャートである。ステップＳ１、２、５、６、７、８は同
じである。ステップＳ１１において、イントネーション
更新あるか否かの判断を行う。この判断が「ＹＥＳ」な
らステップＳ１２に進み、「ＮＯ」ならステップＳ１３
に進む。FIG. 5 is a third flowchart showing a modification of FIG. Steps S1, 2, 5, 6, 7, 8 are the same. In step S11, it is determined whether there is an intonation update. If this determination is "YES", the process proceeds to step S12, and if "NO", the process is step S13.
Proceed to.

【００２０】ステップＳ１２において、イントネーショ
ン・ルールの更新を行う。ステップＳ１３において、ア
クセント更新あるか否かの判断を行う。この判断が「Ｙ
ＥＳ」ならステップＳ１４に進み、「ＮＯ」ならステッ
プＳ２に戻る。ステップＳ１４において、アクセント・
ルール等の韻律ルール部４の更新を行ってステップＳ２
に戻る。In step S12, the intonation rule is updated. In step S13, it is determined whether or not there is an accent update. This judgment is "Y
If "ES", the process proceeds to step S14, and if "NO", the process returns to step S2. In step S14, the accent
After updating the prosody rule part 4 such as rules, step S2
Return to

【００２１】したがって、本発明によれば、音声認識を
行うために発声された音声を、音声認識部７に入力する
と共に、この音声の韻律情報の抽出を行う。このとき、
イントネーション、アクセントが各々更新モードになっ
ていれば、それぞれイントネーション・ルール、アクセ
ント・ルール等の韻律ルール部４の更新を行い、更新し
たパラメータで音声合成を行う。すなわち、ユーザ発声
の韻律情報を学習して、ユーザの言語環境に応じた韻律
を持つ音声合成音を生成可能とする。Therefore, according to the present invention, the voice uttered for voice recognition is input to the voice recognition unit 7 and the prosody information of this voice is extracted. At this time,
If the intonation and the accent are in the update mode, the prosody rule section 4 such as the intonation rule and the accent rule are updated, and the voice synthesis is performed using the updated parameters. That is, by learning the prosody information of the user's utterance, it is possible to generate a synthesized voice having a prosody according to the user's language environment.

【００２２】図６は音素データの更新を示す第４のフロ
ーチャートである。ステップＳ１、５、６、７、は同じ
である。ステップＳ１５において、音素登録の終了を判
断する。この判断が「ＹＥＳ」ならステップＳ５に進
み、「ＮＯ」ならステップＳ１６に進む。ステップＳ１
６において、ユーザの音素の登録を行ってステップＳ１
５に戻る。FIG. 6 is a fourth flowchart showing updating of phoneme data. Steps S1, 5, 6, 7 are the same. In step S15, it is determined whether phoneme registration has ended. If this determination is "YES", the process proceeds to step S5, and if "NO", the process proceeds to step S16. Step S1
6, the phoneme of the user is registered, and step S1
Return to 5.

【００２３】したがって、本発明によれば、音声認識を
行うために発声された音声を、音声認識部７に入力する
と共に、ユーザ音声の登録モードになっていれば、音素
辞書３の更新を行い、ユーザの音声で合成を行う。すな
わち、ユーザ発声音声を登録可能とすることにより、ユ
ーザ自身の音声で合成することが可能になる。Therefore, according to the present invention, the voice uttered for voice recognition is input to the voice recognition unit 7, and the phoneme dictionary 3 is updated if the user voice registration mode is set. , Synthesize with user's voice. That is, by allowing the user's voice to be registered, it is possible to synthesize the voice with the user's own voice.

[Brief description of the drawings]

【図１】本発明に係る音声合成装置を示す図である。FIG. 1 is a diagram showing a speech synthesizer according to the present invention.

【図２】図１のフィルタバンク８で分析されるピッチ周
期１／ｆを示す図である。2 is a diagram showing a pitch period 1 / f analyzed by the filter bank 8 of FIG. 1. FIG.

【図３】本発明の音声合成装置の一連の基本動作を示す
第１のフローチャートである。FIG. 3 is a first flowchart showing a series of basic operations of the speech synthesizer of the present invention.

【図４】図３の変形例を示す第２のフローチャートであ
る。FIG. 4 is a second flowchart showing a modified example of FIG.

【図５】図４の変形例を示す第３のフローチャートであ
る。5 is a third flowchart showing a modified example of FIG.

【図６】音素データの更新を示す第４のフローチャート
である。FIG. 6 is a fourth flowchart showing updating of phoneme data.

【図７】従来の音声合成装置の一例を示す図である。FIG. 7 is a diagram showing an example of a conventional speech synthesizer.

【図８】韻律ルール部４に記憶されるイントネーショ
ン、アクセントを示す図である。8 is a diagram showing intonation and accents stored in the prosody rule unit 4. FIG.

[Explanation of symbols]

１…ナビゲーション２…音声合成部３…音素辞書部３Ａ…音素辞書更新部４…韻律ルール部４Ａ…韻律ルール更新部５…スピーカ６…マイクロホン７…音声認識部８…フィルタバンク 1 ... Navigation 2 ... Speech synthesis section 3 ... Phoneme dictionary section 3A ... Phoneme dictionary update section 4 ... Prosodic rule section 4A ... Prosodic rule update section 5 ... Speaker 6 ... Microphone 7 ... Speech recognition section 8 ... Filter bank

───────────────────────────────────────────────────── フロントページの続き (72)発明者高橋育恵兵庫県神戸市兵庫区御所通１丁目２番28号富士通テン株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Ikue Takahashi 1-2-2 Goshodori, Hyogo-ku, Kobe-shi, Hyogo Within Fujitsu Ten Limited

Claims

[Claims]

1. A voice synthesizing apparatus for synthesizing characters and symbol strings into speech, comprising: a system for responding to speech information with the characters and symbol strings; a speech recognition unit for recognizing the speech information; and synthesizing the speech. And a feature parameter storage unit that learns the voice information and updates the feature parameter.

2. The speech synthesizer according to claim 1, wherein the characteristic parameter storage unit updates the prosodic information of intonation and accent extracted from the speech information.

3. The speech synthesizer according to claim 2, wherein the intonation and the accent extracted in the characteristic parameter storage section are separately used for synthesis.

4. The speech synthesizer according to claim 2, wherein the intonation and accent prosody to be updated are updated based on the time change of the pitch frequency by performing frequency analysis of the voice information.

5. The speech synthesizer according to claim 1, wherein the characteristic parameter storage unit updates the phoneme data extracted from the speech information.