JPH10149188A

JPH10149188A - Text voice synthesizer

Info

Publication number: JPH10149188A
Application number: JP8310890A
Authority: JP
Inventors: Hiroyuki Fujimoto; 博之藤本; Toshitaka Yamato; 俊孝大和; Osamu Ishikawa; 修石川
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1996-11-21
Filing date: 1996-11-21
Publication date: 1998-06-02
Anticipated expiration: 2016-11-21
Also published as: JP3192981B2

Abstract

PROBLEM TO BE SOLVED: To form almost natural voice synthesization concerning limited sentence examples. SOLUTION: Concerning a text voice synthesizer for regularly synthesizing arbitrary sentences in voice, this device is provided with a word dictionary part 62 storing a lot of words and having identification attributes in partial words so as to identify them from the other words, sentence example pattern table 66 having plural sentence example patterns composed of the string of identification attributes, and control part 65 for collating a word string provided from a language processing analytic part 63 with the sentence example pattern and controlling inserted voice synthesization or regular synthesization. A device for performing the inserted voice synthesization is provided with an insert table 73 with intonation composed of conjugation for conjugating plural words and the intonations of inserted word strings and an inserted rhythm generating part 74 for generating a pitch pattern while using the intonation of inserted word string and connecting a waveform for the unit of a voice according to this pitch pattern.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はテキスト音声合成装
置に関し、特に限定された文例について自然に近い音声
合成を形成することに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech synthesizing apparatus, and more particularly to a method for forming a speech synthesis that is almost natural for a limited sentence example.

【０００２】[0002]

【従来の技術】ＦＭデータ多重放送では放送に文字情報
のデータが多重されているが、近年、車両に対して、自
動車用ナビゲーション装置に交通渋滞情報のデータを多
重するＶＩＣＳ（道路交通情報通信システム）のサービ
スが始まった。ＶＩＣＳの交通渋滞情報には地図情報と
共に文字情報が含まれる。受信機側では、この交通渋滞
情報が視覚的に表示される。2. Description of the Related Art In FM data multiplex broadcasting, character information data is multiplexed in a broadcast. In recent years, VICS (road traffic information communication system) which multiplexes traffic congestion information data into a car navigation system for a vehicle. ) Service has begun. The VICS traffic congestion information includes character information along with map information. On the receiver side, this traffic congestion information is visually displayed.

【０００３】さらに、この交通渋滞情報のうち文字情報
が視覚的に表示されるだけでなく、音声合成されて、聴
覚に訴えるようにすれば、車両の運転の見地からは好ま
しい。この交通渋滞情報の文字情報は漢字仮名混じり文
であるので、テキスト音声合成装置により音声にするこ
とが可能である。このテキスト音声合成装置では、漢字
仮名混じり文が音素記号と韻律記号とからなる表音文字
列に解析処理され、表音文字列に対して各音素の継続時
間長、イントネーションにアクセントが重畳したピッチ
パターン等の韻律が生成される。音素の波形がこの韻律
の規則を基に接続されて音声が合成される。Furthermore, it is preferable from the viewpoint of driving of a vehicle that not only the character information of the traffic congestion information be displayed visually, but also that it be synthesized by voice and appeal to hearing. Since the character information of the traffic congestion information is a sentence mixed with kanji and kana, it can be converted to voice by a text-to-speech synthesizer. In this text-to-speech synthesis apparatus, a sentence mixed with kanji kana is analyzed and processed into a phonetic character string composed of phoneme symbols and prosodic symbols, and the duration of each phoneme with respect to the phonetic character string, the pitch in which accent is superimposed on intonation A prosody such as a pattern is generated. Phoneme waveforms are connected based on the rules of this prosody to synthesize speech.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
音素の単位の間の接続規則が複雑であるために、得られ
たピッチパターンが自然なピッチパターンからはずれ、
音声も自然な音声になりにくいという問題がある。交通
情報の合成音声が不自然な音声であると、折角、交通情
報を視覚的から聴覚的な情報にしてもその利用価値を発
揮することができないことになる。However, since the connection rules between the above phoneme units are complicated, the pitch pattern obtained deviates from the natural pitch pattern,
There is a problem that it is difficult for voice to be natural voice. If the synthesized voice of the traffic information is an unnatural voice, even if the traffic information is visual to audible information, its usefulness cannot be exhibited.

【０００５】したがって、本発明は、上記問題点に鑑
み、交通情報のうち文字情報の合成音声を自然に近くす
ることができる音声合成装置を提供することを目的とす
る。SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a speech synthesizer capable of making a synthesized speech of character information in traffic information close to nature.

【０００６】[0006]

【課題を解決するための手段】本発明は、前記問題点を
解決するために、任意の文章を音声に規則合成するテキ
スト音声合成装置において、前記文章を表音文字列及び
単語識別属性列に変換する言語処理解析部と、前記表音
文字列に変換するために多数の単語を記憶し、前記単語
識別属性列を得るために単語の一部に他の単語から識別
するための識別属性を持たせた単語辞書部と、前記識別
属性の列からなる文例パターンを持つ文例パターンテー
ブルと、前記言語処理解析部から得られる単語識別属性
列が前記文例パターンテーブルの文例パターンと一致す
るかを判定し、一致する場合にははめ込み音声合成が行
われ、一致しない場合には前記言語処理解析部で得られ
た表音文字列のイントネーションを基にピッチパターン
を生成し、このピッチパターンを用いて前記規則合成が
行われるように制御を行う制御部とを備え、前記はめ込
み音声合成を行う装置は、前記識別属性の列に含まれる
複数の対応単語をはめ込んで１つの文章になるように複
数の単語を接続する複数の接続語と文章固有のイントネ
ーションとからなるはめ込み文例パターン例を複数もつ
はめ込み文例テーブルと、前記はめ込み文例テーブルに
保有するイントネーションを基にピッチパターンを生成
し、このピッチパターンを用いてはめ込みされた単語列
の表音文字列を構成する音声単位の波形を接続させるた
めのはめ込み型韻律生成部を具備することを特徴とす
る。この手段により、ＶＩＣＳのように交通文字情報の
文例が限定されている場合には、限定された文例につい
てアナウンサのイントネーションを得ることが容易であ
る。このため、言語処理解析よりも自然なイントネーシ
ョンの音声文字情報の音声が得られる。In order to solve the above-mentioned problems, the present invention provides a text-to-speech synthesizing apparatus for synthesizing an arbitrary sentence into speech in a rule-by-speech manner. A language processing analysis unit for converting, and storing a large number of words for conversion into the phonetic character string, and identifying a part of the word to obtain the word identification attribute string with an identification attribute for identifying the word from other words. A sentence example pattern table having a sentence example pattern including the word dictionary unit and the identification attribute sequence, and determining whether the word identification attribute sequence obtained from the language processing analysis unit matches the sentence example pattern in the sentence example pattern table. If they match, embedded speech synthesis is performed. If they do not match, a pitch pattern is generated based on the intonation of the phonetic character string obtained by the language processing analysis unit. And a control unit for controlling the rule synthesis using the switch pattern. The apparatus for performing the inlay speech synthesis includes a plurality of corresponding words included in the column of the identification attributes, and forms one sentence. An inset sentence example table having a plurality of inset sentence example patterns composed of a plurality of connective words that connect a plurality of words and a sentence-specific intonation, and a pitch pattern is generated based on intonations held in the inset sentence example table. It is characterized by having an inset type prosody generation unit for connecting a waveform of a speech unit constituting a phonogram string of a word string inserted using a pitch pattern. By this means, when the sentence example of the traffic character information is limited as in VICS, it is easy to obtain the intonation of the announcer for the limited sentence example. For this reason, the voice of the voice character information of a more natural intonation can be obtained than the language processing analysis.

【０００７】[0007]

【発明の実施の形態】以下本発明の実施の形態について
図面を参照して説明する。図１は本発明の実施の形態に
係るテキスト音声合成装置を示す図である。車両に搭載
されるＦＭ受信機であって、本図に示す如く、アンテナ
１に接続される受信機２からスイッチ３及びスピーカ４
を経由してＦＭ放送が出力される。受信機２のデコーダ
２Ａは受信信号からニュース等の文字データ、ＶＩＣＳ
の文字データ、地図データを分離する。表示器５はデコ
ーダ２Ａにより分離された、地図データ、文字データを
表示する。はめ込み型併用テキスト音声合成部６は文字
データを入力して従来のテキスト音声合成方式により又
ははめ込み音声合成方式により音声に合成し合成音声が
スイッチ３及びスピーカ４を経由して出力される。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a text-to-speech synthesis apparatus according to an embodiment of the present invention. An FM receiver mounted on a vehicle, as shown in the figure, a receiver 2 connected to an antenna 1, a switch 3 and a speaker 4.
, And FM broadcast is output. The decoder 2A of the receiver 2 converts character data such as news, VICS
Character data and map data are separated. The display 5 displays the map data and the character data separated by the decoder 2A. The text-to-speech synthesizing section 6 receives the character data and synthesizes it with a conventional text-to-speech synthesizing method or a voice-to-speech synthesizing method, and the synthesized voice is output via the switch 3 and the speaker 4.

【０００８】図２及び図３は図１のはめ込み型併用テキ
スト音声合成部６の構成を示す図である。図２に示す如
く、文字データの正規化部６１は漢字仮名混じり文字を
入力し正規化文字データに変換してこれを言語処理解析
部６３に出力する。変換テーブル６２は漢字仮名混じり
文を正規化文字データに変換するための変換テーブルで
あり、文字データの正規化部６１により参照される。言
語処理解析部６３は正規化文字データを入力しこれを単
語の系列にし、品詞や活用形の同定を行い、ポーズ設
定、読み付与、アクセント設定等の韻律情報等の他に、
後述する特定の単語の属性を解析して、属性及び韻律記
号付き（カナ）表音文字を出力する。単語辞書部６４は
数万から十数万語の単語を有し、文法情報、読み、アク
セント形などが記述されており、前記の特定の単語に
は、例えばＡ、Ｂ、Ｃ、Ｄ…等のように種別コードが付
与されており、言語処理解析部６３により参照される。FIGS. 2 and 3 are diagrams showing the configuration of the text-to-speech synthesizing unit 6 shown in FIG. As shown in FIG. 2, the character data normalization unit 61 inputs a character mixed with kanji kana, converts the character into normalized character data, and outputs this to the language processing analysis unit 63. The conversion table 62 is a conversion table for converting a sentence mixed with kanji and kana into normalized character data, and is referred to by the character data normalization unit 61. The language processing analysis unit 63 inputs the normalized character data, converts it into a series of words, identifies parts of speech and inflected forms, and sets prosody information such as pause setting, reading addition, accent setting, etc.
The attribute of a specific word, which will be described later, is analyzed, and the attribute and prosodic (kana) phonetic characters are output. The word dictionary unit 64 has tens of thousands to hundreds of thousands of words and describes grammatical information, readings, accent forms, and the like. The specific words include, for example, A, B, C, D, and the like. The type code is given as follows, and is referred to by the language processing analysis unit 63.

【０００９】制御部６５は文例パターンテーブル６６に
より、言語処理解析部６３から出力される属性及び韻律
記号付き（カナ）文字が規則合成用表音文字列であるか
又ははめ込みフォーマット表音文字列であるかを判定す
る。もし、規則合成用表音文字列と判定されると、通常
のテキスト音声合成の制御が以下のように行われる。す
なわち、規則合成型韻律生成部６７は規則合成用表音文
字列のポーズの位置からイントネーションを生成し、イ
ントネーションとアクセントの重畳によりピッチパター
ンを生成する。韻律テーブル６８はポーズ、アクセント
等を設定する規則が記述されており、規則合成型韻律生
成部６７により参照される。音素継続時間生成部６９は
表音文字を構成する音素の継続時間を設定して、発話の
自然なタイミング（リズム）を実現する。音素長テーブ
ル７０は、隣接音素の影響、を考慮する音素の固有の時
間長規則が記述され、音素継続時間生成部６９により参
照される。波形接続部７１は、規則合成用表音文字列を
構成する音素を接続して連続音声を合成するに際し、前
記音素継続時間、前記ピッチパターン等の特徴パラメー
タを音素間で表された補間処理が施され、補間処理され
た連続音声がＰＣＭデータとして出力される。音素辞書
部７２は音素の波形データを合成単位として記憶し波形
接続部７１により参照される。The control unit 65 uses the sentence example pattern table 66 to convert the attributes and the prosody-signed (kana) characters output from the language processing analysis unit 63 into a phonogram string for rule synthesis or a phonogram string in the embedded format. It is determined whether there is. If it is determined that the phonetic character string for rule synthesis is used, control of normal text-to-speech synthesis is performed as follows. That is, the rule synthesis type prosody generation unit 67 generates intonation from the position of the pause in the phonogram character string for rule synthesis, and generates a pitch pattern by overlapping the intonation with the accent. The prosody table 68 describes rules for setting poses, accents, etc., and is referred to by the rule synthesis type prosody generation unit 67. The phoneme duration generation unit 69 sets the duration of the phonemes constituting the phonogram to realize natural timing (rhythm) of the utterance. The phoneme length table 70 describes a unique time length rule of the phoneme in consideration of the influence of the adjacent phoneme, and is referred to by the phoneme duration generation unit 69. When connecting the phonemes constituting the rule-synthesizing phonogram string to synthesize a continuous speech, the waveform connection unit 71 performs an interpolation process in which characteristic parameters such as the phoneme duration and the pitch pattern are expressed between phonemes. The continuous sound that has been subjected to the interpolation processing is output as PCM data. The phoneme dictionary unit 72 stores the phoneme waveform data as a synthesis unit and is referred to by the waveform connection unit 71.

【００１０】次に、説明を単語辞書部６４に戻す。交通
情報の文字情報の一例は以下の如く、文型が限定されて
いる。また、視覚に訴えるように簡略化された文になっ
ている。『第二神明道路の西行きは、玉津付近で、渋滞２ｋｍ』言語処理解析部６３への正規化文字データでは、以下の
如く、「第二神明道路」は「ダイニシンメードーロ」と
カナに変換され、道路名を意味し、「西行き」は「ニシ
ユキ」とカナに変換され、方向を意味し、「玉津」は
「タマツ」とカナに変換され、地名を意味し、「渋滞」
は「ジュータイ」とカナに変換され、程度を意味し、
「２ｋｍ」は「ニキロ」とカナに変換され、距離を意味
する。Next, the description is returned to the word dictionary section 64. An example of the character information of the traffic information has a limited sentence pattern as follows. The sentence has been simplified to appeal to the visual sense. "The westbound of the second Shinmei road is near Tazu, traffic congestion is 2km." According to the normalized character data to the language processing analysis unit 63, the "second Shinmei road" is replaced with "Dainishin Meadoro" and Kana Converted, means road name, "westbound" is converted to "Nishiyuki" and kana, and means direction, "Tamazu" is converted to "matsu" and kana, means place name, "congestion"
Is converted to "jutai" and kana, meaning degree,
“2 km” is converted to “niklo” and kana, which means a distance.

【００１１】この場合、単語辞書部６４は、これらのダ
イニシンメードーロ、ニシユキ、タマツ、ジュータイ、
ニキロ等の単語を有するが、これらの単語に、例えば、
道路名を意味する単語に「Ａ」という属性のコードを与
える。同様に、方向を意味する単語には「Ｂ」、地名を
意味する単語に「Ｃ」、程度を意味する単語に「Ｄ」、
距離を意味する単語に「Ｅ」という属性のコードを与え
る。その他の属性として、原因を意味する単語に
「Ｆ」、インシデントを意味する単語に「Ｇ」、期日を
意味する単語に「Ｈ」、時刻を意味する単語に「Ｉ」、
時間を意味する単語に「Ｊ」、規制を意味する単語に
「Ｋ」、速度を意味する単語に「Ｌ」、注意を意味する
単語に「Ｍ」、交通手段を意味する単語に「Ｎ」、出入
口を意味する単語に「Ｏ」、地域を意味する単語に
「Ｐ」、警報を意味する単語に「Ｑ」等を与える。[0011] In this case, the word dictionary unit 64 stores these dainisin medoro, Nishiyuki, Tatsuma, Jutai,
Have words such as ni-kilo, but these words include, for example,
A code having an attribute "A" is given to a word meaning a road name. Similarly, a word meaning direction is “B”, a word meaning place name is “C”, a word meaning degree is “D”,
A code having an attribute “E” is given to a word meaning a distance. Other attributes include "F" for the word meaning cause, "G" for the word meaning incident, "H" for the word meaning date, "I" for the word meaning time,
"J" for the word meaning time, "K" for the word meaning regulation, "L" for the word meaning speed, "M" for the word meaning attention, and "N" for the word meaning transportation means "O" is assigned to a word meaning an entrance, "P" is assigned to a word meaning an area, "Q" is assigned to a word meaning an alarm, and the like.

【００１２】文例パターンテーブルでは、限定されてい
る文例の単語列を以下の如く作成して持っている。単語列属性列文例番号路線名＋方向＋地名＋程度＋距離Ａ＋Ｂ＋Ｃ＋Ｄ＋Ｅ１路線名＋地名＋方向＋程度＋距離Ａ＋Ｃ＋Ｂ＋Ｄ＋Ｅ１路線名＋地名＋方向＋距離＋程度Ａ＋Ｃ＋Ｂ＋Ｅ＋Ｄ１路線名＋方向＋地名＋距離＋程度Ａ＋Ｂ＋Ｃ＋Ｅ＋Ｄ１路線名＋方向＋地名＋地名＋程度＋距離Ａ＋Ｂ＋Ｃ＋Ｃ＋Ｄ＋Ｅ２路線名＋方向＋地名＋地名＋距離＋程度Ａ＋Ｂ＋Ｃ＋Ｃ＋Ｅ＋Ｄ２路線名＋方向＋原因＋地名＋距離＋程度Ａ＋Ｂ＋Ｆ＋Ｃ＋Ｅ＋Ｄ３路線名＋方向＋原因＋地名＋程度＋距離Ａ＋Ｂ＋Ｆ＋Ｃ＋Ｄ＋Ｅ３ ………… 制御部６３では、属性及び韻律記号付きカナ文字で構成
される１つの文章が入力したら、文例パターンテーブル
６６と照合し、単語の属性列が文例パターンテーブル６
６が有する文例と全く一致しない場合には、以下のピッ
チ生成処理に規則合成型韻律生成処理を選択する。一方
文例の１つと一致する場合には、はめ込み型韻律生成処
理を選択するとともにはめ込みフォーマットの表音文字
列を生成する。In the sentence example pattern table, word strings of limited sentence examples are created and held as follows. Word string Attribute string Sentence example number Route name + direction + place name + degree + distance A + B + C + D + E 1 Route name + place name + direction + degree + distance A + C + B + D + E 1 Route name + place name + direction + distance + degree A + C + B + E + D 1 Route name + direction + place name + distance + Degree A + B + C + E + D 1 Route name + direction + place name + place name + degree + distance A + B + C + C + D + E 2 Route name + direction + place name + place name + distance + degree A + B + C + C + E + D 2 Route name + direction + cause + place name + distance + degree A + B + F + C + E path + Cause + Place name + Degree + Distance A + B + F + C + D + E 3... The control unit 63 checks the sentence example pattern table 66 when one sentence composed of the attribute and the Kana character with the prosody symbol is input. Example pattern table 6
In the case where the sentence example does not completely match the sentence example, the rule synthesis type prosody generation processing is selected for the following pitch generation processing. On the other hand, if it matches one of the sentence examples, the inset type prosody generation processing is selected and a phonetic character string in the inset format is generated.

【００１３】はめ込みフォーマット表音文字列は、文章
番号とはめ込み単語の読みから形成される。なお、文例
パターンテーブル６６では、上記文例番号１に４つの例
をもっているが、単語位置が変わるだけで同一の情報内
容であるので、文例テーブルには１つの例だけをもつよ
うにしてもよい。メモリ容量を小さくするためである。An inset format phonetic character string is formed from a sentence number and the reading of an inset word. Although the sentence example pattern table 66 has four examples for the above sentence example number 1, the sentence example table may have only one example because the information content is the same only by changing the word position. This is to reduce the memory capacity.

【００１４】図３のはめ込み文例テーブル７３は、文例
パターンテーブル６４の各文例に対応して、文例を構成
する単語を接続する助詞、助動詞、動詞等の接続語の表
音文字列で作成され、例えば、文例番号１の一番上の例
では、「ノ」「ハ、」「フキンデ、」「デス。」からな
る。この場合、はめ込み型韻律生成部７４は、はめ込み
フォーマット表音文字列を構成する単語の属性列の各単
語を文例パラメータテーブルの文例の助詞、助動詞、動
詞等の接続語の表音文字列の間にはめ込む。上記の文例
１の場合には、『Ａ「ノ」Ｂ「ハ、」Ｃ「フキンデ、」ＤＥ「デ
ス。」』のようなはめ込み表音文字列が得られる。The inset sentence example table 73 shown in FIG. 3 is formed of phonetic character strings of connecting words, such as particles, auxiliary verbs, and verbs, connecting the words constituting the sentence examples, corresponding to each of the sentence examples in the sentence example pattern table 64. For example, in the example at the top of the sentence example number 1, it is composed of "No", "C", "Fukinde" and "Death". In this case, the inset type prosody generation unit 74 converts each word of the attribute string of the word constituting the inset format phonetic character string between the phonetic character strings of connective words such as particles, auxiliary verbs and verbs in the sentence example parameter table. Fit it. In the case of the above sentence example 1, an inset phonetic character string such as “A“ NO ”B“ H ”“ C ”Fukinde“ DE “Death.” ”Is obtained.

【００１５】交通情報の文字情報は視覚的な構成となっ
ているので、単語を助詞、助動詞、動詞等の接続語の表
音文字列の間にはめ込み文章として完成させ、視覚から
聴覚に訴えるように変える。次に、はめ込み文例テーブ
ル７３では、各文例の助詞、助動詞、動詞等の接続語の
間に適切な単語をはめ込んで文章にして、各文章をアナ
ウンサに発声してもらい、イントネーションを分析し、
これを各文章について保持しておく。はめ込み型韻律生
成部７４ははめ込み単語のアクセント、ポーズに関して
は韻律テーブル６８と同様な構成の韻律テーブル７５を
参照し、文章全体のイントネーションついてははめ込み
文例テーブル７３を参照して、イントネーションとアク
セントを重畳してピッチパターンを形成する。波形生成
部７１では、前述と同様に、はめ込み表音文字列の音素
に対して音素継続時間を設定し、はめ込み型韻律生成部
７４からのピッチパターンを設定して音素の波形が接続
される。Since the character information of the traffic information has a visual structure, the word is completed as a sentence inserted between phonetic character strings of connecting words such as particles, auxiliary verbs, verbs, etc., and appeals from sight to hearing. Change to Next, in the inset sentence example table 73, an appropriate word is inserted between connecting words such as particles, auxiliary verbs, and verbs of each sentence example to form sentences, and each sentence is uttered by an announcer, and intonation is analyzed.
This is kept for each sentence. The inlaid type prosody generation unit 74 refers to the prosody table 75 having the same configuration as the prosody table 68 for the accent and pause of the inlaid word, and refers to the inset sentence example table 73 for the intonation of the entire sentence to superimpose the intonation and accent. To form a pitch pattern. In the same manner as described above, the waveform generation unit 71 sets the phoneme duration for the phonemes of the inlaid phonetic character string, sets the pitch pattern from the inlaid prosody generation unit 74, and connects the waveforms of the phonemes.

【００１６】したがって、本発明によれば、限定された
交通情報の文例に対してピッチパターンを設定するよう
にしたので、自然な音声合成が得られるようになる。な
お、交通情報以外のニュース、例えば、『大手私鉄も１万円割る、賃上げ９９５０円、３．３８
％』のような情報は予め文例を想定することは困難であ
り、はめ込み合成ではなく、従来通りの規則合成を用い
て音声合成を行う。Therefore, according to the present invention, since a pitch pattern is set for a limited example of traffic information, natural speech synthesis can be obtained. In addition, news other than traffic information, for example, "Major private railways also divide 10,000 yen, wage increase 9950 yen, 3.38
It is difficult to assume a sentence example in advance for information such as "%", and speech synthesis is performed using conventional rule synthesis instead of inset synthesis.

【００１７】図２及び３の構成において、規則合成とは
め込み合成とでデンポラリバッファ、入力、出力バッフ
ァを共通化してもよい。データメモリを増加せずに２種
類のアルゴリズムを用いることが可能になるためであ
る。さらに、文例パターンテーブル６６、文例パラメー
タテーブル７３を音声合成プログラムから分離してフラ
ッシュメモリ等に設置するようにしてもよい。文例デー
タのベースのバージョンアップの実現を簡単にするため
である。In the configurations shown in FIGS. 2 and 3, the depolarizer buffer, the input buffer, and the output buffer may be shared between the rule combination and the fitting combination. This is because two types of algorithms can be used without increasing the data memory. Further, the sentence example pattern table 66 and the sentence example parameter table 73 may be separated from the speech synthesis program and installed in a flash memory or the like. This is to make it easy to upgrade the base of the sentence example data.

[Brief description of the drawings]

【図１】本発明の実施の形態に係るテキスト音声合成装
置を示す図である。FIG. 1 is a diagram showing a text-to-speech synthesis apparatus according to an embodiment of the present invention.

【図２】図１のはめ込み型併用テキスト音声合成６の構
成を示す図である。FIG. 2 is a diagram showing a configuration of a text-to-speech synthesizing unit 6 shown in FIG.

【図３】図１のはめ込み型併用テキスト音声合成６の構
成を示す図である。FIG. 3 is a diagram showing a configuration of a text-to-speech synthesis 6 shown in FIG.

[Explanation of symbols]

６…はめ込み型併用テキスト音声合成部６１…文字データの正規化部６２…変換テーブル６３…言語処理解析部６４…単語辞書部６５…制御部６６…文例パターンテーブル６７…規則合成型韻律生成部６８、７５…韻律テーブル６９…音素継続時間生成部７０…音素長テーブル７１…波形接続部７２…音素辞書部７３…イントネーション付きはめ込みテーブル７４…はめ込み型韻律生成部 6: Text-to-Speech Synthesizing Unit with Inset Type 61: Normalization Unit for Character Data 62: Conversion Table 63: Language Processing Analysis Unit 64: Word Dictionary Unit 65: Control Unit 66: Sentence Example Pattern Table 67: Rule Synthesizing Prosody Generation Unit 68 , 75: Prosody table 69: Phoneme duration generation unit 70: Phoneme length table 71: Waveform connection unit 72: Phoneme dictionary unit 73: Inset table with intonation 74: Inset type prosody generation unit

Claims

[Claims]

1. A text-to-speech synthesizer for regularly synthesizing an arbitrary sentence into speech, comprising: a language processing analysis unit for converting the sentence into a phonetic character string and a word identification attribute string; Memorize many words in
A word dictionary unit in which a part of a word has an identification attribute for identifying the word from other words to obtain the word identification attribute sequence; a sentence example pattern table having a sentence example pattern including the identification attribute sequence; It is determined whether or not the word identification attribute sequence obtained from the language processing analysis unit matches the sentence example pattern in the sentence example pattern table, and if it matches, the embedded speech synthesis is performed. A control unit that generates a pitch pattern based on the intonation of the obtained phonogram character string, and performs control so that the rule synthesis is performed using the pitch pattern. A plurality of connecting words that connect a plurality of words so as to form one sentence by inserting a plurality of corresponding words included in a column of identification attributes and a sentence-specific intonation An inset sentence example table having a plurality of inset sentence example patterns, and a pitch pattern is generated based on intonations held in the inset sentence example table, and a phonetic character string of the inset word string is formed using the pitch pattern. A text-to-speech synthesizer, comprising: an inset-type prosody generation unit for connecting waveforms of speech units to be synthesized.

2. The word dictionary unit according to claim 1, wherein a part of the word has an identification attribute for identifying the word from other words.
Road name, direction, place name, degree, distance, cause, incident, due date, time, time, regulation, speed, caution, transportation,
2. The text-to-speech synthesis apparatus according to claim 1, wherein words that mean an entrance, an area, a warning, and the like are set.

3. The text-to-speech synthesis apparatus according to claim 1, wherein the inlaid sentence example table forms intonation information unique to the sentence example by an intonation generated by an announcer.

4. The text-to-speech synthesis apparatus according to claim 1, wherein the control unit refers to the sentence example pattern table and outputs a corresponding unique sentence example number.

5. The text voice according to claim 1, wherein the sentence example pattern table sets any one of a plurality of similar word strings in which the words constituting the word string are the same and the order is different. Synthesizer.

6. The text-to-speech synthesis apparatus according to claim 1, wherein the rule synthesis and the inlay speech synthesis use a common temporary buffer, input and output buffers.

7. The text-to-speech synthesizing apparatus according to claim 1, wherein the sentence example pattern table and the inlay table with intonation are formed by a flash memory.

8. The text-to-speech synthesis apparatus according to claim 1, wherein the word dictionary unit indicates, as a code, an identification attribute for identifying a part of the word from other words.