JP2003005776A

JP2003005776A - Voice synthesizing device

Info

Publication number: JP2003005776A
Application number: JP2001188509A
Authority: JP
Inventors: Keiko Inagaki; 敬子稲垣
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-06-21
Filing date: 2001-06-21
Publication date: 2003-01-08

Abstract

PROBLEM TO BE SOLVED: To eliminate a need of two dictionaries and to avoid selection of words having expressions which are not normally used. SOLUTION: A dictionary 3 is provided where the expressions of words, readings, accents, part of speech information, and information on the appearance frequencies of the expressions of words in an input sentence are stored. A morpheme analysis means 1 receives the input sentence and uses the dictionary 3 to output word unit divided sentences obtained by dividing the input sentence in the unit of expressions of words. A word candidate selection means 2 uses frequency information in the dictionary 3 to select a sentence divided to a set of expressions of words having the highest appearance frequency when receiving a plurality of word unit-divided sentences. A voice waveform generation means uses the dictionary 3 for individual expressions of words in the selected sentence to generate pronunciation information to which readings and accents of words are imparted, rhythm information representing the intonation and the rhythm of the entire word unit-divided sentence is generated on the basis of pronunciation information, and a voice waveform is generated on the basis of the pronunciation information and rhythm information.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声合成装置に関
し、特に漢字かな混じり文に応じた音声波形を生成する
音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer, and more particularly to a speech synthesizer for generating a speech waveform corresponding to a kanji / kana mixed sentence.

【０００２】[0002]

【従来の技術】従来、この種の音声合成装置は、入力し
た漢字かな混じり文を正しい単語の読みとアクセント及
び正しいイントネーションとリズムにより読み上げるた
めに用いられている。2. Description of the Related Art Conventionally, this type of speech synthesizer has been used to read an input Kanji / Kana mixed sentence according to correct word reading and accent, and correct intonation and rhythm.

【０００３】この従来の音声合成装置のブロック図であ
る図６を参照すると、従来の音声合成装置は、単語の表
記，単語の読みとアクセント及び単語の品詞情報を単語
の表記毎に記憶した辞書と、漢字かな混じり文（入力
文）を受け辞書を用いてこの漢字かな混じり文を単語の
表記の単位に分割した単語単位分割文を作成しこの作成
した単語単位分割文が複数あるときには、例えば自立語
の数の最も少ない単語単位分割文を選んでこの選んだ単
語単位分割文を出力する形態素解析手段（この形態素解
析手段で行う形態素解析には種々の方法がある。例え
ば、文頭から辞書と照合し、文法的な接続関係をチェッ
クしながら順に単語系列に区分する「左最長一致法」と
呼ばれる方法があり、これは多くの日本語音声合成装置
の形態素解析に用いられている。）と、形態素解析手段
より漢字かな混じり文に対応する単語単位分割文を受け
この受けた単語単位分割文内のそれぞれの単語の表記に
対して辞書を使用して単語の読みとアクセントをそれぞ
れ付与しこの付与した単語単位分割文の発音情報と、こ
の単語単位分割文の発音情報から生成した単語単位分割
文全体のイントネーションとリズムとを示す韻律情報と
を基にして音声波形を生成する音声波形発生手段とを有
する構成である。Referring to FIG. 6, which is a block diagram of this conventional speech synthesizer, the conventional speech synthesizer stores a word notation, word reading and accent, and word part-of-speech information for each word notation. And a kanji / kana mixed sentence (input sentence) is used to create a word unit divided sentence by dividing this kanji / kana mixed sentence into word notation units, and when there are a plurality of created word unit divided sentences, A morphological analysis unit that selects a word unit divided sentence having the smallest number of independent words and outputs the selected word unit divided sentence (There are various methods for morpheme analysis performed by this morpheme analyzing unit. For example, from the beginning of a sentence to a dictionary. There is a method called "left longest match method" that matches and checks the grammatical connection relations and sequentially divides it into word sequences, which is used for morphological analysis of many Japanese speech synthesizers. ) And a word unit divided sentence corresponding to a kanji / kana mixed sentence from the morpheme analysis means, and a dictionary is used for the notation of each word in the received word unit divided sentence to read and accent the word. A voice waveform is generated based on the pronunciation information of the assigned word unit divided sentence and the prosody information indicating the intonation and rhythm of the entire word unit divided sentence generated from the pronunciation information of the word unit divided sentence. And a voice waveform generating means for performing the same.

【０００４】形態素解析手段で、漢字かな混じり文を辞
書を使用して単語の表記の単位に分割し単語単位分割文
を作成する場合、辞書にない表記の単語を未知語として
処理し、読みを付ける。この未知語の読み付け処理の性
能向上が、音声合成合成装置の読み誤り率の削減には効
果大であるため、未知語処理に関する様々な研究が行わ
れている。特に従来の合成音声装置は、入力テキストに
漢字かな混じり文を想定しているため、かな文字だけか
らなる文字列の読み誤り率は非常に高かった。これを回
避するため、例えば、特開２００１−５４７９号公報に
記載の技術では、入力が漢字の混ざらない文字列であっ
ても、正確な読み上げを行えるように、漢字が混ざらな
い文字列には、英数仮名に特化した辞書を用いること
で、かな文字だけからなる文字列でも読み上げ精度を維
持しようとしている。しかし、この方法は、漢字が少し
でも混じった場合には対応できない。When a morpheme analysis means divides a kanji / kana mixed sentence into word notation units by using a dictionary to create a word unit divided sentence, a notated word not in the dictionary is processed as an unknown word and read. wear. Since the improvement of the performance of this unknown word reading process is effective in reducing the reading error rate of the speech synthesis / synthesis apparatus, various studies have been conducted on unknown word processing. Particularly, since the conventional synthetic speech device assumes a kanji-kana mixed sentence in the input text, the reading error rate of a character string consisting of only kana characters is very high. In order to avoid this, for example, in the technique described in Japanese Patent Laid-Open No. 2001-5479, even if the input is a character string in which Chinese characters are not mixed, a character string in which Chinese characters are not mixed is used so that accurate reading can be performed. By using a dictionary specialized for alphanumeric characters, we are trying to maintain the reading accuracy even for character strings that consist only of kana characters. However, this method cannot handle the case where the kanji are mixed even a little.

【０００５】かな文字列を多く含む文字列の読み上げ精
度を向上させるためのもう一つの方法として、一つの単
語に対し同音、同意味で異なる表記（異表記）を多数記
述する方法がある。例えば、「送信」に対し異表記とし
て、「そうしん」や「ソウシン」、「そう信」、「送し
ん」、「ソーシン」を登録しておき、漢字かな混じり
文、平仮名文、カタカナ文、口語文などが入力文であっ
た場合にも読み上げ精度を維持させる。これら異表記の
中には、他の表記と比べ、通常の漢字かな混じり文では
あまり見られない記述のものも含まれている。異表記が
多数登録されている辞書を用いて形態素解析を行うと、
解析結果として通常はあまり使われない単語列に分割さ
れてしまうことがある。例えば“秤”（名詞）の異表記
として”はかり”が登録されている辞書を用いて、「ト
ラックはかりに衝突しても。。。」という文を解析させ
ると、「はかりに」の部分を”は仮に”ではなく、”秤
に”の異表記”はかり”として解析してしてしまうこと
がある。この場合「はかりに」は”は仮に”と”秤に”
のどちらと解釈しても形態素解析上は正しいため、形態
素解析だけでどちらを選ぶかを判断することはできな
い。これを正しい単語に分割させるためには、文の意味
を考える必要がある。しかし、意味解析を行うには、辞
書の整備やシステムの構築にかかる工数が大きく、また
実現方法が確立されていないため、現在はどこの音声合
成装置にも搭載されていない。そのため現状では、辞書
に異表記を追加すればするほど、通常のかな漢字文の解
析精度が落ちる傾向にあり、最適な単語単位分割文の選
択方法を見つけることが重要課題となっている。As another method for improving the reading accuracy of a character string containing a large number of kana character strings, there is a method of describing a lot of different notations (same notations) with the same sound and the same meaning for one word. For example, "sendin", "soshin", "soshin", "sendin", and "soshin" are registered as different expressions for "send", and kanji and kana mixed sentences, hiragana sentences, katakana sentences, The reading accuracy is maintained even when a spoken sentence is an input sentence. Some of these different notations include those that are not often seen in ordinary kanji / kana mixed sentences compared to other notations. When morphological analysis is performed using a dictionary in which many different notations are registered,
As a result of analysis, it may be divided into word strings that are not often used. For example, using a dictionary in which "scale" is registered as a different notation for "scale" (noun), the sentence "even if it collides with a truck scale ..." In some cases, "is not a tentative one," but a different notation, "balance," of "scales". In this case, "scale" means "provisionally" and "scale"
Whichever is interpreted, it is correct in morphological analysis, so it is not possible to judge which one is selected only by morphological analysis. In order to divide this into correct words, we need to consider the meaning of the sentence. However, in order to perform the semantic analysis, the number of man-hours required for maintaining a dictionary and constructing a system is large, and a method for realizing the system has not been established. Therefore, under the present circumstances, the more different the notation is added to the dictionary, the more the analysis accuracy of the normal kana-kanji sentence tends to decrease, and it is an important issue to find an optimal method for selecting a word-unit divided sentence.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来の音声合
成装置は、形態素解析手段により、単語単位分割文を複
数作成したときには、例えば自立語の数の最も少ない単
語単位分割文を選ぶようにしているため、正しい単語単
位分割文が選ばれているか否か定かでないという問題が
ある。また、漢字かな混じりの辞書と英数仮名辞書を用
意し、漢字かな混じり文の解析には漢字かな混じりの辞
書、仮名列のみの文には英数仮名辞書を用いることで、
仮名列のみの文の解析精度を維持させているが、漢字か
な混じり文に関しては考慮されていない。そのため漢字
かな混じりの辞書に多数の異表記を登録すると、通常は
あまり使われない表記の単語が選ばれるおそれがあると
いう問題がある。In the conventional speech synthesizer described above, when a plurality of word unit divided sentences are created by the morphological analysis unit, for example, the word unit divided sentence having the smallest number of independent words is selected. Therefore, there is a problem that it is not clear whether or not the correct word unit division sentence is selected. In addition, by preparing a dictionary containing kanji and kana and an alphanumeric kana dictionary, using a kanji and kana dictionary for analysis of kanji and kana mixed sentences, and using an alphanumeric kana dictionary for sentences with only kana strings,
Although the accuracy of parsing only the kana string is maintained, it does not consider kana-kana mixed sentences. Therefore, if a large number of different notations are registered in a kanji-kana mixed dictionary, there is a problem that words of notations that are not often used may be selected.

【０００７】本発明の目的はこのような従来の欠点を除
去するため、定かに正しい単語単位分割文を選び、漢字
かな混じり辞書と英数仮名辞書との性質の異なる二つの
辞書を必要とせず、通常あまり使われない表記の単語が
選ばれるおそれのない音声合成装置を提供することにあ
る。The object of the present invention is to eliminate such drawbacks of the prior art by selecting a correct word-unit division sentence and without requiring two dictionaries having different characteristics, that is, a kanji-kana mixed dictionary and an alphanumeric dictionary. , It is to provide a speech synthesizer in which there is no fear that a word with a not-commonly used notation is selected.

【０００８】[0008]

【課題を解決するための手段】本発明の第１の音声合成
装置は、漢字かな混じり文を受けこの漢字かな混じり文
を単語の表記を単位にして分割しこの分割した単語単位
分割文を出力する形態素解析手段と、前記形態素解析手
段より前記漢字かな混じり文に対応する前記単語単位分
割文を受けこの受けた単語単位分割文が複数あるときに
この複数の前記単語単位分割文のうちから、最も出現頻
度の高い前記単語の前記表記の組に分割された前記単語
単位分割文を選択する単語候補選択手段と、前記単語候
補選択手段により選択した前記単語単位分割文内のそれ
ぞれの前記単語の前記表記に対して前記単語の前記読み
とアクセントをそれぞれ付与して前記単語単位分割文の
発音情報とし、この単語単位分割文の前記発音情報に基
づいて前記単語単位分割文全体のイントネーションとリ
ズムとを示す韻律情報を生成し、前記発音情報と前記韻
律情報とに基づいて音声波形を生成する音声波形発生手
段と、を備えて構成されている。A first speech synthesizer according to the present invention receives a kanji / kana mixed sentence and divides the kanji / kana mixed sentence in word notation units and outputs the divided word-unit divided sentences. From the plurality of word unit divided sentences when there is a plurality of word unit divided sentences received by the morpheme analysis unit and the word unit divided sentence corresponding to the Kanji / Kana mixed sentence from the morpheme analysis unit, A word candidate selection unit that selects the word unit divided sentence divided into the set of the notation of the word having the highest appearance frequency, and each of the words in the word unit divided sentence selected by the word candidate selecting unit. The pronunciation and the accent of the word are given to the notation as the pronunciation information of the word unit divided sentence, and the word unit is divided based on the pronunciation information of the word unit divided sentence. Generates prosody information indicating the intonation and rhythm of the whole division statement is configured to include a speech waveform generation means for generating a speech waveform, the based on the sound information and the prosody information.

【０００９】本発明の第２の音声合成装置は、漢字かな
混じり文を受けこの漢字かな混じり文を単語の表記を単
位にして分割しこの分割した単語単位分割文を出力する
形態素解析手段と、前記形態素解析手段より前記漢字か
な混じり文に対応する前記単語単位分割文を受けこの受
けた単語単位分割文が複数あるときにこの複数の前記単
語単位分割文のうちから、この単語単位分割文内のそれ
ぞれの前記単語の前記表記にそれぞれ対応した、漢字か
な混じり文に出現する頻度を示すそれぞれの予め定めた
頻度情報の前記単語単位分割文全体における評価が最も
高い前記単語単位分割文を選択する単語候補選択手段
と、前記単語候補選択手段により選択した前記単語単位
分割文内のそれぞれの前記単語の前記表記に対して前記
単語の前記読みとアクセントをそれぞれ付与して前記単
語単位分割文の発音情報とし、この単語単位分割文の前
記発音情報に基づいて前記単語単位分割文全体のイント
ネーションとリズムとを示す韻律情報を生成し、前記発
音情報と前記韻律情報とに基づいて音声波形を生成する
音声波形発生手段と、を備えて構成されている。The second speech synthesizing device of the present invention includes a morpheme analysis means for receiving a kanji / kana mixed sentence and dividing the kanji / kana mixed sentence in word notation units and outputting the divided word unit divided sentences. The word unit divided sentence corresponding to the Kanji / Kana mixed sentence is received from the morpheme analysis unit, and when there are a plurality of word unit divided sentences received, the word unit divided sentence is selected from among the plurality of word unit divided sentences. The word unit divided sentence having the highest evaluation in the entire word unit divided sentence of the respective predetermined frequency information indicating the frequency of appearance in the kanji-kana mixed sentence corresponding to the notation of each of the words is selected. The word candidate selecting means and the reading and assigning of the word with respect to the notation of each word in the word unit divided sentence selected by the word candidate selecting means. Cents are given as the pronunciation information of the word-unit divided sentence, and prosodic information indicating the intonation and rhythm of the whole word-unit divided sentence is generated based on the pronunciation information of the word-unit divided sentence. And a voice waveform generating means for generating a voice waveform based on the prosody information.

【００１０】本発明の第３の音声合成装置は、単語の表
記，前記単語の読みとアクセント，前記単語の品詞情報
及び前記単語の前記表記が漢字かな混じり文中に出現す
る頻度を示す頻度情報を前記単語の前記表記毎に記憶し
た辞書と、前記漢字かな混じり文を受け前記辞書を用い
てこの漢字かな混じり文を前記単語の前記表記の単位に
分割した単語単位分割文を出力する形態素解析手段と、
前記形態素解析手段より前記漢字かな混じり文に対応す
る前記単語単位分割文を受けこの受けた単語単位分割文
が複数あるときにこの複数の前記単語単位分割文のうち
から、前記辞書内の前記頻度情報を使用して最も出現頻
度の高い前記単語の前記表記の組に分割された前記単語
単位分割文を選択する単語候補選択手段と、前記単語候
補選択手段により選択した前記単語単位分割文内のそれ
ぞれの前記単語の前記表記に対して前記辞書を使用して
前記単語の前記読みとアクセントをそれぞれ付与して前
記単語単位分割文の発音情報とし、この単語単位分割文
の前記発音情報に基づいて前記単語単位分割文全体のイ
ントネーションとリズムとを示す韻律情報を生成し、前
記発音情報と前記韻律情報とに基づいて音声波形を生成
する音声波形発生手段と、を備えて構成されている。The third speech synthesizing device of the present invention provides word notation, reading and accent of the word, part-of-speech information of the word, and frequency information indicating the frequency of occurrence of the notation of the word in a kanji-kana mixed sentence. A morphological analysis unit that outputs a word unit divided sentence obtained by dividing the kanji / kana mixed sentence by the dictionary stored for each notation of the word and the kanji / kana mixed sentence using the dictionary. When,
The word unit divided sentence corresponding to the Kanji / Kana mixed sentence is received from the morpheme analysis unit, and when there are a plurality of received word unit divided sentences, the frequency in the dictionary is selected from among the plurality of word unit divided sentences. In the word unit divided sentence selected by the word candidate divided unit for selecting the word unit divided sentence divided into the set of the notation of the word having the highest appearance frequency using information, in the word unit divided sentence selected by the word candidate selecting unit Based on the pronunciation information of the word-unit divided sentence, the pronunciation and the accent of the word are added to the notation of each of the words to give pronunciation information of the word-unit divided sentence. Speech waveform generation for generating prosodic information indicating intonation and rhythm of the entire word unit divided sentence, and generating a speech waveform based on the pronunciation information and the prosodic information It is configured by including a stage, a.

【００１１】また、本発明の第３の音声合成装置は、前
記漢字かな混じり文の分野に対応する複数の前記頻度情
報を前記単語の前記表記毎に記憶した前記辞書と、前記
漢字かな混じり文の分野を示す分野情報を受けこの分野
情報に対応する前記辞書内の前記頻度情報を使用して、
前記形態素解析手段より受けた複数の前記単語単位分割
文のうちから、最も出現頻度の高い前記単語の前記表記
の組に分割された前記単語単位分割文を選択する前記単
語候補選択手段と、を備えて構成されている。The third speech synthesizing device of the present invention is also characterized in that the dictionary storing a plurality of pieces of frequency information corresponding to the field of the kanji / kana mixed sentence and the kanji / kana mixed sentence. Using the frequency information in the dictionary corresponding to this field information, which receives field information indicating the field of
From the plurality of word unit divided sentences received from the morpheme analysis unit, the word candidate selection unit for selecting the word unit divided sentence divided into the notation set of the word having the highest appearance frequency, It is equipped with.

【００１２】本発明の第４の音声合成装置は、単語の表
記，前記単語の読みとアクセント，前記単語の品詞情報
及び前記単語の前記表記が漢字かな混じり文中に出現す
る頻度を示す頻度情報を前記単語の前記表記毎に記憶し
た辞書と、前記漢字かな混じり文を受け前記辞書を用い
てこの漢字かな混じり文を前記単語の前記表記の単位に
分割した単語単位分割文を出力する形態素解析手段と、
前記形態素解析手段より前記漢字かな混じり文に対応す
る前記単語単位分割文を受けこの受けた単語単位分割文
が複数あるときにこの複数の前記単語単位分割文のうち
の、この単語単位分割文内のそれぞれの前記単語の前記
表記にそれぞれ対応した、前記辞書内のそれぞれの前記
頻度情報の前記単語単位分割文全体における評価が最も
高い前記単語単位分割文を選択する単語候補選択手段
と、前記単語候補選択手段により選択した前記単語単位
分割文内のそれぞれの前記単語の前記表記に対して前記
辞書を使用して前記単語の前記読みとアクセントをそれ
ぞれ付与して前記単語単位分割文の発音情報とし、この
単語単位分割文の前記発音情報に基づいて前記単語単位
分割文全体のイントネーションとリズムとを示す韻律情
報を生成し、前記発音情報と前記韻律情報とに基づいて
音声波形を生成する音声波形発生手段と、を備えて構成
されている。A fourth speech synthesizer of the present invention provides word notation, reading and accent of the word, part-of-speech information of the word, and frequency information indicating the frequency at which the notation of the word appears in a kanji-kana mixed sentence. A morphological analysis unit that outputs a word unit divided sentence obtained by dividing the kanji / kana mixed sentence by the dictionary stored for each notation of the word and the kanji / kana mixed sentence using the dictionary. When,
When the word unit divided sentence corresponding to the Kanji / Kana mixed sentence is received from the morpheme analysis unit, and when there are a plurality of word unit divided sentences received, in the word unit divided sentence of the plurality of word unit divided sentences A word candidate selecting unit that selects the word unit divided sentence having the highest evaluation in the entire word unit divided sentence of each of the frequency information in the dictionary, respectively corresponding to the notation of each of the words; The pronunciation and accent of the word are added to the notation of each word in the word unit divided sentence selected by the candidate selecting unit using the dictionary as pronunciation information of the word unit divided sentence. Generating prosodic information indicating the intonation and rhythm of the entire word unit divided sentence based on the pronunciation information of the word unit divided sentence, Is configured to include a speech waveform generation means for generating a speech waveform, the based on the information and the prosody information.

【００１３】また、本発明の第４の音声合成装置は、前
記漢字かな混じり文の分野に対応する複数の前記頻度情
報を前記単語の前記表記毎に記憶した前記辞書と、前記
漢字かな混じり文の分野を示す分野情報を受けこの分野
情報に対応する前記辞書内の前記頻度情報を使用して、
前記形態素解析手段より受けた複数の前記単語単位分割
文のうちから、この単語単位分割文内のそれぞれの前記
単語の前記表記にそれぞれ対応した、それぞれの前記頻
度情報の前記単語単位分割文全体における評価が最も高
い前記単語単位分割文を選択する単語候補選択手段と、
を備えて構成されている。Further, a fourth voice synthesizing device of the present invention is the dictionary in which a plurality of the frequency information corresponding to the field of the kanji / kana mixed sentence is stored for each notation of the word, and the kanji / kana mixed sentence. Using the frequency information in the dictionary corresponding to this field information, which receives field information indicating the field of
Of the plurality of word unit divided sentences received from the morpheme analysis means, corresponding to the notation of each of the words in the word unit divided sentence, respectively, in the entire word unit divided sentence of each of the frequency information Word candidate selection means for selecting the word unit divided sentence with the highest evaluation,
It is configured with.

【００１４】[0014]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, embodiments of the present invention will be described with reference to the drawings.

【００１５】図１は、本発明の音声合成装置の一つの実
施の形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a speech synthesizer of the present invention.

【００１６】図１に示す本実施の形態は、単語の表記，
単語の読みとアクセント，単語の品詞情報及び単語の表
記が漢字かな混じり文中に出現する頻度を示す頻度情報
を単語の表記毎に記憶した辞書３と、漢字かな混じり文
を受け辞書３を用いてこの漢字かな混じり文を単語の表
記の単位に分割した単語単位分割文を出力する形態素解
析手段１と、形態素解析手段１より漢字かな混じり文に
対応する単語単位分割文を受けこの受けた単語単位分割
文が複数あるときにこの複数の単語単位分割文のうちか
ら、辞書３内の頻度情報を使用して最も出現頻度の高い
単語の表記の組に分割された単語単位分割文を選択す
る、すなわち、この複数の単語単位分割文のうちの、こ
の単語単位分割文内のそれぞれの単語の表記にそれぞれ
対応した、辞書３内のそれぞれの頻度情報の単語単位分
割文全体における評価が最も高い単語単位分割文を選択
する単語候補選択手段２と、単語候補選択手段２により
選択した単語単位分割文内のそれぞれの単語の表記に対
して辞書３を使用して単語の読みとアクセントをそれぞ
れ付与して単語単位分割文の発音情報とし、この単語単
位分割文の発音情報に基づいて単語単位分割文全体のイ
ントネーションとリズムとを示す韻律情報を生成し、発
音情報と韻律情報とに基づいて音声波形を生成する音声
波形発生手段とにより構成されている。In the present embodiment shown in FIG. 1, word notation,
Using the reading and accent of words, word part-of-speech information, and frequency information indicating the frequency of occurrence of word notation in a kanji-kana mixed sentence, and a dictionary 3 that stores kanji-kana mixed sentences, using the dictionary 3. The morpheme analysis unit 1 which outputs a word unit divided sentence obtained by dividing the Kanji / Kana mixed sentence into word writing units, and the word unit received by the morpheme analysis unit 1 corresponding to the Kanji / Kana mixed sentence When there are a plurality of divided sentences, the word unit divided sentence divided into a set of notations of the word having the highest appearance frequency is selected from the plurality of word unit divided sentences by using the frequency information in the dictionary 3. That is, of the plurality of word unit divided sentences, the frequency information in the dictionary 3 corresponding to the notation of each word in the word unit divided sentence is evaluated in the entire word unit divided sentence. The word candidate selecting means 2 for selecting the word unit divided sentence having the highest score and the dictionary 3 for the notation of each word in the word unit divided sentence selected by the word candidate selecting means 2 using the word reading and accent. Is given as the pronunciation information of the word unit divided sentence, and prosodic information indicating the intonation and rhythm of the entire word unit divided sentence is generated based on the pronunciation information of the word unit divided sentence, and the pronunciation information and the prosodic information are generated. And a voice waveform generating means for generating a voice waveform based on the voice waveform.

【００１７】音声波形発生手段は、単語候補選択手段２
により選択した単語単位分割文内のそれぞれの単語の表
記に対して辞書３を使用して単語の読みとアクセントを
それぞれ付与して単語単位分割文の発音情報を生成する
発音情報生成手段４と、発音情報生成手段４が生成した
発音情報から単語単位分割文全体のイントネーションと
リズムとを示す韻律情報を生成する韻律情報生成手段５
と、単語単位分割文の発音情報（例えば、それぞれの単
語の読みとアクセント等）と韻律情報（例えば、イント
ネーションとリズム等）とを基にして音声を音声合成す
るための細かな単位に分割したデータである音声波形デ
ータ７を用いて音声波形を生成する音声波形生成手段６
とにより構成されている。The voice waveform generating means is word candidate selecting means 2
Pronunciation information generation means 4 for generating pronunciation information of the word-unit divided sentence by using the dictionary 3 to give the pronunciation and accent of each word to the notation of each word in the word-unit divided sentence selected by From the pronunciation information generated by the pronunciation information generation means 4, prosody information generation means 5 for generating prosody information indicating the intonation and rhythm of the entire word unit divided sentence.
Based on the pronunciation information (eg, reading and accent of each word) and prosody information (eg, intonation and rhythm) of the word unit divided sentence, the voice is divided into fine units for voice synthesis. Voice waveform generation means 6 for generating a voice waveform using voice waveform data 7 that is data
It is composed of and.

【００１８】単語候補選択手段２は、単語の表記が漢字
かな混じり文に出現する頻度を示す予め定めた数値であ
る頻度情報の単語単位分割文全体における合計が最も大
である単語単位分割文を選択するようにしている。The word candidate selecting means 2 selects a word unit divided sentence in which the sum of the frequency information, which is a predetermined numerical value indicating the frequency of appearance of a word notation in a kanji-kana mixed sentence, is the largest in the whole word unit divided sentence. I am trying to choose.

【００１９】次に、本実施の形態の音声合成装置の動作
を図２及び図３を参照して詳細に説明する。Next, the operation of the speech synthesizer according to this embodiment will be described in detail with reference to FIGS.

【００２０】図２は、漢字かな混じり文に対応する複数
の単語単位分割文とこららの頻度情報の一例を示す図で
あり、入力した漢字かな混じり文の一部（そう信じ）と
二つの単語単位分割文の一部（そう信／じ、そう／信
じ）とこれらの頻度情報（「そう信」の頻度は２、
「じ」の頻度は１、「そう」の頻度は５、「信じ」の頻
度は５）とこの合計とを示している。FIG. 2 is a diagram showing an example of a plurality of word unit divided sentences corresponding to a kanji / kana mixed sentence and these frequency information. A part of the inputted kanji / kana mixed sentence (believed so) and two words are shown. A part of the unit division sentence (belief / so, so / belief) and these frequency information (the frequency of "belief" is 2,
The frequency of "ji" is 1, the frequency of "yes" is 5, and the frequency of "belief" is 5), and this total is shown.

【００２１】図３は、辞書の記憶内容の一例を示す図で
あり、単語の表記，単語の読みとアクセント（’で示す
部分にアクセントがある），単語の品詞情報及び単語の
頻度情報を単語の表記毎に記憶している様子を示してい
る。頻度情報（出現頻度と記載。）は、表記が文中に現
れる頻度を示す数値であり、この数値としては、例え
ば、ほとんど使われない表記である場合は１、他の表記
の方が一般的であるが１よりも使われることがある場合
を２、２より多い出現頻度だが４よりも低い場合を３、
最も多く出現するわけではないが３より上である場合は
４、他の記述よりも明らかに使われることが多い場合を
５として付与する。FIG. 3 is a diagram showing an example of the contents stored in the dictionary. The word notation, word reading and accent (there is an accent at the portion indicated by '), word part-of-speech information and word frequency information It shows a state of being stored for each notation. The frequency information (described as appearance frequency) is a numerical value indicating the frequency with which a notation appears in a sentence. For example, if the notation is rarely used, it is 1, and other notations are more common. If there is one but it is used more than 1, it is 2, if it is more than 2 but less than 4, it is 3,
Although it does not appear most often, it is given as 4 when it is higher than 3, and as 5 when it is used more obviously than other description.

【００２２】図１において、単語の表記，単語の読みと
アクセント，単語の品詞情報及び単語の表記が漢字かな
混じり文中に出現する頻度を示す頻度情報を単語の表記
毎に記憶した辞書３を予め定めた用意しておく。形態素
解析手段１により、漢字かな混じり文（例えば、「田中
氏はそう信じ、疑わなかった。」）を入力しこの入力し
た漢字かな混じり文（入力文）を辞書３を用いて単語の
表記の単位に分割した単語単位分割文を作成する。この
とき、「そう信じ」の単語分割候補として「そう信／
じ」（”送信時“の意）と「そう／信じ」（”そう
“と”信じる“の連用中止の意）の２候補（すなわち、
２つの単語単位分割文）が求められる。単語候補選択手
段２は、形態素解析手段１より２つの単語単位分割文を
受け、この２つの候補（「そう信／じ」と「そう／信
じ」）に対し辞書３を用いて、個々の単語の頻度を求
め、それら候補の中から合計の大きい方を選択する。す
なわち、図２に示す「そう信じ」の２候補の形態素解析
結果（単語分割）と頻度情報及び合計より、候補１の頻
度合計が３、候補２の頻度合計が１０であるため、候補
２が選択され、発音情報生成手段４には「そう/信じ」
という単語分割結果を含む単語単位分割文のみが送られ
る。発音情報生成手段４は、単語候補選択手段２により
選択した単語単位分割文内のそれぞれの単語の表記に対
して辞書３を使用して単語の読みとアクセントをそれぞ
れ付与して単語単位分割文の発音情報を生成する。In FIG. 1, a dictionary 3 is stored in advance for each word notation, word reading and accent, word part-of-speech information, and frequency information indicating the frequency at which the word notation appears in a kanji-kana mixed sentence. Prepare in advance. The morphological analysis unit 1 inputs a kanji / kana mixed sentence (for example, “Tanaka believes so, I did not doubt”), and inputs this kanji / kana mixed sentence (input sentence) using the dictionary 3 to write words. Create a word-based split sentence that is split into units. At this time, "Shinshin /
Same (“meaning when sending”) and “yes / believe” (meaning that “yes” and “believe” cease continuous use) (ie,
Two word-unit divided sentences) are obtained. The word candidate selection means 2 receives two word-unit divided sentences from the morpheme analysis means 1, and uses the dictionary 3 for these two candidates (“yes / believe” and “yes / believe”) to obtain individual words. The frequency of is calculated, and the one with the larger total is selected from these candidates. That is, based on the morphological analysis results (word division) of two candidates of “believe” and the frequency information and the total shown in FIG. 2, the total frequency of candidate 1 is 3, and the total frequency of candidate 2 is 10. Therefore, candidate 2 is Selected, "Yes / believe" in pronunciation information generation means 4
Only the word-unit division sentence including the word division result is sent. The pronunciation information generating means 4 uses the dictionary 3 to give the reading and accent of each word to the notation of each word in the word unit divided sentence selected by the word candidate selecting means 2 to generate a word unit divided sentence. Generate pronunciation information.

【００２３】韻律情報生成手段５は、発音情報生成手段
４が生成した発音情報から単語単位分割文全体のイント
ネーションとリズムとを示す韻律情報を生成する。すな
わち、韻律情報生成手段５は、発音情報生成手段４が生
成した入力文の発音情報に基づいて入力文を読み上げる
際のリズムやイントネーションのパタンを生成する。韻
律情報生成手段５には、リズムを決めるための継続時間
長制御とイントネーションを決めるためのピッチパタン
制御がある。継続時間長制御としては、例えば母音の中
心（その母音らしい安定したスペクトルを示す時刻）の
間隔を一定に保つ母音中心の間隔の制御を基本とした継
続時間長制御のモデルがある。このモデルでは、合成音
声中の母音間時間長を発話速度、母音の種別、先行する
音韻の種別、アクセント位置などの情報を用いて統計的
に推定を行うものである。ピッチパタン制御には、例え
ば文を構成する各アクセント句の品詞の情報や文中での
位置情報により正規化ピッチパタンを推定したあとアク
セント型によってその形状を補正する方法がある。正規
化ピッチパタンは、文中の各アクセント句の最初の母音
中心のピッチ周波数、ピーク周波数、アクセント句の最
後の母音中心点のピッチ周波数を取り出し、文の先頭ア
クセント句のピークのピッチ周波数で、それぞれの値を
割った値がつくるパタンである。The prosody information generating means 5 generates prosody information indicating the intonation and rhythm of the entire word unit divided sentence from the pronunciation information generated by the pronunciation information generating means 4. That is, the prosody information generating means 5 generates a rhythm or intonation pattern when reading the input sentence based on the pronunciation information of the input sentence generated by the pronunciation information generating means 4. The prosody information generating means 5 has a duration control for determining rhythm and a pitch pattern control for determining intonation. As the duration control, for example, there is a model of duration control based on the control of the interval between vowel centers (time at which a vowel-like stable spectrum is shown) is kept constant. In this model, the time length between vowels in a synthetic speech is statistically estimated using information such as the speech rate, the type of vowel, the type of preceding phoneme, and the accent position. For pitch pattern control, for example, there is a method of estimating a normalized pitch pattern from the information of the part of speech of each accent phrase forming a sentence and the position information in the sentence, and then correcting the shape by the accent type. The normalized pitch pattern extracts the pitch frequency of the first vowel center and the peak frequency of each accent phrase in the sentence, and the pitch frequency of the last vowel center point of the accent phrase. It is the pattern created by dividing the value of.

【００２４】音声波形生成手段６は、発音情報生成手段
４が生成した発音情報と韻律情報生成手段５が生成した
韻律情報とを基にして音声を音声合成するための細かな
単位に分割したデータである音声波形データ７を用いて
音声波形を生成する。すなわち、音声波形生成手段６
は、韻律情報生成手段５が生成したリズムやイントネー
ションパタンにしたがって、音声波形データ７にあらか
じめ蓄えておいた音声を編集して入力文に対する音声信
号を生成する。音声波形データ７には、大量の単語や文
を発声した音声を、合成するための細かい単位に分割し
て蓄積しておく。合成する単位としては、例えば子音と
母音の前半部分（ＣＶ）、母音の後半部分と子音（Ｖ
Ｃ）の単位を用いることができる。音声波形生成手段６
は、単位音声の前後の音素環境と合成したい音声の音素
環境が出来るだけ一致するように音声波形データ７より
単位音声を選択する。選択された単位音声は、韻律情報
生成手段５が生成したリズムやイントネーションパタン
に合わせて、継続時間長の調節やピッチ周波数の変更を
行いながら単位音声を編集する。単位音声どうしを接続
する際の方法や接続する位置は、あらかじめそれぞれの
音素の性質に合わせて決めておく。また接続点では、そ
れぞれの単位音声が滑らかにつながるように補間処理を
行う。このようにして編集された音声データを元に実際
の音声波形を生成する方式としては、例えば波形編集方
式がある。The voice waveform generating means 6 divides the voice into fine units for voice synthesis based on the pronunciation information generated by the pronunciation information generating means 4 and the prosody information generated by the prosody information generating means 5. A voice waveform is generated using the voice waveform data 7. That is, the voice waveform generating means 6
Edits the voice stored in advance in the voice waveform data 7 according to the rhythm and intonation pattern generated by the prosody information generating means 5 to generate a voice signal for the input sentence. In the voice waveform data 7, a large number of words and sentences that are uttered are divided and stored in fine units for synthesis. The unit to be synthesized is, for example, the first half (CV) of a consonant and a vowel, and the second half of a vowel and a consonant (V
The unit of C) can be used. Speech waveform generation means 6
Selects a unit voice from the voice waveform data 7 so that the phoneme environment before and after the unit voice matches the phoneme environment of the voice to be synthesized as much as possible. The selected unit voice is edited in accordance with the rhythm and intonation pattern generated by the prosody information generation means 5 while adjusting the duration time and changing the pitch frequency. The method of connecting the unit voices and the connecting position are determined in advance according to the characteristics of each phoneme. At the connection points, interpolation processing is performed so that the unit voices are smoothly connected. As a method for generating an actual voice waveform based on the voice data edited in this way, there is a waveform editing method, for example.

【００２５】以上の説明では、辞書３に、単語の表記毎
に一つの頻度情報をそれぞれ記憶しておき、形態素解析
手段１により、入力した漢字かな混じり文をこの辞書３
を用いて単語の表記の単位に分割した単語単位分割文に
して出力し、単語候補選択手段２により、形態素解析手
段１より漢字かな混じり文に対応する単語単位分割文を
受けこの受けた単語単位分割文が複数あるときにこの複
数の単語単位分割文のうちから、辞書３内のこの頻度情
報を使用して最も出現頻度の高い単語の表記の組に分割
された単語単位分割文を選択するようにしたが、漢字か
な混じり文の分野（例えば、電子メール、新聞、技術論
文、住所録等。）に対応する複数の頻度情報を単語の表
記毎に記憶した、分野別頻度情報を備えた辞書の記憶内
容の一例を示す図である図４に示す辞書９を備え、分野
別頻度情報を使用した音声合成装置の一つの実施の形態
を示すブロック図である図５に示すようにして、単語候
補選択手段８により、漢字かな混じり文の分野を示す分
野情報を受け（この情報は、本装置の外部よりユーザの
指示により受けても良く、また、入力した漢字かな混じ
り文を調べてこの入力文の分野を決め、この決めた分野
情報を受けても良い。）この分野情報に対応する頻度情
報をこの辞書９より得て、この得た頻度情報を使用して
形態素解析手段１より受けた複数の単語単位分割文のう
ちから、最も出現頻度の高い単語の表記の組に分割され
た単語単位分割文を選択するようにしてもよい。In the above description, one frequency information is stored in the dictionary 3 for each notation of a word, and the morpheme analysis means 1 stores the input kanji / kana mixed sentence in this dictionary 3.
Is output as a word unit divided sentence divided into units of word notation, and the word candidate selecting unit 2 receives the word unit divided sentence corresponding to the Kanji / Kana mixed sentence from the morphological analysis unit 1 and the received word unit When there are a plurality of divided sentences, a word-unit divided sentence divided into a set of word notation having the highest appearance frequency is selected from the plurality of word-unit divided sentences using this frequency information in the dictionary 3. However, it is equipped with frequency information for each field that stores a plurality of frequency information corresponding to fields of kanji / kana mixed sentences (for example, e-mail, newspapers, technical papers, address books, etc.) for each word notation. As shown in FIG. 5, which is a block diagram showing one embodiment of a voice synthesizing device using the field-specific frequency information, including the dictionary 9 shown in FIG. 4 showing an example of the stored contents of the dictionary, By the word candidate selection means 8 , Receives field information indicating the field of a kanji / kana mixed sentence (This information may be received from the outside of this device according to a user's instruction, and the input kanji / kana mixed sentence is checked to determine the field of this input sentence. The determined field information may be received.) Frequency information corresponding to this field information is obtained from this dictionary 9, and a plurality of word unit divisions received from the morphological analysis means 1 using this obtained frequency information. You may make it select the word unit division | segmentation sentence divided | segmented into the set of the notation of the word with the highest appearance frequency from among sentences.

【００２６】[0026]

【発明の効果】以上説明したように、本発明の音声合成
装置によれば、単語の表記，単語の読みとアクセント，
単語の品詞情報及び単語の表記が漢字かな混じり文中に
出現する頻度を示す頻度情報を単語の表記毎に記憶した
辞書を備え、形態素解析手段により、漢字かな混じり文
を受け辞書を用いてこの漢字かな混じり文を単語の表記
の単位に分割した単語単位分割文を出力し、単語候補選
択手段により、形態素解析手段より漢字かな混じり文に
対応する単語単位分割文を受けこの受けた単語単位分割
文が複数あるときにこの複数の単語単位分割文のうちか
ら、辞書内の頻度情報を使用して最も出現頻度の高い単
語の表記の組に分割された単語単位分割文を選択し、音
声波形発生手段により、単語候補選択手段により選択し
た単語単位分割文内のそれぞれの単語の表記に対して辞
書を使用して単語の読みとアクセントをそれぞれ付与し
て単語単位分割文の発音情報とし、この単語単位分割文
の発音情報に基づいて単語単位分割文全体のイントネー
ションとリズムとを示す韻律情報を生成し、発音情報と
韻律情報とに基づいて音声波形を生成するため、単語候
補選択手段により、辞書内の頻度情報を使用して最も出
現頻度の高い単語の表記の組に分割された単語単位分割
文を選択するので、定かに正しい単語単位分割文を選ぶ
ことができ、また、単語の表記が漢字かな混じり文中に
出現する頻度を示す頻度情報を単語の表記毎に記憶した
辞書を備えたので、漢字かな混じり辞書と英数仮名辞書
との性質の異なる二つの辞書が必要とならず、更に、単
語の表記が漢字かな混じり文中に出現する頻度を示す頻
度情報を単語の表記毎に記憶した辞書を備え、単語候補
選択手段により、辞書内の頻度情報を使用して最も出現
頻度の高い単語の表記の組に分割された単語単位分割文
を選択するので、通常はあまり使われない表記の単語が
選ばれるおそれがない。As described above, according to the speech synthesizer of the present invention, word notation, word reading and accent,
It is equipped with a dictionary that stores the word part-of-speech information and the frequency information indicating the frequency in which the word notation appears in a kanji-kana mixed sentence, and uses a dictionary that receives the kanji kana-mixed sentence by the morphological analysis means and uses this kanji character. A kana-mixed sentence is divided into word notation units, a word-unit divided sentence is output, and the word candidate selection means receives the word-unit divided sentence corresponding to the kanji kana-mixed sentence from the morphological analysis means. When there are multiple word-based split sentences, the frequency information in the dictionary is used to select the word-based split sentence that has been split into the set of words with the highest frequency of occurrence. By the means, a word unit divided sentence is added by using a dictionary to the notation of each word in the word unit divided sentence selected by the word candidate selection unit, using a dictionary. As pronunciation information, prosodic information indicating the intonation and rhythm of the entire word unit divided sentence is generated based on the pronunciation information of the word unit divided sentence, and a voice waveform is generated based on the pronunciation information and the prosody information. By the candidate selection means, the word unit divided sentence divided into the set of the notation of the word having the highest appearance frequency is selected using the frequency information in the dictionary, so that the correct word unit divided sentence can be definitely selected, In addition, because we have a dictionary that stores the frequency information that shows the frequency of word notation in a kanji-kana mixed sentence for each word notation, there are two dictionaries with different characteristics: a kanji-kana mixed dictionary and an alphanumeric kana dictionary. Further, it is not necessary, and further, a frequency information indicating the frequency of occurrence of a word notation in a kanji-kana mixed sentence is stored for each word notation, and the word candidate selecting means allows the frequency information in the dictionary to be stored. Since selecting the most occurrences it is divided into a set of frequent representation of word-word unit dividing statements using, usually there is no possibility of selected word notation not used very often.

【００２７】また、漢字かな混じり文の分野に対応する
複数の頻度情報を単語の表記毎に記憶した辞書を備え、
単語候補選択手段により、漢字かな混じり文の分野を示
す分野情報を受けこの分野情報に対応する頻度情報をこ
の辞書より得て、この得た頻度情報を使用して、形態素
解析手段より受けた複数の単語単位分割文のうちから、
最も出現頻度の高い単語の表記の組に分割された単語単
位分割文を選択するようにしたため、漢字かな混じり文
の分野に応じた頻度情報により単語単位分割文を選択す
るので、より正しい読み上げが可能となる。In addition, a dictionary storing a plurality of frequency information corresponding to the fields of kanji and kana mixed sentences for each notation of a word is provided.
The word candidate selecting means receives the field information indicating the field of the kanji / kana mixed sentence, obtains the frequency information corresponding to this field information from this dictionary, and uses the obtained frequency information to obtain the plurality of words received from the morpheme analyzing means. From the word unit split sentences of
By selecting the word unit divided sentence divided into the set of words with the highest frequency of appearance, the word unit divided sentence is selected according to the frequency information according to the field of the kanji and kana mixed sentence, so more correct reading is done. It will be possible.

[Brief description of drawings]

【図１】本発明の音声合成装置の一つの実施の形態を示
すブロック図である。FIG. 1 is a block diagram showing an embodiment of a speech synthesizer of the present invention.

【図２】漢字かな混じり文に対応する複数の単語単位分
割文とこららの頻度情報の一例を示す図である。FIG. 2 is a diagram showing an example of a plurality of word-unit divided sentences corresponding to a kanji / kana mixed sentence and their frequency information.

【図３】辞書の記憶内容の一例を示す図である。FIG. 3 is a diagram showing an example of stored contents of a dictionary.

【図４】分野別頻度情報を備えた辞書の記憶内容の一例
を示す図である。FIG. 4 is a diagram showing an example of stored contents of a dictionary having field-specific frequency information.

【図５】分野別頻度情報を使用した音声合成装置の一つ
の実施の形態を示すブロック図である。FIG. 5 is a block diagram showing an embodiment of a speech synthesizer using frequency information for each field.

【図６】従来の音声合成装置のブロック図である。FIG. 6 is a block diagram of a conventional speech synthesizer.

[Explanation of symbols]

１形態素解析手段２単語候補選択手段３辞書４発音情報生成手段５韻律情報生成手段６音声波形生成手段７音声波形データ８単語候補選択手段９辞書 1 Morphological analysis means 2 Word candidate selection means 3 dictionary 4 Pronunciation information generation means 5 Prosody information generation means 6 Voice waveform generation means 7 Voice waveform data 8 Word candidate selection means 9 dictionary

Claims

[Claims]

1. A morpheme analysis unit that receives a Kanji / Kana mixed sentence and divides the Kanji / Kana mixed sentence in word notation units and outputs the divided word unit divided sentence; and the Kanji / Kana mixed sentence from the morpheme analysis unit. When there are a plurality of word unit divided sentences that have received the word unit divided sentence corresponding to the sentence, the word unit divided sentence is divided into the set of the notation of the word having the highest appearance frequency from among the plurality of word unit divided sentences. A word candidate selection unit that selects the word unit divided sentence, and the reading and accent of the word are given to the notation of each of the words in the word unit divided sentence selected by the word candidate selection unit. The pronunciation information of the word unit divided sentence, and based on the pronunciation information of the word unit divided sentence, the intonation and rhythm of the entire word unit divided sentence are indicated. Generates prosody information, speech synthesis apparatus characterized by comprising a speech waveform generation means for generating a speech waveform, the based on the sound information and the prosody information.

2. A morpheme analysis unit which receives a Kanji / Kana mixed sentence and divides the Kanji / Kana mixed sentence in word notation units and outputs the divided word-unit divided sentence; and the Kanji / Kana mixed sentence from the morpheme analysis unit. When there are a plurality of word unit divided sentences that have received the word unit divided sentence corresponding to the sentence, from among the plurality of word unit divided sentences, the notation of each of the words in the word unit divided sentence Word candidate selecting means for selecting the word unit divided sentence having the highest evaluation in the entire word unit divided sentence of the respective predetermined frequency information indicating the frequency of appearance in the corresponding kanji / kana mixed sentence, and the word candidate Before adding the pronunciation and the accent of the word to the notation of each of the words in the word unit divided sentence selected by the selection unit, As pronunciation information of the word unit divided sentence, based on the pronunciation information of the word unit divided sentence to generate prosody information indicating the intonation and rhythm of the entire word unit divided sentence, based on the pronunciation information and the prosody information And a voice waveform generating means for generating a voice waveform.

3. The voice waveform generating means adds the reading and accent of the word to the notation of each of the words in the word unit divided sentence selected by the word candidate selecting means, and Pronunciation information generating means for generating the pronunciation information of the word unit divided sentence, and prosody information for generating the prosody information indicating the intonation and rhythm of the whole word unit divided sentence from the pronunciation information generated by the pronunciation information generating means. A voice generating a voice waveform by using a generating means, and voice waveform data which is data divided into fine units for voice-synthesizing voice based on the pronunciation information and the prosody information of the word-unit divided sentence. The speech synthesizer according to claim 1 or 2, further comprising: waveform generating means.

4. The notation of the word, the reading and accent of the word, the part-of-speech information of the word, and the frequency information indicating the frequency of occurrence of the notation of the word in a kanji-kana mixed sentence are stored for each notation of the word. A dictionary, a morpheme analysis unit that receives the Kanji / Kana mixed sentence and outputs a word unit divided sentence obtained by dividing the Kanji / Kana mixed sentence into units of the notation of the word using the dictionary; When there are a plurality of word-unit divided sentences that have received the word-unit divided sentence corresponding to a kana-mixed sentence, the word-unit divided sentence most appears from the plurality of word-unit divided sentences using the frequency information in the dictionary. In the word unit divided sentence selected by the word candidate selected unit, which selects the word unit divided sentence divided into the notation set of the words having high frequency, Based on the pronunciation information of the word unit divided sentence, the pronunciation and the accent of the word are added to the notation of each of the words to give the pronunciation information of the word unit divided sentence. Voice waveform generating means for generating prosody information indicating the intonation and rhythm of the entire word unit divided sentence and generating a voice waveform based on the pronunciation information and the prosody information. Speech synthesizer.

5. The notation of the word, the pronunciation and accent of the word, the part-of-speech information of the word, and frequency information indicating the frequency of occurrence of the notation of the word in a kanji-kana mixed sentence are stored for each notation of the word. A dictionary, a morpheme analysis unit that receives the Kanji / Kana mixed sentence and outputs a word unit divided sentence obtained by dividing the Kanji / Kana mixed sentence into units of the notation of the word using the dictionary; When there are a plurality of received word unit divided sentences corresponding to the word unit divided sentence corresponding to the kana mixed sentence, among the plurality of word unit divided sentences, the word of each of the words in the word unit divided sentence Word candidate selection for selecting the word unit divided sentence with the highest evaluation in the entire word unit divided sentence of each of the frequency information in the dictionary, corresponding to each notation And the word unit division by adding the pronunciation and accent of the word using the dictionary to the notation of each word in the word unit division sentence selected by the word candidate selection unit. As pronunciation information of a sentence, prosodic information indicating the intonation and rhythm of the entire word unit divided sentence is generated based on the pronunciation information of the word unit divided sentence, and a speech waveform based on the pronunciation information and the prosody information. A speech synthesis apparatus comprising: a speech waveform generating means for generating

6. The dictionary storing the plurality of pieces of frequency information corresponding to the field of the kanji / kana mixed sentence for each notation of the word, and field information indicating the field of the kanji / kana mixed sentence Using the frequency information in the dictionary corresponding to, from among the plurality of word unit divided sentences received from the morpheme analysis means, the divided into the set of the notation of the word with the highest appearance frequency The speech synthesis apparatus according to claim 4, further comprising: the word candidate selection unit that selects a word unit divided sentence.

7. The dictionary, which stores a plurality of the frequency information corresponding to the field of the kanji / kana mixed sentence, for each notation of the word, and field information indicating the field of the kanji / kana mixed sentence. Using the frequency information in the dictionary corresponding to, from among the plurality of word unit divided sentences received from the morpheme analysis means, respectively corresponding to the notation of each word in the word unit divided sentence 6. The speech synthesis apparatus according to claim 5, further comprising: a word candidate selection unit that selects the word unit divided sentence having the highest evaluation in the entire word unit divided sentence of each of the frequency information.

8. The speech waveform generating means uses the dictionary for the notation of each of the words in the word unit divided sentence selected by the word candidate selecting means, and uses the pronunciation and accent of the word. Pronunciation information generating means for generating pronunciation information of the word unit divided sentence by giving each, and prosodic information indicating the intonation and rhythm of the whole word unit divided sentence from the pronunciation information generated by the pronunciation information generating means. A prosody information generating unit for generating a voice waveform using voice waveform data which is data divided into fine units for voice-synthesizing voice based on the pronunciation information and the prosody information of the word-unit division sentence. The speech synthesizer according to claim 4, 5, 6 or 7, further comprising:

9. The word candidate selecting means has the largest total of the frequency information, which is a predetermined numerical value indicating the frequency of appearance of the word in the kanji-kana mixed sentence, in the entire word unit divided sentence. 5. The word-unit division sentence that is is selected.
The speech synthesizer according to 5, 6, 7 or 8.