JPH06337691A

JPH06337691A - Sound rule synthesizer

Info

Publication number: JPH06337691A
Application number: JP5126818A
Authority: JP
Inventors: Yoshinori Shiga; 芳則志賀
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-05-28
Filing date: 1993-05-28
Publication date: 1994-12-06

Abstract

PURPOSE:To automatically revise an accent type of a relevant word in a dictionary only by correctly vocalizing the word speech synthesized by the eroneous accent type for a user. CONSTITUTION:This synthesizer is constituted so that a text becoming an object of reading out by speech synthesis is preserved in a storage part 1, and phoneme information used for the speech synthesis is preserved in a buffer 10, and when the user vocalizes the relevant word correctly in the case of erring the accent type of the word at the time of reading out the text, the sound is fetched by a sound input part 12, and is analyzed by an input sound analysis part 13 to provide the phoneme information, and a part similar to the phoneme information is detected from the phoneme information in the buffer 10 by a matching part 14, and the word suitable for the similar part is extracted from the text in the storage part 1 by a word extraction part 15, and accent information dealing with the extracted word in a Japanese dictionary 3 is revised to the accent type estimated by an accent type estimation part 16 based on a input sound by a dictionary revision part 17.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、入力されたテキストを
音声の規則合成により合成音で読み上げる音声規則合成
装置に係り、特に音声合成時のアクセント付与誤りを訂
正するのに好適な音声規則合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech rule synthesizing device for reading input text as a synthesized voice by speech rule synthesizing, and in particular, speech rule synthesizing suitable for correcting an accenting error during speech synthesizing. Regarding the device.

【０００２】[0002]

【従来の技術】この種の音声規則合成装置における例え
ば日本語の規則合成は、まず入力された漢字仮名混じり
文（テキスト）に対して、言語解析辞書（日本語辞書）
を引きながら形態素解析などの言語処理を行い、更に上
記の辞書から各形態素の読みとアクセント型を得る。ア
クセントに関しては、複合語などの場合のアクセント移
動規則などを用いて文全体についてアクセントを決定す
る。2. Description of the Related Art For example, Japanese rule synthesis in this type of speech rule synthesizing apparatus is first performed on a language analysis dictionary (Japanese dictionary) for an input sentence (text) containing kanji and kana.
While performing linguistic processing such as morphological analysis, the reading and accent type of each morpheme is obtained from the above dictionary. Regarding the accent, the accent is determined for the entire sentence by using the accent movement rule in the case of a compound word or the like.

【０００３】以上のようにして得られた読みからは、例
えばケプストラムパラメータからなる音韻パラメータ
を、アクセント情報からは、ピッチパターンよりなる韻
律パラメータを生成する。From the readings obtained as described above, for example, phonological parameters consisting of cepstrum parameters are generated, and from accent information, prosodic parameters consisting of pitch patterns are generated.

【０００４】そして、有声区間では韻律パラメータから
得られる周期のパルスを、無声区間ではランダムノイズ
を、それぞれ音源信号として、音韻パラメータより得ら
れるフィルタ係数を、ＬＭＡ（対数振幅近似）フィルタ
等で構成される合成器フィルタに与えて音声を合成して
いる。Then, a pulse having a period obtained from the prosodic parameter in the voiced section and a random noise in the unvoiced section are used as sound source signals, and filter coefficients obtained from the phonological parameter are constituted by an LMA (logarithmic amplitude approximation) filter or the like. It synthesizes the voice by giving it to the synthesizer filter.

【０００５】[0005]

【発明が解決しようとする課題】上記した従来の音声規
則合成装置では、音声合成によるテキスト読み上げ時に
語のアクセント型を誤った場合、合成を一時中断して、
該当する語を言語解析辞書の見出し語の中から探し、そ
の項目のアクセント型を変更する作業が必要となる。し
かし、この方式では、１つの語のアクセント型を変更す
るにも多大な時間がかかり、極めて効率が悪いという問
題があった。In the conventional speech rule synthesizing apparatus described above, when the accent type of a word is erroneous when reading a text by speech synthesis, the synthesis is temporarily suspended,
It is necessary to find the corresponding word from the entry words in the language analysis dictionary and change the accent type of that item. However, this method has a problem in that it takes a lot of time to change the accent type of one word and is extremely inefficient.

【０００６】そこで本発明は、音声合成によるテキスト
読み上げ時に語のアクセント型を誤った場合、利用者が
正しいアクセント型でその語を発声するだけで、言語解
析辞書中の該当語のアクセント型を自動的に変更するこ
とができる音声規則合成装置を提供することを第１の目
的とする。Therefore, according to the present invention, when the accent type of a word is erroneous at the time of reading a text by voice synthesis, the user can simply say the word with the correct accent type and the accent type of the corresponding word in the language analysis dictionary is automatically generated. A first object of the present invention is to provide a speech rule synthesizing device that can be changed dynamically.

【０００７】また本発明は、利用者が発声した語が言語
解析辞書中に存在しない場合には、その語とアクセント
型を言語解析辞書に自動的に登録できる音声規則合成装
置を提供することを第２の目的とする。Further, the present invention provides a speech rule synthesizing device which can automatically register a word uttered by a user in the linguistic analysis dictionary when the word is not present in the linguistic analysis dictionary. The second purpose.

【０００８】[0008]

【課題を解決するための手段】本発明は、上記課題を解
決するために、語のアクセント情報を含む辞書（言語解
析辞書）を備え、入力されたテキストに対して当該辞書
を用いて言語解析を行い、その解析結果に従って読みと
アクセントを生成し、読みからは音韻情報を、アクセン
トからは韻律情報をそれぞれ生成し、これらの情報に従
って任意の音声を合成する音声規則合成装置において、
上記入力テキストを記憶するためのテキスト記憶手段
と、音声合成に用いられた音韻情報を記憶するための音
韻情報記憶手段と、音声を入力するための音声入力手段
と、入力音声を分析してその音韻情報を求める入力音声
分析手段と、音韻情報記憶手段に記憶されている音韻情
報の中で、入力音声分析手段により求められた入力音声
の音韻情報と類似した部分を検出する音韻情報マッチン
グ手段と、検出された類似音韻情報部分に相当する語
を、テキスト記憶手段に記憶されている入力テキストか
ら抽出する語抽出手段と、上記入力音声のアクセント型
を推定するアクセント型推定手段と、推定されたアクセ
ント型に応じて上記語抽出手段により抽出された語に対
する辞書中のアクセント情報を変更する辞書変更手段と
を備えたことを特徴とするものである。In order to solve the above problems, the present invention comprises a dictionary (language analysis dictionary) containing word accent information, and performs a language analysis on an input text using the dictionary. In the phonetic rule synthesizer that generates readings and accents according to the analysis result, generates phonological information from readings, prosody information from accents, and synthesizes arbitrary speech according to these informations.
A text storage means for storing the input text, a phonological information storage means for storing phonological information used for speech synthesis, a voice input means for inputting a voice, and an analysis of the input voice, An input speech analysis unit for obtaining phonological information, and a phonological information matching unit for detecting a part of the phonological information stored in the phonological information storage unit that is similar to the phonological information of the input speech found by the input speech analysis unit; , A word extraction means for extracting a word corresponding to the detected similar phonological information portion from the input text stored in the text storage means, and an accent type estimation means for estimating the accent type of the input speech. Dictionary changing means for changing the accent information in the dictionary for the word extracted by the word extracting means according to the accent type. Is shall.

【０００９】[0009]

【作用】上記の構成において、音声合成による入力テキ
ストの読み上げ時に辞書登録ミス等により語のアクセン
ト型を誤った場合、利用者が該当する語を正しく発声す
ると、その音声が音声入力手段により入力され、その入
力音声の音韻情報が入力音声分析手段により求められ
る。また入力音声のアクセント型がアクセント型推定手
段により推定される。In the above structure, when the accent type of a word is incorrect due to a dictionary registration error when reading the input text by voice synthesis, when the user correctly utters the corresponding word, the voice is input by the voice input means. The phoneme information of the input voice is obtained by the input voice analysis means. The accent type of the input voice is estimated by the accent type estimating means.

【００１０】一方、音声合成による読み上げの対象とな
った入力テキストはテキスト記憶手段に記憶（保存）さ
れ、音声合成に用いられた音韻情報は音韻情報記憶手段
に記憶（保存）される。On the other hand, the input text to be read aloud by speech synthesis is stored (saved) in the text storage means, and the phoneme information used for speech synthesis is stored (saved) in the phoneme information storage means.

【００１１】音韻情報マッチング手段は、入力音声分析
手段により入力音声の音韻情報が求められると、その音
韻情報と類似した部分を、音韻情報記憶手段に記憶され
ている音韻情報の中から検出する。この類似音韻情報が
検出されると、その類似音韻情報部分に相当する語が、
テキスト記憶手段に記憶されている入力テキストの中か
ら抽出手段により抽出される。The phoneme information matching means, when the phoneme information of the input voice is obtained by the input phonetic analysis means, detects a portion similar to the phoneme information from the phoneme information stored in the phoneme information storage means. When this similar phoneme information is detected, the word corresponding to the similar phoneme information part is
Extracted from the input text stored in the text storage means by the extraction means.

【００１２】辞書変更手段は、この抽出された語に対す
る上記辞書中のアクセント情報を、アクセント型推定手
段により推定されたアクセント型に応じて変更する。も
し、抽出された語が辞書に登録されていない場合には、
辞書変更手段は、その語と推定されたアクセント型とを
辞書に登録する。The dictionary changing means changes the accent information in the dictionary for the extracted word according to the accent type estimated by the accent type estimating means. If the extracted word is not registered in the dictionary,
The dictionary changing means registers the word and the estimated accent type in the dictionary.

【００１３】このように上記の構成によれば、テキスト
読み上げ時に語のアクセント型を誤った場合、利用者が
その語を正しく発声することにより、アクセント情報を
含む辞書（言語解析辞書）中の該当語のアクセント型を
自動的に変更することができ、また該当語が辞書にない
場合には、その語とアクセント型を辞書に自動登録する
ので、利用者は必要な文を上記構成の音声規則合成装置
に読ませて、アクセントを誤って読み上げた箇所の語を
正しい音声で指摘（発声）するだけで済み、専門知識が
ない利用者でも、アクセント辞書の変更・登録を極めて
簡単に且つ能率的に行うことができる。As described above, according to the above configuration, when the accent type of a word is erroneous at the time of reading a text, the user correctly utters the word, and the corresponding word in the dictionary (language analysis dictionary) containing the accent information is read. The accent type of a word can be changed automatically, and if the corresponding word is not in the dictionary, the word and the accent type are automatically registered in the dictionary. All you have to do is to let the synthesizer read it and point out (speak) the word at the point where you read the accent by mistake with the correct voice. Even users without specialized knowledge can easily and efficiently change and register the accent dictionary. Can be done.

【００１４】[0014]

【実施例】以下、図面を参照して本発明の実施例につき
説明する。図１は本発明を適用した日本語音声規則合成
装置の一実施例を示すブロック構成図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a Japanese speech rule synthesizing device to which the present invention is applied.

【００１５】図１の日本語音声規則合成装置は、入力さ
れる漢字仮名混じり文（テキスト）を逐次保存するため
の入力テキスト記憶部１、この入力文を言語解析して、
読みとアクセント情報を生成する言語解析部２、および
言語解析部２の言語解析の際に用いられる日本語辞書
（言語解析辞書）３を備えている。この日本語辞書３に
は、品詞情報などの文法情報の他、読みとアクセント型
が登録されている。The Japanese speech rule synthesizing apparatus of FIG. 1 stores an input text storage unit 1 for sequentially storing input Kanji / Kana mixed sentences (text), linguistically analyzes the input sentence,
A language analysis unit 2 for generating reading and accent information, and a Japanese dictionary (language analysis dictionary) 3 used for language analysis of the language analysis unit 2 are provided. In this Japanese dictionary 3, in addition to grammatical information such as part-of-speech information, reading and accent type are registered.

【００１６】図１の日本語音声規則合成装置はまた、言
語解析部２からの読みの情報に基づいて各音韻の継続時
間（音韻長）を決定する音韻長決定部４、言語解析部２
からの読みの情報と音韻長決定部４で決定された音韻長
を基にケプストラムパラメータからなる音韻パラメータ
を生成する音韻パラメータ生成部５、および音声素片メ
モリ６を備えている。この音声素片メモリ６には、全て
の音韻に対応する音声素片が保持されている。The Japanese speech rule synthesizing device of FIG. 1 also includes a phoneme length determining unit 4 and a language analyzing unit 2 for determining the duration (phoneme length) of each phoneme based on the reading information from the language analyzing unit 2.
It is provided with a phoneme parameter generation unit 5 for generating a phoneme parameter consisting of a cepstrum parameter based on the phoneme length information determined by the phoneme length determination unit 4 and a phoneme unit memory 6. The voice unit memory 6 holds voice units corresponding to all phonemes.

【００１７】図１の日本語音声規則合成装置はまた、言
語解析部２からのアクセント型の情報と音韻長決定部４
で決定された音韻長を用いてピッチパターンと有声・無
声情報からなる韻律パラメータを生成する韻律パラメー
タ生成部７、この韻律パラメータ生成部７で生成された
韻律パラメータに基づいて音源信号を生成し、音韻パラ
メータ生成部５で生成された音韻パラメータをフィルタ
係数として音声を合成する合成フィルタ８、この合成フ
ィルタ８で合成された音声を出力するスピーカ９、およ
び音韻パラメータバッファ１０を備えている。この音韻
パラメータバッファ１０には、音韻パラメータ生成部５
で生成された音韻パラメータが逐次保存される。The Japanese speech rule synthesizer shown in FIG. 1 also includes accent type information from the language analysis unit 2 and a phoneme length determination unit 4.
A prosody parameter generation unit 7 that generates a prosody parameter composed of a pitch pattern and voiced / unvoiced information using the phoneme length determined in step S6. A sound source signal is generated based on the prosody parameter generated by the prosody parameter generation unit 7. A synthesizing filter 8 for synthesizing voice using the phonological parameter generated by the phonological parameter generating unit 5 as a filter coefficient, a speaker 9 for outputting the voice synthesized by the synthesizing filter 8, and a phonological parameter buffer 10. The phoneme parameter generator 10 includes a phoneme parameter generator 5
The phoneme parameters generated in step 1 are sequentially saved.

【００１８】図１の日本語音声規則合成装置はまた、利
用者が発声した音声をマイクロホン１１を介して取り込
むための音声入力部１２、入力音声を分析して音声合成
時に生成される音韻パラメータ（合成音韻パラメータ）
と同じ種類のパラメータ（ここではケプストラムパラメ
ータ）の音韻パラメータ（入力音声音韻パラメータ）に
変換する入力音声分析部１３、音韻パラメータバッファ
１０に保存されている合成音韻パラメータと入力音声音
韻パラメータとをマッチングして音韻パラメータ中の最
も類似した箇所を検出するマッチング部１４、および語
抽出部１５を備えている。この語抽出部１５は、マッチ
ング部１４で検出された音韻パラメータ箇所に対応する
語を、入力テキスト記憶部１に保存されている漢字仮名
混じり文の中から抽出する。The Japanese voice rule synthesizing device of FIG. 1 also includes a voice input unit 12 for taking in a voice uttered by a user via a microphone 11, a phonological parameter generated during voice synthesis by analyzing the input voice ( Synthetic phoneme parameter)
An input speech analysis unit 13 for converting into a phoneme parameter (input speech phoneme parameter) of the same kind of parameter (here, a cepstrum parameter), and a synthetic phoneme parameter stored in the phoneme parameter buffer 10 and the input phoneme parameter are matched. The matching unit 14 for detecting the most similar part in the phoneme parameter and the word extracting unit 15 are provided. The word extracting unit 15 extracts a word corresponding to the phoneme parameter portion detected by the matching unit 14 from the kanji / kana mixed sentence stored in the input text storage unit 1.

【００１９】図１の日本語音声規則合成装置は更に、入
力音声のアクセント型を推定するアクセント型推定部１
６、および日本語辞書３に登録されている語抽出部１５
で抽出された語のアクセント型をアクセント型推定部１
６で検出されたアクセント型に変更する辞書変更部１７
を備えている。この辞書変更部１７は、該当する語が日
本語辞書３に登録されていない場合には、その語とアク
セント型推定部１６で検出されたアクセント型を日本語
辞書３に登録する。The Japanese speech rule synthesizing apparatus of FIG. 1 further includes an accent type estimating unit 1 for estimating the accent type of the input speech.
6, and the word extraction unit 15 registered in the Japanese dictionary 3
Accent type estimation unit 1 for accent type of words extracted in
Dictionary changing unit 17 for changing to the accent type detected in 6
Is equipped with. When the corresponding word is not registered in the Japanese dictionary 3, the dictionary changing unit 17 registers the word and the accent type detected by the accent type estimating unit 16 in the Japanese dictionary 3.

【００２０】次に、図１の構成の日本語音声規則合成装
置の動作を説明する。まず、本装置に、読み上げの対象
となる漢字仮名混じり文（テキスト）が入力されたもの
とする。Next, the operation of the Japanese voice rule synthesizing device having the configuration of FIG. 1 will be described. First, it is assumed that a sentence (text) mixed with kanji and kana to be read is input to this device.

【００２１】この入力文（入力テキスト）は、入力テキ
スト記憶部１に逐次保存されると共に言語解析部２に渡
される。言語解析部２は、日本語辞書３を参照しなが
ら、形態素解析等の周知の言語解析処理を行うことによ
って、入力文（入力テキスト）を形態素単位に切り分け
る。そして言語解析部２は、日本語辞書３を参照して、
それぞれの形態素に読みとアクセント型を与え、更に複
合語などの場合のアクセントの移動規則を用いて文章全
体についてアクセントを決定する。This input sentence (input text) is sequentially stored in the input text storage unit 1 and is also passed to the language analysis unit 2. The language analysis unit 2 divides the input sentence (input text) into morpheme units by performing well-known language analysis processing such as morpheme analysis with reference to the Japanese dictionary 3. Then, the language analysis unit 2 refers to the Japanese dictionary 3 and
The pronunciation and accent type are given to each morpheme, and the accent movement rule for compound words is used to determine the accent for the entire sentence.

【００２２】すると音韻長決定部４は、言語解析部２で
与えられた読みから、そこに含まれる個々の音韻につい
て、それぞれの継続時間、即ち音韻長を計算する。音韻
パラメータ生成部５は、言語解析部２で与えられた読み
に含まれる各音韻に対応した音声素片を音声素片メモリ
６から読み取り、音韻長決定部４で計算された音韻長に
一致するように素片の伸縮を行って接続し、更に接続部
で歪みを起こさないようにスムージングをかけるなどし
て音韻パラメータを生成する。この音韻パラメータ生成
部５で生成された音韻パラメータは音韻パラメータバッ
ファ１０に逐次保存される。Then, the phoneme length determining unit 4 calculates the duration, that is, the phoneme length, of each phoneme included in the reading given by the language analyzing unit 2. The phoneme parameter generation unit 5 reads a phoneme unit corresponding to each phoneme included in the reading given by the language analysis unit 2 from the phoneme unit memory 6 and matches the phoneme length calculated by the phoneme length determination unit 4. As described above, the phoneme parameters are generated by expanding and contracting the segments and connecting them, and then applying smoothing so as not to cause distortion at the connection parts. The phoneme parameters generated by the phoneme parameter generator 5 are sequentially stored in the phoneme parameter buffer 10.

【００２３】一方、韻律パラメータ生成部７は、言語解
析部２で与えられたアクセント型の情報および音韻長決
定部４で計算された音韻長の情報を用いてピッチパター
ンと有声・無声情報からなる韻律パラメータを生成す
る。On the other hand, the prosody parameter generation unit 7 is composed of pitch patterns and voiced / unvoiced information using the accent type information given by the language analysis unit 2 and the phoneme length information calculated by the phoneme length determination unit 4. Generate prosody parameters.

【００２４】さて、音韻パラメータ生成部５で生成され
た音韻パラメータと韻律パラメータ生成部７で生成され
た韻律パラメータは、合成フィルタ８に渡される。この
合成フィルタ８は、例えばＬＭＡ（対数振幅近似）フィ
ルタであり、韻律パラメータ生成部７から渡された韻律
パラメータであるピッチパターンと有声・無声情報に基
づいて、有声区間ではピッチ周期のインパルスを、無声
区間ではホワイトノイズを生成し、これを音源（音源信
号）とすると共に、音韻パラメータ生成部５から渡され
たケプストラムパラメータからなる音韻パラメータをフ
ィルタ係数として音声を合成する。合成フィルタ８で合
成された音声は、スピーカ９から出力される。The phonological parameters generated by the phonological parameter generating section 5 and the prosody parameters generated by the prosody parameter generating section 7 are passed to the synthesis filter 8. The synthesis filter 8 is, for example, an LMA (logarithmic amplitude approximation) filter, and based on the pitch pattern and the voiced / unvoiced information, which are the prosody parameters passed from the prosody parameter generation unit 7, the impulses of the pitch period in the voiced section, White noise is generated in the unvoiced section, and this is used as a sound source (sound source signal), and at the same time, the speech is synthesized using the phonological parameter consisting of the cepstrum parameter passed from the phonological parameter generating unit 5 as a filter coefficient. The voice synthesized by the synthesis filter 8 is output from the speaker 9.

【００２５】このようにして、例えば「明日の天気は雨
でしょう。」という文章（テキスト）が、図１の合成装
置により合成音で読み上げられているものとする。この
文章の中で、「雨」は、通常東京方言では、図２（ｂ）
に示すように「１型」（頭高型）で発声される。ところ
が、図１の合成装置の日本語辞書３の中の「雨」のアク
セント型が、登録ミスのために、図３（ａ）に示すよう
に「０型」（平板型）になっているものとすると、図１
の合成装置では、上記の文章を合成音で読み上げる際
に、「雨」を図２（ａ）に示すように「０型」で読んで
しまう。In this way, it is assumed that, for example, a sentence (text) "Tomorrow's weather will be raining" is read aloud by the synthesizer of FIG. In this sentence, "rain" is usually in the Tokyo dialect as shown in Figure 2 (b).
It is uttered as "Type 1" (Height type) as shown in. However, the accent type of "rain" in the Japanese dictionary 3 of the synthesizer of FIG. 1 is "0" (flat type) as shown in FIG. 3A due to a registration error. Assuming that
In the synthesizing device, when reading the above sentence with synthesized speech, "rain" is read as "type 0" as shown in FIG. 2 (a).

【００２６】このような場合、図１の合成装置では、利
用者が合成音声を聞いてアクセントが間違っていると判
断し、その間違った箇所の語（「雨」）を正しく発声す
るならば、以下に述べるように、日本語辞書３の登録ミ
スを自動訂正することができる。In such a case, in the synthesizer of FIG. 1, if the user hears the synthesized voice, determines that the accent is wrong, and correctly utters the word ("rain") at the wrong place, As described below, a registration error in the Japanese dictionary 3 can be automatically corrected.

【００２７】まず利用者が、「明日の天気は雨でしょ
う。」という文章の合成音声を聞いた結果、「雨」のア
クセントが誤っていると判断し、音声入力部１２に接続
されたマイクロホン１１に向かって、図２（ｂ）に示す
正確なアクセント、即ち「１型」のアクセントで「雨」
と発声したものとする。First, the user hears the synthesized voice of the sentence "Tomorrow's weather will be rain." As a result, the user judges that the accent of "rain" is wrong, and the microphone connected to the voice input unit 12 is detected. Towards 11, the correct accent shown in FIG. 2 (b), that is, "Type 1" accent, "rain"
It is assumed that he uttered.

【００２８】この利用者が発声した音声「雨」はマイク
ロホン１１を介して音声入力部１２に入力される。音声
入力部１２は、この入力音声をディジタル化して入力音
声分析部１３およびアクセント型推定部１６に渡す。The voice "rain" uttered by this user is input to the voice input unit 12 via the microphone 11. The voice input unit 12 digitizes this input voice and transfers it to the input voice analysis unit 13 and the accent type estimation unit 16.

【００２９】すると入力音声分析部１３は、ディジタル
化された入力音声をケプストラム分析してケプストラム
パラメータを求め、そのケプストラムパラメータをマッ
チング部１４に渡す。Then, the input voice analysis unit 13 performs a cepstrum analysis on the digitized input voice to obtain a cepstrum parameter, and passes the cepstrum parameter to the matching unit 14.

【００３０】マッチング部１４は、入力音声分析部１３
から入力音声のケプストラムパラメータ（入力音声音韻
パラメータ）を受け取ると、音韻パラメータバッファ１
０に保存されている（音韻パラメータ生成部５で生成さ
れて音声合成に用いられた）ケプストラムパラメータ
（合成音韻パラメータ）の中から、上記受け取った利用
者発声の音声（入力音声）「雨」のケプストラムパラメ
ータに最も類似した箇所を、例えばＤＰ（Dynamic Prog
ramming ）マッチングの手法により検出する。The matching unit 14 is an input voice analysis unit 13
When the cepstrum parameter (input speech phoneme parameter) of the input speech is received from the phoneme parameter buffer 1
Among the cepstrum parameters (synthesized phoneme parameters) stored in 0 (generated by the phoneme parameter generation unit 5 and used for voice synthesis), the received voice (input voice) “rain” of the user's voice is received. For example, DP (Dynamic Prog
ramming) Detect by matching method.

【００３１】すると語抽出部１５は、入力テキスト記憶
部１に保存された漢字仮名混じり文（入力テキスト）の
中のいずれの語が、マッチング部１４により検出された
利用者発声の音声（入力音声）「雨」との類似箇所に相
当するかを判断して、その語「雨」を抽出する。そして
語抽出部１５は、抽出した語「雨」の文字コードを辞書
変更部１７に渡す。Then, the word extraction unit 15 determines which of the words in the kanji / kana mixed sentence (input text) stored in the input text storage unit 1 is the voice (input voice) of the user detected by the matching unit 14. ) The word "rain" is extracted by judging whether it corresponds to a similar part to "rain". Then, the word extracting unit 15 passes the extracted character code of the word “rain” to the dictionary changing unit 17.

【００３２】一方、アクセント型推定部１６は、音声入
力部１２からのディジタル化された入力音声「雨」をピ
ッチ分析して、そのピッチ周波数の軌跡を求める。そし
てアクセント型推定部１６は、このピッチ周波数の軌跡
から、入力音声「雨」のアクセント型を推定し、そのア
クセント型（ここでは「１型」）を辞書変更部１７に渡
す。On the other hand, the accent type estimation unit 16 performs pitch analysis on the digitized input voice "rain" from the voice input unit 12 to obtain the locus of the pitch frequency. Then, the accent type estimating unit 16 estimates the accent type of the input speech “rain” from the locus of the pitch frequency, and passes the accent type (here, “Type 1”) to the dictionary changing unit 17.

【００３３】辞書変更部１７は、語抽出部１５から渡さ
れた語「雨」を日本語辞書３から探し、その語「雨」が
図３（ａ）に示すように登録されている場合には、その
語「雨」についてのアクセント型の項目の「０型」を、
アクセント型推定部１６により推定されたアクセント型
「１型」に図３（ｂ）に示すように変更する。The dictionary changing unit 17 searches the Japanese dictionary 3 for the word "rain" passed from the word extracting unit 15, and when the word "rain" is registered as shown in FIG. 3 (a). Is the accent type item "0" for the word "rain",
The accent type "1 type" estimated by the accent type estimating unit 16 is changed as shown in FIG.

【００３４】以上のように、日本語辞書３内の語「雨」
についてのアクセント型を「０型」から「１型」に変更
した結果、次回、図１の合成装置が入力文の読み上げを
行うときには、語「雨」は正しいアクセント型である
「１型」で合成されることになる。As described above, the word "rain" in the Japanese dictionary 3
As a result of changing the accent type for "0 type" from "1 type", the next time the synthesizer of FIG. 1 reads the input sentence, the word "rain" is the correct accent type "1 type". Will be synthesized.

【００３５】なお、語抽出部１５により抽出された語
が、日本語辞書３に登録されていない（未知語或いは複
合語の）場合には、辞書変更部１７は、その語とアクセ
ント型推定部１６により推定されたアクセント型を、日
本語辞書３に新規登録する。If the word extracted by the word extracting unit 15 is not registered in the Japanese dictionary 3 (unknown word or compound word), the dictionary changing unit 17 determines the word and the accent type estimating unit. The accent type estimated by 16 is newly registered in the Japanese dictionary 3.

【００３６】以上、本発明の一実施例につき説明した
が、本発明は前記実施例に限定されるものではない。例
えば、前記実施例では、合成パラメータとしてケプスト
ラムパラメータを使用しているが、ＬＰＣ（線形予測係
数）等の別のパラメータでもよい。Although one embodiment of the present invention has been described above, the present invention is not limited to the above embodiment. For example, in the above embodiment, the cepstrum parameter is used as the synthesis parameter, but another parameter such as LPC (linear prediction coefficient) may be used.

【００３７】また、マッチング部１４では、音韻情報
（音韻パラメータ）のマッチングの手法にＤＰマッチン
グ法を適用しているが、これに限るものではない。また
本発明は、前記実施例のように、マッチングの際に音韻
情報として合成パラメータを用いることも限定しておら
ず、したがって、入力音声を一旦入力音声分析部１３で
音声認識して音韻系列にした後、言語解析部２の出力の
読み情報とマッチングして該当語を推定してもよい。Further, although the matching section 14 applies the DP matching method to the matching method of the phoneme information (phoneme parameters), the invention is not limited to this. Further, the present invention does not limit the use of the synthesis parameter as the phoneme information at the time of matching as in the above embodiment. Therefore, the input voice is once recognized by the input voice analysis unit 13 and converted into a phoneme sequence. After that, the word may be estimated by matching with the reading information output from the language analysis unit 2.

【００３８】更に本発明は、日本語文章の音声合成に限
らず、英文など、日本語以外の文章の音声合成（読み上
げ）のための音声規則合成装置に応用可能である。要す
るに本発明は、その要旨を逸脱しない範囲で種々変形実
施可能である。Furthermore, the present invention is not limited to speech synthesis of Japanese sentences, but can be applied to a speech rule synthesis device for speech synthesis (speech) of sentences other than Japanese such as English sentences. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【００３９】[0039]

【発明の効果】以上説明したように本発明によれば、音
声合成による読み上げの対象となった入力テキストおよ
び音声合成に用いた音韻情報をそれぞれ保存しておく構
成とすると共に、利用者が発声する音声を分析して、そ
の音韻情報を求めると共にアクセント型を推定し、更に
その音韻情報に類似した部分を、保存しておいた音韻情
報の中から検出した後、その類似音韻情報部分に相当す
る語を、保存しておいた入力テキストの中から抽出し、
この抽出した語に対する辞書（言語解析辞書）中のアク
セント情報を、上記推定したアクセント型に変更する、
或いは該当語が辞書にない場合には、その語と推定した
アクセント型とを辞書に登録する構成としたので、次の
効果を得ることができる。As described above, according to the present invention, the input text to be read aloud by voice synthesis and the phonological information used for voice synthesis are respectively stored, and the user utters them. The phoneme is analyzed to obtain its phoneme information, the accent type is estimated, and a part similar to the phoneme information is detected from the stored phoneme information. Extract the words you want to use from the saved input text,
The accent information in the dictionary (language analysis dictionary) for the extracted word is changed to the estimated accent type,
Alternatively, when the corresponding word is not in the dictionary, the word and the estimated accent type are registered in the dictionary, so that the following effects can be obtained.

【００４０】（１）音声合成による入力テキスト（文
章）の読み上げ時に辞書登録ミス等により語のアクセン
ト型を誤った場合、利用者が該当する語を正しく発声す
ると、辞書中の、アクセントを誤って読み上げられた語
のアクセント型が、利用者の発声した語から推定された
正しいアクセント型に自動的に変更できる。(1) If the accent type of a word is incorrect due to an error in registering the dictionary when reading the input text (sentence) by voice synthesis, if the user correctly utters the corresponding word, the accent in the dictionary is incorrect. The accent type of the read word can be automatically changed to the correct accent type estimated from the word uttered by the user.

【００４１】（２）利用者が発声した語が辞書中に存在
しない場合には、その語とアクセント型が同辞書に自動
的に登録できる。（３）上記（１），（２）から、利用者は、必要な文を
音声規則合成装置に読ませて、アクセントを誤って読み
上げた箇所の語を正しい音声で指摘（発声）するだけで
済み、利用者に専門知識がなくても、アクセント辞書の
変更・登録を極めて簡単に行うことができ、能率的であ
る。(2) When the word uttered by the user does not exist in the dictionary, the word and the accent type can be automatically registered in the dictionary. (3) From the above (1) and (2), the user simply causes the phonetic rule synthesizing device to read the necessary sentence, and points out (speaks) the word at the position where the accent is read aloud with the correct voice. Even if the user has no specialized knowledge, the accent dictionary can be changed / registered very easily, which is efficient.

[Brief description of drawings]

【図１】本発明を適用した日本語音声規則合成装置の一
実施例を示すブロック構成図。FIG. 1 is a block diagram showing an embodiment of a Japanese speech rule synthesizing device to which the present invention is applied.

【図２】誤りがあった合成音声の語のアクセントと、利
用者が発声した語のアクセントとを対比して示す図。FIG. 2 is a diagram showing the accent of a word of a synthetic voice in which an error has occurred and the accent of a word uttered by a user.

【図３】図２に対応する日本語辞書３の変更前と変更後
の内容例を示す図。FIG. 3 is a diagram showing examples of contents of the Japanese dictionary 3 corresponding to FIG. 2 before and after the change.

[Explanation of symbols]

１…入力テキスト記憶部（テキスト記憶手段）、２…言
語解析部、３…日本語辞書（言語解析辞書）、４…音韻
長決定部、５…音韻パラメータ生成部、６…音声素片メ
モリ、７…韻律パラメータ生成部、８…合成フィルタ、
９…スピーカ、１０…音韻パラメータバッファ（音韻情
報記憶手段）、１１…マイクロホン、１２…音声入力
部、１３…入力音声分析部、１４…マッチング部、１５
…語抽出部、１６…アクセント型推定部、１７…辞書変
更部。DESCRIPTION OF SYMBOLS 1 ... Input text storage unit (text storage unit), 2 ... Language analysis unit, 3 ... Japanese dictionary (language analysis dictionary), 4 ... Phoneme length determination unit, 5 ... Phoneme parameter generation unit, 6 ... Speech unit memory, 7 ... Prosody parameter generator, 8 ... Synthesis filter,
9 ... Speaker, 10 ... Phoneme parameter buffer (phoneme information storage means), 11 ... Microphone, 12 ... Voice input part, 13 ... Input voice analysis part, 14 ... Matching part, 15
... word extracting unit, 16 ... accent type estimating unit, 17 ... dictionary changing unit.

Claims

[Claims]

1. A dictionary including word accent information is provided, linguistic analysis is performed on the input text using the dictionary, and reading and accent are generated according to the analysis result, and phonological information is obtained from this reading. , In the speech rule synthesizing device for respectively generating prosody information from accents and synthesizing arbitrary voices according to these information, a text storage means for storing the input text, and phonological information used for the speech synthesis. Phoneme information storage means for storing, voice input means for inputting voice, input voice analysis means for analyzing voice input by the voice input means to obtain phoneme information of the input voice, and the phoneme information Of the phoneme information stored in the storage means, a part similar to the phoneme information of the input voice obtained by the input voice analysis means is detected. Phonological information matching means, word extracting means for extracting a word corresponding to the similar phonological information portion detected by the phonological information matching means from the input text stored in the text storage means, and input by the voice input means. Accent type estimating means for estimating the accent type of the speech that has been produced, and a dictionary for changing accent information in the dictionary for the word extracted by the word extracting means according to the accent type estimated by the accent type estimating means. A voice rule synthesizing device comprising: a changing unit.

2. The dictionary changing means, when the word extracted by the word extracting means is not registered in the dictionary, the dictionary and the accent type estimated by the accent type estimating means. The voice rule synthesizing device according to claim 1, wherein