JPH07160290A

JPH07160290A - Sound synthesizing system

Info

Publication number: JPH07160290A
Application number: JP5329588A
Authority: JP
Inventors: Takahiro Nomura; 隆裕野村; Yotaro Hachitsuka; 陽太郎八塚
Original assignee: Kokusai Denshin Denwa KK
Current assignee: KDDI Corp
Priority date: 1993-12-02
Filing date: 1993-12-02
Publication date: 1995-06-23

Abstract

PURPOSE:To obtain a synthesized sound excellent in clearness and naturalness and to reduce the synthesis processing time by automatically adding stored sound waveform and attribute information corresponding to the stored sound waveform to an information storage section or updating them. CONSTITUTION:An attribute information storage section 3 and a waveform information storage section 5 control a sound waveform using phoneme information and accent information as index, it is judged whether or not the stored sound waveform and its attribute information are to be added during sound synthesis and if judged to be added, add these. Moreover, from a connected word information storage section 8 in which connected words are stored taking correspondence with their phoneme information, connection word having no correspondence with the stored sound waveform is retrieved and the sound waveform having correspondence with the connected word is generated using the stored sound waveform and the attribute information, or the sound waveform inputted from an external and its attribute information are stored as a stored sound waveform and its attribute information. Thus, a stored sound waveform and its attribute information are automatically added and updated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声の合成方式に関す
るもので、特に、音声単位の編集、結合及び変形によっ
て音声合成を行う方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing system, and more particularly to a system for synthesizing voice by editing, combining and transforming voice units.

【０００２】[0002]

【従来の技術】音声合成方式とは、一般に、音韻情報と
韻律情報を入力し、それに対応する合成音声波形を出力
する方式である。ここで音韻情報とは、単語やテキスト
等の表記文字（例えば「経験」）の読み仮名（例えば
「経験」の読み仮名は「ケイケン」）から生成される音
声表記（例えば「ケイケン」の発音は「ケーケン」、表
記は”ｋｅｅｋｅｎ”）の情報と定義する。また韻律情
報とは、アクセント句・呼気段落の切れ目、単語・文節
結合結果のアクセントの情報、及びイントネーションの
情報と定義する。この韻律情報は上記表記文字に対す
る、アクセントや形態素・構文解析結果等から生成され
る。このような音声合成方式は、主に波形編集方式と規
則合成方式に大別できる。以下に、図面を用いて従来方
式について説明する。2. Description of the Related Art In general, a speech synthesis method is a method in which phonological information and prosody information are input and a synthetic speech waveform corresponding to them is output. Here, the phonological information is a phonetic transcription (for example, pronunciation of “Keiken” is generated from a phonetic transcription of a written character (for example, “experience”) such as a word or a text (for example, “Keiken” is a pronunciation of “experience”) “Kaken”, notation is defined as information of “Keken”). The prosody information is defined as accent phrases / breaks of breath paragraphs, accent information of word / clause combination results, and intonation information. This prosody information is generated from accents, morphemes, syntactic analysis results, etc. for the above notation characters. Such speech synthesis methods can be roughly classified into a waveform editing method and a rule synthesizing method. The conventional method will be described below with reference to the drawings.

【０００３】先ず、従来の波形編集方式について説明す
る。波形編集方式は、予め人間が発声した音声を、文、
単語、もしくは文節等を単位にとって蓄積しておき、入
力された音韻情報もしくは表記文字情報に基づき、それ
に該当する音声を読みだし接ぎ合わせることにより、音
声波形を合成する方式である。蓄積されている音声波形
は、文、単語、もしくは文節等の合成単位の音韻情報も
しくは表記文字情報をインデックスとして管理されてい
る。First, a conventional waveform editing method will be described. The waveform editing method uses a sentence,
This is a method of synthesizing a speech waveform by accumulating words or phrases in units and reading out and splicing the corresponding speech based on the input phonological information or written character information. The accumulated speech waveform is managed by using phonological information or written character information of a synthesis unit such as a sentence, a word, or a phrase as an index.

【０００４】図５は、波形編集方式の基本的なブロック
図である。図５において、５１は入力端子、５２は合成
単位分解部、５３は音声波形編集部、５４は音声波形蓄
積部、５５は出力端子である。入力端子５１は、入力さ
れた音韻情報を合成単位分解部５２に出力する。合成単
位分解部５２は、入力端子５１からの音韻情報を合成単
位に定形的に分解し音声波形編集部５３へ出力する。音
声波形蓄積部５４は、予め人間が発声した音声から合成
単位で切りだした音声波形を蓄積している。音声波形編
集部５３は、合成単位を音声波形蓄積部５４に出力す
る。音声波形蓄積部５４は、それぞれの合成単位に対応
する音声波形を音声波形編集部５３に出力する。音声波
形編集部５３は、音声波形蓄積部５４からの音声波形を
接ぎ合わせて、出力端子５４を介してその結果の波形を
合成音声波形として出力する。尚、音声波形蓄積部５４
への音声波形の蓄積手段としては、切りだした波形その
ものを蓄積する手段以外に、その波形の特徴パラメータ
を蓄積する手段もある。後者の場合には、音声波形編集
部５３は、特徴パラメータを用いて音声波形を合成した
後その音声波形を接ぎ合わせる。また、音声波形編集部
５３は、合成音声品質の向上のために接合部分の音声波
形を整形する場合もある。FIG. 5 is a basic block diagram of the waveform editing system. In FIG. 5, 51 is an input terminal, 52 is a synthesis unit decomposition unit, 53 is a voice waveform editing unit, 54 is a voice waveform storage unit, and 55 is an output terminal. The input terminal 51 outputs the input phoneme information to the synthesis unit decomposition unit 52. The synthesis unit decomposition unit 52 decomposes the phoneme information from the input terminal 51 into synthesis units, and outputs it to the speech waveform editing unit 53. The voice waveform storage unit 54 stores a voice waveform obtained by cutting out a voice uttered by a person in advance in a synthesis unit. The voice waveform editing unit 53 outputs the synthesis unit to the voice waveform storage unit 54. The voice waveform storage unit 54 outputs the voice waveform corresponding to each synthesis unit to the voice waveform editing unit 53. The voice waveform editing unit 53 stitches the voice waveforms from the voice waveform accumulating unit 54, and outputs the resultant waveform as a synthesized voice waveform via the output terminal 54. The voice waveform storage unit 54
As a means for accumulating the voice waveform in the above, there is a means for accumulating the characteristic parameters of the waveform in addition to the means for accumulating the cut-out waveform itself. In the latter case, the voice waveform editing unit 53 synthesizes the voice waveforms using the characteristic parameters and then joins the voice waveforms together. In addition, the voice waveform editing unit 53 may shape the voice waveform of the joining portion in order to improve the quality of synthesized voice.

【０００５】この従来方式は、蓄積した音声波形をその
まま接ぎ合わせ出力しているため、合成音声品質がよい
という利点を有する。しかし、音声波形蓄積部５４に蓄
積されている音声波形の組合せ以外の波形を合成できな
いという欠点を有する。また、任意の音声波形を合成す
るためには、音声波形蓄積部５４へ莫大な量の音声波形
を蓄積する必要があり、事実上任意の音声波形の合成は
不可能である。This conventional method has an advantage that the synthesized speech quality is good because the accumulated speech waveforms are directly spliced and output. However, there is a drawback in that waveforms other than combinations of voice waveforms stored in the voice waveform storage unit 54 cannot be synthesized. Further, in order to synthesize an arbitrary voice waveform, it is necessary to store a huge amount of voice waveforms in the voice waveform storage section 54, and it is practically impossible to synthesize any voice waveform.

【０００６】次に、従来の規則合成方式について説明す
る。規則合成方式は、予め音素や音節程度の音声波形の
特徴パラメータ、及びそれらを接続するための韻律情報
に関する規則を蓄積しておき、入力された音韻情報及び
韻律情報に基づき、その規則を用いて特徴パラメータか
ら音声波形を合成する方式である。蓄積されている特徴
パラメータは、音素や音節程度の合成単位の音韻情報を
インデックスとして管理されている。Next, a conventional rule combining method will be described. The rule synthesizing method stores characteristic parameters of a phoneme or syllable-like speech waveform and prosody information for connecting them in advance, and uses the rules based on the inputted phonological information and prosody information. This is a method of synthesizing a speech waveform from characteristic parameters. The accumulated characteristic parameters are managed using as an index the phoneme information of a synthesis unit such as a phoneme or a syllable.

【０００７】図６は、規則合成方式の基本的なブロック
図である。図６において、６１及び６２は入力端子、６
３は合成単位分解部、６４は特徴パラメータ蓄積部、６
５は、音声波形合成部、６６は韻律規則蓄積部、６７は
出力端子である。FIG. 6 is a basic block diagram of the rule composition method. In FIG. 6, 61 and 62 are input terminals, and 6
3 is a synthesis unit decomposition unit, 64 is a characteristic parameter storage unit, 6
Reference numeral 5 is a voice waveform synthesis unit, 66 is a prosody rule storage unit, and 67 is an output terminal.

【０００８】入力端子６１は、入力された音韻情報を合
成単位分解部６３に出力する。特徴パラメータ蓄積部６
４は、音節程度の合成単位（例えば、ＣＶ、ＶＣ、ＶＣ
Ｖ、もしくはＣＶＣ；Ｃは子音、Ｖは母音を表す）に対
応する音声波形の特徴パラメータを蓄積している。合成
単位分解部６３は、入力端子６１からの音韻情報を合成
単位に分解し、特徴パラメータ蓄積部６４に出力する。
特徴パラメータ蓄積部６４は、それぞれの合成単位に対
応する特徴パラメータを合成単位分解部６３に出力す
る。合成単位分解６３は、特徴パラメータ蓄積部６４か
らの特徴パラメータを音声波形合成部６５に出力する。
韻律規則蓄積部６６は、韻律情報と韻律パラメータとを
対応させた韻律制御規則を蓄積している。入力端子６２
は、入力された韻律情報を音声波形合成部６５に出力す
る。音声波形合成部６５は、入力端子６２からの韻律情
報を韻律規則蓄積部６６に出力する。韻律規則蓄積部６
６は、その韻律情報に対応した韻律制御パラメータを音
声波形合成部６５に出力する。音声波形合成部６５は、
合成単位分解部６３からの特徴パラメータ及び韻律規則
蓄積部６６からの韻律制御パラメータを用いて合成音声
を生成し、出力端子６７を介して出力する。The input terminal 61 outputs the input phoneme information to the synthesis unit decomposition unit 63. Characteristic parameter storage unit 6
4 is a syllable synthesis unit (for example, CV, VC, VC
V, or CVC; C represents a consonant, and V represents a vowel). The synthesis unit decomposition unit 63 decomposes the phoneme information from the input terminal 61 into synthesis units, and outputs it to the characteristic parameter storage unit 64.
The characteristic parameter storage unit 64 outputs the characteristic parameter corresponding to each synthesis unit to the synthesis unit decomposition unit 63. The synthesis unit decomposition 63 outputs the feature parameter from the feature parameter storage unit 64 to the speech waveform synthesis unit 65.
The prosody rule storage unit 66 stores prosody control rules in which prosody information and prosody parameters are associated with each other. Input terminal 62
Outputs the input prosody information to the speech waveform synthesizer 65. The voice waveform synthesizer 65 outputs the prosody information from the input terminal 62 to the prosody rule accumulator 66. Prosody rule storage unit 6
6 outputs a prosody control parameter corresponding to the prosody information to the speech waveform synthesizer 65. The voice waveform synthesizer 65
A synthetic voice is generated using the characteristic parameter from the synthesis unit decomposition unit 63 and the prosody control parameter from the prosody rule storage unit 66, and is output via the output terminal 67.

【０００９】この従来方式は、音素や音節程度を合成単
位としてそれらの特徴パラメータ及び韻律制御パラメー
タを全て蓄積しているため、任意の音声波形の合成が可
能であるという利点を有する。即ち、原則的には一度構
築した特徴パラメータ蓄積部６４の内容を更新する必要
がない。しかし、波形の接合部が非常に多くその不連続
性の補正が不完全なため合成音声品質が劣化し、かつ補
正処理時間が増大するという欠点を有する。This conventional method has an advantage that any speech waveform can be synthesized because all the characteristic parameters and the prosody control parameters are stored with a phoneme or a syllable degree as a synthesis unit. That is, in principle, it is not necessary to update the contents of the characteristic parameter storage unit 64 once constructed. However, it has the drawbacks that the quality of synthesized speech is deteriorated and the correction processing time is increased because the number of joints in the waveform is so large that the correction of the discontinuity is incomplete.

【００１０】上記２つの方式以外の従来方式として、複
合音声単位を用いた音声合成方式が提案されている。こ
れは、音素レベルから音節、形態素、単語、文節さらに
文章に至るまでの種々の長さ、構造を有する複合音声単
位を蓄積し、複合音声単位及びそれに含まれる部分単位
を用いて音声波形を合成することを特徴とする方式であ
る。この方式の説明は、複合合成単位及びその部分単位
の音声波形の蓄積方法について特に言及していない。但
し、同説明に記載されている一具体例では、それらの音
声波形は音韻情報をインデックスとして管理されてい
る。As a conventional method other than the above two methods, a voice synthesis method using a composite voice unit has been proposed. This is to store compound speech units having various lengths and structures from phoneme level to syllables, morphemes, words, phrases and sentences, and synthesize speech waveforms using compound speech units and sub-units included in them. The method is characterized by The description of this system does not particularly mention the method of accumulating the speech waveform of the composite synthesis unit and the partial unit thereof. However, in one specific example described in the same description, those speech waveforms are managed using the phoneme information as an index.

【００１１】図７は、この従来方式の概略ブロック図で
ある。図７において、７１は入力端子、７２は音声単位
選択部、７３は音声単位ファイル、７４は音声制御知識
ファイル、７５は単位接合部、７６は合成波形生成部、
７７は出力端子である。入力端子７１は、入力された
音韻情報及び韻律情報を音声単位選択部７２に出力す
る。音声単位ファイル７３は、予め採取された種々の音
声単位列を蓄積している。音声知識ファイル７４には、
抽出環境、基本周波数、音韻時間長などの具体的合成条
件を与える規則、諸表が蓄積されている。音声単位選択
部７２はこの２つの情報を基にして、音声単位ファイル
７３から合成に適した音声単位列を選択する。その音声
単位の選択基準は、音声制御知識ファイル７４から、当
該合成箇所の音韻環境、韻律情報を基にして、望ましい
合成単位の抽出環境、基本周波数、音韻時間等の具体的
構成条件である。音声単位選択部７２で選択された音声
単位列は単位接合部７５に与えられ、選択された単位同
士の接合が行われる。結合された音声単位は合成波形生
成部に与えられ、単位結合部７５で得られた合成単位系
列を基にして合成波形を生成する。そして、生成された
音声は出力端子７７に出力される。尚、この従来方式の
一具体例では、音声単位選択部７２が音声単位を選択す
る際の音声単位ファイル７３上のインデックスとして、
音韻情報のみが用いられている。FIG. 7 is a schematic block diagram of this conventional system. In FIG. 7, 71 is an input terminal, 72 is a voice unit selection unit, 73 is a voice unit file, 74 is a voice control knowledge file, 75 is a unit joining unit, 76 is a synthetic waveform generation unit,
Reference numeral 77 is an output terminal. The input terminal 71 outputs the input phoneme information and prosody information to the voice unit selection unit 72. The voice unit file 73 stores various voice unit strings collected in advance. In the voice knowledge file 74,
The rules and tables that give concrete synthesis conditions such as extraction environment, fundamental frequency, and phoneme duration are accumulated. The voice unit selection unit 72 selects a voice unit string suitable for synthesis from the voice unit file 73 based on these two pieces of information. The selection criterion of the voice unit is a specific configuration condition such as a desired extraction environment of the synthesis unit, a fundamental frequency, and a phoneme time from the voice control knowledge file 74 based on the phonological environment and the prosody information of the synthesizing portion. The voice unit sequence selected by the voice unit selection unit 72 is given to the unit joining unit 75, and the selected units are joined. The combined voice units are given to the synthetic waveform generation unit, and generate a synthetic waveform based on the synthetic unit series obtained by the unit combination unit 75. Then, the generated sound is output to the output terminal 77. Incidentally, in one specific example of this conventional method, as an index on the voice unit file 73 when the voice unit selection unit 72 selects a voice unit,
Only phonological information is used.

【００１２】この従来方式の利点としては、蓄積されて
いる音声波形自体だけでなくその一部分を用いることに
より、規則合成方式より音声品質の明瞭性がよく、かつ
波形編集方式よりも多くの任意の音声波形を合成できる
ことがあげられる。しかし、所望の音声波形そのものが
音声単位ファイル７３に蓄積されていない場合、その音
声波形を音声単位ファイル７３中の音声単位により複合
しなくてはならない。ところが、音声単位ファイル７３
には、音素レベルから音節、形態素、単語、文節さらに
文章に至るまでの種々の長さ、構造を有する多くの複合
音声単位が蓄積されているため、音声単位選択部７２が
そのファイル７３から選択してくる音声単位列の数は、
選択された音声単位の組合せの数に依存する。即ち、そ
の組合せの数が膨大となった場合には、音声単位列の選
択に要する時間も長時間化し、特にリアルタイムの音声
合成処理を要求されるシステムへの実装が難しくなると
いう欠点を有する。As an advantage of this conventional method, by using not only the accumulated voice waveform itself but also a part thereof, the voice quality is clearer than that of the rule synthesis method, and more arbitrary than the waveform editing method. It is possible to synthesize voice waveforms. However, if the desired voice waveform itself is not stored in the voice unit file 73, the voice waveform must be combined with the voice unit in the voice unit file 73. However, the voice unit file 73
Since many complex speech units having various lengths and structures from a phoneme level to syllables, morphemes, words, phrases, and sentences are stored in the speech unit, the speech unit selection unit 72 selects from the file 73. The number of voice unit strings
It depends on the number of voice unit combinations selected. That is, if the number of combinations becomes enormous, the time required for selecting the voice unit sequence also becomes long, and it becomes difficult to implement the system in a system that requires real-time voice synthesis processing.

【００１３】[0013]

【発明が解決しようとする課題】以上のように、従来方
式は、合成音声品質を重視する反面、任意の音声波形を
合成できないという欠点を有したり、任意の音声波形を
合成できる反面、合成音声品質に劣化が感じられるとい
う欠点を有したり、品質のよい任意の音声波形を比較的
多く合成できる反面、膨大な合成処理時間が必要である
という欠点を有していた。As described above, the conventional method emphasizes the synthesized voice quality, but has the drawback that it cannot synthesize an arbitrary speech waveform, and it can synthesize any speech waveform, while it does not. It has a drawback that the voice quality is felt to be deteriorated and a relatively large amount of arbitrary high-quality voice waveforms can be synthesized, but it has a drawback that a huge amount of synthesis processing time is required.

【００１４】本発明は、このような従来方式の問題点及
びメモリのコストの廉価傾向を鑑み、メモリをある程度
使用することを前提として、合成音声品質の明瞭性や自
然性が良好で、波形編集方式より多くの任意の音声波形
を合成できることを満足しながら、合成処理時間を短く
できる音声合成方式を提供することを目的とする。In view of the problems of the conventional method and the tendency of the cost of the memory to be low, the present invention assumes that the memory is used to some extent, and the synthesized voice quality has good clarity and naturalness, and waveform editing. It is an object of the present invention to provide a speech synthesis method capable of shortening the synthesis processing time while satisfying that more arbitrary speech waveforms can be synthesized.

【００１５】[0015]

【課題を解決するための手段】本発明の特徴は、予め単
語や文節程度の単位で切り出した蓄積音声波形を該蓄積
音声波形の音韻情報から成る属性情報と対応付けて蓄
積、管理する情報蓄積部を有し、所望の音声に対応する
入力音韻情報をそのまま又は複数に分割した入力情報と
該属性情報の全部もしくは一部である合成単位属性情報
とを照合し該合成単位属性情報に対応する前記蓄積音声
波形を検索し、前記蓄積音声波形を接ぎ合わせることに
より該音声を合成する方式であって、前記音声の合成中
に前記蓄積音声波形及び前記蓄積音声波形に対応した前
記属性情報を追加するかを判定し、追加と判定された場
合には前記蓄積音声波形及び前記蓄積音声波形に対応し
た前記属性情報を前記情報蓄積部へ追加し、連単語を該
連単語の音韻情報と対応させて蓄積している連単語情報
蓄積部に対して前記蓄積音声波形と対応付られていない
前記連単語を検索し、対応付られていない前記連単語に
対応する音声波形を、前記蓄積音声波形及び前記蓄積音
声波形の前記属性情報を用いて生成し、前記連単語に対
応する生成した音声波形及びその属性情報を前記情報蓄
積部へ追加し、外部から入力された音声波形及び該音声
波形の属性情報を前記蓄積音声波形及び前記蓄積音声波
形の前記属性情報として前記情報蓄積部へ追加又は更新
することにより、自動的に前記蓄積音声波形及び前記蓄
積音声波形に対応した前記属性情報を前記情報蓄積部へ
追加又は更新する音声合成方式にある。A feature of the present invention is information storage for storing and managing a stored speech waveform previously cut out in units of words or phrases in association with attribute information consisting of phoneme information of the stored speech waveform. A part of the input phoneme information corresponding to a desired voice, or input information obtained by dividing the input phoneme information into a plurality of pieces and the synthesis unit attribute information which is all or a part of the attribute information, and corresponds to the synthesis unit attribute information. A method of synthesizing the voice by searching the stored voice waveform and joining the stored voice waveforms, wherein the stored voice waveform and the attribute information corresponding to the stored voice waveform are added during the synthesis of the voice. If it is determined to be added, the accumulated voice waveform and the attribute information corresponding to the accumulated voice waveform are added to the information storage unit, and a consecutive word is set as phoneme information of the consecutive word. The continuous word information accumulating unit that is accumulating in response to the accumulated speech waveform is searched for the consecutive words, and the speech waveform corresponding to the uncorrelated consecutive words is converted into the accumulated speech. A waveform generated using the attribute information of the waveform and the accumulated speech waveform, the generated speech waveform corresponding to the compound word and the attribute information thereof are added to the information storage unit, and the speech waveform input from the outside and the speech waveform By adding or updating the attribute information of the stored voice waveform and the stored voice waveform as the attribute information of the stored voice waveform to the stored voice waveform and the attribute information corresponding to the stored voice waveform. This is a speech synthesis method of adding or updating to the information storage unit.

【００１６】[0016]

【実施例】以下に、図面を用いて本発明の実施例につい
て詳細に説明する。Embodiments of the present invention will be described in detail below with reference to the drawings.

【００１７】図１は本発明の音声合成方式の一実施例の
概略ブロック図である。図１において、１は入力端子、
２は合成単位選択部、３は属性情報蓄積部、４は音声合
成部、５は波形情報蓄積部、６は出力端子、７は蓄積情
報更新部、８は連単語情報蓄積部、９は情報更新判定
部、１０は入力端子である。本発明は、属性情報蓄積部
３及び波形情報蓄積部５で、音韻情報及びアクセント情
報をインデックスとして音声波形を管理しており、接続
部分の前後を考慮しながら蓄積波形の全てもしくは一部
を選択し接ぎ合わせることにより合成音声を生成する場
合において、属性情報蓄積部３及び波形情報蓄積部５の
内容を自動的に更新するところに特徴がある。この蓄積
内容の更新は音声合成時だけでなく非音声合成時にも行
われる。即ち、音声合成時に新しく合成された音声波形
を、その音韻情報及びアクセント情報等と共に属性情報
蓄積部３及び波形情報蓄積部５へ登録する。また、非音
声合成時において入力端子１０から音声波形、音韻情
報、アクセント情報、及び切り出し点情報（ある音声波
形の音韻情報／アクセント情報に対応した波形の切り出
し点に関する情報）が入力された場合には、属性情報蓄
積部３及び波形情報蓄積部５内の蓄積情報を検索し、そ
の入力情報が新規情報なら登録し、既登録情報なら音声
波形品質の高い方に更新する。一方、音声合成の合間で
あるか、入力端子１０から情報が入力されない場合に
は、テキスト解析時等に使用される連単語情報蓄積部８
を検索して、属性情報蓄積部３及び波形情報蓄積部５に
登録されていない連単語の音声波形を合成し登録する。
以下には、この３つの場合について、本発明を詳細に説
明する。FIG. 1 is a schematic block diagram of an embodiment of the speech synthesis system of the present invention. In FIG. 1, 1 is an input terminal,
2 is a synthesis unit selection unit, 3 is an attribute information storage unit, 4 is a voice synthesis unit, 5 is a waveform information storage unit, 6 is an output terminal, 7 is a stored information update unit, 8 is a continuous word information storage unit, and 9 is information. The update determination unit 10 is an input terminal. According to the present invention, the attribute information accumulating unit 3 and the waveform information accumulating unit 5 manage the speech waveform using the phoneme information and the accent information as indexes, and select all or a part of the accumulated waveform in consideration of before and after the connected portion. A characteristic is that the contents of the attribute information accumulating unit 3 and the waveform information accumulating unit 5 are automatically updated in the case of generating synthetic speech by stitching. This update of the stored contents is performed not only during voice synthesis but also during non-voice synthesis. That is, the voice waveform newly synthesized at the time of voice synthesis is registered in the attribute information storage unit 3 and the waveform information storage unit 5 together with its phoneme information and accent information. Further, when a voice waveform, phoneme information, accent information, and cut-out point information (information about a cut-out point of a waveform corresponding to phoneme information / accent information of a voice waveform) are input from the input terminal 10 during non-voice synthesis. Is searched for stored information in the attribute information storage unit 3 and the waveform information storage unit 5, and if the input information is new information, it is registered, and if it is already registered information, it is updated to the one with higher voice waveform quality. On the other hand, when it is between voice synthesis or when no information is input from the input terminal 10, the continuous word information storage unit 8 used at the time of text analysis or the like.
To synthesize and register the speech waveforms of the consecutive words that are not registered in the attribute information storage unit 3 and the waveform information storage unit 5.
Hereinafter, the present invention will be described in detail with respect to these three cases.

【００１８】先ず、音声合成時における蓄積情報の更新
について説明する。First, the update of stored information at the time of voice synthesis will be described.

【００１９】入力端子１は、入力された音韻情報／韻律
情報／形態素情報を合成単位選択部２に出力する。属性
情報蓄積部３は、音韻情報（音声表記）／アクセント情
報（アクセントの強弱の情報）をインデックスとして、
予め単語や文節程度の単位で切りだした蓄積音声波形に
関する属性情報（即ち、その音声波形の音韻情報、アク
セント情報、原波形かどのような複合波形か、即ち既存
のどの蓄積波形の合成波形かを示す波形生成情報、及び
音声波形蓄積部５に蓄積されているその波形の波形情報
のアドレス情報）を蓄積、管理している。合成単位選択
部２は、先ず入力端子１から音韻情報／韻律情報／形態
素情報を入力し、その内の韻律情報からアクセント情報
を抽出する。次に、音韻情報／アクセント情報／形態素
情報をアクセント句（主アクセントを１つ持つような、
自立語及び付属語の連続）に分割し、アクセント句の音
韻情報／アクセント情報を属性情報蓄積部３へ出力す
る。属性情報蓄積部３は、入力されたアクセント句の音
韻情報／アクセント情報の全部もしくは一部を有する属
性情報を選択する。このアクセント句の全部もしくは一
部を「合成単位」と定義する。この合成単位は所望の音
声波形を構成するための最小の単位である。そして、各
合成単位を有する属性情報をその合成単位の抽出位置情
報と共に、合成単位の属性情報として合成単位選択部２
へ出力する。合成単位選択部２は、アクセント句の音韻
情報／アクセント情報を、属性情報蓄積部３からの合成
単位の音韻情報／アクセント情報と比較して、その差分
をその直前後の一部と一緒に抽出する。そして、再びそ
の抽出部分の音韻情報／アクセント情報を属性情報蓄積
部３へ出力する。再び、属性情報蓄積部３は、入力され
た音韻情報／アクセント情報の全部もしくは一部を有す
る属性情報を選択し、合成単位の属性情報として合成単
位選択部２へ出力する。これを繰り返すことにより、ア
クセント句は１つ以上の合成単位で構成されることにな
る。合成単位選択部２は、上記のアクセント句の音韻情
報／アクセント情報／形態素情報と、それを構成する合
成単位の属性情報及び合成順序とを、情報更新判定部９
へ出力する。そして、入力端子１から得られた音韻情報
／アクセント情報の全てに対して合成単位が選択される
まで、上記のやり取りを合成単位選択部２及び属性情報
蓄積部３の間で繰り返す。合成単位選択部２は、入力端
子１からの音韻情報／アクセント情報の全てに対する合
成単位が選択されると、その合成単位の属性情報、合成
単位の合成順序、及び入力端子１からの韻律情報を音声
波形合成部４へ出力する。波形情報蓄積部５は、合成情
報蓄積部２の出力の１つであるアドレス情報をインデッ
クスとして、それに対応する波形情報（即ち、音声波形
そのものかその特徴パラメータ、及び切り出し点情報）
を蓄積、管理している。尚、波形情報蓄積部５内の波形
情報は属性蓄積部２内の属性情報と一対一対応してい
る。音声波形合成部４は、合成単位選択部２からの属性
情報から波形情報のアドレス情報及び切り出し点情報を
抽出し、波形情報蓄積部５へ出力する。波形情報蓄積部
５は、音声波形合成部４からのアドレス情報を基に波形
情報を検索し、音声波形合成部４へ出力する。音声波形
合成部４は、波形情報蓄積部からの波形情報に基づき、
合成単位の音声波形を生成する。そして必要に応じて、
呼気段落の切れ目のポーズ挿入、文全体のイントネーシ
ョンの付加、及び合成単位間の音声波形の補間等を行い
ながら、生成音声波形を編集する。生成された合成音声
波形は出力端子６を介して出力される。The input terminal 1 outputs the input phonological information / prosodic information / morpheme information to the synthesis unit selecting section 2. The attribute information storage unit 3 uses the phoneme information (speech notation) / accent information (accent strength information) as an index.
Attribute information related to the accumulated speech waveform extracted in advance in units of words or phrases (that is, a composite waveform such as phonological information of the speech waveform, accent information, and original waveform, that is, which existing accumulated waveform is a synthesized waveform? And the address information of the waveform information of the waveform stored in the voice waveform storage unit 5) are stored and managed. The synthesis unit selection unit 2 first inputs phonological information / prosodic information / morpheme information from the input terminal 1 and extracts accent information from the prosodic information. Next, phonological information / accent information / morpheme information are accent phrases (such as having one main accent,
The phonetic information / accent information of the accent phrase is output to the attribute information storage unit 3. The attribute information storage unit 3 selects attribute information having all or part of the phoneme information / accent information of the input accent phrase. All or part of this accent phrase is defined as a "composite unit". This synthesis unit is the minimum unit for constructing a desired speech waveform. Then, the synthesis unit selection unit 2 sets the attribute information having each synthesis unit as the attribute information of the synthesis unit together with the extraction position information of the synthesis unit.
Output to. The synthesis unit selection unit 2 compares the phoneme information / accent information of the accent phrase with the phoneme information / accent information of the synthesis unit from the attribute information storage unit 3 and extracts the difference together with a part immediately before and after that. To do. Then, the phoneme information / accent information of the extracted portion is output to the attribute information storage unit 3 again. Again, the attribute information storage unit 3 selects the attribute information having all or a part of the input phoneme information / accent information, and outputs it to the synthesis unit selection unit 2 as the attribute information of the synthesis unit. By repeating this, the accent phrase is composed of one or more composition units. The synthesis unit selection unit 2 determines the phoneme information / accent information / morpheme information of the above-mentioned accent phrase, the attribute information and the synthesis order of the synthesis units that form the information, and the information update determination unit 9
Output to. Then, the above-mentioned exchange is repeated between the synthesis unit selecting unit 2 and the attribute information accumulating unit 3 until the synthesis unit is selected for all the phoneme information / accent information obtained from the input terminal 1. When the synthesis unit for all the phoneme information / accent information from the input terminal 1 is selected, the synthesis unit selection unit 2 displays the attribute information of the synthesis unit, the synthesis order of the synthesis unit, and the prosody information from the input terminal 1. Output to the voice waveform synthesizer 4. The waveform information storage unit 5 uses the address information, which is one of the outputs of the synthesis information storage unit 2, as an index, and the corresponding waveform information (that is, the voice waveform itself or its characteristic parameters and the cut-out point information).
Are accumulated and managed. The waveform information in the waveform information storage unit 5 has a one-to-one correspondence with the attribute information in the attribute storage unit 2. The voice waveform synthesizer 4 extracts the address information and cut-out point information of the waveform information from the attribute information from the synthesis unit selector 2 and outputs it to the waveform information accumulator 5. The waveform information storage section 5 searches for waveform information based on the address information from the speech waveform synthesis section 4, and outputs it to the speech waveform synthesis section 4. The voice waveform synthesizer 4 is based on the waveform information from the waveform information accumulator.
Generate a speech waveform for a synthesis unit. And if necessary,
The generated voice waveform is edited while inserting pauses at breaks in the exhalation paragraph, adding intonation of the whole sentence, and interpolating voice waveforms between synthesis units. The generated synthetic speech waveform is output via the output terminal 6.

【００２０】一方、情報更新判定部９は、合成単位選択
部２からのアクセント句の音韻情報／アクセント情報／
形態素情報と、それを構成する合成単位の属性情報（波
形生成情報を含む）及び合成順序に基づき、属性情報蓄
積部３及び波形情報蓄積部５の内容を更新するか否かを
判定する。尚、更新の基準の例としては、アクセント句
が原波形から抽出された合成単位のみから構成されてい
る場合や、アクセント句から助詞や助動詞を除いた語が
未登録の場合等が考えられる。ここで便宜上、アクセン
ト句もしくはその一部で構成される登録・更新の単位
を、「連単語」と定義する。更新する場合には、蓄積情
報更新部７へ連単語の音韻情報／アクセント情報と、そ
れを構成する合成単位の属性情報及び合成順序とを出力
し、更新を要請する。尚、更新しない場合には何も行わ
ない。蓄積情報更新部７は、音声波形合成部４と同様
に、情報更新判定部９からの各種情報を基に、波形情報
蓄積部５へアクセスし、その連単語に対する音声波形を
合成し、その合成波形を波形情報蓄積部５へ出力し、蓄
積及びアドレス情報の返送を要請する。波形情報蓄積部
５は、その合成波形を新たな波形として所定のフォーマ
ットに従って蓄積し、アドレス情報を付与する。そし
て、そのアドレス情報を蓄積情報更新部７へ出力する。
その後、蓄積情報更新部７は、受信したアドレス情報と
共に、連単語の音韻情報／アクセント情報、及び複合波
形を示す波形生成情報を属性情報蓄積部３へ出力し、そ
の連単語の属性情報として蓄積することを要請する。On the other hand, the information update determination unit 9 uses the phoneme information / accent information / accent information / of the accent phrase from the synthesis unit selection unit 2.
Based on the morpheme information, the attribute information (including the waveform generation information) of the synthesis unit that constitutes the morpheme information, and the synthesis order, it is determined whether or not the contents of the attribute information storage unit 3 and the waveform information storage unit 5 are updated. It should be noted that examples of updating criteria include a case where the accent phrase is composed only of a synthesis unit extracted from the original waveform, and a case where the word excluding the particle and auxiliary verb from the accent phrase is not registered. Here, for convenience, a unit of registration / update composed of an accent phrase or a part of the accent phrase is defined as a “combined word”. In the case of updating, the phoneme information / accent information of the continuous word, the attribute information of the synthesis unit forming the same, and the synthesis order are output to the accumulated information updating unit 7 to request the update. Incidentally, if it is not updated, nothing is done. Similar to the speech waveform synthesizer 4, the accumulated information updater 7 accesses the waveform information accumulator 5 based on various information from the information update determiner 9, synthesizes the speech waveform for the consecutive word, and synthesizes it. The waveform is output to the waveform information storage unit 5, and a request for storage and return of address information is made. The waveform information storage unit 5 stores the combined waveform as a new waveform according to a predetermined format and adds address information. Then, the address information is output to the accumulated information updating unit 7.
Thereafter, the accumulated information updating unit 7 outputs the phoneme information / accent information of the continuous word and the waveform generation information indicating the complex waveform to the attribute information accumulating unit 3 together with the received address information, and accumulates it as the attribute information of the continuous word. Request to do so.

【００２１】次に、非音声合成時において入力端子１０
から音声波形、音韻情報、アクセント情報、及び切り出
し点情報が入力される場合の蓄積情報の更新について説
明する。尚、上記の説明と重複しているところの説明を
省き、動作の異なるところのみを詳細に説明する。Next, during non-voice synthesis, the input terminal 10
Update of accumulated information when a voice waveform, phoneme information, accent information, and cut-out point information are input will be described. It should be noted that explanations that overlap with the above explanation will be omitted, and only the different operations will be explained in detail.

【００２２】蓄積情報更新部７は、入力端子１０からの
音声波形、音韻情報、アクセント情報、及び切り出し点
情報が入力されると、先ず音韻情報／アクセント情報を
属性情報蓄積部３へ出力し、更新すべき属性情報が存在
するかを尋ねる。属性情報蓄積部３は、更新すべき属性
情報、即ち、蓄積情報更新部７からの音韻情報／アクセ
ント情報に完全にマッチし、かつ波形生成情報が複合波
形である属性情報が存在するかを検索する。そのような
属性情報が存在した場合には、その波形生成情報を原波
形と変更し、そのアドレス情報を蓄積情報更新部７へ出
力する。蓄積情報更新部７は、そのアドレス情報と共に
入力端子１０からの音声波形（もしくはその波形を分析
した結果の特徴パラメータ）及び切り出し点情報を波形
情報蓄積部５へ出力し、波形情報の更新を要請する。波
形情報蓄積部５は、入力されたアドレス情報が示す蓄積
波形及び切り出し点情報を、入力された新しい音声波形
及び切り出し点情報に置換する。When the voice waveform, the phoneme information, the accent information, and the cut-out point information are input from the input terminal 10, the accumulated information updating section 7 first outputs the phoneme information / accent information to the attribute information accumulating section 3, Ask if there is attribute information to update. The attribute information storage unit 3 searches for attribute information to be updated, that is, whether there is attribute information that completely matches the phoneme information / accent information from the storage information update unit 7 and the waveform generation information is a composite waveform. To do. When such attribute information exists, the waveform generation information is changed to the original waveform and the address information is output to the accumulated information updating unit 7. The accumulated information updating unit 7 outputs the address information, the voice waveform from the input terminal 10 (or the characteristic parameter obtained by analyzing the waveform) and the cut-out point information to the waveform information accumulating unit 5, and requests the waveform information to be updated. To do. The waveform information storage unit 5 replaces the stored waveform and cut-out point information indicated by the input address information with the new input voice waveform and cut-out point information.

【００２３】一方、属性情報蓄積部３は、蓄積情報更新
部７からの音韻情報／アクセント情報に完全にマッチ
し、かつ波形生成情報が複合波形である属性情報が存在
しなかった場合、その旨を蓄積情報更新部７へ連絡す
る。これにより蓄積情報更新部７は、音声波形（もしく
はその波形を分析した結果の特徴パラメータ）及び切り
出し点情報を波形情報として波形情報蓄積部５へ出力
し、蓄積及びアドレス情報の返送を要請する。波形情報
蓄積部５は、この波形情報を新たな情報として蓄積し、
蓄積情報更新部７へそのアドレス情報を出力する。蓄積
情報更新部７は、受信したアドレス情報と共に、入力端
子１０からの音韻情報／アクセント情報を属性情報とし
て属性情報蓄積部３へ出力し、登録を要請する。属性情
報蓄積部３は、この属性情報を波形生成情報が原波形で
ある新たな情報として登録する。On the other hand, if the attribute information accumulating section 3 completely matches the phonological information / accent information from the accumulated information updating section 7 and there is no attribute information in which the waveform generation information is a composite waveform, that fact is indicated. To the stored information updating section 7. As a result, the accumulated information updating unit 7 outputs the voice waveform (or the characteristic parameter obtained by analyzing the waveform) and the cut-out point information to the waveform information accumulating unit 5 as waveform information, and requests accumulation and return of address information. The waveform information storage unit 5 stores this waveform information as new information,
The address information is output to the accumulated information updating unit 7. The accumulated information updating section 7 outputs the phoneme information / accent information from the input terminal 10 as attribute information to the attribute information accumulating section 3 together with the received address information, and requests registration. The attribute information storage unit 3 registers this attribute information as new information whose waveform generation information is the original waveform.

【００２４】最後に、音声合成の合間であるか、もしく
は入力端子１０から何の情報も入力されない場合におけ
る蓄積情報の更新について説明する。尚、上記の説明と
重複しているところの説明を省き、動作の異なるところ
のみを詳細に説明する。連単語情報蓄積部８は、各連単
語の音韻情報やアクセント情報等が予め登録されてい
る。また、これはしばしばテキスト解析時における単語
同定や形態素解析等に用いられる辞書でもある。蓄積情
報更新部７は、任意の規則に基づいて、連単語情報蓄積
部８へアクセスし、そこに登録されているある連単語の
音韻情報／アクセント情報を得る。次に蓄積情報更新部
７は、連単語情報蓄積部８からの音韻情報／アクセント
情報を属性情報蓄積部３へ出力し、その連単語が登録済
みかを尋ねる。属性情報蓄積部３は、それらに対応する
属性情報が登録されているかをチェックし、その結果を
蓄積情報更新部７へ出力する。以上の過程を、蓄積され
ていない属性情報が得られるまで繰り返す。蓄積情報更
新部７は、属性蓄積部３に未登録である連単語が得られ
ると、合成単位選択部２と同様に、属性蓄積部３をアク
セスして、その連単語を構成する１つ以上の合成単位を
選択し、決定する。そして、蓄積情報更新部７は、音声
波形合成部４と同様に、属性情報蓄積部３からの合成単
位の属性情報及び合成順序を基に、波形情報蓄積部５へ
アクセスし、連単語に対応する音声波形を合成する。そ
の後、その合成波形を波形情報蓄積部５へ出力し、蓄積
及びアドレス情報の返送を要請する。波形情報蓄積部５
は、その合成波形を蓄積し、アドレス情報を付与する。
そして、そのアドレス情報を蓄積情報更新部７へ出力す
る。蓄積情報更新部７は、波形情報蓄積部５からのアド
レス情報と共に、連単語の音韻情報／アクセント情報、
及び複合波形であるという波形生成情報を、属性情報蓄
積部３へ出力し、その連単語の属性情報とし蓄積するこ
とを要請する。Lastly, the update of the stored information will be described when it is between voice synthesis or when no information is input from the input terminal 10. It should be noted that explanations that overlap with the above explanation will be omitted, and only the different operations will be explained in detail. In the continuous word information accumulating unit 8, phoneme information, accent information and the like of each continuous word are registered in advance. This is also a dictionary often used for word identification and morphological analysis during text analysis. The accumulated information updating unit 7 accesses the consecutive word information accumulating unit 8 based on an arbitrary rule, and obtains the phoneme information / accent information of a certain consecutive word registered therein. Next, the accumulated information updating unit 7 outputs the phoneme information / accent information from the consecutive word information accumulating unit 8 to the attribute information accumulating unit 3 and asks whether the consecutive word has been registered. The attribute information accumulating unit 3 checks whether the attribute information corresponding to them is registered, and outputs the result to the accumulated information updating unit 7. The above process is repeated until the unstored attribute information is obtained. When a compound word that is not registered in the attribute storage unit 3 is obtained, the stored information update unit 7 accesses the attribute storage unit 3 and, as with the synthesis unit selection unit 2, one or more that compose the compound word. Select and determine the synthesis unit of. Then, similar to the voice waveform synthesizer 4, the accumulated information updater 7 accesses the waveform information accumulator 5 on the basis of the attribute information of the synthesis unit and the synthesis order from the attribute information accumulator 3 and handles the continuous word. Synthesize voice waveforms. After that, the synthesized waveform is output to the waveform information storage unit 5 to request storage and return of address information. Waveform information storage unit 5
Accumulates the synthesized waveform and gives address information.
Then, the address information is output to the accumulated information updating unit 7. The accumulated information updating unit 7 includes, together with the address information from the waveform information accumulating unit 5, phoneme information / accent information of consecutive words,
Also, the waveform generation information indicating that it is a composite waveform is output to the attribute information storage unit 3 and requested to be stored as the attribute information of the continuous word.

【００２５】以下に、上記の一実施例との相違点に着目
して、本発明の他の実施例、及び本発明を規定していな
い事項について記す。Focusing on the difference from the above-described one embodiment, other embodiments of the present invention and matters not defining the present invention will be described below.

【００２６】上記の一実施例では、合成単位選択部２が
合成単位を選択する際に音声波形そのものの情報を取り
込まないことを明確化するために、音韻情報／アクセン
ト情報と波形情報（音声波形そのものの情報を含む）を
属性蓄積部３と波形情報蓄積部５とに別々に蓄積・管理
しているように説明したが、音韻情報／アクセント情報
と波形情報を１つの蓄積部で蓄積・管理することも考え
られる。この場合に、この蓄積部は、合成単位選択部２
からの音韻情報／アクセント情報に対して属性情報を、
音声波形合成部４からのアドレス情報に対して波形情報
を、それぞれ出力することになる。In the above embodiment, in order to clarify that the synthesis unit selection unit 2 does not capture the information of the speech waveform itself when selecting the synthesis unit, the phonological information / accent information and the waveform information (speech waveform). It has been described that (including its own information) is separately stored and managed in the attribute storage unit 3 and the waveform information storage unit 5, but the phonological information / accent information and the waveform information are stored and managed in one storage unit. It is also possible to do it. In this case, the storage unit is the synthesis unit selection unit 2
Attribute information for the phoneme information / accent information from
The waveform information is output in response to the address information from the voice waveform synthesizer 4.

【００２７】上記の一実施例では、波形情報蓄積部５に
蓄積されている音声波形自身の情報は、本発明を規定す
る事項ではなく、音声波形そのものに限らず、その特徴
パラメータとすることも考えられる。In the above-described embodiment, the information of the voice waveform itself stored in the waveform information storage section 5 is not a matter defining the present invention and is not limited to the voice waveform itself, but may be a characteristic parameter thereof. Conceivable.

【００２８】上記の一実施例において、合成単位選択部
２で音韻情報／アクセント情報／形態素情報をアクセン
ト句に分割しているが、これは本発明を規定する事項で
はなく、アクセント句に限らずそれより大きい単位、逆
に小さい単位で分割することも考えられる。In the above embodiment, the synthesizing unit selecting section 2 divides the phoneme information / accent information / morpheme information into accent phrases, but this is not a requirement for the present invention and is not limited to accent phrases. It is also possible to divide into larger units and conversely smaller units.

【００２９】上記の一実施例において、属性情報蓄積部
３での属性情報のインデックスとして、音韻情報及びア
クセント情報を用いているが、アクセント情報の代わり
に韻律情報を用いたり、それらの内の一部の情報を用い
ることも考えられる。また、蓄積情報更新部７から属性
情報蓄積部３へ出力される情報、入力端子１０へ入力さ
れる情報、及び連単語情報蓄積部８に蓄積されている情
報に対しても同様である。In the above-described embodiment, the phoneme information and the accent information are used as the index of the attribute information in the attribute information storage unit 3. However, prosody information is used instead of the accent information, or one of them is used. It is also possible to use the information of the department. The same applies to the information output from the stored information update unit 7 to the attribute information storage unit 3, the information input to the input terminal 10, and the information stored in the combined word information storage unit 8.

【００３０】上記の一実施例において、形態素情報は、
情報更新判定部９での一判定基準例として使用されてい
るため入力端子１から入力されているが、情報更新判定
部９における判定基準として使用しない場合には、入力
端子１から入力される必要がない。In the above embodiment, the morpheme information is
It is input from the input terminal 1 because it is used as an example of the determination criterion in the information update determination unit 9, but it is necessary to input from the input terminal 1 when it is not used as the determination criterion in the information update determination unit 9. There is no.

【００３１】上記の一実施例において、合成単位選択部
２及び蓄積情報更新部７でアクセント句の音韻情報／ア
クセント情報を、属性情報蓄積部３からの合成単位の音
韻情報／アクセント情報と比較して、その差分をその直
前後の一部と一緒に抽出しているが、直前後の一部を考
慮しない方法も考えられる。In the above embodiment, the synthesizing unit selecting section 2 and the accumulated information updating section 7 compare the phonological information / accent information of the accent phrase with the phonological information / accent information of the synthesizing unit from the attribute information accumulating section 3. Then, the difference is extracted together with the part immediately before and after that, but a method that does not consider the part immediately before and after is also conceivable.

【００３２】以下に、図面を用いて上記の３つの蓄積情
報の更新過程について具体的に説明する。The process of updating the above three stored information will be specifically described below with reference to the drawings.

【００３３】先ず、音声合成時における蓄積情報の更新
過程について具体的に説明する。First, the process of updating the stored information during voice synthesis will be described in detail.

【００３４】図２は、本発明の一実施例における音声合
成時の蓄積情報の更新過程の一具体例を示す図である。
また、蓄積情報更新の判定基準例として、アクセント句
が原波形から抽出された合成単位のみで構成されること
を用いる。図２において、（ａ）は未登録の連単語の表
記文字が「音声合成」であることを、（ｂ）は（ａ）の
読み仮名が「オンセイゴウセイ」であることを、（ｃ）
は（ａ）の音韻情報（音声表記）が”ｏｎｓｅｅｇｏｏ
ｓｅｅ”であることを、（ｄ）は（ａ）のアクセント情
報が”ＬＨＨＨＨＬＬＬ”であることを、（ｅ）は属性
情報蓄積部３に蓄積されている更新前の属性情報を、
（ｆ）は選択された合成単位を含む単語の音韻情報／ア
クセント情報を、（ｇ）は属性情報蓄積部３に蓄積され
た更新後の属性情報を、それぞれ示している。（ｃ）の
音韻情報において”ｅｅ”は”ｅ”の長音化を意味す
る。（ｄ）のアクセント情報は簡単のために高低２段階
で表し、”Ｌ”はピッチの低い音素を”Ｈ”は高い音素
を意味する。（ｅ）及び（ｇ）の第１行、第２行、第３
行、第４行、及び第５行はそれぞれ属性情報ＩＤ、音韻
情報、アクセント情報、アドレス情報、及び波形生成情
報を示す。（ｆ）では、下線部が合成単位である。FIG. 2 is a diagram showing a concrete example of a process of updating the stored information at the time of voice synthesis in one embodiment of the present invention.
Further, as an example of a criterion for determining the accumulated information update, it is used that the accent phrase is composed only of the synthesis unit extracted from the original waveform. In FIG. 2, (a) indicates that the notation character of the unregistered continuous word is "speech synthesis", (b) indicates that the reading kana of (a) is "onsei gosei", (c).
Is (on) phonological information (phonetic notation) is "onseegoo"
Seed, (d) indicates that the accent information in (a) is "LHHHHLLL", and (e) indicates the attribute information before update stored in the attribute information storage unit 3.
(F) shows the phoneme information / accent information of the word including the selected synthesis unit, and (g) shows the updated attribute information stored in the attribute information storage unit 3. In the phoneme information of (c), "ee" means making the "e" longer. The accent information in (d) is expressed in two steps, high and low, for simplicity, and "L" means a phoneme with a low pitch and "H" means a phoneme with a high pitch. (E) and (g) 1st row, 2nd row, 3rd
The line, the fourth line, and the fifth line show the attribute information ID, the phoneme information, the accent information, the address information, and the waveform generation information, respectively. In (f), the underlined portion is the composition unit.

【００３５】以下に、蓄積情報の更新過程について順を
追って説明する。１）合成単位選択部２は、属性情報蓄積部３をアクセス
して、音韻情報”ｏｎｓｅｅｇｏｏｓｅｅ”／アクセン
ト情報”ＬＨＨＨＨＬＬＬ”を有するアクセント句「音
声合成」の合成単位として、図２の（ｆ）に示すよう
な、「音節」の”ｏｎ”／”ＬＨ”、「金星」の”ｓｅ
ｅ”／”ＨＨ”、及び「光合成」の”ｇｏｏｓｅｅ”
／”ＨＬＬＬ”を選択する。２）合成単位選択部２は、合成したアクセント句の音韻
情報／アクセント情報と、各合成単位の属性情報及び合
成順序とを情報更新判定部９へ送出する。尚、図２の
（ａ）の「音声合成」というアクセント句は連単語に等
しい。３）情報更新判定部９は、図２の（ｅ）において各合成
単位の属性情報の属性情報ＩＤと波形生成情報とを比較
し、全ての合成単位において同じ内容である（即ち、全
ての合成単位が原波形から抽出されている）ので更新可
能と判定する。４）情報更新判定部９は、連単語「音声合成」の音韻情
報／アクセント情報と、各合成単位の属性情報及び合成
順序とを、蓄積情報更新部７へ送出する。５）蓄積情報更新部７は、情報更新判定部９からの各種
情報を基に、波形情報蓄積部５へアクセスし、連単語
「音声合成」の音声波形を合成する。６）蓄積情報更新部７は、連単語「音声合成」の音声波
形を波形情報蓄積部５へ送出する。７）波形情報蓄積部５は、連単語「音声合成」の音声波
形を登録し、アドレス情報を付与する。８）波形情報蓄積部５は、そのアドレス情報を蓄積情報
更新部７へ送出する。９）蓄積情報更新部７は、受信したアドレス情報と共
に、連単語「音声合成」の音韻情報／アクセント情報
と、各合成単位の属性情報及び合成順序とを属性情報蓄
積部３へ送出する。１０）属性情報蓄積部３は、図２の（ｇ）に示す通り、
受信した各種情報から、連単語「音声合成」の音韻情
報”ｏｎｓｅｅｇｏｏｓｅｅ”、アクセント情報”ＬＨ
ＨＨＨＬＬＬ”、アドレス情報”００１４”、及び波形
生成情報”０００１（１−２），０００２（３−４），
０００３（３−６）”を生成、登録し、その属性情報に
属性情報ＩＤ”０００４”を付与する。The process of updating the stored information will be described below step by step. 1) The synthesis unit selecting unit 2 accesses the attribute information accumulating unit 3 and sets the synthesis unit of the accent phrase “speech synthesis” having the phoneme information “onseegoosee” / accent information “LHHHHLLLL” as shown in (f) of FIG. As shown, "syllabic""on" / "LH", "venus""se"
e "/" HH ", and" goosee "of" photosynthesis "
/ Select "HLLL". 2) The synthesis unit selection unit 2 sends the phoneme information / accent information of the synthesized accent phrase, the attribute information of each synthesis unit, and the synthesis order to the information update determination unit 9. The accent phrase "speech synthesis" in FIG. 2A is equivalent to a continuous word. 3) The information update determination unit 9 compares the attribute information ID of the attribute information of each synthesis unit with the waveform generation information in (e) of FIG. 2 and has the same content in all synthesis units (that is, all synthesis units). Since the unit has been extracted from the original waveform), it is determined that updating is possible. 4) The information update determination unit 9 sends the phoneme information / accent information of the continuous word “speech synthesis”, the attribute information of each synthesis unit, and the synthesis order to the stored information update unit 7. 5) The accumulated information updating unit 7 accesses the waveform information accumulating unit 5 based on the various information from the information update judging unit 9 and synthesizes the speech waveform of the continuous word "speech synthesis". 6) The accumulated information updating unit 7 sends the speech waveform of the continuous word “speech synthesis” to the waveform information accumulating unit 5. 7) The waveform information storage unit 5 registers the speech waveform of the continuous word “speech synthesis” and adds address information. 8) The waveform information storage unit 5 sends the address information to the storage information update unit 7. 9) The accumulated information updating unit 7 sends to the attribute information accumulating unit 3, together with the received address information, the phoneme information / accent information of the continuous word “speech synthesis”, the attribute information of each synthesis unit, and the synthesis order. 10) The attribute information storage unit 3, as shown in (g) of FIG.
From the received various information, the phoneme information “onseegoosee” and the accent information “LH” of the continuous word “speech synthesis”
HHHLLL ", address information" 0014 ", and waveform generation information" 0001 (1-2), 0002 (3-4),
0003 (3-6) "is generated and registered, and the attribute information ID" 0004 "is given to the attribute information.

【００３６】次に、非音声合成時において入力端子１０
から音声波形、音韻情報、アクセント情報、及び切り出
し点情報が入力された場合の蓄積情報の更新過程につい
て具体的に説明する。尚、この更新は、入力された音
声波形の音韻情報／アクセント情報が属性情報蓄積部３
に存在するか否かで、更新過程１もしくは更新過程２と
異なる過程を経る。Next, at the time of non-voice synthesis, the input terminal 10
The process of updating the stored information when the voice waveform, the phoneme information, the accent information, and the cut-out point information are input will be specifically described. In this update, the phoneme information / accent information of the input speech waveform is stored in the attribute information storage unit 3
Or not, the process different from the update process 1 or the update process 2 is performed.

【００３７】図３は、本発明の一実施例における、非音
声合成時に入力端子１０から音声波形、音韻情報、アク
セント情報、及び切り出し点情報が入力され、かつそれ
に対応する波形情報が既に蓄積されている場合での蓄積
情報の更新過程（更新過程１）の一具体例を示す図であ
る。図３において、（ａ）は入力端子１０から入力され
た音声波形の表記文字が「音声合成」であることを、
（ｂ）は（ａ）の読み仮名が「オンセイゴウセイ」であ
ることを、（ｃ）は（ａ）の音韻情報（音声表記）が”
ｏｎｓｅｅｇｏｏｓｅｅ”であることを、（ｄ）は
（ａ）のアクセント情報が”ＬＨＨＨＨＬＬＬ”である
ことを、それぞれ示す。尚、上記（ｃ）及び（ｄ）の情
報と、音声波形及び切り出し点情報は入力端子１０から
人為的に入力される。また図３において、（ｅ）は属性
情報蓄積部３に蓄積されている更新前の属性情報を、
（ｆ）は選択された合成単位を含む単語の音韻情報／ア
クセント情報を、（ｇ）は更新過程１を経て属性情報蓄
積部３に蓄積された更新後の属性情報を、それぞれ示し
ている。尚、図３におけるその他の表現上の仮定は図２
に等しい。FIG. 3 shows that in the embodiment of the present invention, a voice waveform, phoneme information, accent information, and cut-out point information are input from the input terminal 10 at the time of non-voice synthesis, and waveform information corresponding thereto is already accumulated. FIG. 6 is a diagram showing a specific example of an update process (update process 1) of stored information in the case where the information is stored. In FIG. 3, (a) shows that the notation characters of the voice waveform input from the input terminal 10 are “voice synthesis”,
(B) shows that the phonetic transcription of (a) is "Onsei Gosei", and (c) shows that the phoneme information (phonetic notation) of (a) is "
"onseegoose", and (d) indicates that the accent information in (a) is "LHHHHLLLL". The information in (c) and (d), and the voice waveform and cut-out point information are It is artificially input from the input terminal 10. Further, in Fig. 3, (e) shows the pre-update attribute information stored in the attribute information storage unit 3,
(F) shows the phoneme information / accent information of the word including the selected synthesis unit, and (g) shows the updated attribute information accumulated in the attribute information accumulating unit 3 through the updating process 1. Other expressive assumptions in FIG.
be equivalent to.

【００３８】以下に、更新過程１について順を追って説
明する。１）蓄積情報更新部７は、入力端子１０からの音韻情
報”ｏｎｓｅｅｇｏｏｓｅｅ”／アクセント情報”ＬＨ
ＨＨＨＬＬＬ”を属性情報蓄積部３へ出力する。２）属性情報蓄積部３は、受信した音韻情報／アクセン
ト情報が存在するので、その属性情報の波形生成情報を
その属性情報ＩＤ（即ち原波形を表す）に変更し、その
アドレス情報”００１４”を蓄積情報更新部７へ送出す
る。３）蓄積情報更新部７は、受信したアドレス情報”００
１４”と共に、音声波形及び切り出し点情報を波形情報
蓄積部５へ出力する。４）波形情報蓄積部５は、受信したアドレス情報”００
１４”の波形情報を、新しいものに更新する。The updating process 1 will be described below step by step. 1) The accumulated information updating unit 7 uses the phoneme information “onseegoosee” / accent information “LH from the input terminal 10.
HHHLLLL "is output to the attribute information storage unit 3. 2) Since the attribute information storage unit 3 has the received phoneme information / accent information, the waveform generation information of the attribute information is assigned to the attribute information ID (that is, the original waveform). The address information “0014” is sent to the accumulated information updating unit 7. 3) The accumulated information updating unit 7 receives the received address information “00”.
14 ", and outputs the voice waveform and the cut-out point information to the waveform information storage unit 5. 4) The waveform information storage unit 5 receives the received address information" 00 ".
The waveform information of 14 "is updated to a new one.

【００３９】図４は、本発明の一実施例における、非音
声合成時に入力端子１０から音声波形、音韻情報、アク
セント情報、及び切り出し点情報が入力され、かつそれ
に対応する波形情報が蓄積されていない場合での蓄積情
報の更新過程（更新過程２）の一具体例を示す図であ
る。図４において、（ａ）は入力端子１０から入力され
た音声波形の表記文字が「音声合成」であることを、
（ｂ）は（ａ）の読み仮名が「オンセイゴウセイ」であ
ることを、（ｃ）は（ａ）の音韻情報（音声表記）が”
ｏｎｓｅｅｇｏｏｓｅｅ”であることを、（ｄ）は
（ａ）のアクセント情報が”ＬＨＨＨＨＬＬＬ”である
ことを、それぞれ示す。尚、上記（ｃ）及び（ｄ）の情
報と、音声波形及び切り出し点情報は入力端子１０から
人為的に入力される。また図３において、（ｅ）は属性
情報蓄積部３に蓄積されている更新前の属性情報を、
（ｆ）は選択された合成単位を含む単語の音韻情報／ア
クセント情報を、（ｇ）は更新過程２を経て属性情報蓄
積部３に蓄積された更新後の属性情報を、それぞれ示し
ている。尚、図４におけるその他の表現上の仮定は図２
に等しい。In FIG. 4, in one embodiment of the present invention, a speech waveform, phonological information, accent information, and cut-out point information are input from the input terminal 10 during non-speech synthesis, and corresponding waveform information is accumulated. It is a figure which shows a specific example of the update process (update process 2) of the stored information in the case of not being. In FIG. 4, (a) indicates that the notation characters of the voice waveform input from the input terminal 10 are “voice synthesis”,
(B) shows that the phonetic transcription of (a) is "Onsei Gosei", and (c) shows that the phoneme information (phonetic notation) of (a) is "
"onseegoose", and (d) indicates that the accent information in (a) is "LHHHHLLLL". The information in (c) and (d), and the voice waveform and cut-out point information are It is artificially input from the input terminal 10. Further, in Fig. 3, (e) shows the pre-update attribute information stored in the attribute information storage unit 3,
(F) shows the phoneme information / accent information of the word including the selected synthesis unit, and (g) shows the updated attribute information stored in the attribute information storage unit 3 through the update process 2. Note that other expressive assumptions in FIG.
be equivalent to.

【００４０】以下に、更新過程２について順を追って説
明する。１）蓄積情報更新部７は、入力端子１０からの音韻情
報”ｏｎｓｅｅｇｏｏｓｅｅ”／アクセント情報”ＬＨ
ＨＨＨＬＬＬ”を属性情報蓄積部３へ出力する。２）属性情報蓄積部３は、受信した音韻情報／アクセン
ト情報が存在しないので、その旨を蓄積情報更新部７へ
連絡する。３）蓄積情報更新部７は、その旨を受信すると、入力さ
れた連単語「音声合成」の音声波形を波形情報蓄積部５
へ送出する。４）波形情報蓄積部５は、連単語「音声合成」の音声波
形を登録し、アドレス情報”００１４”を付与する。５）波形情報蓄積部５は、そのアドレス情報”００１
４”を蓄積情報更新部７へ送出する。６）蓄積情報更新部７は、受信したアドレス情報”００
１４”と共に、連単語「音声合成」の音韻情報／アクセ
ント情報を属性情報蓄積部３へ送出する。７）属性情報蓄積部３は、図４の（ｇ）に示す通り、受
信した各種情報から、連単語「音声合成」の音韻情報”
ｏｎｓｅｅｇｏｏｓｅｅ”、アクセント情報”ＬＨＨＨ
ＨＬＬＬ”、アドレス情報”００１４”、及び波形生成
情報”０００４（即ち原波形を示す）”を生成、登録
し、その属性情報に属性情報ＩＤ”０００４”を付与す
る。The updating process 2 will be described below step by step. 1) The accumulated information updating unit 7 uses the phoneme information “onseegoosee” / accent information “LH from the input terminal 10.
"HHHLLLL" is output to the attribute information accumulating unit 3. 2) Since the attribute information accumulating unit 3 does not have the received phoneme information / accent information, the attribute information accumulating unit 3 notifies the accumulated information updating unit 7 of that fact. Upon reception of that fact, the unit 7 outputs the input speech waveform of the word "voice synthesis" to the waveform information storage unit 5.
Send to. 4) The waveform information storage unit 5 registers the speech waveform of the continuous word "speech synthesis" and adds the address information "0014". 5) The waveform information accumulating unit 5 receives the address information “001
4 ”is sent to the stored information updating unit 7. 6) The stored information updating unit 7 receives the received address information“ 00 ”.
14 ”and the phoneme information / accent information of the continuous word“ speech synthesis ”are sent to the attribute information storage unit 3. 7) As shown in FIG. 4G, the attribute information storage unit 3 calculates the phoneme information of the continuous word “speech synthesis” from the received various information.
onseegoosee ”, accent information“ LHHH
HLLL ", address information" 0014 ", and waveform generation information" 0004 (that is, showing the original waveform) "are generated and registered, and attribute information ID" 0004 "is added to the attribute information.

【００４１】最後に、音声合成の合間であるか、もしく
は入力端子１０から音声波形、音韻情報、アクセント情
報、及び切り出し点情報が入力されない場合の蓄積情報
の更新過程について具体的に説明する。尚、説明に際し
て用いる図は、図２の（ａ）が連単語情報蓄積部８に蓄
積されており、かつ属性情報蓄積部３には未登録である
連単語の表記文字が「音声合成」を意味すること以外、
同図２に全く等しいため、図の重複を避け同図を使用す
る。Finally, the process of updating the stored information will be described in detail when it is in the interval of voice synthesis or when the voice waveform, phoneme information, accent information, and cut-out point information are not input from the input terminal 10. 2A is stored in the continuous word information accumulating unit 8 and the notation character of the continuous word which is not registered in the attribute information accumulating unit 3 is “speech synthesis”. Except what it means
Since it is exactly the same as FIG. 2, the same figure is used to avoid duplication.

【００４２】以下に、蓄積情報の更新過程について順を
追って説明する。１）蓄積情報更新部７は、連単語情報蓄積部８及び属性
情報蓄積部３へアクセスして、属性情報蓄積部３に未登
録である連単語「音声合成」を検索する。２）同時に、蓄積情報更新部７は、連単語「音声合成」
の音韻情報”ｏｎｓｅｅｇｏｏｓｅｅ”及びアクセント
情報”ＬＨＨＨＨＬＬＬ”を連単語情報蓄積部８から入
手する。３）蓄積情報更新部７は、合成単位選択部２と同様に、
属性情報蓄積部３をアクセスして、連単語「音声合成」
の合成単位として、図２の（ｆ）に示すような、「音
節」の”ｏｎ”／”ＬＨ”、「金星」の”ｓｅｅ”／”
ＨＨ”、及び「光合成」の”ｇｏｏｓｅｅ”／”ＨＬＬ
Ｌ”を選択する。４）蓄積情報更新部７は、連単語「音声合成」の音韻情
報及びアクセント情報と、各合成単位の属性情報及び合
成順序を基に、波形情報蓄積部５へアクセスし、連単語
「音声合成」の音声波形を合成する。５）蓄積情報更新部７は、連単語「音声合成」の音声波
形を波形情報蓄積部５へ送出する。６）波形情報蓄積部５は、連単語「音声合成」の音声波
形を登録し、アドレス情報”００１４”を付与する。７）波形情報蓄積部５は、そのアドレス情報”００１
４”を蓄積情報更新部７へ送出する。８）蓄積情報更新部７は、受信したアドレス情報”００
１４”と共に、連単語「音声合成」の音韻情報／アクセ
ント情報と、各合成単位の属性情報及び合成順序とを属
性情報蓄積部３へ送出する。９）属性情報蓄積部３は、図２の（ｇ）に示す通り、受
信した各種情報から、連単語「音声合成」の音韻情報”
ｏｎｓｅｅｇｏｏｓｅｅ”、アクセント情報”ＬＨＨＨ
ＨＬＬＬ”、アドレス情報”００１４”、及び波形生成
情報”０００１（１−２），０００２（３−４），００
０３（３−６）”を生成、登録し、その属性情報に属性
情報ＩＤ”０００４”を付与する。The process of updating the stored information will be described below step by step. 1) The accumulated information updating unit 7 accesses the combined word information accumulating unit 8 and the attribute information accumulating unit 3 and searches for the combined word “speech synthesis” that is not registered in the attribute information accumulating unit 3. 2) At the same time, the accumulated information updating unit 7 causes the continuous word “speech synthesis”.
The phoneme information “onseegoosee” and the accent information “LHHHHLLL” are obtained from the continuous word information storage unit 8. 3) The accumulated information updating unit 7, like the composition unit selecting unit 2,
Access the attribute information storage unit 3 to access the continuous word "speech synthesis"
As a synthesis unit of “on” / “LH” of “syllable” and “see” / ”of“ Venus ”as shown in (f) of FIG.
"HH", and "goosee" / "HLL" of "photosynthesis"
4) The accumulated information updating unit 7 accesses the waveform information accumulating unit 5 based on the phoneme information and accent information of the continuous word “speech synthesis”, the attribute information of each synthesis unit, and the synthesis order. , Synthesizes the speech waveform of the continuous word "speech synthesis". 5) The accumulated information updating unit 7 sends the speech waveform of the continuous word “speech synthesis” to the waveform information accumulating unit 5. 6) The waveform information storage unit 5 registers the speech waveform of the continuous word "speech synthesis" and adds the address information "0014". 7) The waveform information accumulator 5 stores the address information “001
4 ”is sent to the stored information updating unit 7. 8) The stored information updating unit 7 receives the received address information“ 00 ”.
14 ”together with the phoneme information / accent information of the continuous word“ speech synthesis ”, the attribute information of each synthesis unit, and the synthesis order are sent to the attribute information storage unit 3. 9) As shown in FIG. 2G, the attribute information storage unit 3 calculates the phoneme information of the continuous word “speech synthesis” from the received various information.
onseegoosee ”, accent information“ LHHH
HLLL ", address information" 0014 ", and waveform generation information" 0001 (1-2), 0002 (3-4), 00.
03 (3-6) "is generated and registered, and the attribute information ID" 0004 "is given to the attribute information.

【００４３】尚、上記の３つの更新過程の一具体例で
は、アクセント情報を２段階で表記しているが、これは
本発明を限定する事項ではなく、本発明は、３段階以上
の表記に対しても適応可能である。In the specific example of the above-mentioned three updating processes, the accent information is expressed in two stages, but this is not a limitation of the present invention, and the present invention is expressed in three or more stages. It can also be adapted.

【００４４】上記の一具体例では、情報更新判定部９に
おける更新判定基準として、全ての合成単位が原波形か
ら抽出されていることを取り上げているが、この基準は
単なる一具体例であり、本発明を限定するものではな
く、例えば、アクセント句から助詞や助動詞を除いた語
が属性情報蓄積部３において未登録であること等も可能
である。In the above-mentioned one specific example, the fact that all the synthesis units are extracted from the original waveform is taken as the update determination standard in the information update determining section 9, but this standard is merely one specific example. The present invention is not limited to this. For example, it is possible that a word obtained by removing a particle or auxiliary verb from an accent phrase is not registered in the attribute information storage unit 3.

【００４５】以上の説明から、本発明は、音韻情報及び
アクセント情報を用い、単語や文節等の音声波形そのも
のや、その音声波形から切りだした一部分を接ぎ合わせ
て波形合成を行っているため、明瞭性の高い良好な音声
品質で、波形編集方式より多くの任意の音声波形を生成
できることを満足しながら、音声合成時において新規合
成波形を登録したり、非音声合成時において入力された
新規音声波形を登録したり、連単語情報蓄積部８内の連
単語の音声波形を合成し登録したりすることにより、波
形合成に使用する情報の自動更新及び追加をするため、
音声合成処理時間の短縮化を実現することができる。From the above description, the present invention uses phonological information and accent information to synthesize a waveform by gluing together the speech waveform itself such as a word or a phrase or a part cut out from the speech waveform. It is possible to register a new synthesized waveform during speech synthesis or to input a new speech during non-speech synthesis, while satisfying that it can generate more arbitrary speech waveforms than the waveform editing method with good voice quality with high clarity. In order to automatically update and add information used for waveform synthesis by registering a waveform or synthesizing and registering a speech waveform of a complex word in the complex word information storage unit 8,
It is possible to shorten the speech synthesis processing time.

【００４６】[0046]

【発明の効果】以上詳細に説明したように、本発明は、
予め単語や文節程度の単位で切り出した蓄積音声波形を
その音韻情報から成る属性情報と対応付けて管理し、所
望の音声に対応する入力音韻情報等を１つ以上に分割し
た分割入力情報と、属性情報の全部もしくは一部である
合成単位属性情報とを照合し、その合成単位属性情報に
対応する蓄積音声波形を検索し接ぎ合わせることによ
り、所望の音声を合成する音声合成方式において、音声
合成中に蓄積音声波形及びその属性情報を追加するかを
判定し、追加と判定された場合にはそれらを追加した
り、連単語をその音韻情報等と対応させて蓄積している
連単語情報蓄積部に対して、蓄積音声波形と対応付られ
ていない連単語を検索し、その連単語に対応する音声波
形を、蓄積音声波形及びその属性情報を用いて生成した
り、外部から入力された音声波形及びその属性情報を蓄
積音声波形及びその属性情報として蓄積したりすること
により、自動的に蓄積音声波形及びその属性情報を追加
及び更新することができるので、音声合成処理時間の短
縮化を実現することができ、その効果は極めて大きい。As described in detail above, the present invention is
The divided input information obtained by managing the accumulated voice waveform cut out in units of words or phrases in advance in association with the attribute information including the phoneme information, and dividing the input phoneme information or the like corresponding to the desired voice into one or more, In a voice synthesis method for synthesizing a desired voice by collating the synthesis unit attribute information, which is all or a part of the attribute information, and searching the stored voice waveforms corresponding to the synthesis unit attribute information and joining them. It is determined whether to add the accumulated speech waveform and its attribute information in it. If it is determined to be added, they are added, or the compound word is accumulated by correlating the compound word with its phoneme information etc. Section, search for consecutive words that are not associated with the accumulated speech waveform, generate a speech waveform corresponding to that consecutive word using the accumulated speech waveform and its attribute information, or input from outside. By accumulating the voice waveform and its attribute information as the accumulated voice waveform and its attribute information, the accumulated voice waveform and its attribute information can be automatically added and updated, thus shortening the voice synthesis processing time. It can be realized and its effect is extremely large.

[Brief description of drawings]

【図１】本発明の音声合成方式の一実施例の概略ブロッ
ク図である。FIG. 1 is a schematic block diagram of an embodiment of a speech synthesis system of the present invention.

【図２】本発明の一実施例における音声合成時の蓄積情
報の更新過程の一具体例、及び音声合成の合間である
か、もしくは入力端子１０から音声波形、音韻情報、ア
クセント情報、そして切り出し点情報が入力されない場
合での蓄積情報の更新過程の一具体例を示す図である。FIG. 2 is a specific example of a process of updating stored information at the time of voice synthesis according to an embodiment of the present invention, and whether it is between voice synthesis, or a voice waveform, phoneme information, accent information, and cutout from the input terminal 10. It is a figure which shows one specific example of the update process of the accumulated information when point information is not input.

【図３】本発明の一実施例における非音声合成時に入力
端子１０から音声波形、音韻情報、アクセント情報、及
び切り出点情報が入力され、かつそれに対応する波形情
報が既に蓄積されている場合での蓄積情報の更新過程
（更新過程１）の一具体例を示す図である。FIG. 3 shows a case where a speech waveform, phoneme information, accent information, and cut-out point information are input from the input terminal 10 during non-speech synthesis in one embodiment of the present invention, and waveform information corresponding thereto is already accumulated. FIG. 6 is a diagram showing a specific example of an update process (update process 1) of stored information in FIG.

【図４】本発明の一実施例における非音声合成時に入力
端子１０から音声波形、音韻情報、アクセント情報、及
び切り出点情報が入力され、かつそれに対応する波形情
報が蓄積されていない場合での蓄積情報の更新過程（更
新過程２）の一具体例を示す図である。FIG. 4 shows a case where a voice waveform, phoneme information, accent information, and cut-out point information are input from the input terminal 10 during non-voice synthesis in one embodiment of the present invention, and waveform information corresponding thereto is not accumulated. It is a figure which shows a specific example of the update process (update process 2) of the accumulation information of.

【図５】従来方式の１つである波形編集方式の基本的な
ブロック図である。FIG. 5 is a basic block diagram of a waveform editing method which is one of conventional methods.

【図６】従来方式の１つである規則合成方式の基本的な
ブロック図である。FIG. 6 is a basic block diagram of a rule combining method which is one of conventional methods.

【図７】従来方式の１つである複合音声単位を用いた音
声合成方式の概略ブロック図である。FIG. 7 is a schematic block diagram of a speech synthesis method using a complex speech unit, which is one of conventional methods.

[Explanation of symbols]

１入力端子２合成単位選択部３属性情報蓄積部４音声波形合成部５波形情報蓄積部６出力端子７蓄積情報更新部８連単語情報蓄積部９情報更新判定部１０入力端子５１入力端子５２合成単位分解部５３音声波形編集部５４音声波形蓄積部５５出力端子６１入力端子６２入力端子６３合成単位分解部６４特徴パラメータ蓄積部６５音声波形合成部６６韻律規則蓄積部６７出力端子７１入力端子７２音声単位選択部７３音声単位ファイル７４音声制御知識ファイル７５単位結合部７６合成波形生成部７７出力端子 1 Input Terminal 2 Synthesis Unit Selection Section 3 Attribute Information Storage Section 4 Voice Waveform Synthesis Section 5 Waveform Information Storage Section 6 Output Terminal 7 Stored Information Update Section 8 Conjunction Word Information Storage Section 9 Information Update Judgment Section 10 Input Terminal 51 Input Terminal 52 Synthesis Unit decomposition unit 53 Speech waveform editing unit 54 Speech waveform accumulation unit 55 Output terminal 61 Input terminal 62 Input terminal 63 Synthesis unit decomposition unit 64 Characteristic parameter accumulation unit 65 Speech waveform synthesis unit 66 Prosody rule accumulation unit 67 Output terminal 71 Input terminal 72 Speech Unit selection unit 73 Voice unit file 74 Voice control knowledge file 75 Unit combination unit 76 Synthetic waveform generation unit 77 Output terminal

Claims

[Claims]

1. An information storage unit for storing and managing a stored voice waveform, which is cut out in advance in units of words or phrases, in association with attribute information consisting of phonological information of the stored voice waveform, and which corresponds to a desired voice. The input information obtained by dividing the input phoneme information as it is or into a plurality of pieces is collated with synthesis unit attribute information which is all or a part of the attribute information, and the accumulated speech waveform corresponding to the synthesis unit attribute information is searched for, and the accumulation is performed. A method of synthesizing the voice by joining the voice waveforms, wherein it is determined whether the accumulated voice waveform and the attribute information corresponding to the accumulated voice waveform are added during the synthesis of the voice, and it is determined that the attribute is added. In this case, the accumulated speech waveform and the attribute information corresponding to the accumulated speech waveform are added to the information accumulating section, and the consecutive words are accumulated in association with the phoneme information of the consecutive words. The information accumulating section is searched for the consecutive words that are not associated with the accumulated speech waveform, and the speech waveform corresponding to the uncombined consecutive words is defined as the accumulated speech waveform and the attributes of the accumulated speech waveform. Generated using information, the generated voice waveform corresponding to the continuous word and its attribute information are added to the information storage unit, and the voice waveform input from the outside and the attribute information of the voice waveform are stored in the stored voice waveform and Automatically adding or updating the stored voice waveform and the attribute information corresponding to the stored voice waveform to the information storage unit by adding or updating to the information storage unit as the attribute information of the stored voice waveform. A voice synthesis method characterized by.