JPS63169700A

JPS63169700A - Unit voice editing type rule synthesizer

Info

Publication number: JPS63169700A
Application number: JP62002247A
Authority: JP
Inventors: 松尾　則子; 伏木田　勝信
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-01-07
Filing date: 1987-01-07
Publication date: 1988-07-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は規則型音声合成装置、特に入力される音素記号
列から音声の合成波形を生成する規則型音声合成装置に
関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a regular speech synthesizer, and more particularly to a regular speech synthesizer that generates a synthesized speech waveform from an input phoneme symbol string.

（従来の技術）従来、規則型音声合成装置において、ｃｖ、ｖｃ等を単
位音声とする音声波形を編集、合成する方式がある。例
えば、音声研究会資料８８２−０６（１９８２年４月）
”ＣＶ、ＶＣ波形のピッチ同期的補間による任意語合成
方式″に記載されているような技術が知られている。(Prior Art) Conventionally, in a regular speech synthesizer, there is a method of editing and synthesizing speech waveforms using cv, vc, etc. as unit speech. For example, Voice Study Group Material 882-06 (April 1982)
A technique described in "Arbitrary word synthesis method using pitch-synchronous interpolation of CV and VC waveforms" is known.

（発明が解決しようとする問題点）従来方式では、前後の単位音声の影響（調音結合）を取
り入れていなかったため、個々の音声の明瞭性は良いが
合成音が不自然に聞こえるという欠点があった。この欠
点を解決するためにそれぞれのｃｖ、ｖｃに対して種々
の調音結合条件に対応した複数個の単位音声波形をあら
かじめ自然音声から切り出し用意して用いる方式が当然
考えられる。しかしながら、この方式は、単位音声波形
の組合せ方によって、ピッチ、スペクトル包絡が不連続
になる恐れがある。(Problem to be solved by the invention) The conventional method did not take into account the influence of the preceding and succeeding unit voices (articulatory combination), so although the clarity of individual voices was good, the synthesized voice sounded unnatural. Ta. In order to solve this drawback, it is natural to think of a method in which a plurality of unit speech waveforms corresponding to various articulatory coupling conditions are extracted and prepared in advance from natural speech for each cv and vc. However, in this method, the pitch and spectrum envelope may become discontinuous depending on how the unit speech waveforms are combined.

（問題点を解決するための手段）本発明の規則型音声合成装置は、合成すべき音声を表す
入力音素列から、ＣＶ、ＶＣ等の単位音声名列に分解す
る手段と、該入力音素列からピッチルールに従ってピッ
チ周波数データを算出する手段と、該単位音声名列から
複数個の互いに異なるピッチバタン、ホルマントパタン
を有する単位音声の組（異音サブセットと呼称する）を
選択する手段と、ピッチ、ホルマントなど各単位音声の
境界データを記憶しておくメモリーと、各単位音声波形
を記憶してお（メモリーと、選択された前記異音サブセ
ットの中から該境界データを用いてある範囲内（例えば
、一単語）において前後の単位音声の距離の和が最小と
なるように単位音声名を選択する手段と前記選択された
単位音声名に従って該単位音声波形を編集し音声の合成
を行う手段とを有する。(Means for Solving the Problems) The regular speech synthesis device of the present invention includes means for decomposing an input phoneme string representing speech to be synthesized into unit phonetic name strings such as CV and VC, and means for calculating pitch frequency data according to a pitch rule from the unit phonetic name sequence; , a memory for storing boundary data of each unit voice such as formant, and a memory for storing each unit voice waveform (memory, and a memory for storing boundary data of each unit voice such as formant, etc.). For example, means for selecting unit speech names such that the sum of the distances between the preceding and succeeding unit speeches in one word is minimum, and means for editing the unit speech waveform and synthesizing speech according to the selected unit speech names. has.

（作用）本方式は入力音素列から、生成されたＣＶ、ＶＣ等の単
位音声名列とピッチルールにより生成されたピッチ周波
数データとから異音サブセットを求め、前記異音サブセ
ットの中から前後の単位音声間の距離（ピッチ、ホルマ
ントなどの関数）の和がある一定の音声区間内（例えば
、一単語）で最小となる最適な単位音声系列を選ぶ。前
記最適な単位音声系列を選択するためには前記異音サブ
セットに含まれる単位音声に対して可能な全ての組合せ
に対する前記距離の和を求め、最小となるものを選択す
る。このように前後の音素の影響を取り入れ、且つ単位
音声間のピッチ、ホルマントの連続性を最適化して合成
することにより、明瞭性が高く滑らかな音質を持つ合成
音声が得られる。(Operation) This method calculates allophone subsets from the input phoneme string, generated unit phonetic name strings such as CV and VC, and pitch frequency data generated by pitch rules, and selects the preceding and following allophone subsets from among the allophone subsets. An optimal unit speech sequence is selected in which the sum of the distances (functions of pitch, formant, etc.) between unit speeches is the minimum within a certain speech interval (for example, one word). In order to select the optimal unit speech sequence, the sum of the distances for all possible combinations of unit speech included in the allophone subset is determined, and the minimum one is selected. In this way, by incorporating the influence of the preceding and following phonemes and optimizing the continuity of pitch and formant between unit voices for synthesis, synthesized speech with high clarity and smooth sound quality can be obtained.

（実施例）第１図は本発明の原理を実現するための一実施例を示す
ブロック図である。入力端子１から入力された音素列は
、単位音声名列分解回路２とピッチルール回路３に入力
され、２ではｃｖ、ｖｃ等の単位音声名列に分解され、
３では前記音素列に従って前記各単位音声のピッチ周波
数値（ある程度幅をもたせる）が決定される。２と３の
結果をもとに複数個の単位音声の組である異音サブセッ
トを異音サブセット選択回路４で選ぶ。最適値選択回路
５において境界条件記憶メモリ６に記憶されている各単
位音声の境界におけるピッチ、ホルマントデータを用い
て前後の単位音声間の距離の和がある音声区間内（例え
ば、一単語）で最小となるように、４で選ばれた異音サ
ブセットの中から選択する。５で選ばれた最適な異音サ
ブセットと単位音声波形メモリー７に記憶されている単
位音声波形を用いて音声合成回路８で該単位音声波形を
編集合成し、出力端子９から出力する。単位音声波形の
編集合成方式については、前記文猷音声研究会費料８８
２−０６（１９８２年４月）”ｃｖ、ｖｃ濾波形ピッチ
同期的補間による任意語合成方式″に詳しいのでここで
は説明を省略する。(Embodiment) FIG. 1 is a block diagram showing an embodiment for realizing the principle of the present invention. The phoneme string input from the input terminal 1 is input to the unit phonetic name string decomposition circuit 2 and the pitch rule circuit 3, where it is decomposed into unit phonetic name strings such as cv, vc, etc.
3, the pitch frequency value (with some width) of each unit voice is determined according to the phoneme sequence. Based on the results of steps 2 and 3, an allophone subset selection circuit 4 selects an allophone subset that is a set of a plurality of unit voices. The optimum value selection circuit 5 uses the pitch and formant data at the boundaries of each unit voice stored in the boundary condition storage memory 6 to determine the sum of the distances between the preceding and succeeding unit voices within a speech interval (for example, one word). Select from among the allophone subsets selected in step 4 so as to be the minimum. Using the optimal abnormal sound subset selected in step 5 and the unit speech waveform stored in the unit speech waveform memory 7, the speech synthesis circuit 8 edits and synthesizes the unit speech waveform and outputs it from the output terminal 9. Regarding the unit speech waveform editing and synthesis method, please refer to the above-mentioned Bunyu Speech Study Group fee 88.
2-06 (April 1982) "Arbitrary Word Synthesis Method Using CV, VC Filtered Pitch Synchronous Interpolation" is detailed, so the explanation will be omitted here.

（発明の効果）以上、説明したように本願はｃｖ、ｖｃ等の単位音声名
列とピッチルールと異音サブセットからある範囲内（例
えば、一単語）で最適化されたピッチ、ホルマントなど
を用いることにより前後の音素の影響を取り入れた合成
音を得ることができ、明瞭性が高くなめらかな音質が得
られる。(Effects of the Invention) As explained above, the present application uses pitches, formants, etc. that are optimized within a certain range (for example, one word) from unit phonetic name sequences such as cv and vc, pitch rules, and allophone subsets. By doing this, it is possible to obtain a synthesized sound that takes into account the effects of the preceding and following phonemes, resulting in high clarity and smooth sound quality.

[Brief explanation of the drawing]

第１図は、本発明°の一実施例を示すブロック図である
。１・・・・・・入力端子２・・・・・・単位音声名列分解回路３・・・・・・ピッチルール回路４・・・・・・異音サブセット選択回路５・・・・・・
異音サブセット最適値選択回路６・・・・・境界条件記
憶メモリー７・・・・・単位音声波形メモリー８・・・・・音声合成回路FIG. 1 is a block diagram showing an embodiment of the present invention. 1... Input terminal 2... Unit phonetic name string decomposition circuit 3... Pitch rule circuit 4... Allophone subset selection circuit 5...・
Abnormal sound subset optimal value selection circuit 6...Boundary condition memory memory 7...Unit speech waveform memory 8...Speech synthesis circuit

Claims

[Claims]

means for decomposing an input phoneme string representing speech to be synthesized into unit phonetic name strings such as CV and VC; means for calculating pitch frequency data from the input phoneme string according to a pitch rule; means for selecting an allophone subset according to pitch frequency data; a memory for storing boundary data of each unit voice such as pitch and formant; a memory for storing each unit voice waveform; Means for selecting a unit phonetic name from a subset using the boundary data so that the sum of the distances of preceding and following unit phonetics within a certain phonetic interval (for example, one word) is minimum; and the selected unit phonetic name. and means for editing the unit speech waveform and synthesizing speech according to the following.