JPS58134697A

JPS58134697A - Waveform editting type voice synthesizer

Info

Publication number: JPS58134697A
Application number: JP57017368A
Authority: JP
Inventors: 幸夫三留; 伏木田　勝信
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-02-05
Filing date: 1982-02-05
Publication date: 1983-08-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は波形編集型音声合成装置、特に子音−母音およ
び母音−子音等の組合せ等の音韻連鎖等を単位とする単
位音声波形を編集する型の音声合成装置に関するもので
ある◎ 種々の機械装置から人間に対して情報を伝える方法の一
つとして、音声によるものは非常に重要であシ、多くの
音声合成装置あるいは音声応答装置が提案されている。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a waveform editing type speech synthesis device, and more particularly to a speech synthesis device of the type that edits unit speech waveforms in units of phoneme chains such as consonant-vowel and vowel-consonant combinations. ◎ Voice is an extremely important method for conveying information from various mechanical devices to humans, and many voice synthesis devices or voice response devices have been proposed.

合成すべき文章や語句の数が限られている場合には、そ
の文章や語句の自然音声を記憶しておき必要に応じて出
力することによって容易に実現できる。If the number of sentences or phrases to be synthesized is limited, this can be easily achieved by storing the natural sounds of the sentences or phrases and outputting them as needed.

また、所望の文章が不特定多数の語Φから構成される場
合には、全ての語句の自然音声波形を記憶しておくこと
は不可能である。Furthermore, if the desired sentence is composed of an unspecified number of words Φ, it is impossible to store the natural speech waveforms of all the words.

このような場合にば、従来、単語よシも小さい音韻の単
位、例え□ば単音節や音韻連鎖（子音−母音あるいは母
音−子音等の組合せ）等を単位とする自然音声波形を記
□憶しておき、これらの単位音声波形を組合せて編集す
ることによって目的の文章の音声を合成していた。In such cases, it has conventionally been possible to memorize natural speech waveforms that are smaller phonological units than words, such as single syllables or phonological chains (consonant-vowel or vowel-consonant combinations, etc.). Then, by combining and editing these unit speech waveforms, the speech of the target sentence was synthesized.

この方法では、語句にピッチによるアクセントをつける
ためや、文章に自然な音調をつけるため容量の記憶装置
を必要とするという問題があった。This method has the problem of requiring a large-capacity storage device in order to add pitch accents to words and natural tones to sentences.

必要とする記憶容量が比較的少い従来例として分析合成
を利用したものがある。これは編集の単位となる単音節
等の自然音声波形を分析し、そのスペクトル形状を表わ
すパラメータとピッチや有声無声等を表わすパラメータ
を抽出して記憶しておき、合成時にはスペクトル形状を
表わすパラメータに基づいて合成フィルタの伝達関数を
制御し、有声の場合にはピッチを表わすパラメータが表
わす周期のインパルス列を前記合成フィルタに入力し、
無声の場合には不規則な雑音を前記合成フィルタに入力
することによって所望の音声波形を生成していた。A conventional example that requires relatively little storage capacity is one that uses analysis and synthesis. This analyzes natural speech waveforms such as monosyllables that are the unit of editing, extracts and stores parameters representing its spectral shape, pitch, voicedness, etc., and uses the parameters representing the spectral shape during synthesis. controlling the transfer function of the synthesis filter based on the voiced case, inputting an impulse train with a period represented by a parameter representing pitch to the synthesis filter,
In the case of unvoiced speech, a desired speech waveform is generated by inputting irregular noise to the synthesis filter.

との方法によれば、自然音声波形を記憶しておく場合に
比べ必要な記憶容量が少くてすむばかシでなく、音調や
アクセントに関係するピッチは、単にピッチを表わすパ
ラメツ夕だけを変更することによって容易に変化させら
□れる。しかし、この方法によって合成された音声は明
瞭性が良くなく、高品質な合成音を要求される応用には
適さなかつた。According to this method, the storage capacity required is smaller than when storing natural speech waveforms, and the pitch related to intonation and accent can be changed only by changing the parameters representing the pitch. It can be easily changed by □. However, the speech synthesized by this method has poor clarity and is not suitable for applications that require high-quality synthesized speech.

本発明の目的は子音−母音および母音−子音等の音韻連
鎖等を単位とする単位音声波形を編集合成する型の音声
合成装置において、比較的自然なアクセントや音調の合
成音声を比較的少い記憶容量で実現できる音声合成装置
を提供することにある。An object of the present invention is to provide a speech synthesizer of the type that edits and synthesizes unit speech waveforms in units of phonological chains such as consonant-vowel and vowel-consonant, etc., which produces relatively few synthesized speeches with relatively natural accents and tones. An object of the present invention is to provide a speech synthesis device that can be realized with limited storage capacity.

本発明によると子音−母音および母音−子音等の組合せ
等の音韻連鎖等を単位とする単位音声波形を編集する型
の音声合成装置において、前記単位音声波形を記憶する
記憶手段と、前記単位音声波形のピッチを変換する変換
手段と、前記ピッチを変換された波形を編集する編集手
段とを含むことを特徴とする。波形編集型音声合成装置
が得られるＯ次に図を用いて本発明の詳細な説明する。第１図に本発
明ゆ）一実施例のブロック図を示す。According to the present invention, there is provided a speech synthesis device of the type that edits unit speech waveforms in units of phoneme chains such as combinations of consonant-vowel and vowel-consonant, etc.; The present invention is characterized in that it includes converting means for converting the pitch of a waveform, and editing means for editing the pitch-converted waveform. A waveform editing type speech synthesis device can be obtained. Next, the present invention will be explained in detail with reference to the drawings. FIG. 1 shows a block diagram of one embodiment of the present invention.

・１・１まず合成要求入力端子１０７おら合成の開始を要求する
信号が制御回路１０５に入力されると、制御回路１０５
は第１の制御信号伝送路１０９を介して文字列分析回路
１０２に対して、文字列の入力とその分析の開始を指示
する。・1.1 First, when a signal requesting the start of synthesis is input from the synthesis request input terminal 107 to the control circuit 105, the control circuit 105
Instructs the character string analysis circuit 102 via the first control signal transmission path 109 to input a character string and start analyzing the character string.

文字列分析回路１０２は前記制御回路１０５から送られ
た制御信号に従って、文字列入力端子１．０６から入力
される合成すべき文章を表わす文字列とアクセントや音
調を表わす制御文字列とからなる文字列を、内部の表を
参照することによって合成に易摩な単位音声波形のデー
タ番号と、必要ならば変換すべきピッチの値とからなる
合成データ情報に変換し、合成データ情報伝送路１１３
を介して前記制御回路１０５に送る。In accordance with the control signal sent from the control circuit 105, the character string analysis circuit 102 analyzes characters consisting of a character string representing a sentence to be synthesized and a control character string representing an accent or tone inputted from a character string input terminal 1.06. The column is converted into synthetic data information consisting of a data number of a unit speech waveform that is easy to synthesize and a pitch value to be converted if necessary by referring to an internal table, and the synthesized data information transmission line 113
The signal is sent to the control circuit 105 via the control circuit 105.

次に、制御回路１０５は前記文字列分析回路１０２から
送られる合成データ情報を受は取シ、単位音声波形記憶
回路１０１に対し第２の制御信号伝送路１１０を介して
、単位音声波形のデータ番号と単位音声波形の転送先を
指示する。即ち単位音声波形をそのまま使う場合は第１
の単位音声波形伝送路１１４を介して波形編集回路１０
４に送らせ１ピツチを変換する必要がある場合は第２の
単位音声波形伝送路１１５を介してピッチ変換５− 回路１０３に送らせる。又、ピッチを変換する必要があ
る場合には、制御回路１０５は第３の制御信号伝送路１
１１を介して前記ピッチ変換回路１０３に対して変換す
べきピッチの値を送シ、前記単位音声波形記憶回路１０
１から送られる単位音声波形のピッチを変換させる。Next, the control circuit 105 receives the synthesized data information sent from the character string analysis circuit 102, and transmits the unit speech waveform data to the unit speech waveform storage circuit 101 via the second control signal transmission path 110. Specify the number and the destination of the unit audio waveform. In other words, when using the unit audio waveform as it is, the first
The waveform editing circuit 10 via the unit audio waveform transmission line 114 of
If it is necessary to convert one pitch, the signal is sent to the pitch conversion circuit 103 via the second unit audio waveform transmission line 115. Further, when it is necessary to convert the pitch, the control circuit 105 connects the third control signal transmission line 1
11, the pitch value to be converted is sent to the pitch conversion circuit 103 through the unit audio waveform storage circuit 10.
The pitch of the unit audio waveform sent from 1 is converted.

ピッチ変換回路１０３は前記制御回路１０５がら送られ
た制御信号に従って、前記単位音声波形記憶回路１０１
から送られた単位音声波形のピッチを変換しピッチ変換
波形伝送路１１′６を介して波形編集回路１０４に送る
。The pitch conversion circuit 103 converts the unit audio waveform storage circuit 101 according to the control signal sent from the control circuit 105.
It converts the pitch of the unit audio waveform sent from the unit and sends it to the waveform editing circuit 104 via the pitch conversion waveform transmission line 11'6.

ピッチの変換は、例えば、時間軸上で波形を伸縮させる
ことによって実現できる。具体的な方法の一例は後に述
べる。Pitch conversion can be realized, for example, by expanding or contracting the waveform on the time axis. An example of a specific method will be described later.

次に制御回路１０５は第４の制御信号伝送路１１２を介
して波形編集回路１０４に対して制御信号を送り、単位
音声波形を使う場合は前記単位音声記憶回路１０１から
送られる波形を、ピッチを変換した波形を使う場合は前
記ピッチ変換回路１０３から送られる波形を入力させて
編集させ、６− すべての波形の入力と編集が終了したら合成音声を合成
音声出力端子１０８から出力させる。Next, the control circuit 105 sends a control signal to the waveform editing circuit 104 via the fourth control signal transmission path 112, and when using a unit audio waveform, changes the pitch of the waveform sent from the unit audio storage circuit 101. When using the converted waveform, the waveform sent from the pitch conversion circuit 103 is input and edited, and 6- When input and editing of all waveforms are completed, synthesized speech is output from the synthesized speech output terminal 108.

ここで、ディジタル信号処理によって波形を伸縮させる
方法の原理を第２図によって説明する。Here, the principle of a method for expanding and contracting a waveform by digital signal processing will be explained with reference to FIG.

第２図（ａ）はピッチ周期が’ｌ’ｐｏである波形の約
−周期分の波形をサンプリグ周期Ｔｓでサンプリングし
たときのサンプル値を時間軸から実線で引いた垂線ｍで
示している。ピッチを変換するためにはまず上記のサン
プル値の間を補間周期ＴＸごとに補間したサンプル値を
求める。第２図（ａ）においては補間されたサンプル値
は時間軸から破線で引いた垂％１ｎで示した。次に、こ
の補間されたサンプル値ｎをサンプリング周期Ｔｓごと
に並べると、第２図（ｂ）に示したように波形のピッチ
周期が’ｌ’ｐｌに変換される。In FIG. 2(a), a perpendicular line m drawn as a solid line from the time axis indicates a sample value obtained when a waveform with a pitch period of 'l'po is sampled at a sampling period Ts of approximately -period. In order to convert the pitch, first, sample values are obtained by interpolating between the above sample values at every interpolation period TX. In FIG. 2(a), the interpolated sample values are indicated by vertical %1n drawn by a broken line from the time axis. Next, when these interpolated sample values n are arranged for each sampling period Ts, the pitch period of the waveform is converted to 'l'pl as shown in FIG. 2(b).

このとき、補間周期ＴＩがす゛ンプリング周期Ｔｓのα
倍であるとすると、即ち　・Ｔｒ＝α−Ｔｓ　　　　　　　　　　（１）が成立する
とき、変換されたピッチ周期ＴＰＩは元の波形のピッチ
周期Ｔｐｏのα分の１、即ち次の式（２）となる。At this time, the interpolation period TI is α of the sampling period Ts
When Tr=α−Ts (1) holds, the converted pitch period TPI is 1/α of the pitch period Tpo of the original waveform, that is, the following equation (2). .

ＴＰ１＝ＴＰＯ／α　　　　　　　（２）従ってピッチ
周波数はα倍となる。αが１よシ小さければピッチ周波
数は低下し、αが１よシ大きければピッチ周波数は上昇
する。TP1=TPO/α (2) Therefore, the pitch frequency is multiplied by α. If α is smaller than 1, the pitch frequency will decrease, and if α is larger than 1, the pitch frequency will increase.

なお、以上の説明はピッチの変換方法の一例であシ他の
方法によっても可能である。Note that the above explanation is only one example of a pitch conversion method, and other methods are also possible.

本発明によれば、単位音声波形のための記憶容量は比較
的少いながら、ピッチを変化させることによって比較的
自然なアクセントや音調の合成音が得られる。According to the present invention, although the storage capacity for a unit speech waveform is relatively small, by changing the pitch, a synthesized sound with a relatively natural accent and tone can be obtained.

[Brief explanation of the drawing]

第１図は本発明の一実施例のブロック図、第２図は音声
波形のピッチを変換する方法の一例を説明する図である
。、。１１□１第１図においてく・じ、１０１・・・・・単位音声波形
記憶１回路、１０２・・・・・・文字列分析回路、１０３・・
・・・・ピッチ変換回路、１０４・・・・・・波形編集
回路、１０５・・・・・・制御回路、１０６・・・　文
字列入力端子、１０７・・・・・・合成要求入力端子、
１０８・・・・・・合成音声出力端子。９− 第１図FIG. 1 is a block diagram of an embodiment of the present invention, and FIG. 2 is a diagram illustrating an example of a method for converting the pitch of an audio waveform. ,. 11□1 In Fig. 1, 101...Unit voice waveform memory 1 circuit, 102...Character string analysis circuit, 103...
... Pitch conversion circuit, 104 ... Waveform editing circuit, 105 ... Control circuit, 106 ... Character string input terminal, 107 ... Synthesis request input terminal,
108...Synthesized audio output terminal. 9- Figure 1

Claims

[Claims]

A speech synthesis device of a type that edits unit speech waveforms in units of phoneme chains such as combinations of consonant-vowel and vowel-consonant, etc., comprising a storage means for storing the unit speech waveforms, and a pitch of the unit speech waveforms. 1. A waveform editing type speech synthesis device comprising: a converting means for converting the pitch; and an editing means for editing the pitch-converted waveform.