JPS6032720Y2

JPS6032720Y2 - speech synthesizer

Info

Publication number: JPS6032720Y2
Application number: JP1981024176U
Authority: JP
Inventors: 謙二高畠
Original assignee: 日本電気株式会社
Priority date: 1981-02-23
Filing date: 1981-02-23
Publication date: 1985-09-30
Also published as: JPS57137000U

Description

【考案の詳細な説明】本考案は音素合成による音声合成装置に関する。[Detailed explanation of the idea] The present invention relates to a speech synthesis device using phoneme synthesis.

一般に日本語や英語等の単音は、第１図に示すように、
前に子音部Ｃを有し後ろに母音部Ｖを伴った波形となる
。In general, single sounds in Japanese and English, etc., are as shown in Figure 1.
The waveform has a consonant part C at the front and a vowel part V at the back.

尚、母音だけの音声は母音部Ｖのみの波形となる。Note that a voice consisting only of vowels will have a waveform of only the vowel part V.

所定の周波数帯域でサンプリングした単音の母音部は、
その中に代表的な母音波形（所定のピッチ周期をもつ）
を有しており、母音部内ではその代表母音波形が繰り返
されていることが知られている。The vowel part of a single note sampled in a predetermined frequency band is
A typical vowel waveform (with a predetermined pitch period)
It is known that the representative vowel sound waveform is repeated within the vowel part.

かかる代表母音波形を所定のサンプリング周期で量子化
してメモリに記憶し、このメモリから量子化データを繰
り返し読み出してアナログ信号に変換することにより、
音声を合皮する方式の音素合成装置が従来知られている
。By quantizing such a representative vowel waveform at a predetermined sampling period and storing it in a memory, and repeatedly reading out the quantized data from this memory and converting it into an analog signal,
2. Description of the Related Art A phoneme synthesizer that synthesizes speech has been known in the past.

しかしながら、従来の音素合成装置は代表母音波形を規
則的に繰り返すことを基本として合皮するものであるた
め、合皮された音声からは不自然さが感じられた。However, since the conventional phoneme synthesizer synthesizes based on the regular repetition of representative vowel sound waveforms, the synthesized speech feels unnatural.

本考案の目的は自然音声に近い合成音声を得る音素合成
による音声脅威装置を提供することにある。The purpose of the present invention is to provide a voice threat device using phoneme synthesis to obtain synthesized speech close to natural speech.

この考案によれば、単音波形内に現われる所定の周期を
有する第１の繰り返し波形及び前記周期の近くで互いに
異なる周期をもつ少なくとも１個の第２の繰り返し波形
とを所定の時間間隔で量子化したデータ群を有する記憶
部と、この記憶部から前記第１及び第２の繰り返し波形
の量子化データをランダムに読み出す手段と、読み出さ
れた量子化データをもとにして音声信号を作成する手段
とを有することを特徴とする音声脅威装置が得ら”れる
。According to this invention, a first repetitive waveform having a predetermined period appearing in a single waveform and at least one second repetitive waveform having a mutually different period near the said period are quantized at a predetermined time interval. a storage unit having a data group; a means for randomly reading out quantized data of the first and second repetitive waveforms from the storage unit; and creating an audio signal based on the read quantized data. A voice threat device is obtained, characterized in that it has means.

本考案は、例えば人間の音声が、音源から声道を伝達し
て口腔から発生される間に、そのピッチ周期（前記の繰
り返し波形周期）にゆらぎがあることに注目したもので
ある。The present invention focuses on the fact that, for example, human voice has fluctuations in its pitch cycle (the above-mentioned repetitive waveform cycle) while it is transmitted from the sound source through the vocal tract and generated from the oral cavity.

本考案では合皮される音声にこのゆらぎを与えるために
、代表される第１の繰り返し波形（音素）の周期の近傍
で互いに異なる周期をもつ第２の繰り返し波形を用意し
て、これらを量子化したディジタルデータをメモリに設
定している。In this invention, in order to give this fluctuation to the synthesized speech, we prepare second repetitive waveforms with different periods near the period of the representative first repetitive waveform (phoneme), and then combine them with quantum The converted digital data is stored in memory.

この結果、用意した第１及び第２の繰り返し波形を示す
量子化データを読み出して、これらを基にして音素合成
を行なうことにより、ピッチ周期のゆらぎを加味した自
然音声に近い合成音声を得ることができる。As a result, by reading out the prepared quantized data indicating the first and second repetitive waveforms and performing phoneme synthesis based on these, synthesized speech close to natural speech with pitch period fluctuations taken into account can be obtained. Can be done.

更に、前記第１及び第２の繰り返し波形を不規則に読み
出すことにより、合成音を常時聴取する人にとってあた
かも人間が発声しているかのように感じさせることがで
きる。Furthermore, by reading out the first and second repetitive waveforms irregularly, a person who constantly listens to the synthesized sound can feel as if a human being is speaking.

以下に図面を参照して本考案の一実施例を詳細に説明す
る。An embodiment of the present invention will be described in detail below with reference to the drawings.

第２図は、１つの単母音の中に含まれる代表音素として
、所定のピッチＴ１を有する第１の繰り返し波形の模型
図で、第３図ａ〜Ｃは、第１の繰り返し波形のピッチＴ
よの近傍で互いに異なるピッチＴ２．Ｔ３．Ｔ、を
有する第２の繰り返し波形群の模型図である。FIG. 2 is a model diagram of a first repetitive waveform having a predetermined pitch T1 as a representative phoneme included in one single vowel, and FIGS.
Different pitches T2. T3. FIG. 3 is a schematic diagram of a second repetitive waveform group having T.

第４図はこの実施例で用いる音声合成装置の要部ブロッ
ク図である。FIG. 4 is a block diagram of the main parts of the speech synthesizer used in this embodiment.

その構成は、第２図及び第３図に示された第１及び第２
の繰り返し波形を、例えば１０ＫＨｚのサンプリング周
期で量子化したディジタルデータ群が所定の領域に格納
されているメモリ（例えば、ＲＯＭ）ｌと、このメモリ
１からデータを読み出す制御を行なう読み出し制御部２
と、読み出されたデータに基いて合成処理を行なう処理
部３と、合皮されたディジタル音声をアナログ信号に変
換してスピーカ５に出力する出力部４とを含む。Its configuration consists of the first and second
A memory (e.g., ROM) 1 in which a group of digital data obtained by quantizing a repetitive waveform at a sampling period of, for example, 10 KHz is stored in a predetermined area, and a read control unit 2 that controls reading data from the memory 1.
, a processing section 3 that performs synthesis processing based on the read data, and an output section 4 that converts the synthesized digital audio into an analog signal and outputs it to the speaker 5.

メモリ１内の所定のアドレス空間に波形毎に整列されて
いる第１及び第２の繰り返し波形量子化データ（例えば
、ｌサンプリング点当り８ビツト（このうち１ビツトは
振幅の正負を示す符号データ）で正規化されたデータ）
ａｔ〜ａｎ（周期Ｔ１）、ｂ、〜ｂｎ（周期Ｔ２）、
Ｃ１〜Ｃｎ（周期Ｔ３）、ｄ１〜ｄｎ（周期Ｔ、）（第
２，３図参照）は、読み出し制御部２から出力されるア
ドレス信号により任意に読み出されて処理部３へ送られ
る。First and second repeated waveform quantized data arranged for each waveform in a predetermined address space in the memory 1 (for example, 8 bits per sampling point (of which 1 bit is code data indicating the sign of the amplitude) (normalized data)
at~an (period T1), b,~bn (period T2),
C1 to Cn (period T3) and d1 to dn (period T, ) (see FIGS. 2 and 3) are arbitrarily read out by the address signal output from the readout control section 2 and sent to the processing section 3.

今、例えば周期Ｔ１の波形が２回連続して読み出され、
次に、周期Ｔ、の波形が３回連続して読み出され、その
後周班「２の波形が２回読み出され、読いて周期Ｔ１の
波形が読み出されるように制御されると、処理部３に入
力されるデータ列ハａ１〜ａｎ、ａ１〜ａｎ１ｄ１〜ｄ
ｎ、ｄ□〜ｄｎ、ｂ１〜ｂｎ、ｂ□〜ｂｎ、ａ
□、ａｎ、＊ｅ＊−ｅ−となる。Now, for example, a waveform with period T1 is read out twice in succession,
Next, the waveform of period T is read out three times in a row, and then the waveform of period T1 is read out twice, and the processing unit Data strings input to 3 are a1 to an, a1 to an1, and d1 to d.
n, d□~dn, b1~bn, b□~bn, a
□, an, *e*-e-.

この結果、処理部３で必要な合成処理（例えば母音波形
の包路線データ、即ち各繰り返し波形の中での振幅最太
屯に結ぶエンベロープデータとの乗算処理や、更に音量
（音の強さ）をかみすべく振幅の倍率データとの乗算処
理等）を行なって、出力部４のＤ／Ａ変換回路へ伝送す
ることにより、第５図に示す母音波形がスピーカ５へ入
力される。As a result, the processing unit 3 performs necessary synthesis processing (for example, multiplication processing with envelope data of the vowel waveform, that is, envelope data connected to the maximum amplitude of each repeated waveform), and further processing of the volume (sound intensity). The vowel waveform shown in FIG. 5 is input to the loudspeaker 5 by performing multiplication processing with amplitude magnification data, etc. in order to capture the amplitude, and transmitting it to the D/A conversion circuit of the output section 4.

この音声アナログ波形を見ても明らかなように単音節内
に含まれる繰り返し波形は夫々ピッチの異なる波形が連
続して現れているので、人間の声道で生じる微妙な周波
数のゆらぎをもった音声信号を合皮することができる。As is clear from looking at this audio analog waveform, the repetitive waveform contained within a single syllable is a series of waveforms with different pitches, so the voice has subtle frequency fluctuations that occur in the human vocal tract. The signal can be made of synthetic leather.

このため、スピーカを通して聞こえる音声は、聴取者に
とってあたかも人間が発音しているかのように聞こえる
。Therefore, the sound heard through the speaker sounds to the listener as if it were being produced by a human being.

更に、第６図に示すように読み出し制御部２としてラン
ダムアドレス発生部６と、母音の種類（例えば、日本語
であれば、あ、い、う、え、お、英語であればａ、
ｉ、１１．ｅ、ｏ）を指定する種類指定部７と、
繰り返し波形内のサンプリングデータを指定する波形指
定部８とを有するよう′に構成してもよい。Furthermore, as shown in FIG. 6, the readout control unit 2 includes a random address generation unit 6 and the type of vowel (for example, in Japanese, a, i, u, e, o; in English, a,
i, 11. e, o), a type specifying section 7 for specifying
It may also be configured to include a waveform specifying section 8 for specifying sampling data within a repetitive waveform.

この場合、メモリ９内は、例えば日本語では、アドレス
空間が１あ、ｒおヨの５つのテーブルに分割され、各
テーブル内にピッチの異なる繰り返し波形が複数個用意
されていればよい。In this case, in the memory 9, for example, in Japanese, the address space is divided into five tables 1, 2, and 3, and a plurality of repetitive waveforms with different pitches are provided in each table.

合成時には、種類指定部７で合皮すべき母音の種類（ア
ドレスの上位ビット）を指定して、ランダムアドレス部
６の内容によって異なるピッチ群の中の任意の１つを指
定（アドレスの中位ビット）シ、波形指定部８でその波
形内のサンプリングデータを順次読み出すように制御す
る。At the time of synthesis, the type of vowel to be synthesized is specified in the type specifying section 7 (upper bits of the address), and any one of the different pitch groups is specified depending on the contents of the random address section 6 (the middle bit of the address). bit) The waveform specifying section 8 controls the sampling data within the waveform to be read out sequentially.

波形指定部８としてサンプリング数に応じた数を計数す
るカウンタを用いれば、そのカウンタのカウント終了に
応じてランダムアドレス部６の内容を変更するようにす
れば、順次任意のピッチの繰り返し波形を連続して読み
出すことができる。If a counter that counts a number according to the number of samples is used as the waveform specifying section 8, and the contents of the random address section 6 are changed according to the end of the count of the counter, a repeating waveform of an arbitrary pitch can be successively created. It can be read out.

しかもランダムアドレス部６が１つの母音内の繰り返し
波形数に応じてランダムにその内容を変更するように、
例えばランダム数値発生プログラムを用いたり、ポリノ
ミナルカウンタ等を用いたりすれば、合皮する毎に繰り
返し波形の配列が変化するため、常聴者にとっても聞く
毎に微妙な可聴者の違いが感じられ、何の抵抗もなく自
然音として聞き入れることができる。Furthermore, the random address section 6 changes the contents randomly according to the number of repeated waveforms within one vowel.
For example, if you use a random number generation program or a polynomial counter, the array of waveforms will change repeatedly each time you synthesize the skin, so even regular listeners will be able to feel subtle differences between the audible sounds each time they listen. You can listen to it as a natural sound without any resistance.

これは聴取者の精神構造に対して極めて安定感を与える
という効果が得られる。This has the effect of giving an extremely stable feeling to the listener's mental structure.

尚、母音の音色は繰り返し波形の形で決まるので、前述
した第１及び第２の繰り返し波形としては、各母音の中
からピッチの異なる近似した波形を抽出して用意しても
よいが、１つの代表音素をもとにして所定の範囲（即ち
、微妙なピッチのゆらぎを感じさせることができる程度
）でそのピッチを変えて、そのピッチに応じて波形近似
を行なって用意してもよい。Note that since the timbre of a vowel is determined by the shape of the repeated waveform, the first and second repeated waveforms described above may be prepared by extracting approximate waveforms with different pitches from each vowel; The pitch may be varied within a predetermined range (that is, to the extent that subtle pitch fluctuations can be felt) based on one representative phoneme, and a waveform approximation may be performed in accordance with the pitch.

更に、メモリへのアクセス速度や合成処理速度を、クロ
ック信号等の制御信号を用いて変更することにより合皮
される音声の繰り返し波形のピッチを変更してもよい。Furthermore, the pitch of the repetitive waveform of the voice to be synthesized may be changed by changing the access speed to the memory or the synthesis processing speed using a control signal such as a clock signal.

尚、繰り返し波形内を例えば、１００μｓ間隔でサンプ
リングする場合、そのサンプリング開始点とサンプリン
グ終了点とはその前後に連続する波形の終了点と開始点
とに一致するように設定しておく方が波形のとびが生じ
ないために望ましい。Note that when sampling a repetitive waveform at intervals of 100 μs, for example, it is better to set the sampling start point and sampling end point to match the end point and start point of the consecutive waveforms before and after them. This is desirable because it prevents skipping.

この例としては、開始及び終了を夫々振幅零にすればよ
い。In this example, the amplitude may be set to zero at the start and end.

更にメモリ内に量子化されるデータはＰＣＭ、ＡＤＰ
ＣＭ等最適のデータ格納方式を採用しても差し支えない
。Furthermore, the data to be quantized in memory is PCM, ADP
There is no problem in adopting an optimal data storage method such as CM.

又、声道内の周波数ゆらぎとして繰り返し波形のピッチ
にゆらぎをもたせる例を提示したが、ピッチは同じでそ
の波形の振幅の変化したもの、あるいは波形自体を適度
に変形したもの、あるいはピッチを初めこられを含めて
最適に波形変更を行なったものをメモリに用意しておい
てもよいし、合成時に振幅変更処理や波形歪処理等を行
なうようにしてもよい。In addition, we have presented an example in which the pitch of a repetitive waveform is varied as a frequency fluctuation in the vocal tract. Optimally modified waveforms including these may be prepared in memory, or amplitude modification processing, waveform distortion processing, etc. may be performed at the time of synthesis.

又、メモリに対するアドレス制御はハードウェア制御で
もソフトウェア制御でもよい。Further, address control for the memory may be either hardware control or software control.

更に、単母音だけでなく前にノイズ信号を伴う子音を合
皮したり、まとまった単語や文章を合皮したりする場合
にもこの考案は十分適用できることを開示しておく。Furthermore, it is disclosed that this invention can be fully applied not only to single vowels but also to cases in which consonants accompanied by a noise signal in front of them are synthesized, or to synthesize words or sentences that are grouped together.

[Brief explanation of the drawing]

第１図は単音（ノイズ＋母音）波形図、第２図はある母
音内の代表される所定のピッチをもつ繰り返し波形図、
第３図ａ、ｂｙｃは夫々第２図の波形に近似した互
いに異なるピッチの波形図、第４図はこの考案の一実施
例を示す要部ブロック図、第５図は合皮された母音部の
一部のアナログ波形図、第６図は他の実施例を示す要部
ブロック図である。１・・・・・・読み出し制御部、２，９・・・・・・メ
モリ、３・・・・・・処理部、４・・・・・・出力部、
５・・・・・・スピーカ、６・・・・・・ランダムアド
レス部、７・・・・・・種類指定部、８・・・・・・波
形指定部。Figure 1 is a single sound (noise + vowel) waveform diagram, Figure 2 is a repetitive waveform diagram with a predetermined pitch representative of a certain vowel,
Figures 3a and by c are waveform diagrams with different pitches that approximate the waveforms in Figure 2, Figure 4 is a block diagram of the main part showing an embodiment of this invention, and Figure 5 is a synthesized vowel. FIG. 6 is a block diagram of a main part showing another embodiment. 1... Readout control unit, 2, 9... Memory, 3... Processing unit, 4... Output unit,
5... Speaker, 6... Random address section, 7... Type specification section, 8... Waveform specification section.

Claims

[Scope of utility model registration request]

sampling data of a first audio signal waveform having a predetermined pitch; and sampling data of at least one second audio signal waveform having a waveform shape similar to the first audio signal waveform and having different pitches. the means by which it occurs;
It is characterized by comprising means for successively reading sampling data of each of the first and second audio signal waveforms in an arbitrary order, and means for creating an analog audio signal based on the read sampling data. Speech synthesis device.