JPH035598B2

JPH035598B2 -

Info

Publication number: JPH035598B2
Application number: JP56168459A
Authority: JP
Inventors: Noboru Sonehara
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1981-10-20
Filing date: 1981-10-20
Publication date: 1991-01-25
Also published as: JPS5868799A

Description

【発明の詳細な説明】本発明は、音声を合成する事のできる音声合成
装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech synthesis device capable of synthesizing speech.

この種音声合成装置としては、現在パーコール
方式を採用したデイジタル音声合成装置が主流を
なしている。斯るパーコール方式は、その音声分
析時に於いて、音声が定常とみなせるフレーム周
期（10ｍsec）毎に音声波形の偏自己相関係数
（パーコール係数）、並びに音源情報がパラメータ
として導出される。このパーコール係数として
は、8KHz程度でサンプリングされた音声波形の
隣接した２サンプル値間のパーコール係数K₁、
２サンプル間隔の２サンプル値間のパーコール係
数K₂、及びｎサンプル間隔の２サンプル値間の
パーコール係数Knとした場合の例えば10個のパ
ーコール係数K₁、K₂、〜K₁₀が用いられる。一
方、音源情報は、上記パーコール係数K₁、K₂、
〜K₁₀を抽出した後の残差信号から得た音声の有
声無声の判別並びに有声の場合の基本周期を示す
ピツチ係数Ｐ、及び音源の振巾係数Ａからなつて
いる。即ち、斯様な係数群K₁〜K₁₀、Ｐ、Ａが１
フレームをなし、10秒の音声であれば、1000フレ
ーム分の系列が連続して得られる。斯して得られ
た音声毎のフレーム系列が、多数の音声について
パラメータROM（リードオンリーメモリ）に格
納される。 As this type of speech synthesis apparatus, digital speech synthesis apparatuses employing the Percoll method are currently the mainstream. In this Percoll method, during voice analysis, the partial autocorrelation coefficient (Percoll coefficient) of the voice waveform and sound source information are derived as parameters for each frame period (10 msec) in which the voice can be considered stationary. This Percoll coefficient is the Percoll coefficient K ₁ between two adjacent sample values of the audio waveform sampled at about 8KHz,
For example, 10 Percoll coefficients K ₁ , K ₂ , to K ₁₀ are used, assuming that the Percoll coefficient K ₂ is between two sample values at a two-sample interval, and the Percoll coefficient Kn is between two sample values at an n-sample interval. On the other hand, the sound source information is the above Percoll coefficients K ₁ , K ₂ ,
It consists of a pitch coefficient P indicating whether the voice is voiced or unvoiced obtained from the residual signal after extracting ~ _K10 , a pitch coefficient P indicating the fundamental period in the case of voice, and an amplitude coefficient A of the sound source. That is, such coefficient groups K ₁ to K ₁₀ , P, A are 1
If the audio consists of frames and is 10 seconds long, a sequence of 1000 frames can be obtained continuously. The frame sequence for each voice thus obtained is stored in a parameter ROM (read-only memory) for a large number of voices.

従来のパーコール方式の音声合成装置の構成を
第１図に示し、この装置に用いられるパラメータ
ROMの内容の一例を第２図に示す。第１図に於
いて、１は、各番地が１バイト（８ビツト）で構
成されているパラメータROMであり、第２図に
依ればピツチ係数Ｐに６ビツト、振巾係数Ａに５
ビツト、パーコール係数K₁、〜、K₁₀に37ビツ
ト、を配分し、１フレーム単位で48ビツト、即ち
６番地分のメモリ領域を占めている。そしてこの
フレーム系列が各音声毎に格納されている。２は
該パラメータROM１に格納された各音声のフレ
ーム系列の先頭番地を各音声毎に貯えたアドレス
テーブル、３はアドレスカウンタであり、該アド
レステーブル２に貯えられたパラメータROM１
の先頭番地を読み出し、この先頭番地を指定する
事に依つて、パラメータROM１に格納された目
的の音声のフレーム系列を順次読み出すものであ
る。４は該パラメータROM１からフレーム周期
毎に読み出されて来るフレーム単位の係数群Ａ、
Ｐ、K₁₀、K₉、〜、K₁に基づいて音声波形を再合
成する合成回路であり、ピツチ係数Ｐと振巾係数
Ａとに依つて、周期Ｐ振巾Ａの音声ピツチパルス
又は振巾Ａの無声雑音からなる音源を作り、これ
をパーコール係数K₁、〜K₁₀に依つて係数制御さ
れる声道特性を模擬したデイジタルフイルタに掛
ける事に依つて、音声波形が得られる。５は該合
成回路４からの音声信号を増巾するアンプであ
り、スピーカ６を駆動せしめるものである。７は
制御部であり、上記アドレスカウンタ３にアクセ
スすると共に上記合成回路４にスタート信号を送
出する機能を備えている。 Figure 1 shows the configuration of a conventional Percall-based speech synthesis device, and the parameters used in this device.
An example of the contents of the ROM is shown in Figure 2. In Fig. 1, 1 is a parameter ROM in which each address consists of 1 byte (8 bits), and according to Fig. 2, the pitch coefficient P has 6 bits, and the amplitude coefficient A has 5 bits.
37 bits are allocated to the percoll coefficients _K1 to _K10 , and each frame occupies 48 bits, that is, a memory area equivalent to 6 addresses. This frame series is stored for each voice. 2 is an address table that stores the starting address of the frame series of each audio stored in the parameter ROM 1 for each audio; 3 is an address counter;
By reading out the starting address of , and specifying this starting address, the frame series of the target audio stored in the parameter ROM 1 is sequentially read out. 4 is a frame unit coefficient group A read out from the parameter ROM 1 every frame period;
This is a synthesis circuit that resynthesizes the speech waveform based on P, K ₁₀ , K ₉ , _. A speech waveform is obtained by creating a sound source consisting of unvoiced noise A and applying it to a digital filter simulating vocal tract characteristics that is controlled by the Percoll coefficients K ₁ to K ₁₀ . Reference numeral 5 denotes an amplifier that amplifies the audio signal from the synthesis circuit 4, and drives the speaker 6. Reference numeral 7 denotes a control section, which has the function of accessing the address counter 3 and sending a start signal to the synthesis circuit 4.

一方、通常上述の如き構成の音声合成装置を用
いて、数秒から数分に亘る文章を発声させてい
る。しかしながら、使用者がこの文章の途中から
の音声を聞きたい時には、アドレスカウンタ３に
依つて、この文章に対応するパラメータROM１
のフレーム系列の途中からこのフレーム系列を読
み出さねばならない。この場合、重要な事は、合
成回路４の動作をスタートせしめる時点で、アド
レスカウンタ３が指定すべきパラメータROM１
の番地はフレーム単位での先頭番地でなければな
らないことである。即ち、上記合成回路４は、１
フレーム単位の係数群について、第２図から明ら
かな如く、ピツチ係数Ｐ、振巾係数Ａ、パーコー
ル係数K₁₀、K₉、〜、K₁の順に入力されるぺく、
構成されているので、この先頭番地以外の番地か
らスタートすれば、音声の合成は全く不可能とな
る。 On the other hand, a speech synthesizer having the above-mentioned configuration is usually used to produce sentences lasting from several seconds to several minutes. However, when the user wants to hear the audio starting from the middle of this sentence, the address counter 3 selects the parameter ROM 1 corresponding to this sentence.
This frame series must be read from the middle of the frame series. In this case, the important thing is that the address counter 3 should specify the parameter ROM1 at the time when the operation of the synthesis circuit 4 is started.
The address must be the first address in each frame. That is, the synthesis circuit 4 has 1
Regarding the coefficient group for each frame, as is clear from FIG. 2, the pitch coefficient P, the amplitude coefficient A, and the Percoll coefficients K ₁₀ , K ₉ , . . . , K ₁ are input in this order.
Therefore, if you start from an address other than this first address, it will be completely impossible to synthesize speech.

従つて、第１図の如き従来装置に於いては、テ
ーブル２にパラメータROM１のフレーム系列の
各先頭番地を格納しておけば、このテーブル２に
従つてアドレスカウンタ３はスタート時点でフレ
ーム系列の先頭番地のみを指定できるが、この為
には、上記アドレステーブル２のメモリ容量を大
きくしなければならず、斯る音声合成装置全体の
大型化につながり、コスト面で不利となる。 Therefore, in the conventional device as shown in FIG. 1, if the starting address of each frame series of the parameter ROM 1 is stored in the table 2, the address counter 3 will start the frame series at the starting point according to the table 2. Although only the first address can be specified, this requires increasing the memory capacity of the address table 2, leading to an increase in the size of the entire speech synthesis device, which is disadvantageous in terms of cost.

本発明は斯る点に鑑みて為されたものであり、
以下に詳述する。本発明の音声合成装置の一実施
例を第３図に示し、この装置に用いられるパラメ
ータROMの内容を第４図に示す。第３図に於い
て、３〜７は第１図の従来装置と同様に、アドレ
スカウンタ〜制御部を示している。１′は各番地
が１バイトで構成されたパラメータROMであ
り、第４図に示す如く、各番地毎の第２ビツト図
から第８ビツト図までの７ビツト分を用いて、連
続した７番地毎に１フレーム分の係数群、即ち、
６ビツトのピツチ係数Ｐと５ビツトの振巾係数Ａ
と37ビツトのパーコール係数K₁、〜、K₁₀とが格
納されている。そして、各フレーム単位の先頭番
地の第１ビツト目には特別コードとして論理
“１”が書き込まれており、他の番地の第１ビツ
ト目には論理“０”が書き込まれている。８はゲ
ート回路であり、アドレスカウンタ３が示すパラ
メータROM１′の番地の第１ビツト目が論理
“１”である時に、制御部７から得られるスター
ト信号が合成回路４へ伝達される。 The present invention has been made in view of these points,
The details are explained below. An embodiment of the speech synthesis device of the present invention is shown in FIG. 3, and the contents of the parameter ROM used in this device are shown in FIG. In FIG. 3, numerals 3 to 7 indicate an address counter to a control section, similar to the conventional device shown in FIG. 1' is a parameter ROM in which each address consists of 1 byte, and as shown in Figure 4, 7 bits from the 2nd bit diagram to the 8th bit diagram of each address are used to store 7 consecutive addresses. A group of coefficients for one frame every time, that is,
6-bit pitch coefficient P and 5-bit amplitude coefficient A
and 37-bit Percoll coefficients K ₁ , ~, K ₁₀ are stored. A logic "1" is written as a special code in the first bit of the first address of each frame unit, and a logic "0" is written in the first bit of other addresses. A gate circuit 8 transmits a start signal obtained from the control section 7 to the synthesis circuit 4 when the first bit of the address of the parameter ROM 1' indicated by the address counter 3 is logic "1".

斯様な構成の音声合成装置に於いて、文章を発
声せしめる為に第４図に示した如く多数のフレー
ム系列を格納したパラメータROM１′を用い、
その文章の途中からの音声を発生せしめる場合、
制御部７は、アドレスカウンタ３の内容を数μsec
の高速で増加せしめ、パラメータROM１′から
その番地毎の内容を順次読み出し続ける。そして
所望の時点で音声合成開始の為のスタート命令が
あると、読出制御部７はスタート信号を生成し、
ゲート回路８にこのスタート信号が送出される
と、ゲート回路はこの後に、パラメータROM
１′の各番地の第１ビツト目から論理“１”が得
られた時に、このスタート信号を合成回路４に伝
達する。斯様にしてスタート信号を受けた合成回
路４はこの時点から、即ち、パラメータROM
１′からフレーム単位の先頭番地の内容が読み出
されている時点から以後、パラメータROM１′
の各番地の第２ビツト目から第８ビツト目迄の内
容を取り込む。 In a speech synthesizer having such a configuration, a parameter ROM 1' storing a large number of frame sequences as shown in FIG. 4 is used to produce sentences.
If you want to generate audio from the middle of the sentence,
The control unit 7 updates the contents of the address counter 3 for several microseconds.
The contents of each address are successively read out from the parameter ROM 1'. When there is a start command to start voice synthesis at a desired time, the readout control unit 7 generates a start signal,
When this start signal is sent to the gate circuit 8, the gate circuit will then write the parameter ROM.
When a logic "1" is obtained from the first bit of each address of 1', this start signal is transmitted to the synthesis circuit 4. From this point on, the synthesis circuit 4 that has received the start signal in this way starts the parameter ROM.
From the time when the contents of the first address of the frame unit are read from 1', parameter ROM1'
The contents from the 2nd bit to the 8th bit of each address are taken in.

一方、上記ゲート回路８から得られるスタート
信号は、再び制御部７に帰還され、この信号が得
られた時点で、アドレスカウンタ３の内容を増加
せしめる周期を変換する。即ち、パラメータ
ROM１′のフレーム単位の７番地分を高速で順
次指定し、次の１フレーム単位の７番地分を高速
で順次指定する迄に、10ｍsec程度の休止時間を
設け、フレーム周期（10ｍsec）毎にフレーム単
位の７番地分を指定する。これに依つて、合成回
路４にはフレーム周期毎にパラメータROM１′
からフレーム単位の係数群がピツチ係数Ｐ、振巾
係数Ａ、パーコール係数K₁₀、K₉、〜、K₁の順で
正しく導入される。そして、この合成回路４はこ
れ等の係数群に基づいて音声波形を合成し、アン
プ５とスピーカ６とに依つて所望の途中箇所から
の合成音声が発せられる。 On the other hand, the start signal obtained from the gate circuit 8 is fed back to the control section 7 again, and at the time this signal is obtained, the cycle for incrementing the contents of the address counter 3 is changed. That is, the parameter
The 7th address in each frame of ROM1' is specified in sequence at high speed, and a pause period of approximately 10 msec is provided before the 7th address in the next 1 frame is specified in sequence at high speed. Specify the 7th location of the unit. Depending on this, the synthesis circuit 4 receives the parameter ROM1' for each frame period.
A group of coefficients for each frame are correctly introduced in the order of pitch coefficient P, amplitude coefficient A, and Percoll coefficients K ₁₀ , K ₉ , . . . K ₁ . Then, the synthesis circuit 4 synthesizes a voice waveform based on these coefficient groups, and the synthesized voice is emitted from a desired intermediate point by the amplifier 5 and speaker 6.

又、本発明の音声合成装置の他の実施例とし
て、パラメータROMのフレーム単位の先頭番地
の全てを特別コードに割当て、その第２番地から
の６番地分に係数群を格納するメモリ構成を採用
しても良い。この場合、例えば特別コードとして
第２番地からの６番地分に出現しない８ビツトの
“00000000”を用い、この特別コードがパラメー
タROMから読み出された時に、これを解読する
デコーダに依り合成回路にスタート信号を伝達せ
しめれば、この合成回路は、フレーム単位の第２
番地目から正しく係数群を取り込む事が可能とな
る。 In addition, as another embodiment of the speech synthesis device of the present invention, a memory configuration is adopted in which all the first addresses of each frame of the parameter ROM are assigned to special codes, and coefficient groups are stored in six addresses from the second address. You may do so. In this case, for example, an 8-bit "00000000" that does not appear in the 6th address from the 2nd address is used as a special code, and when this special code is read from the parameter ROM, a decoder that decodes it is sent to the synthesis circuit. If the start signal is transmitted, this synthesis circuit will be able to perform the second
It becomes possible to correctly import the coefficient group starting from the address.

本発明の音声合成装置は以上の説明から明らか
な如く、パラメータメモリの複数の番地に亘つて
格納されている各パラメータ群の先頭番地に、特
別コードを書き込み、読出制御回路に依つて、ス
タート命令があつたすぐ後に、このパラメータメ
モリから特別コードが読み出された時点から、パ
ラメータ群を合成回路へ供給するものであるの
で、合成回路には先行するパラメータ群の後部と
後続するパラメータ群の前部とがパラメータ群と
して導入される恐れなく、パラメータ群単位で正
確に導入される事になる。 As is clear from the above description, the speech synthesis device of the present invention writes a special code to the first address of each parameter group stored in a plurality of addresses in the parameter memory, and issues a start command using the read control circuit. Immediately after the special code is read from this parameter memory, the parameter group is supplied to the synthesis circuit. This means that there is no fear that parts and parts will be introduced as a parameter group, and that they will be accurately introduced in parameter group units.

従つて、長文の合成音声を発声せしめる場合に
於いては、大規模なアドレステーブルを具備する
事なく、パラメータメモリの作成時に同一の特別
コードを書き込んでおくだけで、所望の箇所から
の合成音声を得る事ができ、音声合成装置の小型
化並びにコストダウンが計れる。 Therefore, when producing a long synthesized speech, you can simply write the same special code when creating the parameter memory without having to prepare a large address table, and the synthesized speech can be produced from the desired location. can be obtained, and the size and cost of the speech synthesizer can be reduced.

[Brief explanation of the drawing]

第１図は従来の音声合成装置の構成を示すブロ
ツク図、第２図は従来装置に用いられているパラ
メータROMの内容を示す模式図、第３図は本発
明の音声合成装置の構成を示すブロツク図、第４
図は本発明装置に用いられるパラメータROMの
内容を示す模式図、である。１，１′……パラメータROM、３……アドレ
スカウンタ、４……合成回路、５……アンプ、６
……スピーカ、７……制御部、８……ゲート回
路。 Fig. 1 is a block diagram showing the configuration of a conventional speech synthesis device, Fig. 2 is a schematic diagram showing the contents of a parameter ROM used in the conventional device, and Fig. 3 shows the configuration of the speech synthesis device of the present invention. Block diagram, 4th
The figure is a schematic diagram showing the contents of a parameter ROM used in the device of the present invention. 1, 1'...Parameter ROM, 3...Address counter, 4...Synthesizing circuit, 5...Amplifier, 6
...Speaker, 7...Control unit, 8...Gate circuit.

Claims

[Scope of Claims] 1. A memory in which a group of parameters extracted from a human voice at each of a large number of consecutive frame periods in which human voice can be considered to be in a steady state are stored by assigning addresses, and each parameter group is stored in a plurality of frames. a parameter memory which is stored across addresses and in which a special code marking the beginning of a parameter group is stored at each leading address; a read control circuit for reading out the parameter group by sequentially specifying addresses in the memory; It consists of a synthesis circuit that synthesizes speech using a plurality of parameter groups read out by the readout control circuit, and the readout control circuit reads out a special code after receiving a start command to start speech synthesis. A speech synthesis device characterized in that a parameter group is supplied to the synthesis circuit from the point in time, and the synthesis circuit is operated from this point on.