JP2004226968A

JP2004226968A - Device and method for speech synthesis

Info

Publication number: JP2004226968A
Application number: JP2004008193A
Authority: JP
Inventors: Chao-Wen Chi; チチャオ−ウェン
Original assignee: Winbond Electronics Corp
Current assignee: Winbond Electronics Corp
Priority date: 2003-01-17
Filing date: 2004-01-15
Publication date: 2004-08-12
Also published as: GB2397737B; GB2397737A; TWI226601B; GB0328325D0; DE10356054A1; TW200414125A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device and a method for speech synthesis whose processing capability is increased by solving a problem of contention of instruction timings of a processor. <P>SOLUTION: Provided are a memory, a processor, a register, a latch circuit, a timer means, and an D/A converter; and the processor decodes speech signal data and outputs the decoded speech signal to the register and the latch circuit is triggered with a sampling signal outputted from the timer means to read the decoded speech signal out of the register in cycles of the sampling signal and transfer the decoded speech signal to the D/A converter at fixed time. Consequently, jitters of a synthesized speech signal are suppressed. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は音声合成装置及び方法に関し、特に、ラッチ回路を用い音声信号のジッターを防止する音声合成装置及び方法に関する。 The present invention relates to a voice synthesizing apparatus and method, and more particularly, to a voice synthesizing apparatus and method for preventing jitter of a voice signal using a latch circuit.

情報技術の急速な発展及び通信ネットワークの普及に伴い、ディジタル化音声合成技術は広く応用されている。例えば、電子玩具や移動電話などにおいて、音声の符号化技術は音声の転送によく用いられている。特に、音声合成技術を音声の圧縮に用いることによって、ユーザが合成後の音声効果を明確に視聴でき、娯楽及び相互交流の目的を達成できる。 With the rapid development of information technology and the spread of communication networks, digitized speech synthesis technology has been widely applied. For example, in an electronic toy, a mobile phone, or the like, a voice coding technique is often used for voice transfer. In particular, by using the voice synthesis technology for voice compression, the user can clearly view the voice effect after the synthesis, thereby achieving the purpose of entertainment and mutual exchange.

図１は従来の音声合成システムの構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of a conventional speech synthesis system.

図１に示す音声合成システムは、プロセッサ１００、レジスター１０２、ディジタル・アナログ（Ｄ／Ａ）変換器１０４、及びスピーカ１０６を含む。 The speech synthesis system shown in FIG. 1 includes a processor 100, a register 102, a digital / analog (D / A) converter 104, and a speaker 106.

動作中に、クロック信号１０８はプロセッサ１００とレジスター１０２に入力され、プロセッサ１００はクロック信号１０８の周期で音声信号データについて復号化の計算を行ない、復号された音声信号を生成する（以下、プロセッサが出力した復号された音声信号を復号化音声信号と言う）。レジスター１０２もクロック信号１０８にトリガーされ、プロセッサ１００から出力された復号化音声信号を取り入れ、当該信号は順次Ｄ／Ａ変換器１０４及びスピーカ１０６に転送される。 During operation, the clock signal 108 is input to the processor 100 and the register 102, and the processor 100 performs a decoding calculation on the audio signal data at a cycle of the clock signal 108 to generate a decoded audio signal (hereinafter, the processor 100). The output decoded audio signal is called a decoded audio signal). The register 102 is also triggered by the clock signal 108 and takes in the decoded audio signal output from the processor 100, and the signal is sequentially transferred to the D / A converter 104 and the speaker 106.

図２は、図１に示す音声合成システムの出力のタイミング・チャートである。 FIG. 2 is a timing chart of the output of the speech synthesis system shown in FIG.

図２において、横軸は時間、縦軸は信号の振幅をそれぞれ表し、Ｔ１、Ｔ２、…、Ｔｎは音声信号のサンプリング・サイクル（ｓａｍｐｌｉｎｇｃｙｃｌｅ：ＳＰ）を表し、Ｄ１、Ｄ２、…、Ｄｎは、プロセッサ１００が各サンプリング・サイクル（Ｔ１、Ｔ２、…、Ｔｎ）の範囲内にファームウエアによるサブルーチン計算で得られた復号化音声信号を表す。 In FIG. 2, the horizontal axis represents time, the vertical axis represents signal amplitude, T1, T2,..., And Tn represent sampling cycles (SP) of the audio signal, and D1, D2,. , Tn represents the decoded audio signal obtained by the subroutine calculation by the firmware within the range of each sampling cycle (T1, T2,..., Tn).

理論上、信号Ｄ１とＤ２を例とすれば、プロセッサ１００はサンプリング・サイクルＴ１とＴ２がそれぞれ終了する前に信号Ｄ１とＤ２を順次レジスター１０２に転送すれば、Ｄ／Ａ変換器１０４に信号Ｄ１とＤ２を供給することができる。Ｄ３など他の信号についても同様である。 In theory, taking the signals D1 and D2 as an example, the processor 100 would transfer the signals D1 and D2 to the register 102 sequentially before the end of the sampling cycles T1 and T2, respectively. And D2 can be supplied. The same applies to other signals such as D3.

しかし、実際に、例えば、サンプリング・サイクルＴ２において、プロセッサ１００が音声データについて復号化を行なう以外に、周辺からの割り込み信号Ｉ１を受信することもあり、そのため、プロセッサ１００は命令時間を使ってこのような割り込み信号Ｉ１を処理しなければならない、その結果、プロセッサ１００はサンプリング・サイクルＴ２内に信号Ｄ２を得るための計算を完了できず、次のサンプリング・サイクルＴ３に遅延することになる。即ち、プロセッサ１００はサンプリング・サイクルＴ２の範囲内に音声信号Ｄ２をレジスター１０２に転送することができず、その転送は次のサンプリング・サイクルＴ３に遅延される。 However, in practice, for example, in the sampling cycle T2, the processor 100 may receive an interrupt signal I1 from the surroundings other than decoding the audio data, so that the processor 100 uses the instruction time to execute the decoding. Such an interrupt signal I1 must be processed, so that the processor 100 cannot complete the calculation to obtain the signal D2 within the sampling cycle T2 and will be delayed to the next sampling cycle T3. That is, the processor 100 cannot transfer the audio signal D2 to the register 102 within the sampling cycle T2, and the transfer is delayed to the next sampling cycle T3.

特にマルチ・メディア音声合成システムにおいて、一つのサンプリング・サイクルにおいて、プロセッサ１００は多くの割り込み信号Ｉｎを受信することがある。このような割り込み信号の処理はプロセッサ１００の命令時間を使用するので、プロセッサ１００は所定のサンプリング・サイクル内に復号化音声信号を生成することができない。その結果、Ｄ／Ａ変換器１０４は予定通りにレジスター１０２から音声信号を読み取ることができず、よって、音声信号全体の合成波形に歪みが生じ、音声信号のジッター（ｊｉｔｔｅｒ）現象は起きる。言い換えれば、音声信号のジッターは、プロセッサ１００が音声データを合成する際に、合成音声信号における信号の歪み或いは雑音を伴い、音声合成の品質を低減させる。 Particularly in a multimedia speech synthesis system, in one sampling cycle, the processor 100 may receive many interrupt signals In. Since processing of such an interrupt signal uses the instruction time of the processor 100, the processor 100 cannot generate a decoded audio signal within a given sampling cycle. As a result, the D / A converter 104 cannot read the audio signal from the register 102 as scheduled, so that the synthesized waveform of the entire audio signal is distorted, and a jitter phenomenon of the audio signal occurs. In other words, the jitter of the audio signal is accompanied by signal distortion or noise in the synthesized audio signal when the processor 100 synthesizes the audio data, and reduces the quality of the audio synthesis.

したがって、音声合成システムにおいて、いかに音声信号のジッターを除去し、高品質な音声合成信号を生成するかは、この分野における重要な課題となっている。 Therefore, how to remove jitter of a voice signal and generate a high-quality voice synthesized signal in a voice synthesis system is an important issue in this field.

本発明の一つの目的は、計時手段からのサンプリング信号を用いラッチ回路を制御しレジスターから復号化音声信号を読み取ることにより、プロセッサにおける命令時間の競合の問題を解決し、処理能力を高めた音声合成装置及び方法を提供することにある。 One object of the present invention is to solve the problem of instruction time contention in a processor by controlling a latch circuit using a sampling signal from a timing unit and reading a decoded audio signal from a register, thereby improving audio processing. An object of the present invention is to provide a synthesizing apparatus and method.

また、本発明の他の目的は、複数の計時手段からの非同期の複数のサンプリング信号により複数のラッチ回路をトリガーし、サンプリング信号の周期で復号化音声信号を転送することにより、合成音声信号におけるジッターを防止することができる音声合成装置及び方法を提供することにある。 Further, another object of the present invention is to trigger a plurality of latch circuits by a plurality of asynchronous sampling signals from a plurality of time measuring means, and to transfer a decoded voice signal at a period of the sampling signal, thereby obtaining a synthesized voice signal. It is an object of the present invention to provide a voice synthesizing apparatus and method capable of preventing jitter.

また、本発明の他の目的は、複数の計時手段からの非同期の複数のサンプリング信号によりサンプリング周波数の異なる複数チャンネルの音声信号を生成することにより、復号音声信号を記憶するのに必要な記憶容量を低減し、システムの製造コストを削減することができる音声合成装置及び方法を提供することにある。 Further, another object of the present invention is to generate a plurality of channels of audio signals having different sampling frequencies by using a plurality of asynchronous sampling signals from a plurality of time measuring means, so that a storage capacity required for storing decoded audio signals is obtained. It is an object of the present invention to provide a speech synthesizing apparatus and method that can reduce the cost of the system and reduce the manufacturing cost of the system.

以上の目的を達成するために、本発明の音声合成装置は、音声信号データから合成音声信号を生成する音声合成装置であって、前記音声信号データを記憶するメモリ、クロック信号によりトリガーされて前記メモリから前記音声信号データを読み取り復号化を行い、復号化音声信号を生成するプロセッサ、前記クロック信号によりトリガーされて前記プロセッサから前記復号化音声信号を受け取るレジスター、サンプリング信号出力する計時手段、計時手段から出力されたサンプリング信号により当該サンプリング信号の周期でトリガーされて前記レジスターから前記復号化音声信号を読み取るラッチ回路、及び前記ラッチ回路からの前記復号化音声信号をディジタル信号からアナログ信号へ変換し、前記合成音声信号を生成するディジタル・アナログ変換器を含む。 In order to achieve the above object, a voice synthesizer of the present invention is a voice synthesizer that generates a synthesized voice signal from voice signal data, wherein the memory stores the voice signal data, and is triggered by a clock signal. A processor that reads and decodes the audio signal data from a memory to generate a decoded audio signal; a register that is triggered by the clock signal to receive the decoded audio signal from the processor; a timer that outputs a sampling signal; A latch circuit that is triggered by the sampling signal output by the sampling signal and reads the decoded audio signal from the register, and converts the decoded audio signal from the latch circuit from a digital signal to an analog signal, Digital for generating the synthesized speech signal Including analog converter.

具体的に、一つ又は複数の計時手段を設け、当該計時手段からのサンプリング信号を用いてラッチ回路の動作を制御することによって、ラッチ回路は所定の周期でレジスター復号化音声信号を読み取り、次の回路に転送する。本発明により、サンプリング信号に同期してすべての復号化音声信号を転送することができるので、合成音声信号におけるジッターを除去することができる。 Specifically, by providing one or a plurality of timing means and controlling the operation of the latch circuit using the sampling signal from the timing means, the latch circuit reads the register decoded audio signal at a predetermined cycle, and To the circuit. According to the present invention, all decoded audio signals can be transferred in synchronization with the sampling signal, so that jitter in the synthesized audio signal can be removed.

また、計時手段とプロセッサは独立して動作し、なおラッチ回路はハードウエアとして音声合成装置に設けられているので、プロセッサの動作が影響されない。即ち、ラッチ回路はプロセッサの命令時間を使用しない、よって、ラッチ回路はサンプリング信号の周期で定時的に復号音声信号を取り込んで転送することができる。 In addition, the clock means and the processor operate independently, and the operation of the processor is not affected since the latch circuit is provided as hardware in the speech synthesizer. That is, the latch circuit does not use the instruction time of the processor, so that the latch circuit can fetch and transfer the decoded audio signal periodically at the cycle of the sampling signal.

プロセッサの実行周期の範囲内に、音声信号データの計算のための命令時間が十分にあり、２以上のサンプリング周期に対応する復号音声信号を形成することができる場合は、本発明の音声合成装置に２以上の計時手段を設け、各計時手段のサンプリング信号にそれぞれ同期して復号化音声信号を取り込んで転送することができる。 If the instruction time for calculating the audio signal data is sufficiently long within the execution cycle of the processor and a decoded audio signal corresponding to two or more sampling cycles can be formed, the speech synthesis apparatus according to the present invention is used. Provided with two or more clocking means, and can fetch and transfer the decoded audio signal in synchronization with the sampling signal of each clocking means.

本発明は特に複数の異なるサンプリング周期（非同期サンプリング信号）による多チャンネル音声合成システムに適している。従来の多チャンネル音声合成システムにおいて、プロセッサが復号音声信号の転送を制御するので、プロセッサは一つの実行周期内に１つ又は複数の音声チャンネルの音声信号データの復号化を完了する必要がある。また、システムの動作の簡略化及び動作の安定性のために、音声チャンネルの間において割り込み信号の共用が禁止されている、即ち、第１の音声チャンネルが第１のサンプリング信号を利用して復号化音声信号を転送している間に、第２の音声チャンネルが第２のサンプリング信号を利用し復号化音声信号を転送することを要求しても、第１の音声チャンネルが音声信号の転送を完了したあとでなければ、プロセッサは第２の音声チャンネルの割り込み要求を処理しない。 The present invention is particularly suitable for a multi-channel speech synthesis system using a plurality of different sampling periods (asynchronous sampling signals). In the conventional multi-channel speech synthesis system, since the processor controls the transfer of the decoded speech signal, the processor needs to complete the decoding of the speech signal data of one or more speech channels within one execution cycle. Also, for simplification of system operation and stability of operation, sharing of interrupt signals between audio channels is prohibited, that is, the first audio channel is decoded using the first sampling signal. While transmitting the decoded audio signal, if the second audio channel requests to transmit the decoded audio signal using the second sampling signal, the first audio channel may transmit the audio signal. Only after completion, the processor will not process the interrupt request for the second audio channel.

これに対して、本発明の場合は、２つの計時手段が別々にラッチ回路をトリガーし、ラッチ回路は第１及び第２のサンプリング周期でレジスターから復号音声信号を取り込んで、定期的に転送する、そのため、各音声チャンネルにおいて復号音声信号が遅滞しない。換言すれば、各音声チャンネルの間に、復号音声信号が相互に影響せず、それぞれの計時手段のサンプリング周期で転送の順序を決めるので、多チャンネル音声合成システムにおいて音声信号のジッター現象は大幅に抑えられる。 On the other hand, in the case of the present invention, the two timing means separately trigger the latch circuit, and the latch circuit takes in the decoded audio signal from the register at the first and second sampling periods and transfers it periodically. Therefore, the decoded audio signal is not delayed in each audio channel. In other words, between the audio channels, the decoded audio signals do not affect each other, and the transfer order is determined by the sampling period of each time-measurement means. Can be suppressed.

本発明によれば、計時手段を用いてラッチ回路を制御し、ラッチ回路がレジスターから復号化音声信号を読み取ることにより、プロセッサにおける命令時間の競合の問題を解決し、特にマルチ・メディア音声合成システムにおけるプロセッサの処理能力を高めることができる。 According to the present invention, a latch circuit is controlled by using a timing unit, and the latch circuit reads a decoded audio signal from a register, thereby solving the problem of instruction time contention in a processor. Can increase the processing capability of the processor.

また、複数の計時手段を用いて非同期の複数のサンプリング信号を生成し、ラッチ回路は各サンプリング信号の周期で順次復号化音声信号を転送することにより、合成音声信号におけるジッターを防止することができる。 Also, a plurality of asynchronous sampling signals are generated by using a plurality of timing means, and the latch circuit can prevent the jitter in the synthesized audio signal by sequentially transmitting the decoded audio signal at the cycle of each sampling signal. .

従来の音声合成装置の問題点を解消すべく、計時手段からのサンプリング信号を用いラッチ回路を制御し、ラッチ回路に能動的にレジスターから復号化音声信号を読み取らせることにより、プロセッサにおける命令時間の競合の問題を解決することができる音声合成装置及び方法を提供する。また、複数の計時手段を用い非同期の複数のサンプリング信号を生成し複数のラッチ回路をトリガーし、ラッチ回路が各サンプリング信号の周期で異なるチャンネルの復号化音声信号を順次転送することにより、音声チャンネルの間に合成音声信号におけるジッターを防止することができる音声合成装置及び方法を提供する。 In order to solve the problem of the conventional speech synthesizer, the latch circuit is controlled by using the sampling signal from the timing means, and the latch circuit is made to actively read the decoded speech signal from the register, thereby reducing the instruction time in the processor. Provided is a speech synthesis device and method capable of solving a contention problem. Also, a plurality of asynchronous sampling signals are generated by using a plurality of timing means, a plurality of latch circuits are triggered, and the latch circuits sequentially transfer decoded audio signals of different channels in a cycle of each sampling signal, thereby providing an audio channel. The present invention provides a voice synthesizing apparatus and method capable of preventing a jitter in a synthesized voice signal during the period.

次に、添付した図面を参照しながら、本発明の実施形態を説明する。 Next, embodiments of the present invention will be described with reference to the accompanying drawings.

図３は本発明の音声合成システムの構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the speech synthesis system of the present invention.

図３に示す音声合成システムは、音声信号データから合成音声信号を生成し、合成音声信号におけるジッターを防止する。音声信号データはメモリに記憶されている。 The voice synthesis system shown in FIG. 3 generates a synthesized voice signal from voice signal data and prevents jitter in the synthesized voice signal. The audio signal data is stored in the memory.

当該音声合成システムは、プロセッサ２００、レジスター２０２、ラッチ回路２０４、計時手段２０６、ディジタル・アナログ（Ｄ／Ａ）変換器２０８、メモリ２１０、及びスピーカ２１４を含む。 The speech synthesis system includes a processor 200, a register 202, a latch circuit 204, a timer 206, a digital / analog (D / A) converter 208, a memory 210, and a speaker 214.

プロセッサ２００はメモリ２１０と接続している。また、プロセッサ２００は、クロック信号２１２にトリガーされ、メモリ２１０から音声信号データを読み取り、プロセッサ２００において復号化を行い、復号化音声信号を生成する。 The processor 200 is connected to the memory 210. In addition, the processor 200 is triggered by the clock signal 212 to read audio signal data from the memory 210, perform decoding in the processor 200, and generate a decoded audio signal.

レジスター２０２はプロセッサ２００に接続されている。また、レジスター２０２もクロック信号２１２にトリガーされ、プロセッサ２００から復号化音声信号を受け取る。 Register 202 is connected to processor 200. Register 202 is also triggered by clock signal 212 to receive the decoded audio signal from processor 200.

ラッチ回路２０４はレジスター２０２に接続されている。また、ラッチ回路２０４は計時手段２０６に制御され、レジスター２０２から復号化音声信号を読み取る。 The latch circuit 204 is connected to the register 202. Further, the latch circuit 204 is controlled by the timer 206 to read the decoded audio signal from the register 202.

計時手段２０６はサンプリング信号をラッチ回路２０４に出力する。ラッチ回路２０４は当該サンプリング信号の周期でトリガーされ、能動的にレジスター２０２から復号化音声信号を読み取る。 The timer 206 outputs the sampling signal to the latch circuit 204. The latch circuit 204 is triggered by the period of the sampling signal and actively reads the decoded audio signal from the register 202.

例えば、計時手段２０６は、一つ又は複数の時間を計測するカウンター（タイマー）からなる。 For example, the timer 206 includes a counter (timer) that measures one or a plurality of times.

Ｄ／Ａ変換器２０８はラッチ回路２０４に接続されており、ラッチ回路２０４からの復号化音声信号をディジタル信号からアナログ信号へ変換し、合成音声信号を生成する。Ｄ／Ａ変換器２０８は該合成音声信号をスピーカ２１４に出力する。 The D / A converter 208 is connected to the latch circuit 204, converts the decoded audio signal from the latch circuit 204 from a digital signal to an analog signal, and generates a synthesized audio signal. The D / A converter 208 outputs the synthesized voice signal to the speaker 214.

好適に、ラッチ回路２０４は、複数層のデータ構造を有し、複数層の復号化音声信号を格納する。たとえば、これらデータ構造はＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）回路を含み、ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔの法則で復号化音声信号をＤ／Ａ変換器２０８に転送する。 Preferably, the latch circuit 204 has a data structure of a plurality of layers and stores decoded audio signals of a plurality of layers. For example, these data structures include a FIFO (First In First Out) circuit, and transfer the decoded voice signal to the D / A converter 208 according to the First In First Out rule.

プロセッサ２００は、例えば、６５０２シリーズのマイクロ・コントローラ、又は、シングル・ボード若しくは汎用の中央演算処理装置ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）からなる。 The processor 200 includes, for example, a 6502 series microcontroller or a single board or a general-purpose central processing unit CPU (Central Processing Unit).

メモリ２１０における音声信号データは、例えば、時間ドメインにおいて使われる波形符号化により符号化されたものである、例えば、適応的差分パルス符号変調方式（ＡｄａｐｔｉｖｅＤｉｆｆｅｒｅｎｔｉａｌＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ：ＡＤＰＣＭ）、又は、差分パルス符号変調方式（ＤｉｆｆｅｒｅｎｔｉａｌＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ：ＤＰＣＭ）などがある。 The audio signal data in the memory 210 is, for example, encoded by waveform encoding used in the time domain, for example, an adaptive differential pulse code modulation (ADPCM) or a differential pulse. There is a code modulation method (Differential Pulse Code Modulation: DPCM) and the like.

ＡＤＰＣＭは、ディジタル・サンプリング符号化（ＤｉｇｉｔａｌＳａｍｐｌｉｎｇＥｎｃｏｄｉｎｇ）技術を用い、アナログ音声信号をディジタル信号に変換する。ＡＤＰＣＭは、音声が連続的に変化することを利用して、隣接するサンプルの差を記録するので、同じ音声を記憶するに必要とする容量は、他の符号化方式（例えば、ＰＣＭ）より少ない。 ADPCM converts an analog audio signal into a digital signal by using digital sampling encoding (Digital Sampling Encoding) technology. Since ADPCM records the difference between adjacent samples using the fact that speech continuously changes, the capacity required to store the same speech is smaller than that of other encoding schemes (eg, PCM). .

具体的に、ラッチ回路２０４に対して一つ又は複数の計時手段２０６を設け、計時手段２０６はサンプリング信号を生成し、ラッチ回路の読み書き動作を制御し、ラッチ回路は所定の周期でレジスター２０２から復号化音声信号を読み取り、スピーカ２１４に出力する。プロセッサが復号音声信号の転送を制御する従来の方式と比較して、本発明は、プロセッサ２００の使用可能な命令時間を大幅に節約することができる。 Specifically, one or a plurality of timekeeping means 206 is provided for the latch circuit 204, the timekeeping means 206 generates a sampling signal, controls the read / write operation of the latch circuit, and the latch circuit outputs a signal from the register 202 at a predetermined cycle. The decoded audio signal is read and output to the speaker 214. Compared with the conventional method in which the processor controls the transfer of the decoded audio signal, the present invention can greatly reduce the available instruction time of the processor 200.

本発明において、サンプリング信号に同期してすべての復号化音声信号を転送することができるので、合成音声信号におけるジッターを完全に除去することができる。 In the present invention, all decoded audio signals can be transferred in synchronization with the sampling signal, so that jitter in the synthesized audio signal can be completely removed.

次に、ラッチ回路２０４に対して一つ又は複数の計時手段２０６が設けられた場合の動作を説明する。複数の計時手段２０６が設けられた場合に、各計時手段２０６を一つの音声チャンネルとして定義する。言い換えれば、複数の音声チャンネルは、複数の計時手段２０６、複数のラッチ回路２０４、複数のレジスター２０２（又はＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、及び複数のファームウエアに対応する。 Next, an operation in the case where one or a plurality of timing units 206 are provided for the latch circuit 204 will be described. When a plurality of timekeeping means 206 are provided, each timekeeping means 206 is defined as one audio channel. In other words, the plurality of audio channels correspond to the plurality of time counting means 206, the plurality of latch circuits 204, the plurality of registers 202 (or RAM: Random Access Memory), and the plurality of firmware.

図４は、図３に示す音声合成システムにおいて、ラッチ回路２０４に対して１つの計時手段２０６を設けられる場合のタイミング・チャートを示す。 FIG. 4 is a timing chart in the case where one clock unit 206 is provided for the latch circuit 204 in the speech synthesis system shown in FIG.

図４において、横軸は時間、縦軸は信号の振幅をそれぞれ表し、ＳＣはプロセッサ２００の入力信号を、ＴＣは入力信号ＳＣの実行周期をそれぞれ表す。Ｄ１は、プロセッサ２００の一つの実行周期内に算出した復号化音声信号を表す。ＳＬは、計時手段２０６のサンプリング信号を、ＴＬはサンプリング信号ＳＬの周期をそれぞれ表す。 4, the horizontal axis represents time, the vertical axis represents signal amplitude, SC represents an input signal of the processor 200, and TC represents an execution cycle of the input signal SC. D1 represents the decoded audio signal calculated within one execution cycle of the processor 200. SL represents the sampling signal of the time counting means 206, and TL represents the period of the sampling signal SL.

動作中に、計時手段２０６はサンプリング信号ＳＬを用いてラッチ回路２０４をトリガーし、よって、ラッチ回路２０４はレジスター２０２から復号化音声信号Ｄ１を読み取り、なお、所定の時間Ｐ１に復号化音声信号Ｄ１をＤ／Ａ変換器２０８に転送する。Ｄ／Ａ変換器２０８は復号化音声信号Ｄ１を合成音声信号に変換し、スピーカ２１４に出力する。 In operation, the timing means 206 triggers the latch circuit 204 using the sampling signal SL, so that the latch circuit 204 reads the decoded audio signal D1 from the register 202, and furthermore, at a predetermined time P1, the decoded audio signal D1 To the D / A converter 208. The D / A converter 208 converts the decoded audio signal D1 into a synthesized audio signal and outputs the synthesized audio signal to the speaker 214.

同じように、ラッチ回路２０４は順次レジスター２０２から復号化音声信号Ｄ２、…、Ｄｎを読み取り、所定の時間Ｐ１、…、ＰｎにＤ／Ａ変換器２０８に転送する。 Similarly, the latch circuit 204 sequentially reads the decoded audio signals D2,..., Dn from the register 202 and transfers them to the D / A converter 208 at predetermined times P1,.

本発明において、ラッチ回路２０４は、計時手段２０６からのサンプリング信号ＳＬを用いてレジスター２０２から信号を読み取り、なお、計時手段２０６とプロセッサ２００は独立して動作するので、ラッチ回路２０４はプロセッサ２００の動作を影響しない、即ち、ラッチ回路２０４はプロセッサ２００の命令時間を使用しない。 In the present invention, the latch circuit 204 reads a signal from the register 202 by using the sampling signal SL from the timer 206, and the timer 206 and the processor 200 operate independently. It has no effect on operation, ie, the latch circuit 204 does not use the instruction time of the processor 200.

具体的に、プロセッサ２００は、ラッチ回路２０４の所定のサンプリング周期の前に、当該サンプリング周期に対応する音声信号データの復号化を完了し、復号化音声信号をレジスター２０２に記憶すれば、ラッチ回路２０４はサンプリング信号の周期で順次復号化音声信号を読み込んで、定時的に復号化音声信号をＤ／Ａ変換器２０８に転送する。これによって、単一チャンネルの音声合成システムにおいて復号化信号の転送の遅延による音声信号のジッターを完全に抑えられる。 Specifically, the processor 200 completes the decoding of the audio signal data corresponding to the sampling cycle before the predetermined sampling cycle of the latch circuit 204, and stores the decoded audio signal in the register 202. Reference numeral 204 sequentially reads the decoded audio signal at the period of the sampling signal, and transfers the decoded audio signal to the D / A converter 208 on a regular basis. This makes it possible to completely suppress the jitter of the audio signal due to the delay in the transfer of the decoded signal in the single-channel audio synthesis system.

図５は、図３に示す音声合成システムにおいて、ラッチ回路２０４に対して複数の計時手段２０６を設けられる場合のタイミング・チャートを示す。 FIG. 5 shows a timing chart in the case where a plurality of time measuring means 206 are provided for the latch circuit 204 in the speech synthesis system shown in FIG.

図５は基本的に図４と同様であり、ただし、図４は一つの計時手段２０６の場合を示すのに対して、図５は複数の計時手段２０６の場合を示している。説明の便宜上、２つの計時手段２０６を有するとし、それぞれＴ１とＴ２とする。 FIG. 5 is basically the same as FIG. 4 except that FIG. 4 shows the case of one clock means 206, whereas FIG. 5 shows the case of a plurality of clock means 206. For convenience of explanation, it is assumed that there are two timekeeping means 206, which are T1 and T2, respectively.

図５において、横軸は時間、縦軸は信号の振幅をそれぞれ表し、ＳＣはプロセッサ２００の入力信号を、ＴＣは入力信号ＳＣの実行周期をそれぞれ表す。Ｄ１１、Ｄ２１は、プロセッサ２００の一つの実行周期内に算出した２つの復号化音声信号を表す。ＳＬ１は、第１の計時手段Ｔ１のサンプリング信号、ＴＬ１はサンプリング信号ＳＬ１の周期をそれぞれ表す。ＳＬ２は、第２の計時手段Ｔ２のサンプリング信号、ＴＬ２はサンプリング信号ＳＬ２の周期をそれぞれ表す。 5, the horizontal axis represents time, the vertical axis represents signal amplitude, SC represents an input signal of the processor 200, and TC represents an execution cycle of the input signal SC. D11 and D21 represent two decoded audio signals calculated within one execution cycle of the processor 200. SL1 represents the sampling signal of the first clocking means T1, and TL1 represents the period of the sampling signal SL1. SL2 represents the sampling signal of the second clocking means T2, and TL2 represents the period of the sampling signal SL2.

動作中に、第１の計時手段Ｔ１と第２の計時手段Ｔ２はそれぞれサンプリング信号ＳＬ１とサンプリング信号ＳＬ２によりラッチ回路２０４をトリガーし、これによって、ラッチ回路２０４はレジスター２０２から復号化音声信号Ｄ１１とＤ２１を読み取り、なお、所定の時間Ｐ１１、Ｐ２１に復号化音声信号Ｄ１１、Ｄ２１をＤ／Ａ変換器２０８に転送する。Ｄ／Ａ変換器２０８は復号化音声信号Ｄ１１、Ｄ２１を合成音声信号に変換し、スピーカ２１４に出力する。同じように、ラッチ回路２０４は第１の計時手段Ｔ１と第２の計時手段Ｔ２に順次トリガーされ、レジスター２０２から順次復号化音声信号（Ｄ１１、Ｄ２１）、（Ｄ１２、Ｄ１３、Ｄ２２）、…、（Ｄ１ｍ、Ｄ２ｎ）を読み取り、所定の時間（Ｐ１１、Ｐ２１）、（Ｐ１２、Ｐ１３、Ｐ２２）、…、（Ｐ１ｍ、Ｐ２ｎ）にＤ／Ａ変換器２０８に転送する。 In operation, the first timing means T1 and the second timing means T2 trigger the latch circuit 204 with the sampling signal SL1 and the sampling signal SL2, respectively, whereby the latch circuit 204 outputs the decoded audio signal D11 from the register 202 and D21 is read, and the decoded audio signals D11 and D21 are transferred to the D / A converter 208 at predetermined times P11 and P21. The D / A converter 208 converts the decoded audio signals D11 and D21 into a synthesized audio signal and outputs the synthesized audio signal to the speaker 214. Similarly, the latch circuit 204 is sequentially triggered by the first timer T1 and the second timer T2, and sequentially decodes the decoded audio signals (D11, D21), (D12, D13, D22),. (D1m, D2n) is read and transferred to the D / A converter 208 at predetermined times (P11, P21), (P12, P13, P22),..., (P1m, P2n).

本発明において、第１の計時手段Ｔ１と第２の計時手段Ｔ２とプロセッサ２００は独立して動作し、また、ラッチ回路２０４はハードウエアとして音声合成システムに設置されているので、ラッチ回路２０４はプロセッサ２００の動作を影響しない、即ち、ラッチ回路２０４はプロセッサ２００の命令時間を使用しない。従って、ラッチ回路２０４は、第１のサンプリング周期と第２のサンプリング周期で定時的に復号化音声信号Ｄ１１とＤ２１を読み取り、所定の時間Ｐ１１、Ｐ２１にＤ／Ａ変換器２０８に転送することができる。これによって、合成音声信号におけるジッターは完全に抑えられる。 In the present invention, the first clock unit T1, the second clock unit T2, and the processor 200 operate independently, and the latch circuit 204 is installed as hardware in the speech synthesis system. It does not affect the operation of the processor 200, that is, the latch circuit 204 does not use the instruction time of the processor 200. Therefore, the latch circuit 204 can read the decoded audio signals D11 and D21 periodically at the first sampling period and the second sampling period, and transfer them to the D / A converter 208 at predetermined times P11 and P21. it can. As a result, jitter in the synthesized voice signal is completely suppressed.

言い換えれば、プロセッサ２００は、実行周期ＴＣの範囲内に、音声信号データの計算のための命令時間が十分にあり、２以上のサンプリング周期に対応する復号音声信号を形成することができる場合は、本発明の音声合成装置におけるラッチ回路２０４に対して２以上の計時手段２０６を設け、ラッチ回路２０４は各計時手段のサンプリング信号にそれぞれ同期して復号化音声信号を読み込んで転送することができる。 In other words, the processor 200 determines that if the instruction time for calculating the audio signal data is sufficient and the decoded audio signal corresponding to two or more sampling periods can be formed within the execution period TC, Two or more clocking means 206 are provided for the latch circuit 204 in the voice synthesizing apparatus of the present invention, and the latch circuit 204 can read and transfer the decoded voice signal in synchronization with the sampling signal of each clocking means.

本発明は特に複数の異なるサンプリング周期（非同期サンプリング信号）による多チャンネル音声合成システムに適している。従来の多チャンネル音声合成システムにおいて、プロセッサが復号音声信号の転送を制御するので、プロセッサは一つの実行周期内に１つ又は複数の音声チャンネルの音声信号データの復号化を完了する必要がある。また、システムの動作の簡略化及び動作の安定性のために、音声チャンネルの間において割り込み信号の共用が禁止されている、即ち、第１の音声チャンネルが第１のサンプリング信号を利用して復号化音声信号を転送している間に、第２の音声チャンネルが第２のサンプリング信号を利用し復号化音声信号を転送することを要求しても、第１の音声チャンネルが音声信号の転送を完了したあとでなければ、プロセッサは第２の音声チャンネルの割り込み要求を処理しない。そのため、第２の音声チャンネルにおける復号化音声信号が第１の音声チャンネルに制限され、出力された合成音声信号にジッターが生じる。 The present invention is particularly suitable for a multi-channel speech synthesis system using a plurality of different sampling periods (asynchronous sampling signals). In the conventional multi-channel speech synthesis system, since the processor controls the transfer of the decoded speech signal, the processor needs to complete the decoding of the speech signal data of one or more speech channels within one execution cycle. Also, for simplification of system operation and stability of operation, sharing of interrupt signals between audio channels is prohibited, that is, the first audio channel is decoded using the first sampling signal. If the second audio channel uses the second sampling signal to request the transmission of the decoded audio signal while the encoded audio signal is being transmitted, the first audio channel can transmit the audio signal. Only after completion, the processor will not process the interrupt request for the second audio channel. For this reason, the decoded audio signal in the second audio channel is limited to the first audio channel, and jitter is generated in the output synthesized audio signal.

これに対して、本発明の場合は、ラッチ回路２０４とプロセッサ２００は独立しており、また、複数の計時手段２０６が別々にラッチ回路２０４をトリガーし、ラッチ回路は能動的に第１及び第２のサンプリング周期でレジスター２０２から復号音声信号を読み込んで、定時的に転送する、そのため、各音声チャンネルにおいて復号音声信号が遅滞しない。即ち、各音声チャンネルの間に、復号音声信号が相互に影響せず、それぞれの計時手段のサンプリング周期で転送の順序を決めるので、多チャンネル音声合成システムにおいて合成音声信号のジッター現象は大幅に抑えられる。 On the other hand, in the case of the present invention, the latch circuit 204 and the processor 200 are independent, and the plurality of time counting means 206 separately trigger the latch circuit 204, and the latch circuit is actively activated by the first and second latch circuits. The decoded audio signal is read from the register 202 at the sampling period of 2 and is transmitted on a regular basis. Therefore, the decoded audio signal is not delayed in each audio channel. That is, since the decoded audio signals do not affect each other between the audio channels and the transfer order is determined by the sampling period of the respective time-measuring means, the jitter phenomenon of the synthesized audio signal in the multi-channel audio synthesis system is largely suppressed. Can be

図６は、本発明の音声合成システムの動作を示すフローチャートである。 FIG. 6 is a flowchart showing the operation of the speech synthesis system of the present invention.

ステップＳ６００において、クロック信号２１２によりプロセッサ２００をトリガーし、メモリ２１０から音声信号データを読み取る。 In step S600, the processor 200 is triggered by the clock signal 212 to read audio signal data from the memory 210.

ステップＳ６０２において、プロセッサ２００は音声信号データについて復号化を行い、復号化音声信号を生成する。 At step S602, the processor 200 decodes the audio signal data to generate a decoded audio signal.

ステップＳ６０４において、クロック信号２１２によりレジスター２０２をトリガーし、レジスター２０２はプロセッサ２００から復号化音声信号を受け取る。 In step S604, register 202 is triggered by clock signal 212, and register 202 receives the decoded audio signal from processor 200.

ステップＳ６０６において、複数の計時手段２０６から出力された複数のサンプリング信号により当該サンプリング信号の各々の周期でラッチ回路２０４をトリガーし、ラッチ回路２０４は、当該複数のサンプリング信号の各々に対応して、レジスター２０２から複数チャンネルの復号化音声信号を読み取る。一つのサンプリング信号は、一つの音声チャンネルの合成音声信号に対応する。ラッチ回路２０４は各サンプリング信号に基づいて、定時的にレジスター２０２から復号音声信号を読み込んで、定時的に転送する。これによって、合成音声信号のジッター現象は抑えられる。 In step S606, the latch circuit 204 is triggered at each cycle of the sampling signal by the plurality of sampling signals output from the plurality of timing units 206, and the latch circuit 204 corresponds to each of the plurality of sampling signals. A plurality of channels of decoded audio signals are read from the register 202. One sampling signal corresponds to a synthesized audio signal of one audio channel. The latch circuit 204 reads the decoded audio signal from the register 202 on a regular basis based on each sampling signal and transfers the decoded audio signal on a regular basis. Thereby, the jitter phenomenon of the synthesized voice signal is suppressed.

ステップＳ６０８において、当該複数チャンネルの復号化音声信号をディジタル信号からアナログ信号へ変換し、複数チャンネルの合成音声信号を生成する。 In step S608, the multi-channel decoded audio signal is converted from a digital signal to an analog signal to generate a multi-channel synthesized audio signal.

ステップＳ６１０において、アナログ合成音声信号をスピーカに出力する。 In step S610, an analog synthesized voice signal is output to a speaker.

以上のように、本発明の音声合成装置は、ラッチ回路でレジスターを制御し、計時手段でラッチ回路を制御し、ラッチ回路が能動的にレジスターから復号化音声信号を読み取ることができる。これによって、プロセッサにおける命令時間の競合の問題を解決し、音声合成システムにおけるプロセッサの処理能力を高めることができる。 As described above, in the speech synthesizer of the present invention, the register is controlled by the latch circuit, the latch circuit is controlled by the timing unit, and the latch circuit can actively read the decoded speech signal from the register. As a result, the problem of instruction time contention in the processor can be solved, and the processing capability of the processor in the speech synthesis system can be increased.

また、本発明の音声合成装置は、複数の計時手段を用いて非同期の複数のサンプリング信号を生成し、ラッチ回路は各サンプリング信号の周期で順次復号化音声信号を転送することにより、合成音声信号におけるジッターを防止することができる。 Also, the speech synthesizer of the present invention generates a plurality of asynchronous sampling signals using a plurality of time measuring means, and the latch circuit sequentially transfers the decoded speech signal at a cycle of each sampling signal, thereby obtaining a synthesized speech signal. Can be prevented.

また、本発明の音声合成装置は、複数の計時手段からの非同期の複数のサンプリング信号によりサンプリング周波数の異なる複数チャンネルの音声信号を生成することにより、復号音声信号を記憶するのに必要な記憶容量を低減し、システムの製造コストを削減することができる。 In addition, the speech synthesizer of the present invention generates a plurality of channels of audio signals having different sampling frequencies by using a plurality of asynchronous sampling signals from a plurality of time measuring means, so that a storage capacity required for storing a decoded audio signal is obtained. And the manufacturing cost of the system can be reduced.

以上、本発明の好ましい実施形態を説明したが、本発明はこの実施形態に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は本発明の範囲に属する。 Although the preferred embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and any modifications to the present invention fall within the scope of the present invention unless departing from the spirit of the present invention.

従来の音声合成システムの構成を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration of a conventional speech synthesis system. 図１に示す音声合成システムの出力のタイミング・チャートである。2 is a timing chart of an output of the speech synthesis system shown in FIG. 1. 本発明の音声合成システムの構成を示すブロック図である。1 is a block diagram illustrating a configuration of a speech synthesis system according to the present invention. 図３に示す音声合成システムにおいて、単一の計時手段を用いる場合のタイミング・チャートを示す。4 shows a timing chart in the case where a single timing unit is used in the speech synthesis system shown in FIG. 3. 図３に示す音声合成システムにおいて、複数の計時手段を用いる場合のタイミング・チャートを示す。4 shows a timing chart when a plurality of timing units are used in the speech synthesis system shown in FIG. 3. 本発明の音声合成システムの動作を示すフローチャートである。5 is a flowchart showing the operation of the speech synthesis system of the present invention.

Explanation of reference numerals

１００、２００プロセッサ
１０２、２０２レジスター
２０４ラッチ回路
２０６計時手段
１０４、２０８Ｄ／Ａ変換器
２１０メモリ
１０８、２１２クロック信号
１０６、２１４スピーカ 100, 200 Processor 102, 202 Register 204 Latch circuit 206 Timing means 104, 208 D / A converter 210 Memory 108, 212 Clock signal 106, 214 Speaker

Claims

A speech synthesizer for generating a synthesized speech signal from speech signal data, comprising:
A memory for storing the audio signal data;
A processor triggered by a clock signal to read and decode the audio signal data from the memory to generate a decoded audio signal;
A register triggered by the clock signal and receiving the decoded audio signal from the processor;
Timing means for outputting a sampling signal;
A latch circuit that is triggered by the sampling signal output from the timing means and reads the decoded audio signal from the register; and converts the decoded audio signal from the latch circuit from a digital signal to an analog signal, and A digital-to-analog converter for generating an audio signal;
A speech synthesizer including:

The latch circuit has a multi-layer data structure for storing the decoded audio signal,
At least one of the data structures includes a FIFO circuit;
The speech synthesizer according to claim 1.

A speech synthesis method for generating a multi-channel synthesized speech signal from speech signal data stored in a memory, comprising:
Triggering a processor with a clock signal to read the audio signal data from the memory;
The processor decoding the audio signal data to generate a decoded audio signal;
Triggering a register with the clock signal and receiving the decoded audio signal from the processor;
A plurality of sampling signals output from a plurality of timing means triggers a latch circuit at each cycle of the sampling signal, and the plurality of channels of the decoded sound are output from the register in response to each of the plurality of sampling signals. Reading the signal; and converting the decoded audio signal of the plurality of channels from a digital signal to an analog signal to generate the synthesized audio signal of the plurality of channels;
A speech synthesis method that includes

The latch circuit has a multi-layer data structure for storing the decoded audio signal,
At least one of the data structures includes a FIFO circuit;
The speech synthesis method according to claim 3.