JPH0310960B2

JPH0310960B2 -

Info

Publication number: JPH0310960B2
Application number: JP5258781A
Authority: JP
Inventors: Yutaka Yasui; Shinichiro Obara; Fumitada Itakura; Shigeki Sagayama; Noboru Kanmura
Original assignee: Fujitsu Ltd; Nippon Telegraph and Telephone Corp
Current assignee: Fujitsu Ltd; Nippon Telegraph and Telephone Corp
Priority date: 1981-04-08
Filing date: 1981-04-08
Publication date: 1991-02-14
Also published as: JPS57167098A

Description

【発明の詳細な説明】本発明は、音声データを合成して音声信号を出
力する音声合成装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voice synthesis device that synthesizes voice data and outputs a voice signal.

音声合成方式としては、音声の特徴を利用して
パラメータ化したPARCOR（partial autocorre
−lation）（偏自己相関）方式、LPC
（linearpredictive coding）（線形予測符号化）方
式、LSP（line spectrum pair）（線スペクトル
対）方式等が知られている。これらの方式は、音
声波形をPCM等に符号化した波形符号化方式に
比較してデータ圧縮度が大きいものである。 As a speech synthesis method, PARCOR (partial autocorre
−lation) (partial autocorrelation) method, LPC
(linear predictive coding) method, LSP (line spectrum pair) method, etc. are known. These methods have a higher degree of data compression than waveform encoding methods in which audio waveforms are encoded into PCM or the like.

音声合成は、例えばフレーム周期毎に１組の音
声データ（例えば６バイト）を所定の手順で変換
し、直線補間等によりサンプル周期毎の補間出力
をデイジタルフイルタの係数として加え、音源部
からのパルス列又はホワイトノイズをデイジタル
フイルタに加えて音声合成出力を得るものであ
る。通常は或るフレームの処理の間に次のフレー
ムの音声データを読込むものであり、このとき音
声データが不足したり、又は全く読込むことがで
きなかつた場合は、異常なデータを基にして音声
合成が行なわれるので、音声出力が異常なものと
なる。 Speech synthesis involves, for example, converting a set of audio data (for example, 6 bytes) for each frame period according to a predetermined procedure, adding the interpolated output for each sample period using linear interpolation etc. as coefficients of a digital filter, and converting the pulse train from the sound source. Alternatively, white noise is added to a digital filter to obtain a speech synthesis output. Normally, the audio data of the next frame is read while processing a certain frame, and if there is insufficient audio data or cannot be read at all, abnormal data is used as the basis. Since voice synthesis is performed using the same method, the voice output will be abnormal.

本発明は、前述の如き音声データの異常を検出
して、異常音声出力を防止することを目的とする
ものである。以下実施例について詳細に説明す
る。 An object of the present invention is to detect abnormalities in audio data as described above and prevent abnormal audio output. Examples will be described in detail below.

第１図はマイクロプロセツサの制御により音声
合成を行なうシステムのブロツク線図であり、１
はマイクロプロセツサ、２は音声データを格納し
たメモリ、３は音声合成回路、４はタイミング信
号発生等の為の原発振用の水晶発振子、５はDA
変換器、６はローパスフイルタ、７は増幅器、
８，１０はスピーカ、９はトランスである。 Figure 1 is a block diagram of a system that performs speech synthesis under the control of a microprocessor.
is a microprocessor, 2 is a memory that stores audio data, 3 is a speech synthesis circuit, 4 is a crystal oscillator for primary oscillation for timing signal generation, etc., and 5 is a DA.
Converter, 6 is a low-pass filter, 7 is an amplifier,
8 and 10 are speakers, and 9 is a transformer.

マイクロプロセツサ１からのアドレス信号によ
りメモリ２から音声データが読出されて音声合成
回路３に加えられ、マイクロプロセツサ１からの
制御情報により音声合成回路３の音声合成の開
始、停止、フレーム周期の選択等が制御され、音
声合成回路３の状態情報がマイクロプロセツサ１
に送出される。音声合成回路３の音声出力がデイ
ジタル直列出力の場合、DA変換器５によりアナ
ログ音声信号に変換され、ローパスフイルタ６を
介して増幅器７に加えられ、増幅出力によりスピ
ーカ８が駆動される。又音声合成回路３には簡易
型のDA変換器が内蔵されているので、トランス
９を介してスピーカ１０を駆動することもでき
る。この場合の音声品質は、簡易型のDA変換器
によりアナログ音声信号に変換されるので、多少
低いものとなる。 Voice data is read from the memory 2 according to the address signal from the microprocessor 1 and applied to the voice synthesis circuit 3, and control information from the microprocessor 1 controls the start and stop of voice synthesis in the voice synthesis circuit 3, and the frame period. The selection etc. are controlled, and the state information of the speech synthesis circuit 3 is sent to the microprocessor 1.
will be sent to. When the audio output of the audio synthesis circuit 3 is a digital serial output, it is converted into an analog audio signal by the DA converter 5, and is applied to the amplifier 7 via the low-pass filter 6, and the speaker 8 is driven by the amplified output. Furthermore, since the speech synthesis circuit 3 has a built-in simple DA converter, it is also possible to drive the speaker 10 via the transformer 9. The audio quality in this case is somewhat low because it is converted into an analog audio signal by a simple DA converter.

第２図は音声合成回路のブロツク線図であり、
１１は音声データＤ０〜Ｄ７を信号によりセ
ツトして音声データの変換時まで保持しておくレ
ジスタ群からなるデータバツフアスタツク、１２
はインタフエース部、１３は起動・停止信号ST、
補間動作を停止し同一データで繰返し合成動作を
指定する繰返し信号RPT、フレーム周期設定信
号T₀，T₁、可変・固定フレーム長モード指定信
号MODE、音声データ内情報により音声合成停
止を指定するストツプビツトイネーブル信号
SBE等の制御信号を信号によりセツトする
コントロールレジスタ、１４は状態信号として信
号，，を信号SEにより出力する
ステータスレジスタ、１５は音声変換部、１６は
音声データをフイルタ係数に変換する読取専用メ
モリ（ROM）等からなる変換部、１７はフレー
ム周期毎に加えられるフイルタ係数をサンプル周
期毎に補間して出力する補間部、１８は音源部、
１９はデイジタルフイルタ部、２０は簡易型の
DA変換器、２１はタイミング信号等を発生し、
各部を制御する制御部であり、集積回路化した場
合のものである。各端子に示す符号は信号を表わ
し、V_DD，V_SS，V_SAは電源電圧、SOUTは例えば
16ビツト直列のデイジタル音声出力信号、
POUT，NOUTは例えば８ビツトの簡易型DA変
換器２０でアナログ信号に変換されたアナログ音
声出力信号、XOUTは内部論理演算用の基本ク
ロツク、WSYNはSOUTの同期信号、MRは内
部フリツプフロツプ、カウンタ類のリセツト信
号、CKSTは水晶発振回路の制御信号、CLK及
びXTALは水晶振動子接続端子、FPはフレーム
パルスである。 Figure 2 is a block diagram of the speech synthesis circuit.
Reference numeral 11 denotes a data buffer stack consisting of a group of registers in which audio data D0 to D7 are set by signals and held until the audio data is converted;
is the interface section, 13 is the start/stop signal ST,
Repetition signal RPT that stops interpolation operation and specifies repeated synthesis operation using the same data; frame period setting signals T ₀ and T ₁ ; variable/fixed frame length mode specification signal MODE; Top bit enable signal
A control register is used to set control signals such as SBE, etc.; 14 is a status register that outputs a signal as a status signal by a signal SE; 15 is an audio converter; 17 is an interpolation unit that interpolates and outputs the filter coefficients added at each frame period at each sample period; 18 is a sound source unit;
19 is a digital filter section, 20 is a simple type
A DA converter 21 generates timing signals etc.
This is a control unit that controls each part, and is implemented as an integrated circuit. The symbol shown on each terminal represents the signal, V _DD , V _SS , V _SA are the power supply voltages, and SOUT is, for example,
16-bit serial digital audio output signal,
For example, POUT and NOUT are analog audio output signals converted to analog signals by an 8-bit simple DA converter 20, XOUT is a basic clock for internal logic operations, WSYN is a synchronization signal for SOUT, and MR is an internal flip-flop, counter, etc. CKST is the control signal for the crystal oscillation circuit, CLK and XTAL are the crystal resonator connection terminals, and FP is the frame pulse.

LSP方式の場合、音声データとしては、例えば
１ビツトのスタートビツトと７ビツトのピツチ周
期データ、２ビツトのフレーム長指定ビツトと６
ビツトの振幅データ、４ビツトづつのLSPパラメ
ータからなる６バイトを１組としたもので、フレ
ーム長は例えば５，10，20，40mSの指定が可能
となつている。 In the case of the LSP method, the audio data includes, for example, a 1-bit start bit, 7-bit pitch cycle data, 2-bit frame length designation bits, and 6-bit start bit.
A set of 6 bytes consists of bit amplitude data and LSP parameters of 4 bits each, and the frame length can be specified as, for example, 5, 10, 20, or 40 mS.

このような音声データがデータバツフアスタツ
ク１１にメモリから加えられ、信号（データ
ロードパルス）によりセツトされ変換部１６によ
りLSPパラメータと振幅データとを用いてフイル
タ係数に変換され、補間部１７に加えられる。補
間部１７に於いては、サンプル周期毎に直線補間
してデイジタルフイルタ部１９にフイルタ係数を
加える。又音源部１８は音声データ中のピツチ周
期データに従つたパルス列を発生してデイジタル
フイルタ部１９に加える。 Such audio data is added to the data buffer stack 11 from the memory, set by a signal (data load pulse), converted into filter coefficients by the conversion section 16 using the LSP parameters and amplitude data, and then sent to the interpolation section 17. Added. The interpolation section 17 performs linear interpolation for each sample period and adds filter coefficients to the digital filter section 19. Further, the sound source section 18 generates a pulse train according to the pitch period data in the audio data and applies it to the digital filter section 19.

デイジタルフイルタ部１９は、加減算回路及び
乗算回路等を含むもので、音源部１８からのパル
ス列と補間部１７からのフイルタ係数との演算に
より音声合成フイルタとして動作し、例えば16ビ
ツト直列のデイジタル音声信号を出力する。 The digital filter unit 19 includes an addition/subtraction circuit, a multiplication circuit, etc., and operates as a voice synthesis filter by calculating the pulse train from the sound source unit 18 and the filter coefficients from the interpolation unit 17, and converts, for example, a 16-bit serial digital audio signal. Output.

第３図は本発明の実施例の要部ブロツク線図で
あり、３１，３２，３３，３９，４３はインヒビ
ツトゲート、４０はアンドゲート、４６，４７は
オアゲート、３５はデータ不足検出、タイミング
調整及び動作再開制御を行なう検出及び制御回路
３６はピツチ周期データを保持するレジスタ、３
７は音源発生回路、３８，４５は加算回路、４１
は差分値レジスタ、４４は補間レジスタ、４２は
１／２ⁿ回路である。 FIG. 3 is a block diagram of the main parts of the embodiment of the present invention, 31, 32, 33, 39, 43 are inhibit gates, 40 is an AND gate, 46, 47 are OR gates, 35 is a data shortage detection, timing A detection and control circuit 36 that performs adjustment and operation restart control includes a register 3 that holds pitch cycle data.
7 is a sound source generation circuit, 38 and 45 are addition circuits, 41
is a difference value register, 44 is an interpolation register, and 42 is a 1/2 ⁿ circuit.

メモリから読出された音声データＤ０〜Ｄ７は
前述の如くデータロードパルスDLによりデータ
バツフアスタツク１１にセツトされる。通常は検
出及び制御回路３５の各出力は“０”であるの
で、インヒビツトゲート３１，３２，３３，４３
は開かれており、変換用読出タイミングパルスｔ
１によつてデータバツフアスタツク１１から音声
データが読出されて、振幅データとLSPパラメー
タとが変換部１６に、又ピツチ周期データが音源
部１８のレジスタ３６に加えられ、ピツチ周期デ
ータ更新タイミングパルスｔ２によりレジスタ３
６の内容が更新される。音源発生回路３７はレジ
スタ３６の内容のピツチ周期データに従つたパル
ス列を出力し、デイジタルフイルタ部１９へ加え
る。 The audio data D0 to D7 read from the memory are set in the data buffer stack 11 by the data load pulse DL as described above. Normally, each output of the detection and control circuit 35 is "0", so the inhibit gates 31, 32, 33, 43
is open, and the conversion read timing pulse t
1, the audio data is read from the data buffer stack 11, the amplitude data and LSP parameters are added to the converter 16, and the pitch period data is added to the register 36 of the sound source section 18, and the pitch period data update timing is set. Register 3 by pulse t2
The contents of 6 are updated. The sound source generating circuit 37 outputs a pulse train according to the pitch cycle data of the register 36 and applies it to the digital filter section 19.

変換部１６の変換出力は補間部１７の加算回路
３８に加えられ、前フレームのフイルタ係数との
差分が出力されて差分値計算タイミングパルスｔ
３により差分値レジスタ４１にセツトされ、この
内容は1/2ⁿ回路４２によりサンプル周期に分割さ
れる。従つてフレーム周期毎に変換部１６から入
力される係数と、前のフレームの係数との差分が
加算回路３８によつて求められ、この差分が１フ
レーム内のサンプル数で1/2ⁿ回路で除算されて、
１サンプル毎の値が求められ、この値を補間値レ
ジスタ４４の内容に加算回路４５で加算して、１
サンプル毎のフイルタ係数が出力されることにな
る。 The conversion output of the conversion unit 16 is applied to the addition circuit 38 of the interpolation unit 17, and the difference with the filter coefficient of the previous frame is output, and the difference value calculation timing pulse t is output.
3 is set in the difference value register 41, and its contents are divided into sample periods by the 1/2 ⁿ circuit 42. Therefore, the difference between the coefficients input from the converter 16 and the coefficients of the previous frame is calculated for each frame period by the adder circuit 38, and this difference is calculated by 1/2 ⁿ circuits based on the number of samples in one frame. divided,
A value for each sample is obtained, and this value is added to the contents of the interpolated value register 44 in an adder circuit 45 to obtain 1.
The filter coefficients for each sample will be output.

検出及び制御回路３５はデータロードパルス
DLをカウントし、１フレーム周期毎に例えば６
バイトの音声データを読込んだか否かをタイミン
グパルスｔ４を基に識別し、若しカウント値が不
足していれば、音声データが不足していることに
なるので、検出及び制御回路３５の各出力を
“１”とする。従つてインヒビツトゲート３１，
３２，３３，４３が閉じられ、データバツフアス
タツク１１の読出し、音源部１８のレジスタ３６
の更新、補間動作が停止される。それによつて不
足データによる音声合成は行なわれず、その代わ
りに、前フレームの音声データによる同一音が出
力されることになる。 The detection and control circuit 35 is a data load pulse
Count the DL, e.g. 6 per frame period.
It is determined based on the timing pulse t4 whether or not a byte of audio data has been read. If the count value is insufficient, it means that the audio data is insufficient, so each of the detection and control circuits 35 Set the output to “1”. Therefore, inhibit gate 31,
32, 33, and 43 are closed, the data buffer stack 11 is read, and the register 36 of the sound source section 18 is closed.
update and interpolation operations are stopped. As a result, voice synthesis using the missing data is not performed, and instead, the same sound based on the voice data of the previous frame is output.

第４図は検出及び制御回路の要部ブロツク線図
であり、CNTは３ビツトバイナリカウンタ、FF
１，FF２はポジテイブエツジトリガーのフリツ
プフロツプ、Ｇ１〜Ｇ５はゲート回路、はデ
ータロードパルス、SRはシステムリセツト信号、
Ｍ１，Ｍ２は音声合成部が動作中“１”となる動
作中信号、DTは検出タイミング信号、SPはサン
プルクロツク、CLKは基本クロツク、は要
求信号、はアラーム信号、RSは第３図の各
インヒビツト回路３１，３２，３３，４３に入力
される出力信号、RTは要求信号をアクテイ
ブとするタイミング信号である。 Figure 4 is a block diagram of the main parts of the detection and control circuit, CNT is a 3-bit binary counter, FF
1, FF2 are positive edge trigger flip-flops, G1 to G5 are gate circuits, data load pulse is, SR is system reset signal,
M1 and M2 are operating signals that become "1" when the speech synthesis section is in operation, DT is a detection timing signal, SP is a sample clock, CLK is a basic clock, is a request signal, is an alarm signal, and RS is a signal shown in Figure 3. The output signal RT input to each inhibit circuit 31, 32, 33, 43 is a timing signal that activates the request signal.

動作中信号Ｍ１は通常“１”であるのでシステ
ムリセツト信号SRが“０”になることにより、
カウンタCNT、フリツプフロツプFF１、タイミ
ング回路Ｔはリセツトされ、フリツプフロツプ
FF２はセツトされる。それによつて要求信号
REQは“０”、アラーム信号は“１”とな
る。これは合成動作停止中の状態で、DT，SP，
CLKなどのタイミングパルスは発生されないの
でフリツプフロツプFF２、タイミング回路Ｔは
動作しない。 Since the operating signal M1 is normally "1", when the system reset signal SR becomes "0",
The counter CNT, flip-flop FF1, and timing circuit T are reset, and the flip-flop
FF2 is set. Thereby the request signal
REQ becomes "0" and the alarm signal becomes "1". This is a state where the synthesis operation is stopped, and DT, SP,
Since timing pulses such as CLK are not generated, flip-flop FF2 and timing circuit T do not operate.

合成動作中は前記DT，SP，CLKなどのタイ
ミングパルスが発生し、次のように動作する。 During the synthesis operation, timing pulses such as DT, SP, and CLK are generated, and the operation is as follows.

データロードパルスがカウンタCNTにより
カウントされ５バイトの音声データを読込んだと
き、カウント内容は“101”となり、６バイト目
のデータロードパルスによりゲート回路Ｇ１
の出力が“０”となり、カウンタCNTは“000”
にセツトされ、又フリツプフロツプFF２はリセ
ツトされる。又フリツプフロツプFF１はゲート
回路Ｇ２の出力が“１”となることによりセツト
され要求信号は“１”となり、音声データ
の要求が終了する。音声合成動作中は、動作中信
号Ｍ１は“１”であり、各種タイミング信号も発
生されるので、所定の時間後のタイミング信号
RTによりフリツプフロツプFF１がリセツトさ
れ、要求信号は“０”となり次の音声デー
タの要求が行なわれる。また、カウンタCNTは
音声データ要求が出ているときのみカウントする
ようにイネーブル信号ENを入力する。 When the data load pulse is counted by the counter CNT and 5 bytes of audio data are read, the count becomes "101", and the 6th byte data load pulse causes the gate circuit G1 to
output becomes “0” and counter CNT becomes “000”
and flip-flop FF2 is reset. Further, the flip-flop FF1 is set when the output of the gate circuit G2 becomes "1", the request signal becomes "1", and the request for audio data is completed. During speech synthesis operation, the operating signal M1 is "1" and various timing signals are also generated, so the timing signal after a predetermined time
The flip-flop FF1 is reset by RT, and the request signal becomes "0" and the next audio data is requested. Further, an enable signal EN is inputted to the counter CNT so that the counter CNT counts only when an audio data request is issued.

検出タイミング信号DTが“１”となつたとき
に、フリツプフロツプFF１がセツト状態である
と、フリツプフロツプFF２がセツトされ、アラ
ーム信号が“０”となり、異常状態を示す
ものとなる。即ちデータロードパルスのカウ
ント値が不足していると、フリツプフロツプFF
１はリセツトされたままのため要求信号が
“０”となり、データを要求しているにも拘らず、
音声データが受信されず不足していることを示す
ものとなる。 If the flip-flop FF1 is in the set state when the detection timing signal DT becomes "1", the flip-flop FF2 is set and the alarm signal becomes "0", indicating an abnormal state. In other words, if the count value of the data load pulse is insufficient, the flip-flop FF
1 remains reset, so the request signal becomes “0” and even though data is requested,
This indicates that audio data has not been received and is insufficient.

フリツプフロツプFF２がセツトされることに
よりタイミング回路Ｔでは適当なタイミングで
“１”の出力信号RSを出力し前述のインヒビツト
ゲート３１〜３３，４３が閉じられて、変換動作
と補間動作とが停止され、前フレームの音声デー
タによる音声の合成が行なわれて、異常音声の合
成が防止される。 By setting the flip-flop FF2, the timing circuit T outputs an output signal RS of "1" at an appropriate timing, the aforementioned inhibit gates 31 to 33, and 43 are closed, and the conversion operation and interpolation operation are stopped. , the audio is synthesized using the audio data of the previous frame, thereby preventing the synthesis of abnormal audio.

その後カウンタCNTのカウント値が所定値に
なると、ゲート回路Ｇ１の出力が“０”となつて
フリツプフロツプFF２がリセツトされ、それに
よつてアラーム信号が“１”に復旧して、
サンプルクロツクSPによつてタイミング回路Ｔ
がセツトされ再び出力信号RSも“０”となるか
ら、新しくフレーム周期が設定されるとともに前
述のインヒビツトゲート３１〜３３，４３が開い
て音声合成動作が再開される。 After that, when the count value of the counter CNT reaches a predetermined value, the output of the gate circuit G1 becomes "0" and the flip-flop FF2 is reset, thereby the alarm signal is restored to "1".
Timing circuit T by sample clock SP
is set and the output signal RS becomes "0" again, so a new frame period is set and the aforementioned inhibit gates 31-33, 43 are opened to restart the speech synthesis operation.

第５図は動作説明図であり、フレームパルス
FPの後に要求信号が“０”となり、前述の
如くデータロードパルスによつて音声データ
が読込まれるものであるが、次のフレームパルス
FPまでに所定のバイト数の音声データが読込ま
れない場合に、前述の如くカウンタCNTのカウ
ント値が所定値にならないので、アラーム信号
ALMが“０”となる。それによつて点線で示す
フレームパルスFPは出力されないものである。 Figure 5 is an explanatory diagram of the operation, and the frame pulse
After the FP, the request signal becomes "0" and the audio data is read by the data load pulse as described above, but the next frame pulse
If the predetermined number of bytes of audio data are not read by FP, the count value of counter CNT will not reach the predetermined value as described above, and an alarm signal will be sent.
ALM becomes “0”. As a result, the frame pulse FP indicated by the dotted line is not output.

又合成動作Ｍの期間Ｔ１に於いては、音声デー
タ＃ｍの変換及び＃ｍ−１と＃ｍのデータについ
ての差分計算が行なわれ、次の期間Ｔ２に於いて
は、この差分値を用いた補間動作とフイルタ動作
とが行なわれる。アラーム信号が“０”と
なつた期間Ｔ３に於いては、その直前の補間値で
フイルタ動作が行なわれ、変換、差分計算、補間
動作は行なわれないものとなる。そのときの時間
はサンプル周期を単位とした任意の時間でフイル
タ動作を行なうことができる。 Also, in period T1 of synthesis operation M, conversion of audio data #m and difference calculation between data #m-1 and #m are performed, and in the next period T2, this difference value is used. Interpolation and filter operations are performed. During the period T3 in which the alarm signal becomes "0", a filter operation is performed using the immediately preceding interpolation value, and no conversion, difference calculation, or interpolation operation is performed. The filter operation can be performed at any time in units of sample periods.

音声データDATAの＃ｍ＋１が読込まれると、
前述の如くカウンタCNTのカウント値が所定値
となり、アラーム信号は“１”となり、又
要求信号も“１”となる。次のフレームパ
ルスFPの後で要求信号が“０”となつて音
声データDATの＃ｍ＋２が読込まれる。そして
合成動作Ｍの期間Ｔ４に於いては、＃ｍ−１から
＃ｍのデータの補間の最終ステツプ、＃ｍ＋１の
データの変換及び＃ｍと＃ｍ＋１のデータについ
ての差分計算およびフイルタ動作が行なわれる。
そして次の期間Ｔ５に於いては、＃ｍから＃ｍ＋
１のデータの補間及びフイルタ動作が行なわれ、
以後通常の合成動作となる。 When audio data DATA #m+1 is read,
As described above, the count value of the counter CNT becomes a predetermined value, the alarm signal becomes "1", and the request signal also becomes "1". After the next frame pulse FP, the request signal becomes "0" and audio data DAT #m+2 is read. In the period T4 of the synthesis operation M, the final step of interpolating the data #m-1 to #m, the conversion of the data #m+1, and the calculation of the difference between the data #m and #m+1 and the filter operation are performed. It will be done.
Then, in the next period T5, from #m to #m+
1 data interpolation and filter operation are performed,
From then on, normal compositing operation will occur.

前述の如く音声データが不足している場合に変
換動作を停止させて前フレームの音声データによ
る合成音声を出力するものであるから、異常な音
声が合成されるようなことがなくなる。又意図的
に音声データを不足状態にすることにより、音声
データが揃うまで新たな合成が停止されるから、
音声合成の速度が遅くなり、従つて任意のフレー
ム周期による音声合成を実現することも可能であ
る。 As mentioned above, when the audio data is insufficient, the conversion operation is stopped and synthesized audio based on the audio data of the previous frame is output, so that abnormal audio will not be synthesized. Also, by intentionally leaving the audio data in a state of shortage, new synthesis will be stopped until the audio data is complete.
The speed of speech synthesis becomes slow, and therefore it is also possible to realize speech synthesis with an arbitrary frame period.

又前述の実施例はLSP方式を主として説明した
ものであるが、PARCOR方式やLPC方式等にも
適用し得るものであり、又音声データとしては、
比較的長い単語或は文の分析結果、又は母音Ｖと
子音Ｃとの組合せのVCV方式等に用いる音片の
分析結果を用いることもできる。 In addition, although the above embodiment mainly describes the LSP method, it can also be applied to the PARCOR method, LPC method, etc., and as audio data,
It is also possible to use the analysis results of relatively long words or sentences, or the analysis results of speech pieces used in the VCV method of combinations of vowels V and consonants C, etc.

以上説明したように、本発明は、指定されたフ
レーム周期等の特定の周期毎に一定のバイト数の
音声データを読込んで音声合成を行なう装置に於
いて、音声データの変換時点で音声データの不足
をカウンタCNT等により検出する検出手段と、
音声データの不足を検出したとき、データバツフ
アスタツク１１からの読出しを停止して変換動作
を停止させる手段と、音声データの変換を停止し
たとき、その直前の音声データの補間結果を保持
する保持手段と、音声データの不足が検出された
後、音声データが揃つたとき、変換動作を再開さ
せる手段とを備え、不足データによる音声合成を
停止して異常音声の出力を防止し、その間は直前
の補間結果に基いて音声合成を行ない、完全に合
成音声の出力を停止する場合に比較して自然音声
に近づけることができる。又このような機能を利
用して任意の速度で音声合成を行なうようにする
ことも可能であり、応用範囲を拡大することがで
きる。 As explained above, the present invention provides an apparatus that performs speech synthesis by reading a certain number of bytes of audio data every specific period such as a designated frame period. a detection means for detecting a shortage using a counter CNT or the like;
Means for stopping the conversion operation by stopping reading from the data buffer stack 11 when a shortage of audio data is detected, and holding the interpolation result of the immediately previous audio data when the conversion of audio data is stopped. and a means for restarting the conversion operation when the audio data is collected after the shortage of audio data is detected, and stops the audio synthesis using the missing data to prevent abnormal audio from being output. Speech synthesis is performed based on the immediately previous interpolation result, and the result can be made closer to natural speech than when the output of synthesized speech is completely stopped. Moreover, it is also possible to perform speech synthesis at an arbitrary speed by using such a function, and the range of applications can be expanded.

[Brief explanation of the drawing]

第１図はマイクロプロセツサの制御により音声
合成を行なうシステムのブロツク線図、第２図は
本発明の実施例の音声合成回路のブロツク線図、
第３図は本発明の実施例の要部ブロツク線図、第
４図は本発明の実施例の検出及び制御回路の要部
ブロツク線図、第５図は動作説明図である。１はマイクロプロセツサ、２はメモリ、３は音
声合成回路、１１はデータバツフアスタツク、１
２はインタフエース部、１３はコントロールレジ
スタ、１４はステータレジスタ、１５は音声合成
部、１６は変換部、１７は補間部、１８は音源
部、１９はデイジタルフイルタ部、２０はDA変
換器、２１は制御部、３５は検出及び制御回路、
３６はレジスタ、３７は音源発生回路、４１は差
分値レジスタ、４４は補間値レジスタ、CNTは
カウンタである。 FIG. 1 is a block diagram of a system that performs speech synthesis under the control of a microprocessor, and FIG. 2 is a block diagram of a speech synthesis circuit according to an embodiment of the present invention.
FIG. 3 is a block diagram of a main part of an embodiment of the present invention, FIG. 4 is a block diagram of a main part of a detection and control circuit according to an embodiment of the present invention, and FIG. 5 is an operation explanatory diagram. 1 is a microprocessor, 2 is a memory, 3 is a speech synthesis circuit, 11 is a data buffer stack, 1
2 is an interface section, 13 is a control register, 14 is a stator register, 15 is a speech synthesis section, 16 is a conversion section, 17 is an interpolation section, 18 is a sound source section, 19 is a digital filter section, 20 is a DA converter, 21 is a control unit, 35 is a detection and control circuit,
36 is a register, 37 is a sound source generation circuit, 41 is a difference value register, 44 is an interpolation value register, and CNT is a counter.

Claims

[Claims] 1. In a speech synthesis device that reads speech data and synthesizes speech by converting, interpolating, and calculating the speech data, detection detects a shortage of speech data at the time of converting the speech data. means, stopping means for stopping the conversion of the audio data when a shortage of audio data is detected; holding means for holding the interpolation result of the immediately preceding audio data when the conversion of the audio data is stopped; and audio data. and restarting means for restarting the audio data conversion operation when the audio data is collected after detecting a shortage of the audio data, and when the audio data shortage is detected by the detection means, the audio data conversion operation is restarted by the stopping means. A speech synthesis device characterized in that the speech synthesis device performs speech synthesis using the interpolation and holding result of the immediately preceding speech data held by the holding means.