JPS63199399A

JPS63199399A - Voice synthesizer

Info

Publication number: JPS63199399A
Application number: JP62031581A
Authority: JP
Inventors: 桜井　穆; 純一田村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1987-02-16
Filing date: 1987-02-16
Publication date: 1988-08-17

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［産業上の利用分針］本発明は音声合成装置に関し、特に音声合成する特徴パ
ラメータを間引きし又は重複使用することにより発声速
度を変える音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Minute Hand] The present invention relates to a speech synthesis device, and particularly to a speech synthesis device that changes the speech rate by thinning out or duplicating feature parameters for speech synthesis.

［従来の技術］音声信号は一定時間内でみるとほぼ定常的である。従来
は、この点に着目し、音声信号を一定時間長毎に分析し
、分析結果に基づいて各区間を一組の特徴パラメータで
表現せしめ、予めこれらを記憶し、音声合成の際は、こ
れらの特徴パラメータを一定時間長毎に取り出し、順次
に合成する方法が知られている。この方法は、合成操作
が極めて簡単であり、音質劣化が少ないので実用に価す
る。具体的には、−組の特徴パラメータは一定時間長の
音声に対応する。従って、特徴パラメータの組を適当に
間引きし又は重複使用することにより合成音声の持続時
間を増減できる。そして、従来は、この方法で発声速度
を変えることが試みられていた。しかし、破裂性子音（
ｋ、ｔ、ｐ。[Prior Art] An audio signal is almost stationary within a certain period of time. Conventionally, focusing on this point, the audio signal is analyzed at fixed time intervals, each section is expressed by a set of characteristic parameters based on the analysis results, these are stored in advance, and these are used during speech synthesis. A method is known in which feature parameters are extracted at regular intervals and sequentially synthesized. This method is of practical use because the synthesis operation is extremely simple and there is little deterioration in sound quality. Specifically, the − set of feature parameters corresponds to audio having a certain length of time. Therefore, the duration of the synthesized speech can be increased or decreased by appropriately thinning out or duplicating the set of feature parameters. Conventionally, attempts have been made to change the speaking speed using this method. However, the plosive consonant (
k, t, p.

ｂ、ｄ、ｇ、ｒ等）の持続時間は短いので、たかだか１
組か２組の特徴パラメータが合成されるのみである。従
って、従来方法では、間引いたり重複使用する特徴パラ
メータの組がたまたま破裂性子音に該当する場合には音
声の明瞭度を著しく損なっていた。b, d, g, r, etc.) are short, so at most 1
Only one or two sets of feature parameters are combined. Therefore, in the conventional method, if a set of feature parameters that are thinned out or used repeatedly happens to correspond to a plosive consonant, the intelligibility of speech is significantly impaired.

［発明が解決しようとする問題点］本発明は上述の従来技術の欠点を除去するものであり、
その目的とする所は、発声速度を変えても合成音声の明
瞭度を損なわない音声合成装置を提供することにある。[Problems to be Solved by the Invention] The present invention eliminates the drawbacks of the prior art described above,
The purpose is to provide a speech synthesizer that does not impair the clarity of synthesized speech even when the speech rate is changed.

　　　　　・［問題点を解決するための手段コ本発明の音声合成装置は上記の目的を達成するために、
所定時間長の音声に対応する特徴パラメータと少なくと
も前記特徴パラメータ毎に対応させた発声速度制御の可
否情報を記憶する記憶手段と、音声合成の際に、前記可
否情報の内容が速度制御可である特徴パラメータのみを
対象としてその特徴パラメータを間引きし又は重複使用
する速度制御手段を備えることをその概要とする。- [Means for solving the problems] In order to achieve the above object, the speech synthesis device of the present invention has the following features:
a storage means for storing feature parameters corresponding to a voice of a predetermined length of time and at least information on whether speech rate control is possible in correspondence with each of the feature parameters, and the content of the enable/disable information allows speed control during speech synthesis; The outline of the present invention is to include a speed control means for thinning out or duplicating feature parameters only for feature parameters.

また本発明の音声合成装置は上記の目的を達成するため
に、所定時間長の音声に対応する特徴パラメータと少な
くとも前記特徴パラメータ毎に対応させた発声速度制御
可否の多値情報を記憶する記憶手段と、発声速度に応じ
て閾値な設定する閾値設定手段と、音声合成の際に、前
記多値情報の内容が前記閾値より小さい特徴パラメータ
のみを対象としてその特徴パラメータを間引きし又は重
複使用する速度制御手段を備えることをその概要とする
。Further, in order to achieve the above object, the speech synthesis device of the present invention has a storage means for storing feature parameters corresponding to speech of a predetermined length of time and multi-valued information regarding whether or not speech rate control is possible, which is associated with at least each of the feature parameters. a threshold value setting means for setting a threshold value according to a speaking speed; and a speed for thinning out or duplicating feature parameters only for feature parameters whose multi-valued information is smaller than the threshold value during speech synthesis. The outline of the system is to include a control means.

また好ましくは、記憶手段は破裂性子音の破裂時点を示
す特徴パラメータに対応して最大の多値情報を記憶し、
続く特徴パラメータに対応して減少するような多値情報
を記憶することをその〜態様とする。Preferably, the storage means stores maximum multivalued information corresponding to a feature parameter indicating the point of rupture of the plosive consonant,
Its aspect is to store multivalued information that decreases in accordance with subsequent feature parameters.

また好ましくは、速度制御手段は多値情報が所定の符号
を有するときは無条件でその特徴パラメータの重複使用
を行なわないことをその一態様とする。Preferably, one aspect of the speed control means is that when the multivalued information has a predetermined sign, the characteristic parameter is not reused unconditionally.

また好ましくは、閾値設定手段は発声速度が標準速度よ
り速いか又は遅くなるほど高い閾値を設定することをそ
の一態様とする。Preferably, one aspect of the threshold setting means is to set a higher threshold as the speaking speed is faster or slower than the standard speed.

［作用〕かかる構成において、記憶手段は所定時間長の音声に対
応する特徴パラメータと少なくとも前記特徴パラメータ
毎に対応させた発声速度制御の可否情報（例えば２値情
報）を記憶する。速度制御手段は、音声合成の際に、前
記可否情報の内容が速度制御可である特徴パラメータの
みを対象としてその特徴パラメータを間引きし又は重複
使用する。[Operation] In this configuration, the storage means stores characteristic parameters corresponding to a voice of a predetermined length of time and at least information (for example, binary information) on whether speech rate control is possible or not associated with each of the characteristic parameters. During speech synthesis, the speed control means thins out or uses duplicate feature parameters only for those feature parameters whose content of the availability information is speed controllable.

またかかる構成において、記憶手段は所定時間長の音声
に対応する特徴パラメータと少なくとも前記特徴パラメ
ータ毎に対応させた発声速度制御可否の多値情報を記憶
する。好ましくは、記憶手段は破裂性子音の破裂時点を
示す特徴パラメータに対応して最大の多値情報を記憶し
、続く特徴パラメータに対応して減少するような多値情
報を記憶する。閾値設定手段は発声速度（例えば外部か
らの発声速度指令）に応じて閾値を設定する。Further, in such a configuration, the storage means stores feature parameters corresponding to a voice of a predetermined length of time and multi-valued information regarding whether or not speech rate control is possible, which is associated with at least each of the feature parameters. Preferably, the storage means stores maximum multi-value information corresponding to a feature parameter indicating the point at which the plosive consonant ruptures, and stores decreasing multi-value information corresponding to subsequent feature parameters. The threshold value setting means sets the threshold value according to the speech rate (for example, a speech rate command from the outside).

好ましくは、閾値設定手段は発声速度が標準速度より速
いか又は遅くなるほど高い閾値を設定する。速度制御手
段は、音声合成の際に、前記多値情報の内容が前記閾値
より小さい特徴パラメータのみを対象としてその特徴パ
ラメータを間引きし又は重複使用する。好ましくは、速
度制御手段は多値情報が所定の符号を有するときは無条
件でその特徴パラメータの重複使用を行なわない。Preferably, the threshold setting means sets a higher threshold as the speaking speed is faster or slower than the standard speed. The speed control means thins out or uses duplicate feature parameters only for feature parameters whose multi-valued information is smaller than the threshold value during speech synthesis. Preferably, when the multivalued information has a predetermined sign, the speed control means unconditionally does not use the characteristic parameter repeatedly.

［実施例の説明］以下添付図面に従って本発明の実施例を詳細に説明する
。[Description of Embodiments] Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

［第１実施例］第１図は本発明による第１実施例の音声合成装置のブロ
ック構成図である。図において、１は入力端子であり、
図示せぬホスト側から送られる発声指令及び発声速度指
令等を入力する。２は中央演算装置（ＣＰＵ）であり、
入力した発声指令及び発声速度指令に従って合成音声の
発声及び速度制御を行なう。２ＡはＣＰＵ２が実行する
制御プログラムを記憶してるメモリ（ＲＯＭ）であり、
例えば第６図に示す第１実施例の制御プログラム又は第
１０図に示す第２実施例の制御プログラムを記憶してい
る。更に、３は速度制御の可否情報と供に音声の特徴パ
ラメータの組を収納している第１記憶装置、４はＣＰＵ
２が使用する補助記憶装置、５はＰＡＲＣＯＲ型音声合
成器、６はＤ／Ａ変換器、７は増幅器、８は音声出力用
のスピーカである。[First Embodiment] FIG. 1 is a block diagram of a speech synthesis apparatus according to a first embodiment of the present invention. In the figure, 1 is an input terminal,
A voice command, a voice speed command, etc. sent from a host side (not shown) are input. 2 is a central processing unit (CPU),
The synthesized voice is uttered and its speed is controlled according to the input utterance command and utterance speed command. 2A is a memory (ROM) that stores the control program executed by the CPU 2;
For example, the control program of the first embodiment shown in FIG. 6 or the control program of the second embodiment shown in FIG. 10 is stored. Furthermore, 3 is a first storage device that stores a set of voice characteristic parameters along with speed control enable/disable information, and 4 is a CPU.
2 is an auxiliary storage device used, 5 is a PARCOR type voice synthesizer, 6 is a D/A converter, 7 is an amplifier, and 8 is a speaker for audio output.

第２図（Ａ）〜（Ｃ）は同一男性の発声した「ミタイ」
の一部「タイ」の音声波形を示す図に係り、第２図（Ａ
）は丁寧に発声した場合の音声波形、第２図（Ｂ）は約
１．５倍の速さで発声した場合の音声波形、第２図（ｃ
）は約２倍の速さで発声した場合の音声波形を示しいて
いる。Figures 2 (A) to (C) are “Mitai” uttered by the same man.
Figure 2 (A
) is the audio waveform when speaking carefully, Figure 2 (B) is the audio waveform when speaking at approximately 1.5 times the speed, and Figure 2 (c) is the audio waveform when speaking at approximately 1.5 times the speed.
) shows the audio waveform when speaking at approximately twice the speed.

第３図（Ａ）〜（Ｃ）は第２図（Ａ）〜（Ｃ）の各音声
波形の一部を時間軸方向に同一倍率で拡大した図に係り
、音声「夕」の開始部分を示している。音声波形の下の
目盛は１目盛が長さ１０ミリ秒のフレームであり、各フ
レームはその区間の音声波形を一組の特徴パラメータで
表現する。例えば、第３図（Ａ）のフレーム（ａ＋）は
子音「ｔ」の破裂時点の特徴を示している。これを、第
３図（Ｂ）のフレーム（ｂ＋　）又は第３図（Ｃ）のフ
レーム（Ｃ１）と比較すれば解るように、子音「ｔ」の
破裂時点の特徴は発声速度が変化しても殆ど変っていな
い。従って、逆にＲ声速度を変化させる場合は、もし破
裂時点の特徴フレームに対して間引きや重複使用が行わ
れると、特徴が著しく変化し、音声の明瞭度を損なう。Figures 3 (A) to (C) are diagrams in which a part of each voice waveform in Figures 2 (A) to (C) is enlarged at the same magnification in the time axis direction, and shows the beginning of the voice "Yu". It shows. On the scale below the audio waveform, one division is a frame with a length of 10 milliseconds, and each frame expresses the audio waveform in that section using a set of characteristic parameters. For example, frame (a+) in FIG. 3A shows the characteristics of the rupture point of the consonant "t". As can be seen by comparing this with frame (b+) in Figure 3 (B) or frame (C1) in Figure 3 (C), the characteristic of the rupture of the consonant "t" is that the utterance rate changes. has not changed much. Therefore, when changing the R voice speed, if the characteristic frames at the point of burst are thinned out or used repeatedly, the characteristics will change significantly and the clarity of the voice will be impaired.

この点は、他の破裂性子音（ｋ、ｐ、ｄ、ｇ、ｒ等）の
場合も同じである。そこで、第１実施例では音声波形を
フレーム単位で分析した特徴パラメータの形で格納する
際に、速度制御の可否情報をフレーム毎に付加し、例え
ば破裂性子音の破裂時点のフレームのように、間引きや
重複使用の対象とすべきでないフレームに対しては可否
情報の内容を“否°′とする。This point also applies to other plosive consonants (k, p, d, g, r, etc.). Therefore, in the first embodiment, when storing the speech waveform in the form of feature parameters analyzed frame by frame, information on whether or not speed control is possible is added to each frame. For frames that should not be thinned out or duplicated, the content of the permission information is set to “no”.

第４図は第１実施例の可否情報及び特徴パラメータの組
の構造を示す図である。男声「ミタイ」を各１０ミリ秒
のフレームで分析すると、特徴パラメータの組の総フレ
ーム数は１〜ＮのＮ個である。そして各フレームにおけ
る特徴パラメータの組は夫々ピッチＰＪ　　（ｉはフレ
ーム番号）、アンプＡ、及びＰＡＲＣＯＲ係数に、から
成る。FIG. 4 is a diagram showing the structure of a set of availability information and feature parameters in the first embodiment. When the male voice "Mitai" is analyzed using frames of 10 milliseconds each, the total number of frames of the feature parameter set is N (1 to N). The set of characteristic parameters for each frame consists of pitch PJ (i is the frame number), amplifier A, and PARCOR coefficient.

またフレーム毎に速度制御の可否情報ｅを付しである。Also, information e on whether or not speed control is possible is attached to each frame.

可否情報ｅの内容が“０′°のときは速度制御（間引、
重複使用）可であり、”　１　”のときは速度制御不可
である。When the content of availability information e is “0′°, speed control (thinning,
Duplicate use) is possible, and when it is "1", speed control is not possible.

第５図（Ａ）は第１実施例における速度指令Ｖとフレー
ムの間引き又は重複使用の周期ｍとの関係を示す図であ
る。図において、速度指令■の内容は標準速度のときに
０゛°とする。この場合、ＣＰＵ２は第４図の特徴パラ
メータの組を全部そのまま出力する。標準より速い速度
指令Ｖは正の整数゛１〜４°′で表わず。この場合、Ｃ
ＰＵ２は演算ｍ＝６−１ｖｌを実行して周期ｍを求め、
かつ速度指令Ｖの符号は正であるから周期ｍ＠に間引き
可否の制御を行う。即ち、ｍフレーム毎に速度制御の可
否情報ｅの内容を調べ、もしｅの内容が′０”（可）で
あればそのフレームの特徴パラメータの組のＰＡＲＣＯ
Ｒ型音声合成器５への転送を間引く。標準より遅い速度
指令Ｖは負の整数°゛−１〜−４°゛で表わす。この場
合、ＣＰＵ２は演算ｍ＝６−ＩｖＩを実行して周期ｍを
求め、かつ速度指令Ｖの符号は負であるから周期ｍ毎に
重複使用可否の制御を行う。即ち、ｍフレーム毎に速度
制御の可否情報ｅの内容を調べ、もしｅの内容が°’ｏ
”（可）であればそのフレームの特徴パラメータの組を
重複使用してＰＡＲＣＯＲ型音声合成器５に転送する。FIG. 5(A) is a diagram showing the relationship between the speed command V and the frame thinning or duplication cycle m in the first embodiment. In the figure, the content of the speed command (■) is 0° when the speed is standard. In this case, the CPU 2 outputs all the characteristic parameter sets shown in FIG. 4 as they are. The speed command V, which is faster than the standard, is not expressed as a positive integer from 1 to 4 degrees. In this case, C
PU2 executes the calculation m=6-1vl to find the period m,
In addition, since the sign of the speed command V is positive, control is performed to determine whether thinning is possible or not at the period m@. That is, the content of the speed control availability information e is checked every m frames, and if the content of e is '0' (possible), the PARCO of the feature parameter set of that frame is
Transfer to the R-type speech synthesizer 5 is thinned out. A speed command V that is slower than the standard is expressed as a negative integer from -1 to -4. In this case, the CPU 2 executes the calculation m=6-IvI to obtain the period m, and since the sign of the speed command V is negative, it controls whether or not duplicate use is possible for each period m. That is, the content of the speed control availability information e is checked every m frames, and if the content of e is
``(Acceptable), the set of feature parameters of that frame is used redundantly and transferred to the PARCOR type speech synthesizer 5.

第６図は第１実施例の速度制御手順を示すフローヂャー
トである。第１図の入力端子１から発声指令及び速度指
令Ｖが入力されると第６図の処理を開始する。第６図に
おいて、変数ｊはフレームの計数値（フレーム番号）を
示しており、１〜Ｎの値をとる。変数（周期カウンタ）
ｎは間引き又は重複使用をするための周期ｍを計数して
おり、０〜ｍ−１の値をとる。フラグｆは１周期内での
間引き又は重複使用の処理の完了状態を示しており、１
周期の開始時点では周期カウンタｎと共に′０′′にリ
セットされ、間引き又は重複使用を行うと１′′にセッ
トされる。またフラグｆは、重複処理の際は、同一の特
徴パラメータを２回使用する指標として１次的に数値゛
″−１″がセットされる。FIG. 6 is a flowchart showing the speed control procedure of the first embodiment. When a voice command and a speed command V are input from the input terminal 1 shown in FIG. 1, the process shown in FIG. 6 is started. In FIG. 6, a variable j indicates a frame count value (frame number) and takes a value from 1 to N. Variable (period counter)
n counts the period m for thinning out or redundant use, and takes a value from 0 to m-1. The flag f indicates the completion status of thinning or duplicate usage within one cycle, and is 1
At the beginning of a cycle, it is reset to '0'' along with the cycle counter n, and when thinning out or duplication occurs, it is set to 1''. Further, in the case of duplication processing, the flag f is primarily set to the numerical value ``''-1'' as an index for using the same feature parameter twice.

〈初期処理〉ステップＳ１では演算ｍ＝６−１ｖｌを行って周期ｍを
求める。ステップＳ２ではフレーム番号ｊに数値１をセ
ットしてフレーム（１）からの音声パラメータ（可否情
報ｅを含む）のアクセスを可能にする。ステップＳ３で
は周期カウンタｎとフラグｆをリセットする。ステップ
Ｓ４では速度指令■の内容を調べる。<Initial Processing> In step S1, the calculation m=6-1vl is performed to obtain the period m. In step S2, the frame number j is set to a numerical value of 1 to enable access to the audio parameters (including availability information e) from frame (1). In step S3, the period counter n and flag f are reset. In step S4, the contents of the speed command ■ are checked.

〈標準速度の発声〉ステップＳ４の判別で速度指令■の内容が０°′のとき
は標準速度の発声である。フローはステップＳｌｌに進
み、当該フレームｊの特徴パラメータの組をＰＡＲＣＯ
Ｒ型音声合成器５に転送する。更に、ＰＡＲＣＯＲ型音
声合成装置５は転送された特徴パラメータの組を音声情
報に合成し、Ｄ／Ａ変換器６は合成された音声情報をア
ナログ信号に変換し、増幅器７はアナログ信号を増幅し
、スピーカ８は合成音声を出力する。<Standard speed utterance> If the content of the speed command (■) is 0°' in step S4, the standard speed is uttered. The flow proceeds to step Sll, where the set of feature parameters of the frame j is PARCO
It is transferred to the R-type speech synthesizer 5. Further, the PARCOR type speech synthesizer 5 synthesizes the transferred characteristic parameter set into speech information, the D/A converter 6 converts the synthesized speech information into an analog signal, and the amplifier 7 amplifies the analog signal. , the speaker 8 outputs synthesized speech.

一方、ＣＰＵ２はこの間に、ステップＳ１２ではフラグ
ｆを調べ、°−１°゛ではないからステップＳ１３に進
み、フレーム番号ｊに＋１する。ステップＳ１５ては略
１フレームの時間長（１０ミ９秒）だけ時間を待ち、ス
テップＳ１６ではフレーム番号ｊが総フレーム数Ｎに達
したか否かを調べる。もしＮに達していれば総フレーム
数Ｎの出力を完了したので、ステップＳ１９に進み、処
理を終了する。またＮに達していな間は、ステップＳ１
７に進み、周期カウンタｎに＋１する。Meanwhile, during this time, the CPU 2 checks the flag f in step S12, and since it is not .degree.-1.degree., the process proceeds to step S13, where the frame number j is incremented by 1. In step S15, the process waits for approximately one frame time length (10.9 seconds), and in step S16, it is checked whether the frame number j has reached the total number of frames N. If N has been reached, the output of the total number of frames N has been completed, and the process advances to step S19 to end the process. Also, while N has not been reached, step S1
Proceed to step 7 and increment the cycle counter n by 1.

ステップ３１８ではｎ＜ｍか否かを調べ、もしｎ＜ｍな
ら１周期の途中であるからステップＳ４に戻り、次のフ
レームの読み出しを行う。また、ｎ＜ｍでないならｎ＝
ｍ（次の周期の始まり）であるから、ステップＳ３に戻
り、周期カウンタｎ及びフラグｆをリセットする。こう
して、標準速度の発声では無条件で総フレーム数Ｎの特
徴パラメータの組が出力される。In step 318, it is checked whether n<m, and if n<m, it is in the middle of one cycle, so the process returns to step S4 and the next frame is read. Also, if n<m, then n=
m (start of the next cycle), the process returns to step S3 and resets the cycle counter n and flag f. In this way, when speaking at the standard speed, a set of feature parameters with a total number of frames N is unconditionally output.

く標準速度よりも速い発声〉ステップＳ４の判別で速度指令Ｖの内容が”　ｏ　”で
ないときは標準速度よりも速いか、遅い発声である。そ
して速度指令Ｖの符号が正のときは標準速度よりも速い
発声であり、以下の間引き処理を行う。さて、ステップ
Ｓ３を通り、周期カウンタｎ＝ｏ及びフラグｆ−＝Ｏの
タイミング（１周期の始め）は間引き処理の可否を調べ
るタイミングである。ステップＳ５ではフラグｆの内容
を調べる。フラグｆの内容は始めは°゛Ｏ゛′であるか
らステップ＄６に進み、当該フレームの速度制御の可否
情報ｅＪを読み出す。ステップＳ７ではｅＪの内容が°
°０”°か否かを調べる。もし可否情報ｅＪの内容が′
０゛なら当該フレームは速度制御を可とされたフレーム
であり、フローはステップＳ８に進み、速度指令■の符
号が正か否かを調べる。今、速度指令Ｖの符号は正であ
るから、ステップＳＩＯに進み、フラグｆに１”°をセ
ットして間引き処理完了した旨の宣言をする。ステップ
Ｓ１２の判別では、フラグｆの内容は負でないからステ
ップＳ１３に進み、フレーム番号ｊに＋１する。こうし
て間引きの処理は、ステップＳｌｌの処理を行わずにフ
レーム番号を１つ更新することにより完了する。ステッ
プＳ１５では所定の時間待ちをする。この場合の待ち時
間は１フレ一ム時間長ではない。そして、次にステップ
Ｓ５に戻ったときはフラグｆの内容は０′′でない。即
ち、これ以降の１周期内では常にステップＳ５からステ
ップＳｌｌに進み、標準速度の発声において述べたと同
様にして順々に特徴パラメータの組を読み出し、ＰＡＲ
ＣＯＲ型音声合成器５に転送する。かようにして、周期
ｍ毎にその最初のフレームの可否情報ｅ、１の内容が調
べられ、もし゛可パならばそのフレームの特徴パラメー
タの組が間引かれる。Voicing faster than the standard speed> If the content of the speed command V is not "o" in step S4, it means that the utterance is faster or slower than the standard speed. When the sign of the speed command V is positive, the utterance is faster than the standard speed, and the following thinning process is performed. Now, after passing through step S3, the timing of the cycle counter n=o and the flag f-=O (the beginning of one cycle) is the timing to check whether or not the thinning process is possible. In step S5, the contents of the flag f are checked. Since the content of the flag f is initially °゛O゛', the process proceeds to step $6, and the speed control enable/disable information eJ of the frame is read out. In step S7, the contents of eJ are
Check whether °0”°. If the content of the availability information eJ is ′
If it is 0, the frame is a frame for which speed control is enabled, and the flow advances to step S8, where it is checked whether the sign of the speed command ■ is positive. Now, since the sign of the speed command V is positive, the process proceeds to step SIO and sets the flag f to 1" to declare that the thinning process has been completed. In the determination at step S12, the content of the flag f is negative. Since it is not, the process proceeds to step S13 and increments frame number j by 1.The thinning process is thus completed by updating the frame number by one without performing the process of step Sll.In step S15, a predetermined time wait is performed. The waiting time in this case is not the length of one frame.Then, when the process returns to step S5 next time, the content of the flag f is not 0''.In other words, from step S5 to Proceed to Sll, read out the set of feature parameters one after another in the same way as described for standard rate utterance, and use PAR
It is transferred to the COR type speech synthesizer 5. In this way, the content of the acceptability information e,1 of the first frame is checked every period m, and if it is acceptable, the set of feature parameters of that frame is thinned out.

しかし、ステップＳ７の判別において可否情報ｅＪの内
容が“°１パのときは当該フレームの特徴パラメータの
組の転送を間引かない。フローはステップＳｌｌに進み
、当該フレームの特徴パラメータの組をＰＡＲＣＯＲ型
音声合成器５に転送する。従って、このフレーム処理で
はフラグｆに１゛′がセットされないから、次のステッ
プＳ５の判別でもフラグｆ＝ｏを満足する。そして、ス
テップＳ６では次のフレームの可否情報ｅ　Ｊｕｌの内
容が調べられ、もしこれが′０°′のときはこのフレー
ムについて間引の処理が行なわれる。このようにして、
標準速度よりも速い発声の場合は、周期ｍ毎に間引きの
制御が実行され、もし当該フレームの特徴パラメータの
組を間引けないときはその次のフレームが可否情報ｅの
内容に従って間引かれることにより、標準速度よりも速
い発声を常に忠実に達成し、しかも重要な破裂性子音の
破裂時点のフレームは失われない。However, if the content of the availability information eJ is "°1pa" in the determination in step S7, the transfer of the feature parameter set of the frame is not thinned out.The flow advances to step Sll, and the feature parameter set of the frame is PARCOR Therefore, in this frame processing, the flag f is not set to 1', so the determination in the next step S5 also satisfies the flag f=o.Then, in step S6, the flag f=o is satisfied. The content of the availability information e Jul is checked, and if it is '0°', thinning processing is performed for this frame.In this way,
In the case of speech that is faster than the standard speed, thinning control is executed every period m, and if the feature parameter set of the frame cannot be thinned out, the next frame is thinned out according to the contents of the availability information e. As a result, faster-than-standard pronunciation is always faithfully achieved, and the frame at the point of the important plosive consonant is not lost.

く標準速度よりも遅い発声〉ステップＳ８の判別で速度指令Ｖの符号が負のときは標
準速度よりも遅い発声の場合であり、以下の重複使用の
処理を行う。同様にして、ステップＳ３を通り、周期カ
ウンタｎ＝ｏ及びフラグｆ＝ｏのタイミングは重複使用
の処理の可否を調べるタイミングである。ステップＳ５
ではフラグｆの内容を調べる。フラグｆの内容は始めは
０°°であるからステップＳ６に進み、当該フレームの
速度制御の可否情報ｅＪを読み出す。Speech Slower than Standard Speed> If the sign of the speed command V is negative in the determination in step S8, this means that the speech is slower than the standard speed, and the following duplicate usage process is performed. Similarly, after passing through step S3, the timing of the cycle counter n=o and the flag f=o is the timing to check whether or not duplicate use processing is possible. Step S5
Now, check the contents of flag f. Since the content of the flag f is initially 0°, the process proceeds to step S6, and the speed control enable/disable information eJ of the frame is read out.

ステップＳ７ではｅ」の内容がパ０″′か否かを調べる
。もし可否情報ｅＪの内容が°゛０゛°なら当該フレー
ムは速度制御を可とされたフレームてあす、フローはス
テップＳ８に進み、速度指令Ｖの符号が正か否かを調べ
る。今、速度指令Ｖの符号は負であるから、ステップＳ
９に進み、同一の特徴パラメータを２回使用する指標と
して１次的にフラグｆに数値”　−ｔ　”をセットする
。ステップＳｌｌでは１回目の特徴パラメータの組をＰ
ＡＲＣＯＲ型音声合成器５に転送する。ステップＳ１２
の判別ではフラグｆの内容が−１°゛であることにより
ステップＳ１４に進み、フラグｆに１“をセットする。In step S7, it is checked whether the contents of "e" are PA0'''. If the contents of the permission information eJ are °゛0゛°, the frame in question is a frame for which speed control is allowed, and the flow goes to step S8. Proceed to check whether the sign of the speed command V is positive or not.Now, the sign of the speed command V is negative, so step S
Proceeding to step 9, the value "-t" is primarily set in the flag f as an index for using the same feature parameter twice. In step Sll, the first set of feature parameters is P
It is transferred to the ARCOR type speech synthesizer 5. Step S12
In this determination, since the content of the flag f is -1°, the process proceeds to step S14, where the flag f is set to 1".

重複使用のため特徴パラメータの組を１回余分に転送完
了した旨の宣言である。またステップＳ１３の処理をス
キップすることによりフレーム番号を更新しない。即ち
、このフレーム番号の特徴パラメータを２度使用する。This is a declaration to the effect that a set of feature parameters has been transferred an extra time due to overlapping use. Furthermore, by skipping the process in step S13, the frame number is not updated. That is, the feature parameter of this frame number is used twice.

こうして、次からのステップＳ５の判別においてはフラ
グｆの内容が°１°°であることにより、当該１周期を
完了するまではフレーム番号ｊを更新して各特徴パラメ
ータの組をＰＡＲＣＯＲ型音声合成器５に転送する。In this way, in the determination of the next step S5, since the content of the flag f is °1°°, the frame number j is updated and each feature parameter set is used for PARCOR type speech synthesis until the corresponding one cycle is completed. Transfer to device 5.

しかし、ステップＳ７の判別において可否情報ｅ、の内
容が１°°のときは当該フレームの特徴パラメータの組
の重複転送を行わない。フローはステップＳｌｌに進み
、当該フレームの特徴パラメータの組をＰＡＲＣＯＲ型
音声合成器５に転送し、ステップＳ１３でフレーム番号
を１つ更新する。従って、このフレーム処理ではフラグ
ｆに”　１　”がセットされないから、次のステップＳ
５の判別でもフラグｆ＝ｏを満足する。そして、ステッ
プＳ６では次のフレームの可否情報ｅＪや１の内容が調
べられ、もしこれが”　ｏ　”のときはこのフレームに
ついて重複使用の処理が行なわれる。However, if the content of the permission information e is 1° in the determination in step S7, the set of feature parameters of the frame is not redundantly transferred. The flow advances to step Sll, where the set of feature parameters of the frame is transferred to the PARCOR type speech synthesizer 5, and the frame number is updated by one at step S13. Therefore, in this frame processing, the flag f is not set to "1", so the next step S
The determination of 5 also satisfies the flag f=o. Then, in step S6, the contents of the next frame's availability information eJ and 1 are checked, and if this is "o", the process of overlapping use is performed for this frame.

このようにして、標準速度よりも遅い発声の場合は、周
期ｍ毎に重複使用の制御が実行され、もし当該フレーム
の特徴パラメータの組を重複使用できないときはその次
のフレームの可否情報ｅの内容に従って次のフレームの
特徴パラメータの組を重複使用することにより、標準速
度よりも遅い発声を常に忠実に達成し、しかも重要な破
裂性子音の破裂時点のフレームは重複使用されない。In this way, in the case of utterances that are slower than the standard speed, control for overlapping use is executed every cycle m, and if the feature parameter set of the frame cannot be used overlappingly, the next frame's availability information e is By duplicating the set of feature parameters of the next frame according to the content, utterances slower than the standard rate are always faithfully achieved, and frames at the point of plosive consonants, which are important, are not duplicated.

第７図（Ａ）は第１実施例の可否情報ｅに対する４種類
の速度指令Ｖにおける処理結果を示した図に係り、音声
「夕」の開始部分のフレーム（ｉ−２）からフレーム（
ｉ　＋５）までの８フレームについての各処理結果を示
している。図において、”　ｘ　”印は間引きしたフレ
ームを表し、◎″印は重複使用したフレームを表す。ま
たフレーム（ｉ−２）の位置は何れの速度でも丁度１周
期ｍの倍数の位置にあると仮定する。FIG. 7(A) is a diagram showing the processing results for four types of speed commands V for the availability information e of the first embodiment, from frame (i-2) of the start part of the voice "Yu" to frame (
The processing results for 8 frames up to (i+5) are shown. In the figure, the "x" mark indicates a frame that has been thinned out, and the ◎" mark indicates a frame that has been used repeatedly. Also, the position of frame (i-2) is exactly a multiple of one cycle m at any speed. Assume.

速度指令ｖ＝２の場合は、演算ｍ＝６−１２１により、
周期ｍ＝４である。従って、先頭フレーム（ｉ−２）と
、それから４つ目のフレーム（ｉ　＋２）の可否情報ｅ
の内容が調べられ、この場合は何れも’ｏ”（可）であ
るので、供に間引かれる。In the case of speed command v=2, by calculation m=6-121,
The period m=4. Therefore, the availability information e for the first frame (i-2) and the fourth frame (i +2)
In this case, since they are all 'o' (acceptable), they are also thinned out.

速度指令ｖ＝３の場合は、演算ｍ−６−１３１により周
期ｍ＝３である。従って、先頭フレーム（ｉ−２）と、
それから３つ目のフレーム（ｉ＋１）等の可否情報ｅの
内容が調べられ、この場合は何れもＯ′°であるので、
供に間引かれる。When the speed command v=3, the period m=3 according to calculation m-6-131. Therefore, the first frame (i-2) and
Then, the contents of the acceptability information e of the third frame (i+1) etc. are checked, and in this case they are all O'°, so
They will be thinned out.

速度指令ｖ＝４の場合は、演算ｍミロ−１４１により周
期ｍ＝２である。従って、先頭フレーム（ｉ−２）と、
２つ目のフレーム（ｉ）と、更に２つ目のフレーム（ｉ
　＋２）等の可否情報ｅの内客が調べられる。この場合
はフレーム（ｉ−２）及びフレーム（ｉ　＋２）につい
ては何れも“０′”であるので、供に間引かれる。しか
し、フレーム（ｉ）については可否情報ｅの内容がパ１
°゛（不可）であるので、当該フレームの特徴パラメー
タの組は間引かれずに、その次のフレーム（ｉ＋１）の
可否情報ｅの内容が調べられ、この場合は内容が“０°
°であるので間引かれる。When the speed command v=4, the period m=2 according to the calculation mmilo-141. Therefore, the first frame (i-2) and
The second frame (i) and the second frame (i
+2) etc., etc., can be checked for internal customers with availability information e. In this case, frame (i-2) and frame (i+2) are both "0'" and are therefore thinned out. However, for frame (i), the content of availability information e is
°゛ (impossible), the feature parameter set of the frame is not thinned out, and the content of the acceptability information e of the next frame (i+1) is checked, and in this case, the content is “0°
°, so it is thinned out.

こうして、平均の発声速度には影響を与えず、しかも破
裂性子音「ｔ」の破裂時点を示すフレームＮ）の特徴パ
ラメータは間引かれることなくそのまま合成器５に転送
されるので、明瞭性のある音声が合成される。In this way, the average speech rate is not affected, and the characteristic parameters of frame N) indicating the plosive point of the plosive consonant "t" are transferred to the synthesizer 5 as they are without being thinned out, so that the intelligibility is improved. A certain voice is synthesized.

速度指令Ｖ−−４の場合は演算ｍ＝６−１４１により周
期ｍ＝２である。従って、先頭フレーム（ｉ−２）と、
２つ目のフレーム（ｉ）と、更に２つ目のフレーム（ｉ
＋２）等の可否情報ｅの内容が調べられる。またＶの符
号が負であるので、フレームの重複使用が行われる。即
ち、この場合もフレーム（ｉ−２）及びフレーム（ｉ　
＋２）については何れも０°°であるので、供に重複使
用が行われる。しかし、フレーム（ｉ）については可否
情報ｅの内容が°’１”（不可）であるので、当該フレ
ームの特徴パラメータの組は重複使用されず、その次の
フレーム（ｉ＋１）の可否情報ｅの内容が調べられ、こ
の場合は内容が０″であるので、重複使用される。この
場合も、平均の発声速度には影響を与えず、しかも破裂
性子音「ｔ」の破裂時点を示すフレーム（ｉ）の特徴パ
ラメータは重複使用されることなく１回だけ合成器５に
転送されるので、破裂音がダブらず、明瞭な音声が合成
される。In the case of speed command V--4, the period m=2 due to calculation m=6-141. Therefore, the first frame (i-2) and
The second frame (i) and the second frame (i
+2), etc., is checked. Also, since the sign of V is negative, frames are used repeatedly. That is, in this case as well, frame (i-2) and frame (i
+2) are both 0°, so both are used repeatedly. However, since the content of the acceptability information e for frame (i) is °'1'' (unavailable), the feature parameter set of the frame is not used repeatedly, and the acceptability information e of the next frame (i+1) is The content is checked and in this case the content is 0'', so it is used redundantly. In this case as well, the average speech rate is not affected, and the characteristic parameters of frame (i) indicating the plosive point of the plosive consonant "t" are transferred to the synthesizer 5 only once without being used twice. As a result, clear speech is synthesized without duplication of plosive sounds.

第８図（Ａ）は第１実施例の可否情報ｅに対する４種類
の速度指令Ｖにおける処理結果を原音声の波形と共に示
した図に係り、音声「夕」の開始部分のフレーム（ｉ−
２）からフレーム（ｉ＋５）までの８フレームについて
の各処理結果が示されている。第７図（Ａ）と同様に、
”ｘ”印は間引きしたフレームを表し、゛◎パ印は重複
使用したフレームを表す。第８図（Ａ）より明らかな通
り、無声破裂性子音「ｔ」の破裂時点を示すフレーム（
ｉ）の信号は、発声速度■の如何に依らず間引きや重複
使用の対象とはなっていない。FIG. 8(A) is a diagram showing the processing results of four types of speed commands V for the availability information e of the first embodiment together with the waveform of the original voice, and shows the frame (i-
The processing results for 8 frames from frame 2) to frame (i+5) are shown. Similar to FIG. 7(A),
The "x" mark indicates a frame that has been thinned out, and the "◎" mark indicates a frame that has been used repeatedly. As is clear from Figure 8 (A), the frame (
The signal i) is not subject to thinning out or repeated use, regardless of the speaking rate (2).

［第２実施例］第２実施例のブロック構成図は第１図のものと同一であ
る。第２実施例の特徴は、第１実施例で各フレーム毎に
付加した１ビツトの可否情報ｅを多値化して利用するこ
とにより、フレームの間引ぎや重複使用を速度指令Ｖの
大小に応じて適応的に行わせしめ、発声速度を変化させ
た場合にもより自然で、明瞭な音声を合成出力すること
にある。[Second Embodiment] The block diagram of the second embodiment is the same as that in FIG. The feature of the second embodiment is that the 1-bit enable/disable information e added to each frame in the first embodiment is multi-valued and used, thereby thinning out frames or using duplicate frames according to the size of the speed command V. The object of the present invention is to adaptively synthesize and output more natural and clear speech even when the speech rate is changed.

第３図（Ａ）に戻り、今度は無声破裂性子音「ｔ」の破
裂時点のフレーム（ａｌ）とその次のフレーム（ａ１＋
＋）に着目する。前述の如く、速度指令Ｖを変えても、
破裂時点のフレーム（ａ、）、（ｂＩ）及び（０皿）に
ついてはさほど変化が認められなかった。しかし、次の
フレーム（ａＩ＋＋）に着目すると、１．５倍の速さの
フレーム（ｂ＋＋＋）との間では殆ど不変であるのに対
し、２倍の速さのフレーム（ｃ１＋１）と比較すると、
もはやフレーム（ａｌ＋１）の特徴を表わすフレームは
見当らない。これは発声が速くなるのに従い、子音「ｔ
」から後続の母音「ａ」への長音結合部が短くなる為で
あり、この点は他の破裂性子音（ｋ、ｐ、ｂ、ｄ、ｇ、
ｒ等）の場合も同様である。Returning to Figure 3 (A), this time we will look at the frame (al) at the time of the plosive point of the voiceless plosive consonant “t” and the next frame (a1+
+). As mentioned above, even if the speed command V is changed,
No significant changes were observed in frames (a,), (bI), and (0 plate) at the time of rupture. However, if we focus on the next frame (aI++), it is almost unchanged from the 1.5 times faster frame (b+++), but when compared with the twice faster frame (c1+1),
No more frames are found that represent the characteristics of frame (al+1). As the utterance becomes faster, the consonant "t"
” to the following vowel “a” becomes shorter, and this point is similar to other plosive consonants (k, p, b, d, g,
The same applies to the case of r, etc.).

そこで、第２実施例では発声速度Ｖを変える場合に、標
準より速い発声においては、破裂時点のフレームは間引
かず、かつ後続母音部への長音結合部フレームについて
はその間引き法を工夫し、即ち、速度指令Ｖの大小に応
じて間引き法を適応的に変化させることにより、自然に
近い音声を合成出力する。また標準より遅い発声におい
ては、破裂性子音の継続時間をある程度以上長くすると
子音部の音韻性が失われることが知られているので、第
２実施例では多値化した可否情報ｅ２に対してフレーム
の間引きだけを禁止する符号情報もイ」加して処理を行
い、フレームの重複使用による破裂性子音の音韻性の変
化を防止する。Therefore, in the second embodiment, when changing the speaking speed V, when speaking faster than the standard, the frame at the point of rupture is not thinned out, and the thinning method is devised for the frame at the long sound joining part to the following vowel part, That is, by adaptively changing the thinning method according to the magnitude of the speed command V, a sound that is close to natural is synthesized and output. In addition, in speech that is slower than the standard, it is known that if the duration of a plosive consonant is increased beyond a certain level, the phonology of the consonant part is lost. Code information that only prohibits frame thinning is also added and processed to prevent changes in the phonology of plosive consonants due to repeated use of frames.

第９図は第２実施例の可否情報及び特徴パラメータの組
の構造を示す図である。図において、フレーム番号及び
特徴パラメータの組に関しては第４図のものと同一であ
るが、速度制御の可否情報ｅ２は異なる。第２実施例の
可否情報ｅ２は図のよう多値化されており、”　ｏ　”
を含む正又は負の整数で表わされる。FIG. 9 is a diagram showing the structure of a set of availability information and feature parameters in the second embodiment. In the figure, the set of frame numbers and characteristic parameters are the same as those in FIG. 4, but the speed control enable/disable information e2 is different. The availability information e2 of the second embodiment is multivalued as shown in the figure, and "o"
It is expressed as a positive or negative integer including .

そして、可否情報ｅ２の内容は、その絶対値が速度指令
Ｖに応じて決定された所定閾値ｔより以下のときは、当
該フレームの間引診や重複使用を可とし、また所定閾値
ｔより大きいときは当該フレームの特徴パラメータの組
をそのまま音声出力させるように利用される。The content of the availability information e2 is such that when the absolute value is less than a predetermined threshold value t determined according to the speed command V, thinning examination or duplicate use of the frame is allowed; In some cases, the set of feature parameters of the frame is used to output audio as is.

また可否情報ｅ２の内容に負の符号が付されたときは、
常に重複使用の対象から外される。Also, when a negative sign is attached to the content of the availability information e2,
Always excluded from duplicate use.

即ち、速度指令Ｖが標準より遅い場合はその可否情報ｅ
２の内容が負でないフレームのみを対象として上記の処
理を行う。That is, if the speed command V is slower than the standard, the availability information e
The above processing is performed only on frames in which the content of 2 is not negative.

第９図において、無声破裂性子音「ｔ」の破裂時点のフ
レーム（ｉ）に対しては例えば最大の絶対値１８１を与
え、以下の後続の母音「ａ」に至る長音結合部の３フレ
ームに対しては夫々絶対値１３１．１２１．ｌｉｔを与
えている。このような傾斜特性を与えると、速度指令Ｖ
が標準速度に近い時（閾値ｔが低い時）は母音定常部に
近いフレームのみが間引きや重複使用の対象となり、速
度指令Ｖが標準速度から外れる（閾値ｔが高くなる）に
従って破裂時点に近いフレームまで間引きや重複使用の
対象になる。またその際に、子音部分のフレーム（ｉ）
及びフレーム（ｉ＋１）には負の符号を与え、重複使用
を無条件に禁止して音韻変化を防止している。In Figure 9, for example, the maximum absolute value of 181 is given to frame (i) at the time of the plosive point of the voiceless plosive consonant "t", and the following three frames of the long conjunctive part leading to the following vowel "a" are given the maximum absolute value of 181. The absolute value for each is 131.121. It gives lit. If such a slope characteristic is given, the speed command V
When V is close to the standard speed (threshold t is low), only frames close to the vowel stationary part are subject to thinning out or duplication, and as the speed command V deviates from the standard speed (threshold t increases), the frame approaches the rupture point. Even frames are subject to thinning and duplicate use. At that time, the frame (i) of the consonant part
A negative sign is given to frame (i+1) and repeated use is unconditionally prohibited to prevent phoneme change.

第５図（Ｂ）は第２実施例における発声速度ｖ１閾値を
及び間引き又は重複使用の周期ｍの関係を示す図である
。同様にして、速度指令Ｖの内容は標準の発声速度を０
″とし、標準速度より速い場合を正の整数“１〜４°°
で表わし、標準速度より遅い場合を負の整数“−１〜−
４°′で表わしている。そして、閾値を及び周期ｍの値
は速度指令Ｖの内容を用いて下記の演算（１）及び（２
）により決定する。FIG. 5(B) is a diagram showing the relationship between the speaking rate v1 threshold and the period m of thinning or overlapping use in the second embodiment. Similarly, the content of the speed command V is to set the standard speaking speed to 0.
", and if it is faster than the standard speed, a positive integer "1~4°°
When the speed is slower than the standard speed, it is expressed as a negative integer "-1 to -
It is expressed in 4°'. Then, the values of the threshold value and the period m are calculated using the following calculations (1) and (2) using the contents of the speed command V.
) to be determined.

ｔ＝　ｌ　ｖ　ｌ−１・・・（１）ｍ＝６−ｌｖｌ　　　　　　　　・・・（２）従って、
もし速度指令Ｖが標準の０°°のときは演算（１）によ
り閾値ｔ＝−１になるから、この場合は可否情報ｅ２の
絶対値は閾値を以下の値を取り得ない。従って、常にフ
レームの間引きも重複使用も起こらず、全フレームの特
徴パラメーりの組がそのまま合成出力される。t= l v l-1...(1) m=6-lvl...(2) Therefore,
If the speed command V is the standard 0°, the threshold value t=-1 according to operation (1), so in this case, the absolute value of the acceptability information e2 cannot take a value less than the threshold value. Therefore, frames are not always thinned out or used repeatedly, and feature parameter sets of all frames are synthesized and output as they are.

こうして、入力端子１から発声指令及び速度指令Ｖが人
力されると、ＣＰＵ２は演算（１）及び（２）を実行し
て閾値ｔと周期ｍを求め、もし速度指令■の内容が０゛
′か正の整数であればｍフレーム毎に可否情報ｅ２の内
容を調べ、その絶対値が閾値を以下であるときは当該フ
レームの特徴パラメータの組を間引く。また速度指令■
が負の整数であれば、ｍフレーム毎に可否情報ｅ２の内
容を調べ、ｅ２の符号が負でなく、かつ閾値上以下であ
るとぎは当該フレームの特徴パラメータの組を重複使用
する。In this way, when the voice command and the speed command V are input manually from the input terminal 1, the CPU 2 executes calculations (1) and (2) to find the threshold value t and the period m. If it is a positive integer, the content of the availability information e2 is checked every m frames, and if the absolute value is less than or equal to the threshold, the feature parameter set of the frame is thinned out. Also speed command■
If is a negative integer, the contents of the availability information e2 are checked every m frames, and when the sign of e2 is not negative and is equal to or less than the threshold value, the feature parameter set of the frame is used redundantly.

第１０図は第２実施例の速度制御手順を示すフローチャ
ートである。尚、第６図と同一の処理には同一のステッ
プ番号を付して説明を省略する。FIG. 10 is a flowchart showing the speed control procedure of the second embodiment. Note that the same steps as those in FIG. 6 are given the same step numbers, and the description thereof will be omitted.

く初期処理〉入力端子１から発声指令及び速度指令Ｖが人力されると
ステップ５１００に入力する。ステップ５１００では前
記の演算（１）及び（２）に従って閾値ｔ＝ｌｖｌ−１
と周期ｍ＝６−１ｖｌを求める。Initial Processing> When a voice command and a speed command V are input manually from input terminal 1, they are inputted to step 5100. In step 5100, the threshold value t=lvl-1 is determined according to calculations (1) and (2) above.
and find the period m=6-1vl.

〈標準速度の発声〉ステップ５１０１の判別で速度指令Ｖの内容が０°′の
ときは標準速度の発声である。フローはステップ５１０
５に進み、ｌｅ２．＋Ｉ＞ｔか否かの判別をする。とこ
ろで、標準速度のときは閾値ｔ＝Ｉｏｌ−１＝−１であ
るから、１ｅ２Ｊｌ＞ｔを必ず満足する。従って、全フ
レームを通じてステップ５１０６を実行し、標準速度の
発声が遂行される。<Standard speed utterance> If the content of the speed command V is 0°' in step 5101, the standard speed is uttered. The flow is step 510
Proceed to step 5, le2. It is determined whether +I>t. By the way, since the threshold value t=Iol-1=-1 at the standard speed, 1e2Jl>t is definitely satisfied. Therefore, step 5106 is executed throughout all frames to perform standard rate speech.

く標準速度よりも速い発声〉ステップ５１０１の判別で速度指令Ｖの内容が正の整数
ときは標準速度よりも速い発声である。Speech faster than standard speed> If the content of the speed command V is a positive integer in step 5101, the speech is faster than the standard speed.

同様にして、ステップＳ３を通り、周期カウンタｎ＝ｏ
及びフラグｆ＝ｏのタイミング（１周期の始め）は間引
き処理の可否を調べるタイミングである。ステップＳ６
では可否情報ｅ２Ｊを取り出し、ステップ５１０５では
１ｅ２ＪＩ＞ｔか否かの判別をする。もし１ｅ２ｔｌ＞
ｔを満足するときは、当該フレームを間引かないでステ
ップ５ＩＯ６に進む。この場合はフラグｆに′１゛′を
立てないので、次のフレームについてもステップ５１０
５では１ｅ２Ｊｌ＞ｔか否かの判別をする。また１ｅ２
Ｊｌ＞ｔを満足するときは、ステップ５ＩＯ７に進み、
当該フレームを間引いて、フラグｆに１゛′を立て、間
引き処理完了の旨を宣言する。Similarly, step S3 is passed and the period counter n=o
The timing of the flag f=o (the beginning of one cycle) is the timing to check whether or not thinning processing is possible. Step S6
Then, the availability information e2J is extracted, and in step 5105 it is determined whether 1e2JI>t. If 1e2tl>
When t is satisfied, the process proceeds to step 5IO6 without thinning out the frame. In this case, the flag f is not set to ``1'', so step 510 is performed for the next frame as well.
In step 5, it is determined whether 1e2Jl>t. Also 1e2
When Jl>t is satisfied, proceed to step 5IO7,
The frame is thinned out, the flag f is set to 1', and the completion of the thinning process is declared.

〈標準速度よりも遅い発声〉ステップ５１０１の判別で速度指令Ｖの符号が負のとき
は標準速度よりも遅い発声の場合であり、以下の重複使
用の処理を行う。同様にして、ステップＳ３を通り、周
期カウンタｎ＝ｏ及びフラグｆ＝ｏのタイミングは重複
使用の処理の可否を調べるタイミングである。ステップ
Ｓ６では可否情報ｅ２ｊを取り出し、ステップ５１０２
ではｅ　２Ｊ＜　Ｏか否かの判別をする。ｅ２ｊ＜Ｏの
ときは重複使用禁止フレームと判断して無条件で重複使
用を行わない。フローはステップ５１０６に進み、当該
フレームの特徴パラメータの組を転送し、ステップＳ１
３に進み、フレーム番号を更新する。<Voice Slower Than Standard Speed> If the sign of the speed command V is negative in the determination in step 5101, this means that the voice is slower than the standard speed, and the following duplicate usage process is performed. Similarly, after passing through step S3, the timing of the cycle counter n=o and the flag f=o is the timing to check whether or not the duplicate use process is possible. In step S6, the availability information e2j is extracted, and in step 5102
Then, it is determined whether e2J<O. When e2j<O, the frame is determined to be a frame for which duplicate use is prohibited, and duplicate use is not performed unconditionally. The flow proceeds to step 5106, where the set of feature parameters of the frame is transferred, and step S1
Proceed to step 3 and update the frame number.

またｅ２ｊ〈０でないときは閾値による制御に従う。即
ち、ステップ５１０３ではｅ　２Ｊ＞　ｔか否かの判別
をする。ｅ２ｊ＞ｔのときは重複使用不可フレームと判
断して重複使用を行わない。またｅＢ＞ｔでないときは
ステップ５１０４に進み、フラグｆに°°−１°°をセ
ットして当該フレームの重複使用を可能にする。Further, when e2j<0, control based on the threshold value is followed. That is, in step 5103, it is determined whether e 2J>t. When e2j>t, it is determined that the frame cannot be used in duplicate, and the frame is not used in duplicate. If eB>t is not satisfied, the process proceeds to step 5104, where the flag f is set to °°-1°°, allowing the frame to be used repeatedly.

第７図（Ｂ）は第２実施例の可否情報ｅ２に対する４種
類の速度指令Ｖにおける処理結果を示した図に係り、音
声「夕」の開始部分のフレーム（ｉ−２）からフレーム
（ｉ＋５）までの８フレームについての各処理結果を示
している。同様にして、”ｘ”印は間引きしたフレーム
を表わし、◎゛印は重複使用したフレームを表わす。FIG. 7(B) is a diagram showing the processing results of four types of speed commands V for the availability information e2 of the second embodiment. ) shows the processing results for eight frames. Similarly, "x" marks represent thinned out frames, and ◎ marks represent duplicated frames.

またフレーム（ｉ−２）の位置は何れの速度においても
丁度１周期ｍの倍数の位置にあると仮定した。It is also assumed that the position of frame (i-2) is exactly at a multiple of one period m at any speed.

まず速度指令ｖ＝２の場合は、■の内容が正であるから
間引き制御の対象になる。演算（１）及び演算（２）に
より閾値ｔ＝１、周期ｍ＝４が求まる。従って、各先頭
フレームはフレーム（ｉ−２）及びフレーム（ｉ＋２）
であり、対応する可否情報ｅ２の絶対値が閾値ｔと比較
される。フレーム（’ｉ　−２）ではｌ　ｅ２＋−２１
＝Ｏで閾値ｔ＝１以下であるから間引きの対象になる。First, when the speed command v=2, since the content of ■ is positive, it becomes a target of thinning control. Threshold value t=1 and period m=4 are determined by calculation (1) and calculation (2). Therefore, each first frame is frame (i-2) and frame (i+2).
The absolute value of the corresponding availability information e2 is compared with the threshold value t. In frame ('i -2) l e2+-21
=O and the threshold value t=1 or less, so it becomes a target of thinning.

しかし、フレーム（ｉ＋２）ではｆｅ２亘＋２１＝２で
閾値ｔ＝１より大きいから間引籾は行われない。次のフ
レーム（ｉ＋３）においては、ｌ　ｅ２ｒ＋３　ｌ　＝
１で閾値ｔ＝１以下となり、間引きが行われる。こうし
て、速度指令ｖ＝２の場合はフレーム（ｉ−２）及びフ
レーム（ｉ＋３）の特徴パラメータの組が間引きされる
。However, in frame (i+2), since fe2cross+21=2 is larger than the threshold value t=1, paddy thinning is not performed. In the next frame (i+3), l e2r+3 l =
1, the threshold value t=1 or less, and thinning is performed. In this way, when the speed command v=2, the feature parameter sets of frame (i-2) and frame (i+3) are thinned out.

次に速度指令ｖ＝３の場合は、■の内容が正であるから
間引ぎ制御の対象になる。演算（１）及び演算（２）に
より閾値ｔ＝２１周期ｍ＝３が求まる。従って、各先頭
フレームはフレーム（ｉ−２）、　　フレーム（ｉ＋１
）及びフレーム（ｉ＋４）である。フレーム（ｉ−２）
及びフレーム（ｉ　＋４）では夫々ｌｅ２．−２　ｌ＝
ｌ　ｅ２１＋４　１　＝Ｏであり、閾値ｔ＝２以下であ
るから間引きの対象になる。しかし、フレーム（ｉ＋１
）ではＩ　ｅ２１４１　１　＝　３で閾値ｔ＝２より大
きいから、間引きは行われない。次のフレーム（ｉ＋２
）では、ｌ　ｅ２１＋２　　ｌ　＝２で閾値ｔ＝２以下
となり、間引きが行われる。こうして、速度指令ｖ＝３
の場合はフレーム（ｆ−２）、フレーム（ｉ＋２）及び
フレーム（ｉ＋４）の特徴パラメータの組が間引きされ
る。Next, when the speed command v=3, since the content of ■ is positive, it becomes a target of thinning control. Threshold value t=21 period m=3 is determined by operation (1) and operation (2). Therefore, each first frame is frame (i-2), frame (i+1
) and frame (i+4). Frame (i-2)
and le2. and frame (i +4), respectively. −2 l=
Since l e21+4 1 =O and less than the threshold value t=2, it becomes a target of thinning. However, frame (i+1
), I e2141 1 = 3, which is greater than the threshold t = 2, so no thinning is performed. Next frame (i+2
), when l e21+2 l =2, the threshold value t=2 or less, and thinning is performed. In this way, speed command v=3
In this case, the feature parameter sets of frame (f-2), frame (i+2), and frame (i+4) are thinned out.

次に速度指令ｖ＝４の場合は、■の内容が正であるから
間引き制御の対象になる。演算（１）及び演算（２）に
より、閾値ｔ＝３２周期ｍ＝２及求まる。従って、各先
頭フレームはフレーム（ｉ−２）、　　フレーム（ｉ）
、フレーム（ｉ＋２）及びフレーム（ｉ　＋４）である
。フレーム（ｉ−２）　、　　フレーム（ｉ＋２）及び
フレーム（ｉ＋４）においてはｌ　ｅ２１−２　　ｌ　
＝”ｌ　ｅ２１４２　１　＝２．　　ｌ　ｅ２１＋ａ　
　ｌ　＝Ｏであり、何れも閾値ｔ＝３以下であるから間
引きの対象になる。しかし、フレーム（ｉ）ではｌｅ２
＋１＝８で閾値ｔ＝３より大きいから間引きは行われな
いい次のフレーム（ｉ＋１）においては、ｌ　ｅ２＋＋
＋　　ｌ　＝３で閾値ｔ＝３以下となり、間引きが行わ
れる。こうして、速度指令ｖ＝４においてはフレーム（
ｉ−２）　、　フレーム（ｉ＋１）。Next, when the speed command v=4, since the content of ■ is positive, it becomes a target of thinning control. By calculations (1) and (2), the threshold value t=32 cycles m=2 is determined. Therefore, each first frame is frame (i-2), frame (i)
, frame (i+2) and frame (i+4). In frame (i-2), frame (i+2) and frame (i+4), l e21-2 l
=”l e2142 1 =2. l e21+a
Since l = O and both are less than or equal to the threshold value t = 3, they become targets for thinning. However, in frame (i) le2
Since +1=8 is larger than the threshold t=3, no thinning is performed.In the next frame (i+1), l e2++
+ l =3, the threshold value t=3 or less, and thinning is performed. In this way, when the speed command v=4, the frame (
i-2), frame (i+1).

フレーム（ｉ＋２）及びフレーム（ｉ＋４）の特徴パラ
メータの組が間引きされる。The feature parameter sets of frame (i+2) and frame (i+4) are thinned out.

最後に速度指令ｖ＝−４の場合は、■の内容が負である
から重複使用の制御対象である。演算（１）及び演算（
２）により閾値ｔ＝３、周期ｍ＝２が求まる。従って、
各先頭フレームはフレーム（ｉ−２）、　　フレーム（
ｉ）、フレーム（ｉ＋２）及びフレーム（ｉ＋４）であ
る。Finally, in the case of speed command v=-4, since the content of ■ is negative, it is a control target for duplicate use. Operation (1) and operation (
2), the threshold value t=3 and the period m=2 are found. Therefore,
Each first frame is frame (i-2), frame (
i), frame (i+2) and frame (i+4).

そこで、各対応する可否情報ｅ２の値が調べられ、負で
なければ閾値ｔと比較される。フレーム（ｉ−２）、　
　フレーム（ｉ＋２）及びフレーム（ｉ＋４）では可否
情報ｅ２の値が負でない。Therefore, the value of each corresponding availability information e2 is checked, and if it is not negative, it is compared with the threshold value t. frame (i-2),
In frame (i+2) and frame (i+4), the value of the availability information e2 is not negative.

そして、ｌ　ｅ２＋−２ｌ　＝Ｏ１ｌｅ２μ２１＝２、
Ｉｅ２Ｉや、１＝０でであるから何れも閾値ｔ＝３以下
であり、重複使用の対象となる。しかし、フレーム（ｉ
）ではｅ２Ｈ＝−８であるから、可否情報ｅ２の値が負
であるので重複使用は行われない。また、同１周期内の
残りのフレーム（ｉ＋１）でもｅ２＋＋＋＝３で、負で
あるから重複使用は行われない。こうして、速度指令ｖ
＝−４においてはフレーム（ｉ−２）、フレーム（ｉ＋
２）及びフレーム（ｉ　＋４）で重複使用される。And l e2+-2l =O1le2μ21=2,
Since Ie2I and 1=0, both are less than the threshold value t=3 and are subject to repeated use. However, frame (i
), since e2H=-8, the value of the availability information e2 is negative, so no duplicate use is performed. Further, in the remaining frame (i+1) within the same period, e2+++=3, which is negative, so no duplicate use is performed. In this way, the speed command v
=-4, frame (i-2), frame (i+
2) and frame (i +4).

第８図（Ｂ）は第２実施例の可否情報ｅ２に対する４種
類の速度指令Ｖにおける処理結果を原音声の波形と共に
示した図に係り、音声「夕」の１１１分のフレーム（ｉ
−２）からフレーム（ｉ＋５）までの８フレームについ
ての各処理結果が示されている。第７図（Ｂ）と同様に
、”　ｘ　”印は間引きしたフレームを表わし、パ◎°
゛印は重複使用したフレームを表わす。第８図（Ｂ）よ
り明らかな通り、標準速度より速い発声においては、無
声破裂性子音「ｔ」の破裂時点のフレームが常に保存さ
れ、更に破裂時点と母音「ａ」の定常部とを結ぶ調音結
合部分のフレーム（ｉ＋１）、　　フレーム（ｉ＋２）
及びフレーム（ｉ＋３）が発声速度Ｖの増加に応じて、
母音定常部に近いフレームから順に適応的に間引きされ
ているので、合成出力される音声は発声速度Ｖに依らず
、その明瞭性及び自然性を保つことができる。また、標
準速度より遅い発声においては、無声破裂性子音「ｔ」
の特徴を示すフレーム（ｉ）びフレーム（ｉ＋１）が重
複使用されずにそのまま転送されているので、合成出力
される音声は破裂性の子音部が時間軸方向に延長されず
、その音韻性を保つことができる。FIG. 8(B) is a diagram showing the processing results of four types of speed commands V for the availability information e2 of the second embodiment together with the waveform of the original voice, and shows the 111-minute frame (i
The processing results for eight frames from frame (i+5) to frame (i+5) are shown. Similar to Fig. 7(B), the "x" mark represents the thinned out frame, and the par ◎°
The mark indicates a frame that has been used repeatedly. As is clear from Figure 8 (B), when speaking faster than the standard rate, the frame at the time of the rupture of the voiceless plosive consonant "t" is always preserved, and furthermore, the frame at the time of the plosive point and the constant part of the vowel "a" is connected. Frame (i+1) and frame (i+2) of the articulatory connection part
and frame (i+3) as the speaking speed V increases,
Since the frames are adaptively thinned out in order from the frames closest to the vowel stationary part, the synthesized and output speech can maintain its clarity and naturalness regardless of the speaking speed V. In addition, when speaking at a slower rate than the standard rate, the voiceless plosive consonant "t"
Frame (i) and frame (i+1), which exhibit the characteristics of can be kept.

尚、上述の実施例では音声の特徴を表す特徴パラメータ
及び音声合成器としてＰＡＲＣＯＲ係数及びＰＡＲＣＯ
Ｒ型音声合成器を用いたが、１定時間長の音声を１組の
パラメータで表現する合成方式であれば、いかなる方式
でも実施可能であることは明白である。In addition, in the above-mentioned embodiment, a PARCOR coefficient and a PARCO
Although an R-type speech synthesizer is used, it is clear that any synthesis method can be used as long as it expresses speech of one fixed length of time using one set of parameters.

また、第２実施例において、間引きや重複使用の閾値ｔ
を速度指令Ｖを変数とする１次式の形で与えたが、速度
指令Ｖ毎に独立した手段で与えることが出来ることは明
白である。In addition, in the second embodiment, the threshold value t for thinning and overlapping use is
is given in the form of a linear equation with the speed command V as a variable, but it is clear that each speed command V can be given by independent means.

更に、第２実施例において、破裂性子音部と後続母音部
定常部への調音結合部分について速度制御の可否情報ｅ
２の効果を説明したが、本発明の効果はそれに限定され
ず、合成出力する音声の如何なる部分にも適用可能なこ
とは明らかである。Furthermore, in the second embodiment, information e on whether or not speed control is possible for the articulatory connection part between the plosive consonant part and the following vowel stationary part.
Although the effect of No. 2 has been described, it is clear that the effect of the present invention is not limited thereto, and can be applied to any part of the synthesized and output speech.

［発明の効果コ以上述べた如く、従来は単に機械的に行われていた特徴
パラメータの間引きや重複使用を、本発明によれば、速
度制御の可否情報を負荷し、速度指令Ｖの大小に応じて
特徴パラメータの間引きや重複使用を適応的に行うため
、音韻変化や脱落のない明瞭で且つ自然性を持った音声
を合成することができる。[Effects of the Invention] As described above, according to the present invention, the thinning out and redundant use of characteristic parameters, which were conventionally performed simply mechanically, are loaded with information on whether or not speed control is possible, and the magnitude of the speed command V is changed. Since feature parameters are thinned out or duplicated accordingly, it is possible to synthesize clear and natural speech without phonological changes or omissions.

[Brief explanation of the drawing]

第１図は本発明による第１実施例の音声合成装置のブロ
ック構成図、第２図（Ａ）〜（Ｃ，）は同一男性の発声した「ミタイ
」の一部「タイ」の音声波形を示す図、第３図（Ａ）〜（Ｃ）は第２図（Ａ）〜（Ｃ）の各音声
波形の一部を時間軸方向に同一倍率で拡大した図、第４図は第１実施例の可否情報及び特徴パラメータの組
の構造を示す図、第５図（Ａ）は第１実施例における速度指令Ｖとフレー
ムの間引き又は重複使用の周期ｍとの関係を示す図、第５図（Ｂ）は第２実施例における発声速度Ｖ、閾値を
及び間引き又は重複使用の周期ｍの関係を示す図、第６図は第１実施例の速度制御手順を示すフローチャー
ト、第７図（Ａ）は第１実施例の可否情報ｅに対する４種類
の速度指令Ｖにおける処理結果を示した図、第７図（Ｂ）は第２実施例の可否情報ｅ２に対する４種
類の速度指令Ｖにおける処理結果を示した図、第８図（Ａ）は第１実施例の可否情報ｅに対する４種類
の速度指令Ｖにおける処理結果を原音声の波形と共に示
した図、第８図（Ｂ）は第２実施例の可否情報ｅ２に対する４種
類の速度指令Ｖにおける処理結果を原音声の波形と共に
示した図、第９図は第２実施例の可否情報及び特徴パラメータの組
の構造を示す図、第１０図は第２実施例の速度制御手順を示すフローチャ
ートである。図中、１・・・入力端子、２・・・中央演算装置（ｃｐ
Ｕ）、３・・・第１記憶装置、４・・・補助記憶装置、
５・・・ＰＡＲＣＯＲ型音声合成器、６・・・Ｄ／Ａ変
換器、７・・・増幅器、８・・・スピーカである。Fig. 1 is a block diagram of a speech synthesis device according to the first embodiment of the present invention, and Figs. 2 (A) to (C,) show the speech waveforms of ``tai'', a part of ``mitai'', uttered by the same man. Figures 3 (A) to (C) are enlarged views of a portion of each audio waveform in Figures 2 (A) to (C) at the same magnification in the time axis direction, and Figure 4 is a diagram showing the results of the first implementation. FIG. 5(A) is a diagram showing the relationship between the speed command V and the frame thinning or duplication cycle m in the first embodiment; FIG. (B) is a diagram showing the relationship between the speaking rate V, the threshold value, and the period m of thinning or overlapping use in the second embodiment; FIG. 6 is a flowchart showing the speed control procedure of the first embodiment; ) is a diagram showing the processing results for the four types of speed commands V for the acceptability information e of the first embodiment, and FIG. 7(B) is the processing result for the four types of speed commands V for the acceptability information e2 of the second embodiment. FIG. 8(A) is a diagram showing the processing results of four types of speed commands V for the acceptability information e of the first embodiment together with the waveform of the original voice. FIG. 8(B) is a diagram showing the processing results of the second embodiment. A diagram showing the processing results of four types of speed commands V for the example availability information e2 together with the waveform of the original voice. FIG. 9 is a diagram showing the structure of the availability information and feature parameter set of the second embodiment. FIG. 10 is a flowchart showing the speed control procedure of the second embodiment. In the figure, 1...input terminal, 2... central processing unit (cp
U), 3... first storage device, 4... auxiliary storage device,
5...PARCOR type speech synthesizer, 6...D/A converter, 7...amplifier, 8...speaker.

Claims

[Claims]

(1) In a speech synthesis device that changes the speech rate by thinning out or duplicating the feature parameters to be synthesized, the feature parameters corresponding to a predetermined length of speech and the speech rate control that corresponds to at least each of the feature parameters. A storage means for storing acceptance/disapproval information, and during speech synthesis,
A speech synthesis device characterized by comprising speed control means for thinning out or duplicating feature parameters only for those feature parameters whose content of the availability information is speed controllable.

(2) In a speech synthesis device that changes the speech rate by thinning out or duplicating the feature parameters to be synthesized, is it possible to control the speech rate so that the feature parameters corresponding to a predetermined length of speech correspond to at least each of the feature parameters? a storage means for storing multivalued information; a threshold setting means for setting a threshold according to the speaking speed;
A speech synthesis device comprising a speed control means for thinning out or duplicating feature parameters only for feature parameters whose contents of the multi-valued information are smaller than the threshold value.

(3) The storage means is characterized in that it stores maximum multi-value information corresponding to a feature parameter indicating the point of rupture of a plosive consonant, and stores decreasing multi-value information corresponding to subsequent feature parameters. A speech synthesis device according to claim 2.

(4) The speech synthesis device according to claim 2, wherein the speed control means does not unconditionally use the feature parameter repeatedly when the multi-valued information has a predetermined sign.

(5) The speech synthesis device according to claim 2, wherein the threshold setting means sets a higher threshold as the speaking speed is faster or slower than the standard speed.