JPH02195397A

JPH02195397A - Speech synthesizing device

Info

Publication number: JPH02195397A
Application number: JP1013097A
Authority: JP
Inventors: Atsushi Sakurai; 櫻井　穆
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1989-01-24
Filing date: 1989-01-24
Publication date: 1990-08-01

Abstract

PURPOSE:To vary the voicing speed of a speech uniformly by providing a voicing speed standard value and setting the standard value with a command described in a text or an external switch. CONSTITUTION:This speech synthesizing device is equipped with a speed control means which controls the speed relatively to the preset standard speed. Namely, the number of one set of samples per voice parameter corresponding to the voicing speed standard value is determined and sent out to a synthesizer 8 to control the voicing speed. Then the text which contains command information specifying the voicing speed is inputted and analyzed by using a rule file 4, and then a voice pattern is generated by using a specific rule according to the analysis result. Then the voice pattern is outputted at the voicing speed relative to the preset standard speed. Consequently, the voicing speed of the synthesized voice is varied uniformly.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、入力テキストを解析し、所定規則により音声
を生成する音声合成装置に関し、特に、入力テキスト中
に記載されたコマンド情報により生成音声の発声速度を
変化させる方式の音声合成装置に関するものである。Detailed Description of the Invention [Field of Industrial Application] The present invention relates to a speech synthesis device that analyzes input text and generates speech according to predetermined rules, and particularly relates to a speech synthesis device that analyzes input text and generates speech according to predetermined rules. The present invention relates to a speech synthesis device that changes the speaking speed of the voice.

［従来の技術］従来、入力テキストを解析し、所定規則により音声を生
成する音声合成装置では、入力テキスト中に記載された
コマンドによって、生成する音声の発声速度を制御する
ことが試みられている。[Prior Art] Conventionally, in a speech synthesis device that analyzes input text and generates speech according to predetermined rules, attempts have been made to control the speech rate of the generated speech using commands written in the input text. .

例えば、第２図（ａ）で示す（Ｓ３）は発声速度“３”
を指定し、この速度で音声「ナマムギ」を生成させるコ
マンドである。同様に（Ｓ５）は発声速度“５”を、（
Ｓ７）は発声速度“７”を指定し、この速度でそれぞれ
音声「ナマゴメ」、「ナマタマゴ」を生成させるコマン
ドである。For example, (S3) shown in FIG. 2(a) is a speech rate of "3".
This is a command that specifies , and generates the sound ``Namamugi'' at this speed. Similarly, (S5) sets the speaking rate “5” to (
S7) is a command that specifies a speaking speed of "7" and generates the voices "namagome" and "namatamago", respectively, at this speed.

［発明が解決しようとする課題］しかしながら、上記従来例では、各コマンドで指定でき
る発声速度が固定されているため、次のような欠点があ
る。[Problems to be Solved by the Invention] However, in the conventional example described above, since the speaking speed that can be specified with each command is fixed, there are the following drawbacks.

（１）テキスト全体の発声速度を一様に増減したい時、
即ち、全体をもっと速くしゃべらせたり、あるいは、も
つと遅くしゃべらせたりする必要が生じた時、テキスト
中の速度制御コマンドを全て書き換えなければならない
。(1) When you want to uniformly increase or decrease the speaking speed of the entire text,
That is, when it becomes necessary to make the entire text speak faster or slower, all speed control commands in the text must be rewritten.

（２）速度制御コマンドと固定速度との対応関係が線型
でない場合には、テキスト全体の発達速度を一様に増減
させることができない。(2) If the correspondence between the speed control command and the fixed speed is not linear, the development speed of the entire text cannot be uniformly increased or decreased.

これらの欠点を第２図の例で説明する。第２図（ａ）の
テキストを、全体に速く発声させようとした場合、第２
図（ｂ）で示すように、コマンド（Ｓ３）を（Ｓ４）Ｇ
、：、また（Ｓ５）を（Ｓ６）に、さらに、（Ｓ７）を
（Ｓ８）にそれぞれ書き換えなければならない。同様に
、第２図（ａ）のテキストを、全体に遅く発声させよう
とすると、第２図（Ｃ）で示すように、それぞれ書き換
えなければならない。また、この場合、コマンド（Ｓ３
）　、（Ｓ４）、（Ｓ５）　、（Ｓ６）に対応する発声
速度が、それぞれ３．３．４．Ｏ，，５，０，５，８（
モー９７秒）であれば、コマンド〔Ｓ３）に対するコマ
ンド（Ｓ５）の相対速度は５．０／３．３　＝１．５２
となり、コマンド（Ｓ４）に対するコマンド（Ｓ６）の
相対速度は５．８／４．０　＝１．４５となる。These drawbacks will be explained using the example shown in FIG. If you try to have the text in Figure 2 (a) uttered quickly as a whole, the second
As shown in figure (b), command (S3) is changed to (S4)G
, :, Also, (S5) must be rewritten as (S6), and (S7) must be rewritten as (S8). Similarly, if the text in FIG. 2(a) is to be uttered slowly throughout, each text must be rewritten as shown in FIG. 2(C). Also, in this case, the command (S3
), (S4), (S5), and (S6) are respectively 3.3.4. O,,5,0,5,8(
97 seconds), the relative speed of command (S5) to command [S3) is 5.0/3.3 = 1.52
Therefore, the relative speed of command (S6) with respect to command (S4) is 5.8/4.0 = 1.45.

従って、第２図（ａ）のテキストを音声化した際の音声
「ナマムギ」と音声「ナマゴメ」との相対速度と、第２
図（ｂ）のテキストを音声化した際の音声「ナマムギ」
と音声「ナマゴメ」との相対速度とは、異なってしまい
、テキスト全体を一様に速く発声させることができない
。Therefore, when the text in Fig. 2 (a) is transcribed into speech, the relative speed between the sounds "Namamugi" and the sounds "Namagome" and the second
The sound “Namamugi” when the text in Figure (b) is transcribed into voice
The relative speed of the text and the voice ``namagome'' differs, and the entire text cannot be uttered uniformly and quickly.

本発明は、上記課題を解決するために成されたもので、
合成音声の発声速度を一様に増減できる音声合成装置を
提供することを目的とする。The present invention was made to solve the above problems, and
It is an object of the present invention to provide a speech synthesis device that can uniformly increase or decrease the speaking speed of synthesized speech.

［課題を解決するための手段］上記目的を達成するために、本発明の音声合成装置は、
以下の構成を備える。即ち、発声速度を指定するコマンド情報を含むテキストを入力
する入力手段と、該入力テキストを解析する解析手段と
、該解析手段での解析結果に対し所定規則を用いて音声
パターンを生成する生成手段と、該生成手段での音声パ
ターンを前記発声速度に応じ出力する出力手段とを有す
る音声合成装置であって、前記発声速度をあらかじめ設
定された標準速度に対する相対値として速度制御する速
度制御手段とを備える。[Means for Solving the Problem] In order to achieve the above object, the speech synthesis device of the present invention has the following features:
It has the following configuration. That is, an input means for inputting a text including command information specifying a speech rate, an analysis means for analyzing the input text, and a generation means for generating a speech pattern using a predetermined rule based on the analysis result of the analysis means. and output means for outputting the speech pattern produced by the generation means according to the speech rate, the speech synthesis device comprising speed control means for controlling the speech rate as a relative value to a preset standard speed. Equipped with

また好ましくは、前記速度制御手段での標準速度を前記
入力テキスト中に記載されたコマンド情報により設定す
ることを特徴とする。Preferably, the standard speed of the speed control means is set by command information written in the input text.

さらに好ましくは、前記速度制御手段での標準速度をス
イッチにより設定することを特徴とする。More preferably, the standard speed of the speed control means is set by a switch.

さらにまた好ましくは、前記速度制御手段での標準速度
を前記コマンド情報又はスイッチにより設定し、且つ、
その一方を優先的に使用することを特徴とする。Still preferably, the standard speed of the speed control means is set by the command information or switch, and
It is characterized by preferentially using one of them.

［作用コ以上の構成において、発声速度を指定するコマンド情報
を含むテキストを入力し、そのテキストを解析後、解析
結果に対し所定規則を用いて音声パターンを生成する。[Operation] In the above configuration, a text including command information specifying a speech rate is input, and after analyzing the text, a voice pattern is generated using a predetermined rule based on the analysis result.

そして、その音声パターンを、あらかじめ設定された標
準速度に対する相対値とした発声速度に応じて、出力す
るように動作する。Then, it operates to output the voice pattern in accordance with a speaking speed that is a relative value to a preset standard speed.

［実施例］以下、添付図面を参照して本発明に係る好適な第１の実
施例を詳細に説明する。[Embodiment] Hereinafter, a first preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

く構成の説明　（第１図）〉第１図は、第１の実施例における音声合成装置の構成を
示すブロック図である０図において、ｌはテキストを入
力する入力端子、２は装置全体の制御を司どる中央制御
部であり、後述するＣＰＵの処理手順を格納するＲＯＭ
及び作業領域として使用するＲＡＭを含む。３は単位音
声ファイルであり、合成のための単位音声を音声パラメ
ータの形で収納する。４は規則ファイルであり、入力テ
キストを解析し音声パラメータの時系列を生成するため
の規則を収納する。５は補助記憶であり、標準速度等の
システムパラメータや変数を収納する。６は入力バッフ
ァであり、入力テキストを収納する。７は出力バッファ
であり、生成された音声パラメータ時系列を収納する。Explanation of the structure (Fig. 1)> Fig. 1 is a block diagram showing the structure of the speech synthesis device in the first embodiment. A ROM that is a central control unit that takes charge of control and stores the processing procedures of the CPU, which will be described later.
It also includes a RAM used as a work area. 3 is a unit audio file, which stores unit audio for synthesis in the form of audio parameters. A rule file 4 stores rules for analyzing input text and generating a time series of audio parameters. 5 is an auxiliary memory, which stores system parameters and variables such as standard speed. 6 is an input buffer that stores input text. 7 is an output buffer that stores the generated audio parameter time series.

８は合成器であり、中央制御部２により送られて来る音
声パラメータを用い音声の合成を行う。９はＤ／Ａ変換
器であり、合成器８により合成された音声をアナログ信
号に変換する。そして、９はアナログ音声信号を出力す
る出力端子である。A synthesizer 8 synthesizes speech using the speech parameters sent from the central control section 2. A D/A converter 9 converts the audio synthesized by the synthesizer 8 into an analog signal. Further, 9 is an output terminal for outputting an analog audio signal.

く動作概要の説明〉以上の構成において、入力端子１からテキストが入力さ
れると、これを中央制御部２が制御して入力バッファ６
に収納する６次に、中央制御部２は入力バッファ６に収
納したテキストを先頭から１文字ずつ取り出し、それら
が文章の一部であれば、規則ファイル４に収納されてい
る解析規則と解析辞書とを用いてテキスト解析を行う。Explanation of operation outline> In the above configuration, when text is input from the input terminal 1, the central control unit 2 controls it and sends it to the input buffer 6.
6 Next, the central control unit 2 extracts the text stored in the input buffer 6 one character at a time from the beginning, and if they are part of a sentence, the central control unit 2 extracts the text stored in the input buffer 6 one character at a time, and if they are part of a sentence, reads the analysis rules and analysis dictionary stored in the rule file 4. Perform text analysis using

また、同じく規則ファイル４に収納されている合成規則
により、単位音声ファイルに収納されている単位音声の
適当なものを選んで補間接続を行い、入力テキストに対
応する音声パラメータ時系列を生成する。次に、この生
成した音声パラメータ時系列を一旦出力バツファ７に収
納し、合成器８からのパラメータ転送要求に応じて一組
ずつ、音声パラメータを出力バッファ７より取り出して
合成器８に送出する。そして、合成器８は、中央制御部
２より送出された音声パラメータからデジタル音声信号
を生成し、Ｄ／Ａ変換器９に出力する。この信号を入力
したＤ／Ａ変換器９は、アナログ信号に変−換し、出力
端子１０より順次出力する。Furthermore, according to the synthesis rules also stored in the rule file 4, appropriate unit voices stored in the unit voice files are selected and interpolated and connected to generate a voice parameter time series corresponding to the input text. Next, the generated audio parameter time series is temporarily stored in the output buffer 7, and audio parameters are extracted one set at a time from the output buffer 7 and sent to the synthesizer 8 in response to a parameter transfer request from the synthesizer 8. Then, the synthesizer 8 generates a digital audio signal from the audio parameters sent from the central control unit 2 and outputs it to the D/A converter 9. The D/A converter 9 receives this signal, converts it into an analog signal, and sequentially outputs it from the output terminal 10.

一方、入力バッファ６より取り出した文字列がコマンド
であれば、そのコマンドを解析し、解析結果を補助記憶
５に収納する。なお、本実施例におけるコマンドは“（
）”記号で囲まれた英字と数字から成る文字列であり、
制御コマンド及びシステムコマンドの２種類が定義され
ている。On the other hand, if the character string taken out from the input buffer 6 is a command, the command is analyzed and the analysis result is stored in the auxiliary memory 5. Note that the command in this example is “(
)” is a string of letters and numbers surrounded by symbols,
Two types of commands are defined: control commands and system commands.

（１）制御コマンドまず、制御コマンドは、英大文字１字と数字とにから成
る文字列であり、発声速度、音量、アクセントの強さ等
を指定する。例えば、（Ｓ５）は発声速度“５”を指定
し、（Ａ４）はアクセントの強さ“４”を指定するコマ
ンドである。(1) Control Command First, the control command is a character string consisting of one uppercase English letter and a number, and specifies the speaking speed, volume, strength of accent, etc. For example, (S5) is a command that specifies a speaking rate of "5", and (A4) is a command that specifies an accent strength of "4".

（２）システムコマンド次に、システムコマンドは、英小文字４字と数字とから
成る文字列であり、音声合成システムの持つ各種標準値
の設定を行う０例えば、（ｓｓｐｅ８０）は発声速度の
標準値を８０サンプルに設定する。(2) System command Next, the system command is a character string consisting of four lowercase letters and a number, and is used to set various standard values of the speech synthesis system.For example, (sspe80) is the standard value for the speaking speed. is set to 80 samples.

く発声速度の説明　（第３図）〉上述のコマンドの中で本実施例に関わるのは、発声速度
指定コマンド（Ｓ）と発声速度標準値設定コマンド（ｓ
　ｓ　ｅ　ｐ）である。そして、これら２つのコマンド
によりＩｒｌフレームｊ当りの音声サンプル数が制御さ
れる。以下では音声サンプル数と発声速度との関係につ
いて説明する。Explanation of speaking speed (Figure 3) Among the commands mentioned above, the ones that are relevant to this embodiment are the speaking speed designation command (S) and the speaking speed standard value setting command (s).
s e p). These two commands control the number of audio samples per Irl frame j. The relationship between the number of audio samples and the speaking speed will be explained below.

本実施例の単位音声ファイル３に収納されている音声パ
ラメータは、人間の発声した単位音声を標本化周波数１
０ＫＨｚで標本化し、第３図（ａ）で示すように、フレ
ームシフト幅１０ｍ５で分析を行い、音声パラメータ化
したものである。また、合成器８に出力する一組の音声
パラメータは、継続時間１０ｍ５の発声波形の特徴を表
現し、この１０ｍ５の音声区間には、１００００Ｘ　（１０／１０００）＝１００個の音声サ
ンプルが含まれている。そこで、合成器８がこの一組の
音声パラメータを用い、１００個の音声サンプルを生成
すると、その出力される音声は、原音声と同じ継続時間
長となり、結果的に原音声と同じ発声速度を持つことに
なる。The audio parameters stored in the unit audio file 3 of this embodiment are the unit audio uttered by a human being at a sampling frequency of 1.
The data was sampled at 0 KHz, analyzed with a frame shift width of 10 m5, and converted into audio parameters, as shown in FIG. 3(a). Also, a set of audio parameters output to the synthesizer 8 expresses the characteristics of a vocal waveform with a duration of 10 m5, and this 10 m5 audio section includes 10000X (10/1000) = 100 audio samples. ing. Therefore, when the synthesizer 8 uses this set of audio parameters and generates 100 audio samples, the output audio will have the same duration as the original audio and, as a result, will have the same speaking rate as the original audio. I will have it.

これに対し、第３図（ｂ）で示すように、合成器８が中
央制御部２より送出される一組の音声パラメータを用い
て、８０個の音声サンプルを生成した場合では、生成さ
れる音声の継続時間長は０．８倍となり、発声速度は１
．２５倍になる。On the other hand, as shown in FIG. 3(b), when the synthesizer 8 generates 80 audio samples using a set of audio parameters sent from the central control unit 2, the generated The duration of the voice will be 0.8 times longer, and the speaking speed will be 1
．． It becomes 25 times.

また、第３図（Ｃ）で示すように、１２０個の音声サン
プルを生成した場合には、生成される音声の継続時間長
は１．２倍となり、発声速度は０．８３倍になる。Furthermore, as shown in FIG. 3(C), when 120 voice samples are generated, the duration length of the generated voice is 1.2 times greater, and the speaking speed is 0.83 times greater.

即ち、本実施例では、発声速度指定コマンドに応じて、
発声速度標準値に対する一組の音声パラメータ当りのサ
ンプル数を決定し、これを合成器８に送出することによ
り、発声速度の制御を行っている。そして、合成器８で
は、このサンプル数を記憶し、中央制御部２より送出さ
れた一組の音声パラメータを用い、このサンプル数だけ
音声サンプルを生成する。その後、所定数の音声サンプ
ルの生成が終了すると、次の音声パラメータの転送を中
央制御部２に要求する。That is, in this embodiment, in response to the speaking rate designation command,
The speech rate is controlled by determining the number of samples per set of voice parameters for a standard speech rate value and sending this to the synthesizer 8. Then, the synthesizer 8 stores this number of samples, and uses the set of audio parameters sent from the central control unit 2 to generate audio samples for this number of samples. Thereafter, when the generation of a predetermined number of audio samples is completed, a request is made to the central control unit 2 to transfer the next audio parameter.

以上の処理を繰り返すことにより、発声速度指定コマン
ドに対応した発声速度を持つ合成音声が生成される。By repeating the above process, synthesized speech having a speaking rate corresponding to the speaking rate designation command is generated.

また、本実施例における音声速度指定コマンド（Ｓ）は
、“１”から“９”までの１桁の数字ｎ″に対し、式（
１）を用いてＩｒｌフレームｊ当りの音声サンプル数ｐ
　（ｎ）を決定する。In addition, the voice speed designation command (S) in this embodiment is based on the formula (
1), the number of audio samples p per Irl frame j
(n) is determined.

ｐ　（ｎ）＝ｋＸａ　（ｎ）　　　　　＝　（１）但し
、ａ　（ｎ）は次の条件により決定される。p (n) = kXa (n) = (1) However, a (n) is determined by the following conditions.

２．２５−０．２５ｎ　（１≦ｎ≦５）１．７５−０．
１５ｎ　（６≦ｎ≦９）ここで、ｋはサンプル単位で表
現された発声速度の標準値で、システムコマンド（ｓｓ
ｐｅ）によって設定される。2.25-0.25n (1≦n≦5) 1.75-0.
15n (6≦n≦9) Here, k is the standard value of the speaking rate expressed in units of samples, and the system command (ss
pe).

〈処理手順の説明　（第４図〜第６図）〉次に、入力端
子１より第４図（ａ）で示すテキストが入力された際の
処理を第５図で示すフローチャートに従って、以下に説
明する。<Explanation of processing procedure (Figs. 4 to 6)> Next, the process when the text shown in Fig. 4 (a) is input from the input terminal 1 will be explained below according to the flowchart shown in Fig. 5. do.

まず、ステップＳｌにおいて、テキストが入力されると
、そのテキストを入力バッファ６に収納し、ステップＳ
２で入力したテキストの解析処理を行う。次に、ステッ
プＳ３で解析した文字列が上述したコマンドかどうかを
判断する。そして、コマンドであれば、処理をステップ
Ｓ４に進め、発声速度標準値設定コマンドかを判別する
。First, in step Sl, when text is input, the text is stored in the input buffer 6, and step S
Analyze the input text in step 2. Next, it is determined whether the character string analyzed in step S3 is the above-mentioned command. If it is a command, the process advances to step S4, and it is determined whether it is a command to set a standard value of speaking rate.

なお、本実施例では、入力テキストの文字列が“（ｓｓ
ｐｅｌｏｏ）”であり、発声速度標準値設定コマンドを
示しているために、ステップＳ５に処理を進め、続く数
字列“１００”を処理し、ステップＳ６で発声速度の標
準値として、その数値１００を補助記憶５に収納する。Note that in this embodiment, the character string of the input text is “(ss
peloo)" and indicates the standard speaking rate setting command, the process advances to step S5, processes the following number string "100", and sets the numerical value 100 as the standard speaking speed in step S6. It is stored in the auxiliary memory 5.

次に、ステップＳｌｌにおいて、文字列の区切りを調べ
、区切りでなければ処理をステップＳ２に戻・す。そし
て、ステップＳ２で文字列（Ｓ３）を処理し、Ｓが発声
速度指定コマンドであることから、ステップＳ７に処理
を進める。ステップＳ７において、式（１）に従って、
ｒｌフレームｊ当たりの音声サンプル数ｐ　（ｎ）を求
める。Next, in step Sll, the delimiter of the character string is checked, and if it is not a delimiter, the process returns to step S2. Then, in step S2, the character string (S3) is processed, and since S is a speech rate designation command, the process advances to step S7. In step S7, according to equation (1),
Find the number of audio samples p (n) per rl frame j.

この場合、ｎ＝３であるから、ｐ　（ｎ）は、２．２５
−０．２５Ｘ３で１．５、そして、ｋ＝１ｏｏであるか
ら、ｐ　（ｎ）＝１５０　（サンプル）となる、そこで
、ステップＳ８において、発声速度として数値１５０を
設定し、補助記憶５に収納する。また、ステップＳｌｌ
では、第４図（ａ）で示すように、まだ文字列が続くた
めに、処理をステップＳ２に戻す。そして、続く文字列
“生麦”を同様に処理し、ステップＳ９で音声「ナマム
ギ」に対応する音声パラメータ時系列を生成する。そし
て、ステップＳＩＯで先頭に音声速度情報を付加して出
力バッファ７に収納する。In this case, since n=3, p (n) is 2.25
-0.25X3 = 1.5, and since k = 1oo, p (n) = 150 (samples).Therefore, in step S8, a numerical value of 150 is set as the speaking rate and stored in the auxiliary memory 5. do. Also, step Sll
Now, as shown in FIG. 4(a), since the character string continues, the process returns to step S2. Then, the following character string "namamugi" is processed in the same way, and in step S9, a speech parameter time series corresponding to the speech "namamugi" is generated. Then, in step SIO, audio speed information is added to the beginning and stored in the output buffer 7.

この時、補助記憶５に収納されている発声速度は“１５
０”であるから数値１５０が、音声パラメータ時系列の
先頭に付加される。At this time, the speaking speed stored in the auxiliary memory 5 is “15
0'', the numerical value 150 is added to the beginning of the audio parameter time series.

その後、合成器８から最初に設定された発声速度標準値
のサンプル数間隔で、中央制御部２に対して音声パラメ
ータの出力要求が出される。この合成器８からの出力要
求に対し、ステップＳ１２において、出力バッファ７に
収納されている音声パラメータを一組取り出して出力す
る。しかし、発声速度情報が付加されている場合には、
必ず発声速度情報を合成器８に送出し、引き続いて音声
パラメータを一組送出する０合成器８では、中央制御部
２より送出された情報が発声速度であればそれを記憶し
、引き続き送出される音声パラメータを受は取り、これ
らを使用して、記憶している発声速度情報の示すサンプ
ル数だけ音声サンプルを生成する。Thereafter, the synthesizer 8 requests the central control unit 2 to output voice parameters at intervals of the number of samples of the speech rate standard value set initially. In response to this output request from the synthesizer 8, in step S12, a set of audio parameters stored in the output buffer 7 is extracted and output. However, when speech rate information is added,
The synthesizer 8, which always sends the speech rate information to the synthesizer 8 and subsequently sends out a set of voice parameters, stores the information sent from the central control unit 2 if it is the speech rate, and continues to send it. The voice parameters are received and used to generate voice samples as many as the number of samples indicated by the stored speech rate information.

こうして文字列”生麦”に対し、音声「ナマムギ」が［
１’ｌフレームＪ当り１５０サンプルの速さで生成され
る。同様にして、文字列“（Ｓ５）”が処理され、発声
速度として、数値１００が設定され、文字列“生麦”に
対し、音声「ナマゴメ」がｒｌフレームｊ当り１００サ
ンプルの速さで生成される。そして、文字列“（Ｓ７）
生卵”が処理されると、音声「ナマタマゴ」が１１フレ
ームｊ当り７０サンプルの速さで生成される。In this way, for the character string "Namamugi", the sound "Namamugi" is [
It is generated at a rate of 150 samples per 1'l frame J. In the same way, the character string "(S5)" is processed, a value of 100 is set as the speech rate, and the voice "namagome" is generated for the character string "namamugi" at a rate of 100 samples per rl frame j. Ru. Then, the string “(S7)
When "raw egg" is processed, the audio "raw egg" is generated at a rate of 70 samples per 11 frames j.

以上の処理により、第４図に示す入力テキスト（ａ）、
（ｂ）、（ｃ）をそれぞれ処理した結果を第６図に示す
、コマンド処理の結果設定される発声速度標準値は（ａ
）では１００、（ｂ）では８０、（Ｃ）では１２０サン
プルと、それぞれ異なるが、コマンド（Ｓ３）を処理し
た結果生成されるサンプル数と、コマンド（Ｓ５）を処
理した結果生成されるサンプル数との比はいずれの場合
も１．５である。Through the above processing, the input text (a) shown in FIG.
The results of processing (b) and (c) are shown in Figure 6. The standard speaking rate set as a result of command processing is (a
) is 100 samples, (b) is 80 samples, and (C) is 120 samples, which are different, but the number of samples generated as a result of processing the command (S3) and the number of samples generated as a result of processing the command (S5) are different. The ratio is 1.5 in both cases.

同様にコマンド（Ｓ７）とコマンド（Ｓ５）との比較で
もその比は常に一定で０．７である。Similarly, when comparing the command (S7) and the command (S5), the ratio is always constant and is 0.7.

従って、本実施例によれば、テキスト中の発声速度標準
値設定コマンドを変更するだけで、同一テキストにより
生成される音声の発声速度を一様に増減でき、発声速度
設定コマンドと実際の発声速度との対応関係が、線型で
あるか否かの影響を受けることもない。Therefore, according to this embodiment, by simply changing the standard speaking rate setting command in the text, the speaking speed of voices generated by the same text can be uniformly increased or decreased. The correspondence relationship with is not affected by whether or not it is linear.

［他の実施例］次に、本発明に係る第２の実施例を図面を参照して詳細
に説明する。[Other Embodiments] Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

〈構成の説明　（第７図〜第９図）〉第７図の構成図で示すように、第１の実施例の構成にス
イッチ回路１１を付加し、音声合成装置の使用者が目的
に応じてスイッチにより発声速度を制御できるようにし
たものである。<Description of the configuration (Figures 7 to 9)> As shown in the configuration diagram of Figure 7, a switch circuit 11 is added to the configuration of the first embodiment, allowing the user of the speech synthesis device to The speed of speech can be controlled using a switch.

スイッチ回路１１のスイッチ部分は、第８図で示す外観
を持ち、目盛り「１」から目盛り「５」までは発声速度
を増加させ、目盛り「−１」から目盛り「−５」までは
音声速度を減少させる。The switch part of the switch circuit 11 has the appearance shown in FIG. 8, and increases the speech rate from the scale "1" to the scale "5", and decreases the voice speed from the scale "-1" to the scale "-5". reduce

また、スイッチ回路１１は、第９図で示すようにロータ
リスイッチ１１１とエンコーダ１１２とから構成される
。そして、ロータリスイッチ１１１の位置情報がエンコ
ーダ１１２によりコード化され、中央制御部２に送られ
る。ここで、スイッチの目盛りの位置をｍ、エンコーダ
の出力をｉとすると、両者の関係は式（２）で示される
。Further, the switch circuit 11 includes a rotary switch 111 and an encoder 112, as shown in FIG. Then, the position information of the rotary switch 111 is encoded by the encoder 112 and sent to the central control section 2. Here, if the position of the scale of the switch is m and the output of the encoder is i, the relationship between the two is shown by equation (2).

ｉ＝ｍ＋５　　　　　　　　　　　・・・（２）但し、
ｍ＝−５，−４，・・・−１，Ｏ，ｌ＝　　５である。i=m+5...(2) However,
m=-5,-4,...-1, O, l=5.

従って、エンコーダの出力ｉは、“０“から“１０”ま
での整数値である。Therefore, the output i of the encoder is an integer value from "0" to "10".

く動作の説明　（第１０図、第１１図）〉以上の構成に
おいて、入力端子１よりテキストが入力されると、これ
を−旦入カバッファ６に収納する０次に、補助記憶５に
発声速度標準値として初期値“０”を、また、音声速度
制御情報として初期値“５”をそれぞれ収納する。そし
て、入力バッファ６に収納した入力テキストを、その先
頭から１文字ずつ処理し、その内容が文章の一部である
か、あるいは、コマンドであるかにより以下に述べるよ
うな、それぞれの処理を行う。(Figures 10 and 11) In the above configuration, when text is input from the input terminal 1, it is first stored in the input buffer 6. Next, the utterance rate is stored in the auxiliary memory 5. The initial value "0" is stored as the standard value, and the initial value "5" is stored as the voice speed control information. Then, the input text stored in the input buffer 6 is processed character by character from the beginning, and the following processing is performed depending on whether the content is part of a sentence or a command. .

まず、入力バッファ６から取り出したテキストの内容が
コマンドであれば、その数字部分を数値化して、補助記
憶５の対応する場所に収納する。First, if the content of the text taken out from the input buffer 6 is a command, its numerical part is digitized and stored in the corresponding location in the auxiliary memory 5.

例えば、文字列が（Ｓ３）であれば、補助記憶５内の変
数領域ｎに数値３を収納し、また、文字列が（ｓｓｐｅ
１２０）であれば、補助記憶５内の変数領域ｋに数値１
２０を収納する。For example, if the character string is (S3), the numerical value 3 is stored in the variable area n in the auxiliary memory 5, and the character string is (sspe
120), the value 1 is stored in the variable area k in the auxiliary memory 5.
Stores 20.

一方、入力バッファ６から取り出したテキストの内容が
文章の一部であれば、第１の実施例と同様な処理を行う
。この処理は、単位音声ファイル３及び規則ファイル４
を用い、音声パラメータ時系列を生成し、出力バッファ
７に収納する。この際、補助記憶５内の変数領域ｋを調
べ、その内界が“Ｏ”以外の値であれば、この値と補助
記憶５内の変数領域ｎの値とから、第１の実施例で説明
した式（１）を用いてｆｌフレームＪ当りの音声サンプ
ル数ｂ　（ｎ）を求める。そして、この値を生成した音
声パラメータ時系列の先頭に付加し、さらに、ｋの内容
が“０”であれば、補助記憶５内の変数領域ｎの値に負
符号を付加する。この処理は、入力テキスト中の文章を
解析し、音声パラメータ時系列を生成する際、すでに音
声速度標準値ｋに“０”以外の数値が設定されていれば
、発声速度制御情報ｎより生成した音声パラメータ時系
列の「１フレーム」当りの音声サンプル数を直ちに決定
することができる。しかし、音声速度標準値にの内容が
“Ｏ”であれば、発声速度制御情報ｎだけを速度情報と
して、生成した音声パラメータ時系列に付加することを
意味している。例えば、補助記憶５内の変数領域が第１
０図（ａ）の状態であれば、「１フレーム」当りの音声
サンプル数ｐ　（ｎ）として“１８０”が計算され、生
成された音声パラメータ時系列の先頭に付加される。ま
た、第１０図（ｂ）の状態であれば、発声速度制御情報
ｎの内容に負符号を付加した値−３″が先頭に付加され
る。On the other hand, if the content of the text taken out from the input buffer 6 is part of a sentence, the same processing as in the first embodiment is performed. This process includes unit audio file 3 and rule file 4.
is used to generate an audio parameter time series and store it in the output buffer 7. At this time, the variable area k in the auxiliary memory 5 is checked, and if its inner bound is a value other than "O", from this value and the value of the variable area n in the auxiliary memory 5, the first embodiment The number of audio samples b (n) per fl frame J is determined using the explained equation (1). Then, this value is added to the beginning of the generated audio parameter time series, and if the content of k is "0", a negative sign is added to the value of the variable area n in the auxiliary memory 5. This process analyzes the sentences in the input text and generates the speech parameter time series, and if the speech rate standard value k has already been set to a value other than "0", the speech rate control information n is used to generate the speech parameter time series. The number of audio samples per "frame" of the audio parameter time series can be immediately determined. However, if the content of the voice rate standard value is "O", it means that only the voice rate control information n is added as rate information to the generated voice parameter time series. For example, the variable area in the auxiliary memory 5 is
In the state shown in FIG. 0 (a), "180" is calculated as the number of audio samples p (n) per "one frame" and added to the beginning of the generated audio parameter time series. In addition, in the state shown in FIG. 10(b), a value -3'', which is the content of the speech rate control information n with a negative sign, is added to the beginning.

次に、出力バッファ７に収納した音声パラメータ時系列
を、合成器８からのパラメータ転送要求に応じて一組ず
つ転送する時に、先頭に付加した発声速度情報の符号に
よって、次のような処理を行う。Next, when the audio parameter time series stored in the output buffer 7 is transferred one set at a time in response to a parameter transfer request from the synthesizer 8, the following processing is performed depending on the code of the speech rate information added to the beginning. conduct.

発声速度情報が正の値であれば、上述のようにｒｌフレ
ームｊ当りの音声サンプル数を示しているから直ちに合
成器８に転送し、以降の音声は、このサンプル数、即ち
、発声速度で生成される。If the speaking rate information is a positive value, it indicates the number of audio samples per rl frame j as described above, so it is immediately transferred to the synthesizer 8, and subsequent audio is processed at this number of samples, that is, at the speaking rate. generated.

また、発声速度情報が負の値であれば、解析処理の時点
で発声速度標準値が“Ｏ”であった事を意味するので、
スイッチ回路１１からの出力ｉを用いて発声速度標準値
を計算し、ｆｌフレームｊ当りの音声サンプル数ｐ　（
ｉ、ｎ）を式（３）により決定する。Additionally, if the speaking rate information is a negative value, it means that the standard speaking rate was "O" at the time of the analysis process.
A standard speech rate value is calculated using the output i from the switch circuit 11, and the number of voice samples p (
i, n) are determined by equation (3).

但し、ｎは音声パラメータ時系列の先頭に付加されてい
た発声速度情報から負符号を取り除いた値であり、発声
速度制御コマンドの数字部分の値を示す。However, n is a value obtained by removing the negative sign from the speaking rate information added to the beginning of the voice parameter time series, and indicates the value of the numerical part of the speaking rate control command.

ｐ　　（Ｌ、ｎ）＝ｋ　　（ｉ）Ｘａ　　（ｎ）　　　
＝　　（３）ここで、ａ　（ｎ）　＝２．２５−０．　２５ｎ　　（１≦ｎ≦５）１．７５−
０．　１５ｎ　　（６≦ｎ≦９）また、ｋ（ｉ）＝２０Ｘ　　（１０−ｉ）　　（０≦ｉ≦５）１０Ｘ　　
（１５−ｉ）　　（６≦ｉ≦１０）例えば、音声パラメ
ータ時系列の先頭情報が「−３」の時、発声速度設定ス
イッチの目盛り位置ｍが「４」であれば、式（２）より
ｉ＝９、またｎ＝３、従って、式（３）よりｐ　（ｉ、
ｎ）＝９０となり、ｌフレーム当りの音声サンプル数と
して数値９０が合成器８に送出され、以降生成される音
声はこの発声速度を持つ。p(L,n)=k(i)Xa(n)
= (3) where a (n) = 2.25-0. 25n (1≦n≦5)1.75-
0. 15n (6≦n≦9) Also, k(i) = 20X (10-i) (0≦i≦5) 10X
(15-i) (6≦i≦10) For example, if the starting information of the audio parameter time series is “-3” and the scale position m of the speaking rate setting switch is “4”, then from equation (2) i=9 and n=3, so from equation (3) p (i,
n) = 90, the numerical value 90 is sent to the synthesizer 8 as the number of audio samples per l frame, and the audio generated thereafter has this speaking rate.

即ち、テキスト内に発声速度標準値設定コマンドを持た
ない入力テキストは、ロークリスイッチ１１１によって
、その発声速度標準値を設定できるが、テキスト内に発
声速度標準値設定コマンドがあれば、テキスト内のそれ
以降の文章はロータリスイッチ１１１の位置によらず、
コマンドで設定された標準値により音声が生成される。That is, for an input text that does not have a standard speaking rate setting command in the text, the standard speaking rate can be set using the low reswitch 111, but if there is a standard speaking rate setting command in the text, the standard speaking rate setting command in the text can be set. The text after that does not depend on the position of the rotary switch 111,
Audio is generated according to the standard values set by the command.

以上の処理を第１１図に示す入力テキストの例を用いて
説明する。The above processing will be explained using an example of input text shown in FIG.

第１１図（ａ）に示すテキストが入力端子１より入力さ
れると、入力バッファ６に収納された後、中央制御部２
により、まず、文字列“（Ｓ３）”が発声速度制御コマ
ンドとして処理され、数値３が補助記憶装置５内の変数
領域ｎに収納される。次いで、文字列“生麦”が解析処
理され、音声パラメータ時系列が生成され、出力バッフ
ァ７に収納され、この際、先頭に発声速度情報が付加さ
れる。補助記憶装置Ｓ内の変数領域には初期値の“Ｏ”
を持っているので、この時の発声速度情報は変数領域ｎ
に収納されている発声速度制御情報に負符号を付加した
値“−３゛である。合成器８からのパラメータ転送要求
に対し、中央制御部２は、出力バッファ８より音声パラ
メータを一組ずつ取り出して合成器８に転送する。この
際、先頭に付加されている発声速度情報が処理されるが
、この場合には負の値を持つのでスイッチ回路１１の出
力ｉを調べる。When the text shown in FIG. 11(a) is input from the input terminal 1, it is stored in the input buffer 6 and then sent to the central control unit
First, the character string "(S3)" is processed as a speech rate control command, and the numerical value 3 is stored in the variable area n in the auxiliary storage device 5. Next, the character string "Namamugi" is analyzed, a speech parameter time series is generated, and stored in the output buffer 7, with speech rate information added to the beginning. The variable area in the auxiliary storage device S has an initial value of “O”.
Therefore, the speech rate information at this time is in the variable area n
The value is "-3", which is the value obtained by adding a negative sign to the speech rate control information stored in It is taken out and transferred to the synthesizer 8. At this time, the speech rate information added at the beginning is processed, but in this case, since it has a negative value, the output i of the switch circuit 11 is checked.

その結果、例えば、ロークリスイッチ１１１が第８図（
ｂ）で示す状態であれば、位置情報ｍ＝５、また、ｎ＝
３であるから式（２）（３）％式％ｒｌフレーム１当りの音声サンプル数として、数値７５
が合成器８に転送され、以降、生成される音声は、この
発声速度を持つ。As a result, for example, the low reswitch 111 is
In the state shown in b), position information m=5, and n=
3, so the number of audio samples per rl frame is 75.
is transferred to the synthesizer 8, and the speech generated thereafter has this speaking rate.

即ち、音声「ナマムギ」がｆｌフレームｊ当り７５サン
プルの速さで生成される。同様にして、文字列“（Ｓ５
）生麦”が処理され音声「ナマゴメ」がＩｒｌフレーム
ｊ当り５０サンプルの速さで生成され、文字列“（Ｓ７
）生卵”が処理され音声「ナマタマゴ」がｆｌフレーム
１当り２５サンプルの速さで生成される。That is, the voice "Namamugi" is generated at a rate of 75 samples per fl frame j. Similarly, the character string “(S5
) Namamugi” is processed and the voice “Namagome” is generated at a rate of 50 samples per Irl frame j, and the character string “(S7
) is processed and the voice ``raw egg'' is generated at a rate of 25 samples per fl frame.

即ち、通常の速さに対し１．５倍の速さでテキストの内
容が音声化され出力される。That is, the text content is converted into audio and output at a speed 1.5 times faster than the normal speed.

次に、第１１図（ｂ）に示すテキストが入力端子１から
入力されると、入力バッファ６に収納された後、中央制
御部２により、まず、文字列（ｓｓｐｅｌｏｏ）“がコ
マンドとして処理され、数値“１００”が補助記憶装置
５内の変数領域ｋに収納される。Next, when the text shown in FIG. 11(b) is input from the input terminal 1, it is stored in the input buffer 6, and then the central control unit 2 first processes the character string (sspeloo) as a command. , the numerical value “100” is stored in the variable area k in the auxiliary storage device 5.

次いで、文字列“（Ｓ３）”がコマンドとじて処理され
、数値“３”が補助記憶装置５内の変数領域ｎに収納さ
れる。引き続き文字列“生麦”が文章の一部として解析
処理され、音声パラメータ時系列が生成され、出力バッ
ファ７に収納される。この際、先頭に発声速度情報が付
加されるが、この場合、補助記憶装置５内の変数領域に
の値は”１００“で“Ｏ”ではないので、変数領域ｎの
値を共に用いて式（１）により、ｒｌフレームｊ当りの
音声サンプル数Ｐ　（ｎ）を直ちに計算し、その結果を
発声速度情報として付加する。Next, the character string "(S3)" is processed as a command, and the numerical value "3" is stored in the variable area n in the auxiliary storage device 5. Subsequently, the character string "Namamugi" is analyzed as part of the sentence, and a time series of audio parameters is generated and stored in the output buffer 7. At this time, speaking rate information is added to the beginning, but in this case, the value in the variable area in the auxiliary storage device 5 is "100" and not "O", so the value in the variable area n is used together with the expression According to (1), the number of audio samples P (n) per rl frame j is immediately calculated, and the result is added as speech rate information.

即ち、式（１）においてに＝１００．ｎ＝３であるから
Ｐ　（ｎ）＝１５０となり、発声速度情報として数値１
５０が先頭に付加される。That is, in equation (1), =100. Since n = 3, P (n) = 150, and the value 1 is used as speech rate information.
50 is added to the beginning.

次に、合成器８からのパラメータ転送要求を受けると、
出力バッファ７から音声パラメータを一組ずつ取り出し
合成器８に出力する。この際、先頭に付加されている発
声速度情報が処理されるが、この場合には、正の値“１
５０”を持つのでそのまま「ｌフレーム」当りの音声サ
ンプル数として合成器８に出力する。以降、生成される
音声は、このサンプル数に従うので音声「ナマムギ」が
ｒｌフレームｊ当り１５０サンプルの速さで生成される
。Next, upon receiving a parameter transfer request from the synthesizer 8,
The audio parameters are extracted one set at a time from the output buffer 7 and output to the synthesizer 8. At this time, the speech rate information added at the beginning is processed, but in this case, the positive value "1" is used.
50'', it is directly output to the synthesizer 8 as the number of audio samples per ``1 frame''. Thereafter, the generated speech follows this number of samples, so the speech "Namamugi" is generated at a rate of 150 samples per rl frame j.

同様に、文字列“（Ｓ５）生米”が処理され、音声「ナ
マゴメ」がｒｌフレーム１当り１００サンプルの速さで
生成される。また文字列“（Ｓ７）生卵”が処理され、
音声「ナマタマゴ」がＩｒｌフレームｊ当り５ｏサンプ
ルの速さで生成される。Similarly, the character string "(S5) Raw rice" is processed, and the speech "Namagome" is generated at a rate of 100 samples per rl frame. Also, the character string “(S7) raw egg” is processed,
The voice "Namatamago" is generated at a rate of 5o samples per Irl frame j.

即ち、スイッチ回路１１の出力に関わりなく、テキスト
に記述した通りの速さで音声が生成出力される。That is, regardless of the output of the switch circuit 11, the voice is generated and output at the speed described in the text.

以上説明したように上述した実施例によれば、次に述べ
る効果がある。As explained above, according to the above-mentioned embodiment, the following effects can be obtained.

（１）発声速度設定スイッチにより、合成装置の使用者
が、自分の好む゛速さで合成出力される音声を聞くこと
が出来る。(1) The voice speed setting switch allows the user of the synthesizer to listen to the voice synthesized and output at his/her preferred speed.

（２）発声速度設定スイッチによる速度の増減は、テキ
スト全体に対して一様に行われるので、入力テキスト内
でコマンドにより行われている速度制御を忠実に再現す
る。(2) Since the speed is increased/decreased by the voice speed setting switch uniformly for the entire text, the speed control performed by commands within the input text is faithfully reproduced.

（３）入力テキスト内に発声速度標準値設定コマンドを
挿入し、発声速度の標準値として“０”以外の値を設定
すれば、発声速度設定スイッチの位置のいかんに関わら
ず、常にテキストに記述した通りの速さで音声が生成出
力されるので、重要なメツセージを確実に伝達すること
が出来る。(3) If you insert the speaking rate standard value setting command into the input text and set a value other than "0" as the standard speaking rate, it will always be written in the text regardless of the position of the speaking rate setting switch. Since the voice is generated and output at exactly the same speed, important messages can be reliably conveyed.

［発明の効果］以上説明したように本発明によれば、発声速度標準値を
設け、この標準値をテキスト内に記述されたコマンドや
、外部スイッチにより設定できるようにしたことにより
、音声の発声速度を一様に増減することができる効果が
ある。[Effects of the Invention] As explained above, according to the present invention, a standard value for speech rate is provided, and this standard value can be set by a command written in the text or by an external switch. This has the effect of uniformly increasing or decreasing the speed.

[Brief explanation of the drawing]

第１図は本発明に係る第１の実施例を示す構成図、第２図は従来の技術例を説明する入力テキストを示す図
、第３図（ａ）〜（Ｃ）は音声パターンのサンプルを示す
図、第４図は第１の実施例での入力テキストを示す図、第５図は第１の実施例の処理を示すフローチャ第６図は
第１の実施例の処理結果を示す図、第７図は本発明に係
る第２の実施例を示す構成図、第８図（ａ）、（ｂ）はロータリスイッチの外観図、第９図はスイッチ回路の構成図、第１０図（ａ）、（ｂ）は補助記憶の内部状態を示す図
、第１１図は第２の実施例の処理例を示す図である。図中、１・・・入力端子、２・・・中央制御部、３・・
・単位音声ファイル、４・・・規則ファイル、５・・・
補助記憶、６・・・入力バッファ、７・・・出力バッフ
ァ、８・・・合成器、９・・・Ｄ／Ａ変換器、１０・・
・出力端子、１１・・・スイッチ回路である。 −ト、漫　３図第４図第１１図止斥第８因（０）ｇ８図（ｂ）第９図（ｂ）第１０図Fig. 1 is a block diagram showing a first embodiment of the present invention, Fig. 2 is a diagram showing input text explaining a conventional technical example, and Figs. 3 (a) to (C) are samples of voice patterns. FIG. 4 is a diagram showing input text in the first embodiment. FIG. 5 is a flowchart showing processing in the first embodiment. FIG. 6 is a diagram showing processing results in the first embodiment. , FIG. 7 is a configuration diagram showing a second embodiment of the present invention, FIGS. 8(a) and (b) are external views of a rotary switch, FIG. 9 is a configuration diagram of a switch circuit, and FIG. 10 ( a) and (b) are diagrams showing the internal state of the auxiliary memory, and FIG. 11 is a diagram showing a processing example of the second embodiment. In the figure, 1... input terminal, 2... central control unit, 3...
・Unit audio file, 4... Rule file, 5...
Auxiliary memory, 6... Input buffer, 7... Output buffer, 8... Synthesizer, 9... D/A converter, 10...
- Output terminal, 11... Switch circuit. -G, Man Figure 3 Figure 4 Figure 11 Cause of rejection 8th (0) Figure 8 (b) Figure 9 (b) Figure 10

Claims

[Claims]

(1) An input means for inputting text including command information specifying the speaking speed, an analysis means for analyzing the input text, and a generation method for generating a speech pattern using a predetermined rule based on the analysis result of the analysis means. and output means for outputting the speech pattern produced by the generating means according to the speech rate, the speech synthesis device comprising: a speed control means for controlling the speech rate as a relative value to a preset standard speed; A speech synthesis device characterized by comprising:

(2) The speech synthesis device according to claim 1, wherein the standard speed of the speed control means is set by command information written in the input text.

(3) The speech synthesis device according to claim 1, wherein the standard speed of the speed control means is set by a switch.

(4) The speech synthesis device according to claim 1, wherein the standard speed of the speed control means is set by the command information or a switch, and one of them is used preferentially.