JPH05281984A

JPH05281984A - Method and device for synthesizing speech

Info

Publication number: JPH05281984A
Application number: JP4076950A
Authority: JP
Inventors: Yoshiyuki Hara; 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-03-31
Filing date: 1992-03-31
Publication date: 1993-10-29

Abstract

PURPOSE:To optionally vary the time required for speech synthesis and the quantity of the synthesized speech. CONSTITUTION:A KANJI(Chinese character)-KANA(Japanese syllabary) mixed character code sequence as an object of speech synthesis and degree information showing a degree N are inputted through an input part 1, a language processing part 3 generates a phoneme series and a rhythm information corresponding to the input character code sequence on the basis of a word dictionary 2, and a synthesis parameter generation part 5 takes the cepstrum parameters of a phoneme corresponding to the phoneme series out of a speech element piece file 4 by the degree indicated by the degree information to generate phoneme parameters and also generate rhythm parameters corresponding to the rhythm information. Those phoneme parameters and rhythm parameters are inputted to a speech synthesis part 6, which performs synthetic filtering of degree indicated by the degree information from the input part 1 according to the phoneme parameters and rhythm parameters to generate the synthesized speech.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字コード列、または韻
律情報と音韻系列とから合成音声を生成する音声合成方
法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing method and apparatus for producing a synthetic voice from a character code string or prosody information and a phoneme sequence.

【０００２】[0002]

【従来の技術】近時、漢字かな混じりの文を解析し、そ
の文が示す音声情報を規則合成法により音声合成して出
力する音声合成装置が種々開発されている。そして、こ
の種の音声合成装置は、銀行業務における電話紹介サー
ビスや、新聞校閲システム、文書読み上げ装置等として
幅広く利用され始めている。2. Description of the Related Art Recently, various speech synthesizers have been developed which analyze a sentence containing kanji and kana and synthesize the speech information indicated by the sentence by a rule synthesis method and output the synthesized speech information. And this kind of speech synthesizer has begun to be widely used as a telephone introduction service in a banking business, a newspaper review system, a document reading device, and the like.

【０００３】この種の規則合成法を採用した音声合成装
置は、基本的には人間が発声した音声を予めある単位、
例えばＣＶ（子音、母音）、ＣＶＣ（子音、母音、子
音）、ＶＣＶ（母音、子音、母音）、ＶＣ（母音、子
音）毎にＬＳＰ（線スペクトル対）分析やケプストラム
分析等の手法を用いて分析して求められる音韻情報を音
声素片ファイルに登録しておき、この音声素片ファイル
を参照して音声パラメータ（音韻パラメータと韻律パラ
メータ）を生成し、これらの音声パラメータをもとにし
て音源の生成と合成フィルタリング処理を行うことによ
り合成音声を生成するものである。A voice synthesizing apparatus adopting this kind of rule synthesizing method basically puts a voice uttered by a human into a predetermined unit,
For example, for each CV (consonant, vowel), CVC (consonant, vowel, consonant), VCV (vowel, consonant, vowel), VC (vowel, consonant), LSP (line spectrum pair) analysis or cepstrum analysis is used. The phoneme information obtained by the analysis is registered in a phoneme file, and the phoneme file is referenced to generate phonetic parameters (phoneme parameters and prosodic parameters), and the sound source is generated based on these phonetic parameters. Is generated and synthetic filtering processing is performed to generate synthetic speech.

【０００４】従来、このような音声合成装置は、リアル
タイムに処理するために専用のハードウェアを必要とし
ている。この音声合成装置のシステム構成には大きく分
けて次の２種がある。Conventionally, such a speech synthesizer requires dedicated hardware for real-time processing. The system configuration of this speech synthesizer is roughly divided into the following two types.

【０００５】第１の構成は、パーソナルコンピュータ
（ＰＣ）などのホスト計算機が漢字かな混じり文を韻律
情報と音韻系列に変換し（言語処理）、専用のハードウ
ェアで合成パラメータの生成、音源の生成、合成フィル
タリング、Ｄ／Ａ（ディジタル／アナログ）変換を行う
ものである。これに対して第２の構成は、漢字かな混じ
り文から音声を生成するまでの全ての処理を専用のハー
ドウェアで行うものである。いずれの構成における専用
ハードウェアも、積和演算が高速なＤＳＰ（ディジタル
・シグナル・プロセッサ）と呼ばれるＬＳＩと汎用のＭ
ＰＵ（マイクロプロセッサユニット）で構成されるのが
殆どである。In the first configuration, a host computer such as a personal computer (PC) converts a kanji / kana mixed sentence into prosodic information and a phoneme sequence (language processing), and synthesizes synthesis parameters and sound sources with dedicated hardware. , Synthesis filtering, and D / A (digital / analog) conversion. On the other hand, in the second configuration, the dedicated hardware performs all the processes from the kanji / kana mixed sentence to the generation of voice. The dedicated hardware in any of the configurations has an LSI called DSP (digital signal processor), which has a high-speed product-sum operation, and a general-purpose M.
Most of them are composed of PU (microprocessor unit).

【０００６】一方、パーソナルコンピュータ（ＰＣ）や
エンジニアリング・ワーク・ステーション（ＥＷＳ）の
処理能力が高まったことと、標準でＤ／Ａ変換器、アナ
ログ出力部およびスピーカを搭載したことで、上記の処
理をリアルタイムにソフトウェアで行えるようになりつ
つある。On the other hand, the processing capability of the personal computer (PC) or engineering work station (EWS) has been increased, and the D / A converter, the analog output section and the speaker have been installed as standard, so that the above-mentioned processing is performed. Can be done in real time with software.

【０００７】このようなシステムでは、処理中のタスク
が少ない場合は問題ないが、タスクが多い場合はリアル
タイムに音声合成されないことが少なくない。そのた
め、発声単語の途中で無音区間が挿入され、非常に聞き
づらい音声となっていた。これは、音声合成に要する時
間が一定のため、少ないタスクでリアルタイム動作して
いても、タスクが多くなるとそれだけ他のタスクにＣＰ
Ｕの実行時間を取られるために起こるものである。In such a system, when there are few tasks being processed, there is no problem, but when there are many tasks, voice synthesis is often not performed in real time. Therefore, a silent section is inserted in the middle of the uttered word, which makes the voice very difficult to hear. This is because the time required for voice synthesis is constant, so even if you are working in real time with few tasks, as the number of tasks increases, CP will increase to other tasks.
It happens because the execution time of U is taken.

【０００８】ところで、現在の規則合成法を採用した音
声合成装置で生成される音声の声質を変えるものとし
て、男／女／子供／老男／老女、発話速度、声の高さ
（基本ピッチ、平均ピッチ）、ストレスレベル等があ
り、自分の好みにあった音声を選択できるようになって
いる。しかし、それらの選択では音声の声質は変えるこ
とができても品質そのものを変えることはできなかっ
た。By the way, the voice quality of the voice generated by the voice synthesizer adopting the current rule synthesizing method is changed as follows: male / female / child / old man / old woman, utterance speed, voice pitch (basic pitch, Average pitch), stress level, etc., so that you can select the voice that suits your taste. However, those choices could change the voice quality, but not the quality itself.

【０００９】現在は、明瞭度の高い「ハキハキ」とした
合成音声を生成するものがほとんどであるが、このよう
な合成音声は初めて聞く人に対してはなじみ易いが、合
成音声に対して慣れている者が長時間聞いている場合に
は疲れ易いという不具合もあった。At present, most of the synthesized speeches have a high degree of "clearness", which is familiar to a person who hears them for the first time, but is used to them. There is also a problem that the person who is listening tends to get tired when listening for a long time.

【００１０】[0010]

【発明が解決しようとする課題】このように上記した従
来の音声合成技術にあっては、音声合成に要する時間が
一定であったために、タスクが少ないときにはリアルタ
イムに音声合成できていたものが、タスクが多い場合に
はリアルタイムにできない等の不具合がある他、合成音
声の品質が固定であったため、長時間の使用には向いて
いない等の不具合があった。As described above, in the above-described conventional speech synthesis technique, since the time required for speech synthesis is constant, speech synthesis can be performed in real time when the number of tasks is small. When there are many tasks, there is a problem that it cannot be performed in real time, and because the quality of synthesized speech is fixed, there is a problem that it is not suitable for long-term use.

【００１１】本発明はこのような事情を考慮してなされ
たもので、その目的とするところは、合成フィルタリン
グの次数を変えることによって、音声合成に要する時間
と合成音声の品質を任意に変えることができる音声合成
方法および装置を提供することにある。The present invention has been made in consideration of such circumstances, and an object thereof is to arbitrarily change the time required for speech synthesis and the quality of synthesized speech by changing the order of synthesis filtering. A speech synthesis method and apparatus capable of performing

【００１２】[0012]

【課題を解決するための手段】本発明に係る音声合成方
法および装置は、音韻パラメータの次数、あるいは合成
音の品質を表す情報を入力し、音韻系列と韻律情報に従
って生成された音韻パラメータと韻律パラメータをもと
に、この入力情報に応じた次数の合成フィルタリングを
実行して合成音声を生成するようにしたことを特徴とす
るものである。A speech synthesis method and apparatus according to the present invention receives information indicating the order of phonological parameters or the quality of synthesized speech, and generates phonological parameters and prosody according to phonological sequences and prosodic information. It is characterized in that, based on a parameter, synthesis filtering of an order according to the input information is executed to generate a synthesized voice.

【００１３】[0013]

【作用】上記の構成においては、音韻パラメータの次
数、あるいは合成音の品質を表す情報に従って、合成フ
ィルタに入力される音韻パラメータの次数が変えられ
て、その次数の合成フィルタリングが実行される。この
ように、本発明によれば、合成フィルタに入力される音
韻パラメータの次数を変えることによって同フィルタに
おける計算量を増減することができる。In the above structure, the order of the phoneme parameter input to the synthesis filter is changed according to the order of the phoneme parameter or the information indicating the quality of the synthesized sound, and the synthesis filtering of the order is executed. As described above, according to the present invention, by changing the order of the phoneme parameter input to the synthesis filter, the amount of calculation in the filter can be increased or decreased.

【００１４】したがって、合成フィルタリングを含む音
声合成処理を、マルチタスクを実行するＣＰＵの特定の
タスク処理によって行うシステムでは、稼動する他のタ
スクが少ないとき（あるいはＣＰＵの能力が高いとき）
には高い次数を、逆に稼動する他のタスクが多いとき
（あるいはＣＰＵの能力が低いとき）は低い次数を選択
することによって合成音声をリアルタイムに生成するこ
とが可能となる。また、合成フィルタに入力される音韻
パラメータの次数を変えることによって、合成音の品質
を任意に変えることもできる。Therefore, in a system in which a speech synthesis process including synthesis filtering is performed by a specific task process of a CPU that executes multitasking, when there are few other tasks to operate (or when the CPU has high capability).
It is possible to generate a synthetic voice in real time by selecting a high order for the above, and a low order when there are many other tasks that operate in reverse (or when the CPU has low capability). In addition, the quality of the synthesized voice can be arbitrarily changed by changing the order of the phoneme parameter input to the synthesis filter.

【００１５】[0015]

【実施例】以下、図面を参照して本発明の一実施例を説
明する。図１は同実施例に係る音声合成装置の概略構成
を示すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of the speech synthesizer according to the embodiment.

【００１６】図１に示す音声合成装置は、音声合成の対
象とする漢字かな混じりの文字コード列と、合成音声の
制御情報の入力を司る入力部１を有する。この制御情報
は、例えば後述する音声合成部６内の合成フィルタに入
力すべき合成パラメータの次数Ｎを選択指定するための
情報（次数情報）からなる。The voice synthesizing apparatus shown in FIG. 1 has a character code string including kanji and kana to be subjected to voice synthesis, and an input unit 1 for inputting control information of synthesized voice. This control information includes, for example, information (order information) for selectively designating the order N of the synthesis parameter to be input to the synthesis filter in the speech synthesis unit 6 described later.

【００１７】図１に示す音声合成装置はまた、音声合成
の対象となる単語や句等についてのアクセント型、読
み、品詞情報等が予め登録されている単語辞書２と、入
力部１により入力された文字コード列を単語辞書２を用
いて解析し、対応する音韻系列および韻律情報を生成す
る言語処理部３とを有する。The speech synthesizer shown in FIG. 1 is also input by the input unit 1 and a word dictionary 2 in which accent types, readings, part-of-speech information and the like of words and phrases to be speech-synthesized are registered in advance. The language processing unit 3 that analyzes the character code string using the word dictionary 2 and generates the corresponding phoneme sequence and prosody information.

【００１８】図１に示す音声合成装置はまた、予め任意
の音声単位毎に入力音声を分析することにより求められ
たケプストラムパラメータ群、およびケプストラムパラ
メータの次数を表す情報が格納されている音声素片ファ
イル４と、言語処理部３にて生成された音韻系列および
入力部１からの次数情報に従う音韻パラメータ（ここで
は、音韻のケプストラムパラメータ）の生成を行う合成
パラメータ生成部５とを有する。合成パラメータ生成部
５は、言語処理部３にて生成された韻律情報に従う韻律
パラメータの生成も行う。The speech synthesizer shown in FIG. 1 also stores a cepstral parameter group obtained by previously analyzing the input speech for each arbitrary speech unit, and a speech unit in which information representing the degree of the cepstrum parameter is stored. It has a file 4 and a synthesis parameter generation unit 5 for generating phoneme parameters (here, phoneme cepstrum parameters) according to the phoneme sequence generated by the language processing unit 3 and the order information from the input unit 1. The synthesis parameter generation unit 5 also generates prosody parameters according to the prosody information generated by the language processing unit 3.

【００１９】図１に示す音声合成装置は更に、合成パラ
メータ生成部５によって生成された音韻パラメータ、そ
の次数情報および韻律パラメータをもとに、音源の生成
と、次数Ｎ分の合成フィルタリング処理を行って合成音
声を生成する音声合成部６と、音声出力用のスピーカ７
とを有する。なお、音声合成部６において合成音声をア
ナログ信号に変換するためのＤ／Ａ変換器などは省略さ
れている。The speech synthesizer shown in FIG. 1 further performs sound source generation and synthesis filtering processing of order N based on the phoneme parameter generated by the synthesis parameter generation unit 5, its order information and the prosody parameter. And a speaker 7 for outputting voice
Have and. The D / A converter for converting the synthesized voice into an analog signal in the voice synthesis unit 6 is omitted.

【００２０】以上の構成の音声合成装置は、マルチタス
クを実行するパーソナルコンピュータ（ＰＣ）やエンジ
ニアリング・ワーク・ステーション（ＥＷＳ）によって
実現されるもので、入力部１、言語処理部３、合成パラ
メータ生成部５および音声合成部６（内の音源生成、フ
ィルタリング処理部分）は、ＣＰＵのプログラム処理
（音声合成処理用タスクの実行）によって実現される機
能ブロックである。The speech synthesizer having the above configuration is realized by a personal computer (PC) or an engineering work station (EWS) that executes multitasking, and has an input unit 1, a language processing unit 3, and a synthesis parameter generator. The unit 5 and the voice synthesizing unit 6 (internal sound source generation and filtering processing portion) are functional blocks implemented by the program processing of the CPU (execution of the voice synthesizing processing task).

【００２１】次に、図１に示す音声合成装置の全体の動
作を説明する。まず入力部１により、音声合成の対象と
する漢字かな混じりの文字コード列と、次数Ｎを示す次
数情報が入力される。言語処理部３は、入力部１により
入力された文字コード列と単語辞書２とを照合し、この
入力文字コード列が示す音声合成の対象となっている単
語や句等についてのアクセント型、読み、品詞情報を求
め、その品詞情報に従うアクセント型・境界の決定、お
よび漢字かな混じり文の読みの形式への変換を行い、音
韻系列と韻律情報を生成する。Next, the overall operation of the speech synthesizer shown in FIG. 1 will be described. First, the input unit 1 inputs a character code string containing kanji and kana to be voice-synthesized and degree information indicating the degree N. The language processing unit 3 collates the character code string input by the input unit 1 with the word dictionary 2, and the accent type and reading of the word or phrase that is the target of the speech synthesis indicated by the input character code string. , The part-of-speech information is obtained, the accent type / boundary is determined according to the part-of-speech information, and conversion into a reading format of a kanji / kana mixed sentence is performed to generate a phonological sequence and prosody information.

【００２２】言語処理部３によって生成された音韻系列
と韻律情報は合成パラメータ生成部５に与えられる。こ
の合成パラメータ生成部５には、入力部１により入力さ
れた次数情報も与えられる。The phoneme sequence and prosody information generated by the language processing unit 3 are given to the synthesis parameter generating unit 5. The order information input by the input unit 1 is also given to the synthesis parameter generation unit 5.

【００２３】合成パラメータ生成部５は、音韻系列に対
応する音韻のケプストラムパラメータを、入力部１から
与えられる次数情報の示す次数Ｎ分だけ音声素片ファイ
ル４より抽出して音韻パラメータを生成する。同時に合
成パラメータ生成部５は、韻律情報に従って韻律パラメ
ータを生成する。The synthesis parameter generation unit 5 extracts the cepstrum parameters of the phonemes corresponding to the phoneme sequence by the order N indicated by the order information given from the input unit 1 from the speech unit file 4 to generate the phoneme parameters. At the same time, the synthesis parameter generation unit 5 generates prosody parameters according to the prosody information.

【００２４】音声合成部６は、合成パラメータ生成部５
によって生成された音韻パラメータおよび韻律パラメー
タを、入力部１から与えられる次数情報と共に入力して
一時保持する。そして音声合成部６は、入力した音韻パ
ラメータおよびその次数情報と韻律パラメータとに従
い、音源の生成とディジタルフィルタリング処理とを行
うことにより、前記の入力文字コード列に示される合成
音声を生成し、図示せぬＤ／Ａ変換器によりアナログ信
号に変換してスピーカ７に出力する。このようにして、
入力部１により入力された漢字かな混じりの文から音声
が生成されスピーカ７から出力される。The voice synthesis unit 6 is a synthesis parameter generation unit 5
The phoneme parameter and the prosody parameter generated by are input together with the order information given from the input unit 1 and temporarily stored. Then, the voice synthesizing unit 6 generates a synthesized voice represented by the input character code string by performing sound source generation and digital filtering processing in accordance with the input phoneme parameter and its order information and prosody parameter. An analog signal is converted by a D / A converter (not shown) and output to the speaker 7. In this way
A voice is generated from a sentence containing kanji and kana input by the input unit 1 and output from the speaker 7.

【００２５】次に、図１の音声合成部６の詳細な処理に
ついて、図２のフローチャートを参照して説明する。ま
ず音声合成部６は、フレーム番号を示すカウンタ変数
「ｊ」に「１」を、１フレーム当りの処理すべきサンプ
ル数の残りを示すカウンタ変数「ｉ」に［フレーム周
期］／［サンプリング周期］＝Ｐを、それぞれ初期値と
して設定する（ステップＳ１，Ｓ２）。ここで、［サン
プリング周期］は、図示せぬＤ／Ａ変換器のクロックの
周期に一致する。Next, detailed processing of the voice synthesis unit 6 of FIG. 1 will be described with reference to the flowchart of FIG. First, the speech synthesis unit 6 sets "1" in a counter variable "j" indicating a frame number, and sets [frame period] / [sampling period] in a counter variable "i" indicating the remaining number of samples to be processed per frame. = P is set as an initial value (steps S1 and S2). Here, the [sampling cycle] matches the cycle of the clock of the D / A converter (not shown).

【００２６】次に音声合成部６は、入力部１から与えら
れる次数情報に従い、合成パラメータ生成部５より入力
して保持しておいた音韻パラメータと韻律パラメータの
中から、同情報で示される次数Ｎに対応する１フレーム
分（フレーム番号は「ｊ」）の音韻パラメータＣ0 〜Ｃ
N と韻律パラメータとからなる合成パラメータＲj を選
択的に入力する（ステップＳ３）。Next, in accordance with the order information given from the input unit 1, the voice synthesis unit 6 selects the order indicated by the same information from the phoneme parameters and the prosodic parameters input from the synthesis parameter generation unit 5 and held. Phoneme parameters C0 to C for one frame (frame number is "j") corresponding to N
A synthetic parameter Rj consisting of N and a prosody parameter is selectively inputted (step S3).

【００２７】次に音声合成部６は、音韻パラメータＣ0
と韻律パラメータを用いて１サンプル分の音源データの
生成（音源生成）を行う（ステップＳ４）。そして音声
合成部６は、生成された１サンプル分の音源データを入
力として音韻パラメータＣ1〜Ｃ6 を用いてフィルタリ
ング（ディジタルフィルタリング）を実行する（ステッ
プＳ５）。Next, the speech synthesizer 6 uses the phoneme parameter C0.
Then, sound source data for one sample is generated (sound source generation) using the prosody parameter (step S4). Then, the speech synthesis unit 6 receives the generated sound source data for one sample and performs filtering (digital filtering) using the phoneme parameters C1 to C6 (step S5).

【００２８】音声合成部６は、ステップＳ５のフィルタ
リング処理を終了すると、入力部１から与えられた次数
情報の示す次数Ｎが「６」か否かを判定し（ステップＳ
６）、「６」のときはステップＳ５で生成された１サン
プリングデータ（音声データ）を出力する（ステップＳ
１０）。Upon completion of the filtering process of step S5, the voice synthesis unit 6 determines whether or not the order N indicated by the order information given from the input unit 1 is "6" (step S5).
6), when it is "6", the one sampling data (voice data) generated in step S5 is output (step S).
10).

【００２９】これに対し、次数Ｎが「６」以外のとき
は、音声合成部６は、ステップＳ５で生成されたデータ
を入力として音韻パラメータＣ7 〜Ｃ10を用いてフィル
タリングを実行する（ステップＳ７）。そして音声合成
部６は、上記次数情報の示す次数Ｎが「１０」か否かを
判定する（ステップＳ８）。On the other hand, when the order N is other than "6", the voice synthesizing section 6 executes filtering using the phoneme parameters C7 to C10 with the data generated in step S5 as an input (step S7). .. Then, the voice synthesizing unit 6 determines whether or not the order N indicated by the order information is "10" (step S8).

【００３０】音声合成部６は、ステップＳ８の判定の結
果、次数Ｎが「１０」であれば、上記ステップＳ１０の
１サンプリングデータ出力処理へジャンプする。これに
対し、次数Ｎが「１０」以外であれば、音声合成部６
は、上記ステップＳ７で生成されたデータを入力として
音韻パラメータＣ11〜Ｃ20のフィルタリングを実行し
（ステップＳ９）、しかる後にステップＳ１０の１サン
プリングデータ出力処理へ移る。If the order N is "10" as a result of the determination in step S8, the voice synthesizer 6 jumps to the one sampling data output process in step S10. On the other hand, if the order N is other than “10”, the speech synthesis unit 6
Performs filtering of the phoneme parameters C11 to C20 using the data generated in step S7 as an input (step S9), and then proceeds to the one sampling data output process of step S10.

【００３１】このように本実施例では、次数情報の示す
次数Ｎが「６」のときはＣ1 〜Ｃ6のフィルタリング
を、「１０」のときはＣ1 〜Ｃ10のフィルタリングを、
それ以外のときはＣ1 〜Ｃ20のフィルタリングを実行す
る。As described above, in this embodiment, when the order N indicated by the order information is "6", the filtering of C1 to C6 is performed, and when the order N is "10", the filtering of C1 to C10 is performed.
Otherwise, the filtering of C1 to C20 is executed.

【００３２】音声合成部６は、ステップＳ１０の１サン
プリングデータ出力処理を終了すると、カウンタ変数
「ｉ」を「１」だけ減算し（ステップＳ１１）、この
「ｉ」が「０」より大きいか否かを判定する（ステップ
Ｓ１２）。もし、「ｉ」が「０」より大きいならば、音
声合成部６は、次の１サンプル分の音源生成と次数Ｎ分
のフィルタリング処理のために上記ステップＳ４以降の
処理に戻り、そうでなければ、即ちＰサンプル（Ｐ＝
［フレーム周期］／［サンプリング周期］）分のステッ
プＳ４〜Ｓ１２の処理が実行されたならば、フレーム番
号を示すカウンタ変数「ｊ」を１だけ加算する（ステッ
プＳ１３）。When the one-sampling data output process of step S10 is completed, the voice synthesizing unit 6 subtracts "1" from the counter variable "i" (step S11), and determines whether this "i" is larger than "0". It is determined (step S12). If "i" is greater than "0", the speech synthesis unit 6 returns to the processing of step S4 and thereafter for the next one sample sound source generation and the order N filtering processing, and otherwise. That is, P sample (P =
When the processing of steps S4 to S12 for [frame period] / [sampling period]) is executed, 1 is added to the counter variable “j” indicating the frame number (step S13).

【００３３】このようにして音声合成部６は、Ｐ回だけ
ステップＳ４〜Ｓ１２の処理を実行して、１フレーム
（Ｐサンプル）分の音声データを生成する。そして１フ
レーム（Ｐサンプル）分の音声データを生成すると、即
ちカウンタ変数「ｉ」が「０」より大きい状態ではなく
なると、音声合成部６はカウンタ変数「ｊ」が音声合成
すべきフレーム数「Ｆ」以下か否かを判定し（ステップ
Ｓ１４）、「Ｆ」以下であれば次の１フレームについて
の音声データ生成のためにステップＳ２以降の処理に戻
り、「Ｆ」を超えていれば処理を終える。In this way, the voice synthesizing unit 6 executes the processing of steps S4 to S12 P times, and generates voice data for one frame (P samples). When voice data for one frame (P samples) is generated, that is, when the counter variable “i” is not larger than “0”, the voice synthesizing unit 6 sets the counter variable “j” to the number of frames to be voice synthesized. It is determined whether or not it is equal to or less than "F" (step S14), and if it is equal to or less than "F", the process returns to step S2 and subsequent steps to generate audio data for the next one frame. To finish.

【００３４】このようにして、音声合成部６はＦ回だけ
ステップＳ２〜Ｓ１４の処理を実行して、Ｆフレーム分
の音声データを生成する。なお、図２のフローチャート
では、Ｎ＝６，１０以外のときはすべてＣ1 〜Ｃ20のフ
ィルタリングを行うことになるが、本実施例では、入力
部１により入力される次数情報で指定可能な次数Ｎは、
６，１０，２０の３つに限られており、それ以外の次数
は指定されないものとする。In this way, the voice synthesizing unit 6 executes the processing of steps S2 to S14 only F times to generate voice data for F frames. In the flow chart of FIG. 2, C1 to C20 are all filtered except N = 6 and 10. However, in this embodiment, the order N that can be designated by the order information input by the input unit 1 is used. Is
It is limited to three of 6, 10, and 20, and the other orders are not specified.

【００３５】このように構成された音声合成装置におい
て、例えば次数「２０」（Ｎ＝２０）を示す次数情報が
入力部１に与えられたとする。サンプリング周期が１２
５μｓ、フレーム周期が１０ｍｓであるとすると、図２
におけるＰは「８０」となる。また、音声素片ファイル
４には各音節に対応するケプストラムパラメータがＣ0
〜Ｃ20まで格納されているものとする。In the speech synthesizer configured as described above, it is assumed that the order information indicating the order “20” (N = 20) is given to the input unit 1. Sampling period is 12
Assuming that the frame period is 5 μs and the frame period is 10 ms, FIG.
The value of P is "80". Also, in the speech unit file 4, the cepstrum parameter corresponding to each syllable is C0.
Up to C20 are stored.

【００３６】合成パラメータ生成部５は、言語処理部３
で生成された音韻系列の各音韻に対応する指定次数分の
ケプストラムパラメータＣ0 〜Ｃ20を音声素片ファイル
４から抽出すると共に韻律情報に従って韻律パラメータ
を生成する。なお、ここで得られたパラメータの全フレ
ームＦの数が５００であるとすると、音韻パラメータは
２１×５００＝１０５００個、韻律パラメータは５００
個である。The synthesis parameter generation unit 5 is a language processing unit 3
The cepstrum parameters C0 to C20 of the designated order corresponding to each phoneme of the phoneme sequence generated in step (4) are extracted from the speech unit file 4 and the prosody parameters are generated according to the prosody information. If the number of all frames F of the parameters obtained here is 500, the phoneme parameters are 21 × 500 = 10500, and the prosody parameters are 500.
It is an individual.

【００３７】音声合成部６は、合成パラメータ生成部５
によって生成された１０５００個の音韻パラメータと５
００個の韻律パラメータの中から、最初の１フレーム分
の音韻パラメータＣ0 〜Ｃ20および韻律パラメータから
なる合成パラメータＲ1 を入力し（ステップＳ３）、音
韻パラメータＣ0 と韻律パラメータに基づいて音源を生
成する（ステップＳ４）。次に音声合成部６は、音源デ
ータを合成フィルタに入力すると共に、音韻パラメータ
Ｃ1 〜Ｃ20を用いてフィルタリングを実行する（ステッ
プＳ５〜Ｓ１２）。音声合成部６は、以上のステップＳ
４〜Ｓ１２の処理を８０回（８０サンプル分）行う。The voice synthesizing section 6 is a synthesis parameter generating section 5
10500 phoneme parameters and 5 generated by
Of the 00 prosodic parameters, the first one frame of phoneme parameters C0 to C20 and the synthesis parameter R1 consisting of prosodic parameters are input (step S3), and a sound source is generated based on the phoneme parameter C0 and the prosodic parameters (step S3). Step S4). Next, the voice synthesis unit 6 inputs the sound source data into the synthesis filter and executes filtering using the phoneme parameters C1 to C20 (steps S5 to S12). The voice synthesizer 6 performs the above step S
The processing of 4 to S12 is performed 80 times (for 80 samples).

【００３８】その後、音声合成部６は、次の１フレーム
分の合成パラメータＲ2 を入力し（ステップＳ３）、ス
テップ４〜１２の処理を８０回行う。そして音声合成部
６は、これらの一連の処理（ステップＳ３〜Ｓ１２）を
５００回（５００フレーム分）行う。音声データは、こ
れらの処理中のステップＳ１０にて出力される。After that, the voice synthesizing unit 6 inputs the synthesis parameter R2 for the next one frame (step S3) and performs the processing of steps 4 to 80 80 times. Then, the speech synthesis unit 6 performs these series of processes (steps S3 to S12) 500 times (for 500 frames). The voice data is output in step S10 during these processes.

【００３９】即ち上記の例では、Ｃ1 〜Ｃ20を用いた合
成フィルタリングはＦ×Ｐ＝５００×８０＝４０００回
実行される。このとき、Ｃ1 〜Ｃ6 のフィルタリング１
回に要する時間をＴ１、Ｃ7 〜Ｃ10のフィルタリング１
回に要する時間（ステップＳ７，Ｓ８）をＴ２、Ｃ11〜
Ｃ20のフィルタリング１回に要する時間（ステップＳ
９）をＴ３とし、図２のフローチャートに示す一連の処
理のうち、その他の処理に要する時間をＴ４とすると、
発声時間５秒（フレーム周期１０ｍｓのフレーム５００
個分）の音声データを生成するのに必要な音声合成部６
における全処理時間は４０００×（Ｔ１＋Ｔ２＋Ｔ３）
＋Ｔ４となる。That is, in the above example, the synthesis filtering using C1 to C20 is executed F × P = 500 × 80 = 4000 times. At this time, filtering 1 of C1 to C6
Filtering time T1 and C7 to C10
The time required for turning (steps S7 and S8) is T2, C11
Time required for one filtering of C20 (step S
If 9) is T3, and the time required for other processes in the series of processes shown in the flowchart of FIG. 2 is T4,
Speech time 5 seconds (frame 500 with frame period 10 ms)
Voice synthesizing unit 6 required to generate (for each) voice data
Total processing time at 4000 × (T1 + T2 + T3)
It becomes + T4.

【００４０】次に、上述と同様の設定条件で次数情報の
示す次数Ｎを「６」とすると、音声素片ファイル４から
抽出される音韻パラメータはＣ0 〜Ｃ6 であり、７×５
００＝３５００個となる。Ｎ＝６のため、音声合成部６
におけるステップＳ７〜Ｓ９の処理は行われない。この
場合の全処理時間は４０００×Ｔ１＋Ｔ４となり、次数
２０の場合と比べて４０００×（Ｔ２＋Ｔ３）だけ短縮
される。Next, if the order N indicated by the order information is set to "6" under the same setting conditions as described above, the phoneme parameters extracted from the speech unit file 4 are C0 to C6, 7 × 5.
00 = 3500. Since N = 6, the voice synthesis unit 6
The processes of steps S7 to S9 in step S7 are not performed. The total processing time in this case is 4000 × T1 + T4, which is shortened by 4000 × (T2 + T3) as compared with the case of the order 20.

【００４１】また、ケプストラムパラメータは一般に次
数が高いほど周波数のスペクトル包絡特性が良くなると
いう性質があり、低いほどスペクトルの包絡線がなまる
傾向にある。即ち、次数が高いほど品質の高い合成音声
が生成され、逆に次数が低いと品質の低い合成音声が生
成されるために、次数を選択することにより品質の異な
った合成音声を生成できる。例えば、合成音声を長時間
聞く場合には低い次数を選択すればよい。The cepstrum parameter generally has the property that the higher the order, the better the frequency spectrum envelope characteristic, and the lower the cepstrum parameter, the more the spectrum envelope curve becomes blunt. That is, the higher the order is, the higher the quality of the synthesized speech is generated, and the lower the order is, the lower the quality of the synthesized speech is generated. Therefore, by selecting the degree, the synthesized speech having the different quality can be generated. For example, when listening to synthetic speech for a long time, a low order may be selected.

【００４２】以上、説明してきたように上述の処理機能
を備えた本実施例装置によれば、音韻パラメータの次数
に応じたフィルタリングを実行することによって、合成
フィルタリングにおける計算量の増減が可能となる。ま
た、次数を変えることによって合成音声の品質を変える
ことが可能である。As described above, according to the apparatus of this embodiment having the above-described processing function, the amount of calculation in the synthesis filtering can be increased or decreased by executing the filtering according to the order of the phoneme parameter. .. Further, it is possible to change the quality of synthesized speech by changing the order.

【００４３】なお、本発明は上述した実施例に限定され
るものではない。即ち、実施例では、入力部１から入力
される次数情報により予め定められたケプストラムパラ
メータの３種の次数のうちの１つが直接指定される場合
について説明したが、「１，２，３」あるいは「Ａ，
Ｂ，Ｃ」等の合成音声の品質を表す情報として入力し、
装置内部で音韻パラメータの次数と対応付けても構わな
い。また、指定できる次数も３種に限定する必要はな
い。更に、ＣＰＵの能力、あるいは他のタスク（音声合
成処理用タスク以外のタスク）のＣＰＵ使用率に応じ
て、次数を自動設定するようにしてもよい、例えばＣＰ
Ｕの能力が低い場合、あるいは他のタスクのＣＰＵ使用
率が高い場合ほど、低い次数を設定するようにしてもよ
い。The present invention is not limited to the above embodiment. That is, in the embodiment, the case has been described in which one of the three orders of the cepstrum parameter determined in advance by the order information input from the input unit 1 is directly specified, but "1, 2, 3" or "A,
"B, C" etc. as the information indicating the quality of the synthesized voice,
It may be associated with the order of the phoneme parameter inside the device. Further, it is not necessary to limit the order that can be specified to three. Furthermore, the degree may be automatically set according to the CPU capacity or the CPU usage rate of another task (task other than the voice synthesis processing task), for example, CP.
The lower order may be set when the U capability is lower or the CPU usage rate of other tasks is higher.

【００４４】また、実施例では、音韻パラメータとして
ケプストラムパラメータを用いたが、他の音韻パラメー
タ、例えばＬＳＰパラメータ等を使用してもよい。その
場合には、分析次数に対応した音声素片ファイルが複数
必要になる。Further, in the embodiment, the cepstrum parameter is used as the phoneme parameter, but other phoneme parameters such as LSP parameter may be used. In that case, a plurality of speech unit files corresponding to the analysis order are required.

【００４５】更に実施例では、合成パラメータの生成、
音源の生成、合成フィルタリング等がソフトウェア処理
によって行われるシステムに実施した場合について説明
したが、これらの処理が専用のハードウェアで行われる
システムであってもよく、次数を変えることによって合
成音声の品質を変えることができる。要するに本発明は
その要旨を逸脱しない範囲で種々変形して実施すること
ができる。Further, in the embodiment, the generation of synthetic parameters,
Although the case where the sound source generation, the synthesis filtering, and the like are performed in a system that is performed by software processing has been described, a system in which these processes are performed by dedicated hardware may be used, and the quality of the synthesized speech may be changed by changing the order. Can be changed. In short, the present invention can be variously modified and implemented without departing from the scope of the invention.

【００４６】[0046]

【発明の効果】以上説明したように本発明によれば、音
韻パラメータの次数を変えることによって合成フィルタ
リングの計算量が増減できるため、合成フィルタリング
を含む音声合成処理をＣＰＵの特定のタスク処理によっ
て行うシステムでは、ＣＰＵの能力、あるいは他のタス
クのＣＰＵ使用率に応じて次数を設定すればリアルタイ
ムに音声合成を行うことができる。また、音韻パラメー
タの次数を変えることによって、合成音の品質を任意に
変えることもできる等の実用上多大なる効果が奏せられ
る。As described above, according to the present invention, since the calculation amount of the synthesis filtering can be increased or decreased by changing the order of the phoneme parameter, the voice synthesis process including the synthesis filtering is performed by the specific task process of the CPU. In the system, voice synthesis can be performed in real time by setting the order according to the CPU capacity or the CPU usage rate of other tasks. Further, by changing the order of the phonological parameters, the quality of the synthesized speech can be arbitrarily changed, which has a great practical effect.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声合成装置の概略構
成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a voice synthesis device according to an embodiment of the present invention.

【図２】同実施例における音声合成部６の処理の流れを
説明するためのフローチャート。FIG. 2 is a flowchart for explaining the flow of processing of a voice synthesis unit 6 in the same embodiment.

[Explanation of symbols]

１…入力部、２…単語辞書、３…言語処理部、４…音声
素片ファイル、５…合成パラメータ生成部、６…音声合
成部、７…スピーカ。1 ... Input unit, 2 ... Word dictionary, 3 ... Language processing unit, 4 ... Speech unit file, 5 ... Synthesis parameter generation unit, 6 ... Speech synthesis unit, 7 ... Speaker.

Claims

[Claims]

1. A voice synthesizing method for generating a corresponding phonological parameter according to a phonological sequence, a prosodic parameter according to prosodic information, and synthesizing a voice according to the phonological parameter and the prosodic parameter. A voice characterized in that information indicating quality is input, and based on the generated phoneme parameter and prosody parameter, synthesis filtering of an order according to the input information is executed to generate a synthesized voice. Synthesis method.

2. A voice synthesizing apparatus for generating a corresponding phonological parameter according to a phonological sequence, a prosodic parameter according to prosodic information, and synthesizing a voice according to these phonological parameters and prosodic parameters. Means for inputting information indicating quality, and speech synthesis for generating synthetic speech by executing synthesis filtering of an order according to the information input by the input means, based on the generated phoneme parameters and prosody parameters. A speech synthesizer comprising: