JPH09325788A

JPH09325788A - Device and method for voice synthesis

Info

Publication number: JPH09325788A
Application number: JP8142832A
Authority: JP
Inventors: Yoshinori Shiga; 芳則志賀; Yoshiyuki Hara; 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-06-05
Filing date: 1996-06-05
Publication date: 1997-12-16

Abstract

PROBLEM TO BE SOLVED: To simply increase the variety of the voice qualities of synthesized voices without recording the uttered voice of an announcer and resegmenting speech elements. SOLUTION: The voice parameters, which are obtained by analyzing the discrete voice signals that are made into specimens by a first specimen period, are accumulated into a memory 11 and the fundamental frequency patterns of the voices are stored in a memory 12. Then, discrete voice signals are synthesized by a synthesis filter processing section 13 from the parameters read from the memory 11 and the patterns read from the memory 12. Then, a D/A converter 14 converts the discrete voice signals into analog voice signals by a second specimen period determined by a voice quality control section 18 in accordance with the voice quality that is switched and specified by a voice quality switching section 17. The analog voice signals are amplified by an amplifier 15 and outputted as voices from a speaker 16.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声をパラメータ
化して蓄積し、これから音声を合成する、あるいは音声
を細分化して蓄積し、その組み合わせによって任意の音
声を合成するのに好適な音声合成装置及び方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus suitable for parameterizing and storing a voice and synthesizing the voice from this, or subdividing and storing the voice and synthesizing an arbitrary voice by a combination thereof. And method.

【０００２】[0002]

【従来の技術】本発明に係る音声合成装置として、音声
をパラメータ化して蓄積し、これから音声を合成する分
析合成装置と、音声を細分化して蓄積し、その組み合わ
せによって任意の音声を合成可能な規則合成装置がある
ことが知られている。以下では、これら分析合成装置と
規則合成装置の従来技術の例を図を参照しながら説明し
ていく。2. Description of the Related Art As a voice synthesizing device according to the present invention, a voice is parameterized and stored, and an analysis and synthesis device for synthesizing the voice from this, and a voice is subdivided and stored, and any voice can be synthesized by the combination. It is known that there is a rule synthesizer. In the following, examples of conventional techniques of the analysis and synthesis apparatus and the rule synthesis apparatus will be described with reference to the drawings.

【０００３】図２３は従来の分析合成装置の構成を示し
た図である。FIG. 23 is a diagram showing the configuration of a conventional analysis and synthesis apparatus.

【０００４】図２３において、メモリ３１１には、音声
を分析して得られるパラメータフレームの時系列と各フ
レームに対応した音声の有声・無声情報が記憶されてい
る。ここではパラメータとして低次のケプストラム係数
を利用している。In FIG. 23, a memory 311 stores a time series of parameter frames obtained by analyzing the voice and voiced / unvoiced voice information corresponding to each frame. Here, a low-order cepstrum coefficient is used as a parameter.

【０００５】ところで、低次のケプストラム係数は次の
ようにして求めることができる。まず、アナウンサ等が
発声した音声データに、一定幅・一定周期で窓関数（こ
こではハニング窓）をかけ、各窓内の音声波形に対して
フーリエ変換を行い音声の短時間スペクトルを計算す
る。次に、得られた短時間スペクトルのパワーを対数化
して対数パワースペクトルを得た後、対数パワースペク
トルを逆フーリエ変換する。こうして計算されるのがケ
プストラム係数である。By the way, the low-order cepstrum coefficient can be obtained as follows. First, a window function (here, a Hanning window) is applied to voice data uttered by an announcer or the like with a constant width and a constant period, and a Fourier transform is performed on the voice waveform in each window to calculate a short-time spectrum of the voice. Next, after the obtained short-time spectrum power is logarithmically obtained to obtain a logarithmic power spectrum, the logarithmic power spectrum is subjected to inverse Fourier transform. The cepstrum coefficient is calculated in this way.

【０００６】一般に、高次のケプストラム係数は音声の
基本周波数情報を、低次のケプストラム係数は音声のス
ペクトル包絡情報を保持していることが知られている。
このうち、音声のスペクトル包絡を表わす低次ケプスト
ラムのみがメモリ３１１に蓄えられている。It is generally known that a high-order cepstrum coefficient holds fundamental frequency information of a voice and a low-order cepstrum coefficient holds spectral envelope information of a voice.
Of these, only the low-order cepstrum representing the spectrum envelope of the voice is stored in the memory 311.

【０００７】一方、メモリ３１２には、所定の基本周波
数抽出方法により同じ音声から得られる、音声の基本周
波数の時系列パターンが記憶されている。基本周波数抽
出方法としては、例えば、Bruce G 等による文献「"An
Integrated Pitch TrackingAlgorithm for Speech Syst
ems",ICASSP 1993 」などの方法が挙げられる。On the other hand, the memory 312 stores a time-series pattern of the fundamental frequency of the voice, which is obtained from the same voice by a predetermined fundamental frequency extraction method. As a fundamental frequency extraction method, for example, the document "" An by Bruce G et al.
Integrated Pitch TrackingAlgorithm for Speech Syst
Examples include methods such as ems ", ICASSP 1993".

【０００８】合成フィルタ処理部３１３は、これら２つ
のメモリ３１１，３１２より前述の各データを読み出
し、有声区間では前記基本周波数の時系列パターンに基
づいた周期パルスを、無声区間ではランダムノイズをそ
れぞれ音源として、音声のパラメータフレーム（ケプス
トラム）の時系列からフィルタ係数を算出し、合成フィ
ルタに与えて所望の音声を合成する。通常、この合成フ
ィルタには、ケプストラム係数を直接フィルタ係数とす
るＬＭＡ（Log Magnitude Approximation ）フィルタを
用いている。ここまでの処理はプログラムによって行わ
れるのが一般的である。The synthesis filter processing unit 313 reads out the above-mentioned respective data from these two memories 311 and 312, and generates a periodic pulse based on the time series pattern of the fundamental frequency in the voiced section and a random noise in the unvoiced section. As, the filter coefficient is calculated from the time series of the parameter frame (cepstral) of the voice and is given to the synthesis filter to synthesize the desired voice. Normally, an LMA (Log Magnitude Approximation) filter having a cepstrum coefficient as a direct filter coefficient is used as the synthesis filter. The processing up to this point is generally performed by a program.

【０００９】さて、以上のようにして合成された音声は
離散信号であることから、合成フィルタ処理部３１３で
は最後に、この離散信号（離散波形）をＤ／Ａ（ディジ
タル／アナログ）変換器３１４に供給する。これを受け
てＤ／Ａ変換器３１４は、離散信号（離散音声信号）を
電気的なアナログ信号（アナログ音声信号）に変換す
る。こうして得られたアナログ信号でアンプ３１５を介
してスピーカ３１６等を駆動することにより聴覚で知覚
できる音声が合成される。Since the speech synthesized as described above is a discrete signal, the synthesis filter processing unit 313 finally converts this discrete signal (discrete waveform) into a D / A (digital / analog) converter 314. Supply to. In response to this, the D / A converter 314 converts the discrete signal (discrete audio signal) into an electrical analog signal (analog audio signal). By driving the speaker 316 and the like via the amplifier 315 with the analog signal obtained in this way, a sound that can be perceived aurally is synthesized.

【００１０】図２４は従来の規則合成装置の構成を示し
た図である。FIG. 24 is a diagram showing the configuration of a conventional rule synthesizing device.

【００１１】図２４の規則合成装置はテキストを音韻と
韻律からなる記号列に変換し、その記号列から音声を生
成する文音声変換（Text-to-speech conversion:以下Ｔ
ＴＳと称する）処理を行う。The rule-synthesizing device shown in FIG. 24 converts text into a symbol string consisting of phonemes and prosody, and generates speech from the symbol string (Text-to-speech conversion: T hereinafter).
(TS) processing is performed.

【００１２】この図２４の規則合成装置におけるＴＴＳ
処理は、大きく分けて言語処理部４１１と音声合成部４
１２の２つ処理部からなり、日本語の規則合成を例にと
ると次のように行われるのが一般的である。The TTS in the rule synthesizer of FIG.
The processing is roughly divided into a language processing unit 411 and a speech synthesis unit 4.
It is composed of two processing units 12 and is generally performed as follows, taking Japanese rule composition as an example.

【００１３】まず言語処理部４１１では、テキストファ
イル４１３から入力されるテキスト（漢字かな混じり
文）に対して形態素解析・構文解析等の言語処理を加
え、形態素への分解、係り受け関係の推定等の処理を行
うと同時に、各形態素に読みとアクセント型を与える。
その後言語処理部４１２では、アクセントに関しては複
合語等のアクセント移動規則を用いて、読み上げの際の
区切りとなる句（以下、アクセント句と称する）毎のア
クセント型を決定する。通常ＴＴＳの言語処理部では、
こうして得られるアクセント句毎の読みとアクセント型
を記号列（以下、音声記号列と称する）として出力でき
るようになっている。First, the language processing unit 411 adds language processing such as morphological analysis and syntactic analysis to text (kanji / kana mixed sentences) input from the text file 413, decomposes into morphemes, estimates dependency relations, etc. At the same time, the reading and accent type are given to each morpheme.
After that, the language processing unit 412 determines an accent type for each phrase (hereinafter, referred to as an accent phrase) which is a delimiter at the time of reading by using an accent movement rule such as a compound word with respect to an accent. Normally, in the language processing section of TTS,
The reading and accent type for each accent phrase thus obtained can be output as a symbol string (hereinafter referred to as a phonetic symbol string).

【００１４】次に音声合成部４１２内では、得られた読
みに含まれる各音韻の継続時間を、その音韻の音韻環境
等をもとに、所定の規則によりの音韻継続時間計算処理
部４１４にて決定する。続いて上記のようにして得られ
る「読み」と「音韻の継続時間」に従って、音韻パラメ
ータ生成処理部４１５が音声素片メモリ４１６から順次
必要な音声素片を読み出し、読み出した音声素片を接続
して、合成すべき音声の特徴パラメータ系列を生成す
る。Next, in the voice synthesizing unit 412, the duration of each phoneme included in the obtained reading is given to the phoneme duration calculating unit 414 according to a predetermined rule based on the phoneme environment of the phoneme. To decide. Then, according to the "reading" and the "phoneme duration" obtained as described above, the phoneme parameter generation processing unit 415 sequentially reads out the necessary speech units from the speech unit memory 416 and connects the read speech units. Then, the characteristic parameter sequence of the voice to be synthesized is generated.

【００１５】ここで音声素片メモリ４１５は、予め作成
された多数の音声素片から構成されている。音声素片
は、アナウンサ等が発声した音声を分析して所定の音声
の特徴パラメータを得た後、所定の合成単位例えば日本
語の音節（子音＋母音：以下ＣＶと称する）単位で、日
本語の音声に含まれる全ての音節を前記特徴パラメータ
から切り出すことにより作成される。Here, the speech unit memory 415 is composed of a large number of speech units created in advance. The speech unit analyzes the voice uttered by an announcer or the like to obtain a predetermined voice characteristic parameter, and then, in a predetermined synthesis unit, for example, a Japanese syllable (consonant + vowel: hereinafter referred to as CV) unit in Japanese. It is created by cutting out all the syllables included in the voice of from the characteristic parameter.

【００１６】音声合成部４１２ではさらに、ピッチ生成
処理部４１７が上記アクセント型をもとにピッチの高低
変化が生じる時点にて点ピッチを設定し、複数設定され
た点ピッチ間を直線補間することによりピッチのアクセ
ント成分を生成し、これにイントネーション成分（通常
は周波数軸上での単調減少直線）を重畳してピッチパタ
ーンを生成する。そして有声区間ではピッチパターンに
基づいた周期パルスを、無声区間ではランダムノイズを
それぞれ音源として、一方音声の特徴パラメータ系列か
らフィルタ係数を算出し、合成フィルタ処理部４１８に
与えて所望の音声を合成する。In the voice synthesizing unit 412, the pitch generation processing unit 417 further sets the point pitch at the time when the pitch changes based on the accent type, and linearly interpolates between the plurality of set point pitches. Generates a pitch accent component and superimposes an intonation component (normally a monotonically decreasing straight line on the frequency axis) on this to generate a pitch pattern. Then, the periodic pulse based on the pitch pattern is used as the sound source in the voiced section, and the random noise is used as the sound source in the unvoiced section, and the filter coefficient is calculated from the feature parameter sequence of one voice and is given to the synthesis filter processing unit 418 to synthesize the desired voice. .

【００１７】ここまでの処理はプログラムによって行わ
れるのが一般的で、したがって合成された音声は離散信
号であるから、音声合成部４１２では最後に、この離散
信号をＤ／Ａ変換器４１９に供給する。これを受けてＤ
／Ａ変換器４１９は、離散信号（離散音声信号）を電気
的なアナログ信号（アナログ音声信号）に変換する。こ
うして得られたアナログ信号でアンプ４２０を介してス
ピーカ４２１等を駆動することにより聴覚で知覚できる
音声が合成できる。Since the processing up to this point is generally performed by a program, and thus the synthesized speech is a discrete signal, the speech synthesis section 412 finally supplies this discrete signal to the D / A converter 419. To do. In response to this, D
The / A converter 419 converts the discrete signal (discrete audio signal) into an electrical analog signal (analog audio signal). By driving the speaker 421 and the like via the amplifier 420 with the analog signal thus obtained, aurally perceptible voice can be synthesized.

【００１８】[0018]

【発明が解決しようとする課題】音声合成装置に関し、
現在上記のような従来技術が存在しているが、この従来
の音声合成装置で合成される音声には次のような問題が
ある。即ち従来の音声合成装置では、合成音声の声の種
類（以下、声質と称する）に制約があり、音声素片ファ
イル作成時のアナウンサの声質か、あるいは音声の規則
合成によりそれが多少劣化した声質でしか合成できな
い。したがって、会話文等を音声合成するに当たって合
成音声の声質を増やそうとした場合、音声合成装置開発
者は新たに異なるアナウンサを雇い、発声を録音して、
音声素片の作成を始めからやり直さなければならない。
このため、アナウンサを雇うための賃金が必要となり、
またアナウンサの発声の収録・音声素片の切り出し等の
ために開発者は多大な労力を要することになる。そし
て、このことが装置開発のコストを増加させることにつ
ながる。[Problems to be Solved by the Invention]
At present, the above-mentioned conventional techniques exist, but the speech synthesized by this conventional speech synthesizer has the following problems. That is, in the conventional voice synthesizer, there is a restriction on the type of voice of the synthesized voice (hereinafter, referred to as voice quality). Can only be synthesized with. Therefore, when trying to increase the voice quality of synthesized speech when synthesizing a spoken sentence, the speech synthesizer developer newly hires a different announcer, records the utterance,
The creation of the speech unit has to be restarted from the beginning.
For this reason, wages are required to hire an announcer,
In addition, the developer requires a great deal of work for recording the voice of the announcer and cutting out the voice unit. And this leads to an increase in the cost of device development.

【００１９】本発明はこのような事情を考慮してなされ
たものであり、その目的は、アナウンサ発声の収録や音
声素片の再切り出しを行うことなく、極めて容易な手段
で、合成音声の声質を増やすことのできる音声合成装置
及び方法を提供することにある。The present invention has been made in consideration of such circumstances, and an object thereof is to record a voice of an announcer and to re-cut a voice segment without using an extremely easy means. It is an object of the present invention to provide a voice synthesizing device and method capable of increasing the number of voices.

【００２０】[0020]

【課題を解決するための手段】本発明の第１の観点に係
る音声合成装置は、第１の標本周期で標本化した離散音
声信号を分析して得られる音声の特徴パラメータを蓄積
する蓄積手段と、この蓄積手段より読み出した音声の特
徴パラメータを入力として離散音声信号を合成する合成
手段と、この合成手段によって合成された離散音声信号
を上記第１の標本周期とは異なる第２の標本周期でアナ
ログ音声信号に変換するディジタル／アナログ変換手段
とを備えたことを特徴とする。A speech synthesizer according to a first aspect of the present invention is storage means for storing characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period. And a synthesizing means for synthesizing the discrete speech signal with the characteristic parameter of the speech read out from the accumulating means as an input, and a second sampling period different from the first sampling period for the discrete speech signal synthesized by the synthesizing means. And a digital / analog conversion means for converting into an analog voice signal.

【００２１】こり他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記ディジタル／
アナログ変換手段では、合成手段によって合成された離
散音声信号が声質選択手段により選択指定された声質に
応じて定められる第２の標本周期でアナログ音声信号に
変換される構成とすることも可能である。In addition to the above, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided,
The analog converting means may be configured such that the discrete voice signal synthesized by the synthesizing means is converted into an analog voice signal at a second sample period determined according to the voice quality selected and designated by the voice quality selecting means. .

【００２２】この発明においては、音声分析時の標本周
期である第１の標本周期とディジタル／アナログ（Ｄ／
Ａ）変換での第２の標本周期とを異ならせることによ
り、音声のスペクトルが周波数対数軸上でシフトするか
ら、同じ特徴パラメータを用いて異なる声質の音声を合
成することができる。In the present invention, the first sampling period, which is the sampling period during voice analysis, and the digital / analog (D /
A) Since the spectrum of the voice is shifted on the frequency logarithmic axis by making the second sampling period in the conversion different, it is possible to synthesize voices of different voice qualities using the same feature parameter.

【００２３】次に、本発明の第２の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号に第
１のフレーム周期で時間窓をかけて分析して得られる音
声の特徴パラメータフレームの時系列から、上記第１の
フレーム周期とは異なる第２のフレーム周期で離散音声
信号を合成する合成手段と、この合成手段によって合成
された離散音声信号を上記第１の標本周期とは異なる第
２の標本周期でアナログ音声信号に変換するディジタル
／アナログ変換手段とを備えたことを特徴とする。ここ
で、上記第２のフレーム周期を、第１のフレーム周期、
第１の標本周期及び当該第１の標本周期とは異なる第２
の標本周期に基づいて定めるとよい。Next, a speech synthesizer according to a second aspect of the present invention is a speech obtained by analyzing a discrete speech signal sampled at a first sampling period by applying a time window at a first frame period and analyzing it. From the time series of the characteristic parameter frames of 1., a synthesizing means for synthesizing a discrete speech signal in a second frame cycle different from the first frame cycle, and a discrete speech signal synthesized by this synthesizing means in the first sample. And a digital / analog conversion means for converting into an analog audio signal at a second sampling period different from the period. Here, the second frame period is the first frame period,
A first sampling period and a second sampling period different from the first sampling period;
It should be determined based on the sampling period of.

【００２４】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記合成手段で
は、声質選択手段により選択指定された声質に応じて定
められる第２のフレーム周期で離散音声信号が合成さ
れ、上記ディジタル／アナログ変換手段では、合成手段
によって合成された離散音声信号が声質選択手段により
選択指定された声質に応じて定められる第２の標本周期
でアナログ音声信号に変換される構成とすることも可能
である。In addition to the above, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and the synthesis means discretely in the second frame period determined according to the voice quality selected and designated by the voice quality selection means. The voice signals are synthesized, and in the digital / analog converting means, the discrete voice signals synthesized by the synthesizing means are transformed into analog voice signals at the second sample period determined according to the voice quality selected and designated by the voice quality selecting means. It is also possible to adopt a configuration.

【００２５】この発明においては、音声分析時の第１の
標本周期とＤ／Ａ変換での第２の標本周期を異ならせる
ことにより、音声のスペクトルを周波数対数軸上でシフ
トさせ、同じ特徴パラメータを用いて異なる声質の音声
を合成することができるが、さらにここで、合成時の第
２のフレーム周期を、第１のフレーム周期、第１の標本
周期及び第２の標本周期に基づいて定めるとか、合成時
の第２のフレーム周期とＤ／Ａ変換時の第２の標本周期
を声質選択手段で選択指定された声質に応じて定めるな
どして、上記第２のフレーム周期を分析時の窓の周期
（第１のフレーム周期）と異ならせることにより、合成
する音声の発話速度を適切に制御できるから、声質を変
化させながら自然な発話速度の音声が合成できる。According to the present invention, the spectrum of voice is shifted on the frequency logarithmic axis by making the first sampling period at the time of speech analysis different from the second sampling period at D / A conversion, and the same characteristic parameter is obtained. Can be used to synthesize voices of different voice qualities, and here, the second frame period during synthesis is determined based on the first frame period, the first sampling period, and the second sampling period. Alternatively, the second frame period during analysis and the second sample period during D / A conversion are determined according to the voice quality selected and designated by the voice quality selecting means. By making it different from the window cycle (first frame cycle), the speech rate of the synthesized speech can be controlled appropriately, so that speech with a natural speech rate can be synthesized while changing the voice quality.

【００２６】次に、本発明の第３の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号を分
析して得られる音声の特徴パラメータを蓄積する特徴パ
ラメータ蓄積手段と、音声の基本周波数パターンを蓄積
するピッチパターン蓄積手段と、このピッチパターン蓄
積手段から読み出した基本周波数パターンを上記第１の
標本周期及び第１の標本周期とは異なる第２の標本周期
に基づいて変調するピッチパターン変調手段と、上記特
徴パラメータ蓄積手段から読み出した特徴パラメータと
上記ピッチパターン変調手段にて変調された基本周波数
パターンとから離散音声信号を合成する合成手段と、こ
の合成手段によって合成された離散音声信号を上記第２
の標本周期でアナログ音声信号に変換するディジタル／
アナログ変換手段とを備えたことを特徴とする。Next, the speech synthesis apparatus according to the third aspect of the present invention comprises a characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at the first sampling period. A pitch pattern accumulating means for accumulating a fundamental frequency pattern of voice and a fundamental frequency pattern read from the pitch pattern accumulating means based on the first sampling period and a second sampling period different from the first sampling period. Pitch pattern modulating means for modulating, synthesizing means for synthesizing a discrete voice signal from the characteristic parameters read from the characteristic parameter accumulating means and the fundamental frequency pattern modulated by the pitch pattern modulating means, and the synthesizing means. The discrete voice signal
Digital / conversion to analog voice signal at sampling period of
And an analog conversion means.

【００２７】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記ピッチパター
ン変調手段では、基本周波数パターンに対して声質選択
手段により選択指定された声質に応じた変調が行われ、
上記ディジタル／アナログ変換手段では、合成手段によ
って合成された離散音声信号が声質選択手段により選択
指定された声質に応じて定められる第２の標本周期でア
ナログ音声信号に変換される構成とすることも可能であ
る。In addition to the above, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and the pitch pattern modulation means modulates the fundamental frequency pattern according to the voice quality selected and designated by the voice quality selection means. Is done
In the digital / analog converting means, the discrete voice signal synthesized by the synthesizing means may be converted into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means. It is possible.

【００２８】この発明においては、音声分析時の標本周
期である第１の標本周期とＤ／Ａ変換の第２の標本周期
を異ならせることにより、音声のスペクトルを周波数対
数軸上でシフトさせ、同じ特徴パラメータを用いて異な
る声質の音声を合成することができるが、さらにここ
で、合成の基本周波数パターンを第１の標本周期及び第
２の標本周期に基づいて定めるとか、声質選択手段によ
り選択指定された声質に応じて第２の標本周期を定め、
選択指定された声質に応じて合成の基本周波数パターン
を定めることで、合成する音声のピッチ（声の高さ）を
適切に制御できるから、声質を変化させながら自然なピ
ッチの音声が合成できる。In the present invention, the spectrum of voice is shifted on the frequency logarithmic axis by making the first sampling period, which is the sampling period at the time of speech analysis, different from the second sampling period of D / A conversion. Although voices having different voice qualities can be synthesized using the same characteristic parameter, the fundamental frequency pattern for synthesis can be determined based on the first sampling period and the second sampling period, or selected by the voice quality selecting means. Determine the second sampling period according to the specified voice quality,
By determining the fundamental frequency pattern for synthesis according to the selected and designated voice quality, the pitch of the voice to be synthesized (voice pitch) can be appropriately controlled, so that a voice with a natural pitch can be synthesized while changing the voice quality.

【００２９】次に、本発明の第４の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号から
作成した音声素片を、与えられた音韻情報に基づいて選
択し接続することによって離散音声信号を合成する音声
合成手段と、この音声合成手段によって合成された離散
音声信号を上記第１の標本周期とは異なる第２の標本周
期でアナログ音声信号に変換するディジタル／アナログ
変換手段とを備えたことを特徴とする。Next, the speech synthesizer according to the fourth aspect of the present invention selects a speech unit created from the discrete speech signal sampled at the first sampling period based on the given phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting it, and a digital / digital converter for converting the discrete voice signal synthesized by the voice synthesizing unit into an analog voice signal at a second sampling period different from the first sampling period. And an analog conversion means.

【００３０】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記ディジタル／
アナログ変換手段では、合成手段によって合成された離
散音声信号が声質選択手段により選択指定された声質に
応じて定められる第２の標本周期でアナログ音声信号に
変換される構成とすることも可能である。In addition, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and the digital / digital
The analog converting means may be configured such that the discrete voice signal synthesized by the synthesizing means is converted into an analog voice signal at a second sample period determined according to the voice quality selected and designated by the voice quality selecting means. .

【００３１】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルが周波数対数軸上でシ
フトするから、同じ音声素片を用いて異なる声質の音声
を合成することができる。In the present invention, since the first sampling period at the time of creating a speech unit and the second sampling period of D / A conversion are made different, the spectrum of the voice shifts on the frequency logarithmic axis. It is possible to synthesize voices having different voice qualities by using the voice unit.

【００３２】次に、本発明の第５の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号から
作成した音声素片を、与えられた音韻情報に基づいて選
択し、この選択した音声素片を、合成される音声の発話
速度または発話時間に関係する発話速度パラメータに応
じて接続することによって離散音声信号を合成する音声
合成手段であって、使用する発話速度パラメータを上記
第１の標本周期及び第１の標本周期とは異なる第２の標
本周期に基づいて決定する音声合成手段と、この音声合
成手段によって合成された離散音声信号を上記第２の標
本周期でアナログ音声信号に変換するディジタル／アナ
ログ変換手段とを備えたことを特徴とする。Next, the speech synthesizer according to the fifth aspect of the present invention selects a speech unit created from the discrete speech signal sampled at the first sampling period based on the given phoneme information. A speech synthesizing means for synthesizing a discrete speech signal by connecting the selected speech unit according to a speech rate parameter relating to a speech rate or a speech time of a speech to be synthesized. Based on the first sampling period and a second sampling period different from the first sampling period, and a discrete speech signal synthesized by the speech synthesizing unit in the second sampling period. And a digital / analog converting means for converting into an analog audio signal.

【００３３】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、使用する発話速度パラメータが声質選択手段によ
り選択指定された声質に応じて決定され、上記ディジタ
ル／アナログ変換手段では、音声合成手段によって合成
された離散音声信号が声質選択手段により選択指定され
た声質に応じて定められる第２の標本周期でアナログ音
声信号に変換される構成とすることも可能である。In addition, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided. In the voice synthesis means, the speech speed parameter to be used is determined according to the voice quality selected and designated by the voice quality selection means. In the digital / analog converting means, the discrete voice signal synthesized by the voice synthesizing means is converted into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means. It is also possible.

【００３４】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルを周波数対数軸上でシ
フトさせ、同じ特徴パラメータを用いて異なる声質の音
声を合成することができるが、さらにここで、合成時に
使用される発話速度パラメータを、第１の標本周期及び
第２の標本周期に応じて定めるとか、声質選択手段によ
り選択指定された声質に応じて第２の標本周期を定め、
選択指定された声質に応じて発話速度パラメータを定め
ることで、合成する音声の発話速度を適切に制御できる
から、声質を変化させながら自然な発話速度で音声が合
成できる。In the present invention, the first sampling period at the time of creating a speech unit and the second sampling period of D / A conversion are made different to shift the spectrum of the speech on the frequency logarithmic axis, and the same characteristic is obtained. Although voices having different voice qualities can be synthesized using the parameters, here, the speech rate parameter used at the time of synthesis is determined according to the first sampling period and the second sampling period, or the voice quality selecting means is used. The second sampling period is determined according to the voice quality selected and designated by
By determining the speech rate parameter according to the selected and designated voice quality, the speech rate of the synthesized speech can be controlled appropriately, so that the speech can be synthesized at a natural speech rate while changing the voice quality.

【００３５】次に、本発明の第６の観点に係る音声合成
装置は、与えられた音韻情報に含まれる各音韻の継続時
間を決定する一方、第１の標本周期で標本化した離散音
声信号から作成した音声素片を上記音韻情報に基づいて
選択し、上記決定した各音韻の継続時間に基づいて上記
選択した音声素片を接続することによって離散音声信号
を合成する音声合成手段であって、使用する音韻継続時
間を上記第１の標本周期及び第１の標本周期とは異なる
第２の標本周期に基づいて決定する音声合成手段と、こ
の音声合成手段によって合成された離散音声信号をアナ
ログ音声信号に変換するディジタル／アナログ変換手段
とを備えたことを特徴とする。Next, the speech synthesizer according to the sixth aspect of the present invention determines the duration of each phoneme included in the given phoneme information, while the discrete speech signal sampled at the first sampling period. A speech synthesizing unit that synthesizes a discrete speech signal by selecting a speech unit created from the above based on the phoneme information, and connecting the selected speech units based on the duration of each phoneme determined above. , A voice synthesizing unit that determines the phoneme duration to be used based on the first sampling period and a second sampling period different from the first sampling period, and an analog discrete voice signal synthesized by the speech synthesizing unit. And a digital / analog conversion means for converting into a voice signal.

【００３６】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、使用する音韻継続時間が声質選択手段により選択
指定された声質に応じて決定され、上記ディジタル／ア
ナログ変換手段では、音声合成手段によって合成された
離散音声信号が声質選択手段により選択指定された声質
に応じて定められる第２の標本周期でアナログ音声信号
に変換される構成とすることも可能である。In addition to this, voice quality selection means for selecting and designating the voice quality of the voice to be synthesized is further provided, and in the voice synthesis means, the phoneme duration to be used is determined according to the voice quality selected and designated by the voice quality selection means. In the digital / analog converting means, the discrete voice signal synthesized by the voice synthesizing means is converted into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means. It is also possible.

【００３７】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルを周波数対数軸上でシ
フトさせ、同じ特徴パラメータを用いて異なる声質の音
声を合成することができるが、さらにここで、合成時の
音韻継続時間を第１の標本周期及び第２の標本周期に応
じて定めるとか、声質選択手段により選択指定された声
質に応じて第２の標本周期を定め、選択指定された声質
に応じて音韻継続時間を定めることで、合成する音声の
発話速度を適切に制御できるから、声質を変化させなが
ら自然な発話速度で音声が合成できる。According to the present invention, the first sampling period at the time of creating a speech segment is made different from the second sampling period of D / A conversion to shift the speech spectrum on the frequency logarithmic axis, and the same characteristics are obtained. Voices of different voice qualities can be synthesized using the parameters, and here, the phoneme duration at the time of synthesis is determined according to the first sampling period and the second sampling period, or selected and designated by the voice quality selection means. By determining the second sampling period according to the voice quality that is generated and determining the phoneme duration according to the selected and specified voice quality, the utterance speed of the synthesized voice can be appropriately controlled, so that the voice quality can be changed naturally. Speech can be synthesized at the speaking rate.

【００３８】次に、本発明の第７の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号から
作成した音声素片を、与えられた音韻情報に基づいて選
択し、この選択した音声素片を時間軸方向に伸縮させな
がら接続することによって離散音声信号を合成する音声
合成手段と、この音声合成手段によって合成された離散
音声信号を上記第１の標本周期とは異なる第２の標本周
期でアナログ音声信号に変換するディジタル／アナログ
変換手段とを備えたことを特徴とする。ここで、音声合
成手段における音声素片接続時の音声素片に対する時間
軸方向への伸縮の度合いを、上記第１の標本周期及び第
２の標本周期に基づいて定めるとよい。Next, the speech synthesizer according to the seventh aspect of the present invention selects a speech unit created from the discrete speech signal sampled at the first sampling period based on the given phoneme information. The speech synthesis means for synthesizing the discrete speech signal by connecting the selected speech segments while expanding and contracting in the time axis direction, and the discrete sampling speech signal synthesized by the speech synthesizing means as the first sampling period. And a digital / analog conversion means for converting into an analog voice signal at a different second sampling period. Here, the degree of expansion / contraction in the time axis direction with respect to the voice unit when the voice unit is connected in the voice synthesizing unit may be determined based on the first sampling period and the second sampling period.

【００３９】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、選択した音声素片を声質選択手段により選択指定
された声質に応じて定められる度合いで時間軸方向に伸
縮させながら接続するように構成され、上記ディジタル
／アナログ変換手段では、音声合成手段によって合成さ
れた離散音声信号が声質選択手段により選択指定された
声質に応じて定められる第２の標本周期でアナログ音声
信号に変換される構成とすることも可能である。In addition to this, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided. In the voice synthesis means, the selected voice segment is determined according to the voice quality selected and designated by the voice quality selection means. In the digital / analog converting means, the discrete voice signals synthesized by the voice synthesizing means are determined in accordance with the voice quality selected and designated by the voice quality selecting means. It is also possible to adopt a configuration in which it is converted into an analog audio signal at a sampling period of 2.

【００４０】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルを周波数対数軸上でシ
フトさせ、同じ特徴パラメータを用いて異なる声質の音
声を合成することができるが、さらにここで、合成時に
音声素片を第１の標本周期及び第２の標本周期に基づい
て定められる度合いで時間軸方向に伸縮させながら接続
するとか、声質選択手段により選択指定された声質に応
じて第２の標本周期を定め、選択指定された声質に応じ
て定められる度合いで音声素片を時間軸方向に伸縮させ
ながら接続することで、合成する音声の発話速度を適切
に制御でき、なおかつ、合成する音声のスペクトル過渡
部分の時間変化も適切に制御できるから、声質を変化さ
せながら自然な発話速度で明瞭な音声が合成できる。In the present invention, the first sampling period at the time of creating a speech unit and the second sampling period of D / A conversion are made different to shift the spectrum of the speech on the frequency logarithmic axis, and the same characteristic is obtained. Although it is possible to synthesize voices having different voice qualities by using the parameters, the speech unit is expanded or contracted in the time axis direction at a degree determined based on the first sampling period and the second sampling period. The second sampling period is determined according to the voice quality selected and designated by the voice quality selection means, and the voice unit is connected while expanding and contracting in the time axis direction to the extent determined according to the voice quality selected and designated. By doing so, it is possible to properly control the speech rate of the synthesized speech and also to properly control the temporal change of the spectrum transient part of the synthesized speech, so that the natural speech can be made while changing the voice quality. Clear voice speed can be synthesized.

【００４１】次に、本発明の第８の観点に係る音声合成
装置は、第１の標本周期で標本化した離散音声信号に第
１のフレーム周期の時間窓をかけて分析して得られる音
声の特徴パラメータフレームの時系列から所定の合成単
位で切り出した音声素片を複数蓄積する音声素片蓄積手
段と、この音声素片蓄積手段から入力音韻情報に基づい
て上記音声素片を選択し接続して合成パラメータフレー
ムの時系列を生成する合成パラメータフレーム時系列生
成手段と、この合成パラメータフレーム時系列生成手段
により生成された合成パラメータフレームの時系列から
上記第１のフレーム周期とは異なる第２のフレーム周期
で離散音声信号を合成する合成手段と、この合成手段に
よって合成された離散音声信号を上記第１の標本周期と
は異なる第２の標本周期でアナログ音声信号に変換する
ディジタル／アナログ変換手段とを備えたことを特徴と
する。ここで、第２のフレーム周期を、第１のフレーム
周期、第１の標本周期及び当該第１の標本周期とは異な
る第２の標本周期に基づいて定めるとよい。Next, the speech synthesis apparatus according to the eighth aspect of the present invention is a speech obtained by analyzing a discrete speech signal sampled at the first sampling period by applying a time window of the first frame period to the analysis. And a speech unit accumulating unit for accumulating a plurality of speech units cut out in a predetermined synthesis unit from the time series of the characteristic parameter frame, and selecting and connecting the speech units from the speech unit accumulating unit based on the input phoneme information. And a second parameter different from the first frame period based on the time series of the combined parameter frames generated by the combined parameter frame time series generation means and the combined parameter frame time series generation means for generating the time series of the combined parameter frames. A synthesizing means for synthesizing the discrete speech signal in the frame period of, and a discrete speech signal synthesized by the synthesizing means for the second sample different from the first sampling period. Characterized by comprising a digital / analog converting means for converting an analog audio signal in the cycle. Here, the second frame period may be determined based on the first frame period, the first sampling period, and the second sampling period different from the first sampling period.

【００４２】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記合成手段で
は、声質選択手段により選択指定された声質に応じて定
められる第２のフレーム周期で離散音声信号が合成さ
れ、上記ディジタル／アナログ変換手段では、合成手段
によって合成された離散音声信号が声質選択手段により
選択指定された声質に応じて定められる第２の標本周期
でアナログ音声信号に変換される構成とすることも可能
である。In addition, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and the synthesis means discretely in the second frame cycle determined according to the voice quality selected and designated by the voice quality selection means. The voice signals are synthesized, and in the digital / analog converting means, the discrete voice signals synthesized by the synthesizing means are transformed into analog voice signals at the second sample period determined according to the voice quality selected and designated by the voice quality selecting means. It is also possible to adopt a configuration.

【００４３】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルを周波数対数軸上でシ
フトさせ、同じ特徴パラメータを用いて異なる声質の音
声を合成することができるが、さらにここで、合成時の
第２のフレーム周期を、第１のフレーム周期、第１の標
本周期及び第２の標本周期に基づいて定めるとか、合成
時の第２のフレーム周期とＤ／Ａ変換時の第２の標本周
期を声質選択手段で選択指定された声質に応じて定める
などして、上記第２のフレーム周期を分析時の窓の周期
（第１のフレーム周期）と異ならせることにより、合成
する音声の発話速度を適切に制御できるから、声質を変
化させながら自然な発話速度の音声が合成できる。According to the present invention, the first sampling period at the time of creating a speech unit is made different from the second sampling period of D / A conversion to shift the spectrum of the speech on the frequency logarithmic axis, and the same characteristic is obtained. The parameters can be used to synthesize voices of different voice qualities, and further, here, the second frame period at the time of synthesis is based on the first frame period, the first sampling period, and the second sampling period. When the second frame period is analyzed, the second frame period during synthesis and the second sample period during D / A conversion are determined according to the voice quality selected and designated by the voice quality selection means. By making it different from the window period (first frame period), the speech rate of the speech to be synthesized can be controlled appropriately, so that speech with a natural speech rate can be synthesized while changing the voice quality.

【００４４】次に、本発明の第９の観点に係る音声合成
装置は、韻律情報と音韻情報を入力として第１の標本周
期で標本化した離散音声信号から作成した音声素片を複
数蓄積する音声素片蓄積手段と、上記第１の標本周期及
び第１の標本周期とは異なる第２の標本周期に基づいて
上記韻律情報から音声の基本周波数パターンを生成する
ピッチパターン生成手段と、上記音韻情報に基づいて上
記音声素片蓄積手段から音声素片を選択的に読み出し接
続することによって音声の音韻パラメータを生成する音
韻パラメータ生成手段と、この音韻パラメータ生成手段
によって生成された音韻パラメータと上記ピッチパター
ン生成手段によって生成された基本周波数パターンから
離散音声信号を合成する合成手段と、この合成手段によ
って合成された離散音声信号を上記第２の標本周期でア
ナログ音声信号に変換するディジタル／アナログ変換手
段とを備えたことを特徴とする。Next, the speech synthesizer according to the ninth aspect of the present invention stores a plurality of speech units created from discrete speech signals sampled at the first sampling period with prosody information and phonological information as inputs. A speech unit accumulating means, a pitch pattern generating means for generating a fundamental frequency pattern of speech from the prosody information based on the first sampling period and a second sampling period different from the first sampling period, and the phoneme. Phonological parameter generation means for generating phonological parameter of the voice by selectively reading and connecting the speech element from the speech element accumulating means based on information, the phonological parameter generated by the phonological parameter generation means, and the pitch. A synthesizing means for synthesizing the discrete speech signal from the fundamental frequency pattern generated by the pattern generating means, and a separation means synthesized by this synthesizing means. Characterized in that the audio signal and a digital / analog converting means for converting an analog audio signal at the second sampling period.

【００４５】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記ピッチパター
ン生成手段では、声質選択手段により選択指定された声
質に応じて韻律情報から音声の基本周波数パターンが生
成され、上記ディジタル／アナログ変換手段では、合成
手段によって合成された離散音声信号が声質選択手段に
より選択指定された声質に応じて定められる第２の標本
周期でアナログ音声信号に変換される構成とすることも
可能である。In addition to this, voice quality selection means for selecting and designating the voice quality of the voice to be synthesized is further provided, and the pitch pattern generating means uses the prosody information to select the fundamental frequency of the voice in accordance with the voice quality selected and designated by the voice quality selection means. A pattern is generated, and in the digital / analog conversion means, the discrete voice signal synthesized by the synthesis means is converted into an analog voice signal at the second sampling period determined according to the voice quality selected and designated by the voice quality selection means. It can also be configured.

【００４６】この発明においては、音声素片作成時の第
１の標本周期とＤ／Ａ変換の第２の標本周期を異ならせ
ることにより、音声のスペクトルを周波数対数軸上でシ
フトさせ、同じ特徴パラメータを用いて異なる声質の音
声を合成することができるが、さらにここで、合成の基
本周波数パターンを第１の標本周期及び第２の標本周期
に基づいて定めるとか、基本周波数パターンとＤ／Ａ変
換時の第２の標本周期を声質選択手段で選択指定された
声質に応じて定めることにより、合成する音声のピッチ
（声の高さ）を適切に制御できるから、声質を変化させ
ながら自然なピッチの音声が合成できる。According to the present invention, the first sampling period at the time of creating a speech segment is made different from the second sampling period of D / A conversion, whereby the speech spectrum is shifted on the frequency logarithmic axis, and the same characteristic is obtained. Although voices having different voice qualities can be synthesized by using the parameters, here, the fundamental frequency pattern for synthesis is determined based on the first sampling period and the second sampling period, or the fundamental frequency pattern and D / A are used. By determining the second sampling period at the time of conversion according to the voice quality selected and designated by the voice quality selecting means, the pitch (voice pitch) of the synthesized voice can be appropriately controlled, and thus the voice quality can be changed naturally. Pitch voice can be synthesized.

【００４７】次に、本発明の第１０の観点に係る音声合
成装置は、前記第１の観点に係る音声合成装置に対応す
るもので、ディジタル／アナログ変換手段でのＤ／Ａ変
換の対象となる離散音声信号の標本周期を第１の標本周
期とは異なる第２の標本周期に変換する標本周期変換手
段を備えると共に、当該ディジタル／アナログ変換手段
での離散音声信号からのアナログ音声信号への変換が上
記第２の標本周期とは異なる第３の周期で行われる構成
としたことを特徴とする。Next, a voice synthesizing apparatus according to a tenth aspect of the present invention corresponds to the voice synthesizing apparatus according to the first aspect, and is a target of D / A conversion by digital / analog conversion means. And a sampling period converting means for converting the sampling period of the discrete speech signal into a second sampling period different from the first sampling period, and the digital-to-analog converting means converts the discrete speech signal into an analog speech signal. It is characterized in that the conversion is performed in a third cycle different from the second sampling cycle.

【００４８】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記標本周期変換
手段では、合成手段によって合成された離散音声信号の
標本周期が声質選択手段により選択指定された声質に応
じて定められる第２の標本周期に変換される構成とする
ことも可能である。In addition, voice quality selection means for selecting and designating the voice quality of the voice to be synthesized is further provided, and in the sampling period conversion means, the sampling period of the discrete speech signal synthesized by the synthesis means is selected and designated by the voice quality selection means. It is also possible to adopt a configuration in which conversion is performed into the second sampling period that is determined according to the voice quality that is generated.

【００４９】この発明においては、音声分析時の第１の
標本周期とは異なる第２の標本周期に合成音声（離散音
声信号）を変換した後、第２の標本周期とは異なる第３
の標本周期でＤ／Ａ変換を行うことにより、音声のスペ
クトルが周波数対数軸上でシフトするから、同じ特徴パ
ラメータを用いて異なる声質の音声を合成することがで
きる。なお、元の声質の音声を合成するには、第１の標
本周期＝第２の標本周期＝第３の標本周期とすればよ
い。According to the present invention, after the synthesized speech (discrete speech signal) is converted into the second sampling period different from the first sampling period at the time of speech analysis, the third sampling period different from the second sampling period is obtained.
By performing D / A conversion at the sampling period of, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices of different voice qualities can be synthesized using the same feature parameter. In order to synthesize the voice with the original voice quality, the first sampling period = the second sampling period = the third sampling period.

【００５０】次に、本発明の第１１の観点に係る音声合
成装置は、前記第２の観点に係る音声合成装置に対応す
るもので、ディジタル／アナログ変換手段でのＤ／Ａ変
換の対象となる離散音声信号の標本周期を第１の標本周
期とは異なる第２の標本周期に変換する標本周期変換手
段を備えると共に、当該ディジタル／アナログ変換手段
での離散音声信号からのアナログ音声信号への変換が上
記第２の標本周期とは異なる第３の周期で行われる構成
としたことを特徴とする。ここで、離散音声信号を合成
する第２のフレーム周期を、第１のフレーム周期、第２
の標本周期及び第３の標本周期に基づいて定めるとよ
い。Next, a speech synthesizer according to an eleventh aspect of the present invention corresponds to the speech synthesizer according to the second aspect, and is an object of D / A conversion by digital / analog conversion means. And a sampling period converting means for converting the sampling period of the discrete speech signal into a second sampling period different from the first sampling period, and the digital-to-analog converting means converts the discrete speech signal into an analog speech signal. It is characterized in that the conversion is performed in a third cycle different from the second sampling cycle. Here, the second frame period for synthesizing the discrete speech signal is
It may be determined based on the sampling period of 3 and the third sampling period.

【００５１】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、合成手段では、声
質選択手段により選択指定された声質に応じて定められ
る第２のフレーム周期で離散音声信号が合成され、標本
変換手段では、合成手段によって合成された離散音声信
号が声質選択手段により選択指定された声質に応じて定
められる第２の標本周期に変換される構成とすることも
可能である。In addition to the above, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and in the synthesis means, the discrete voice is generated in the second frame period determined according to the voice quality selected and designated by the voice quality selection means. The signals may be combined, and the sample conversion unit may be configured to convert the discrete voice signal combined by the combination unit into a second sample period determined according to the voice quality selected and designated by the voice quality selection unit. is there.

【００５２】この発明においては、音声分析時の第１の
標本周期とは異なる第２の標本周期に合成音声を変換し
た後、第２の標本周期とは異なる第３の標本周期でＤ／
Ａ変換を行うことにより、音声のスペクトルを周波数対
数軸上でシフトさせた効果がでるから、同じ特徴パラメ
ータを用いて異なる声質の音声を合成することができる
が、さらにここで、合成時の第２のフレーム周期を、第
１のフレーム周期、第２の標本周期及び第３の標本周期
に基づいて定めるとか、第２の標本周期と合成時の第２
のフレーム周期を声質選択手段で選択指定された声質に
応じて定めるなどして、上記第２のフレーム周期を分析
時の窓の周期（第１のフレーム周期）と異ならせること
により、合成する音声の発話速度を適切に制御できるか
ら、声質を変化させながら自然な発話速度の音声が合成
できる。According to the present invention, after converting the synthesized speech into the second sampling period different from the first sampling period at the time of speech analysis, the D / D is converted into the third sampling period different from the second sampling period.
By performing the A conversion, the effect of shifting the spectrum of the voice on the frequency logarithmic axis can be obtained. Therefore, it is possible to synthesize voices having different voice qualities using the same feature parameter. The second frame cycle is determined based on the first frame cycle, the second sampling cycle, and the third sampling cycle, or the second sampling cycle and the second sampling cycle are combined.
Of the second frame period different from the window period (first frame period) at the time of analysis by, for example, determining the frame period of the above according to the voice quality selected and designated by the voice quality selecting means. Since it is possible to appropriately control the speech rate of, it is possible to synthesize a voice with a natural speech rate while changing the voice quality.

【００５３】次に、本発明の第１２の観点に係る音声合
成装置は、前記第３の観点に係る音声合成装置に対応す
るもので、ピッチパターン蓄積手段から読み出した基本
周波数パターンを第１の標本周期、第１の標本周期とは
異なる第２の標本周期及び第２の標本周期とは異なる第
３の標本周期に基づいて変調するピッチパターン変調手
段と、ディジタル／アナログ変換手段でのＤ／Ａ変換の
対象となる離散音声信号の標本周期を上記第２の標本周
期に変換する標本周期変換手段とを備えると共に、当該
ディジタル／アナログ変換手段での離散音声信号からの
アナログ音声信号への変換が上記第３の周期で行われる
構成としたことを特徴とする。Next, the speech synthesizer according to the twelfth aspect of the present invention corresponds to the speech synthesizer according to the third aspect, in which the fundamental frequency pattern read from the pitch pattern accumulating means is the first frequency pattern. Pitch pattern modulating means for modulating based on a sampling period, a second sampling period different from the first sampling period and a third sampling period different from the second sampling period, and D / in the digital / analog converting means A sampling period conversion means for converting the sampling period of the discrete speech signal to be A-converted into the second sampling period, and conversion from the discrete speech signal to the analog speech signal in the digital / analog conversion means. Is performed in the above-mentioned third cycle.

【００５４】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、ピッチパターン変
調手段では、ピッチパターン蓄積手段から読み出した基
本周波数パターンを対象に、声質選択手段により選択指
定された声質に応じた変調が行われ、標本周期変換手段
では、ディジタル／アナログ変換手段でのＤ／Ａ変換の
対象となる離散音声信号の標本周期が声質選択手段によ
り選択指定された声質に応じて定められる第２の標本周
期に変換される構成とすることも可能である。In addition to this, voice quality selection means for selecting and specifying the voice quality of the voice to be synthesized is further provided, and the pitch pattern modulation means selects and specifies the fundamental frequency pattern read from the pitch pattern storage means by the voice quality selection means. Modulation is performed according to the selected voice quality, and the sampling period conversion means determines the sampling period of the discrete voice signal to be D / A converted by the digital / analog conversion means according to the voice quality selected and designated by the voice quality selection means. It is also possible to adopt a configuration in which the second sampling period defined by the above is converted.

【００５５】この発明においては、音声分析時の第１の
標本周期とは異なる第２の標本周期に合成音声を変換し
た後、第２の標本周期とは異なる第３の標本周期でＤ／
Ａ変換を行うことにより、音声のスペクトルが周波数対
数軸上でシフトするため、同じ特徴パラメータを用いて
異なる声質の音声を合成することができるが、さらにこ
こで、合成の基本周波数パターンを第１の標本周期とは
異なる第２の標本周期及び第２の標本周期とは異なる第
３の標本周期に基づいて定めるとか、第２の標本周期と
合成の基本周波数パターンを声質選択手段で選択指定さ
れた声質に応じて定めることにより、合成する音声のピ
ッチ（声の高さ）を適切に制御できるから、声質を変化
させながら自然なピッチの音声が合成できる。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of speech analysis, the D / s are converted into the third sampling period different from the second sampling period.
By performing A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices of different voice qualities can be synthesized using the same feature parameter. Of the second sampling period different from the second sampling period and a third sampling period different from the second sampling period, or the fundamental frequency pattern of the second sampling period and synthesis is selected and designated by the voice quality selecting means. The pitch of the voice to be synthesized (voice pitch) can be appropriately controlled by setting the voice quality according to the voice quality, so that a voice with a natural pitch can be synthesized while changing the voice quality.

【００５６】次に、本発明の第１３の観点に係る音声合
成装置は、前記第４の観点に係る音声合成装置に対応す
るもので、音声合成手段にて合成した離散音声信号の標
本周期を第１の標本周期とは異なる第２の標本周期に当
該音声合成手段にて変換する構成とすると共に、この標
本周期が変換された離散音声信号をディジタル／アナロ
グ変換手段により上記第２の標本周期とは異なる第３の
周期でアナログ音声信号に変換する構成としたことを特
徴とする。Next, a speech synthesizer according to a thirteenth aspect of the present invention corresponds to the speech synthesizer according to the fourth aspect, in which the sampling period of the discrete speech signal synthesized by the speech synthesizer is The voice synthesizing unit converts the second sample period to a second sample period different from the first sample period, and the digital / analog converting unit converts the discrete voice signal into the second sample period. It is characterized in that it is configured to be converted into an analog audio signal in a third cycle different from.

【００５７】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、合成した離散音声信号の標本周期が声質選択手段
により選択指定された声質に応じて定められる第２の標
本周期に変換される構成とすることも可能である。In addition, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided. In the voice synthesis means, the sampling period of the synthesized discrete voice signal depends on the voice quality selected and designated by the voice quality selection means. It is also possible to adopt a configuration in which the second sampling period defined by the above is converted.

【００５８】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするから、同じ音声素片を用いて異
なる声質の音声を合成することができる。なお、元の声
質の音声を合成するには、第１の標本周期＝第２の標本
周期＝第３の標本周期とすればよい。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of creating the speech unit, D is sampled at the third sampling period different from the second sampling period. By performing the / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same voice unit. In order to synthesize the voice with the original voice quality, the first sampling period = the second sampling period = the third sampling period.

【００５９】次に、本発明の第１４の観点に係る音声合
成装置は、前記第５の観点に係る音声合成装置に対応す
るもので、音声合成手段にて合成した離散音声信号の標
本周期を第１の標本周期とは異なる第２の標本周期に当
該音声合成手段にて変換することが可能で、上記第２の
標本周期への標本周期変換を行う際の合成処理には、標
本周期の変換を行わないときとは異なる値の発話速度パ
ラメータを用いる構成とすると共に、上記音声合成手段
から出力される離散音声信号をディジタル／アナログ変
換手段により上記第２の標本周期とは異なる第３の周期
でアナログ音声信号に変換する構成としたことを特徴と
する。ここで、合成処理で使用する発話速度パラメータ
を第２の標本周期及び第３の標本周期に基づいて定める
とよい。Next, a speech synthesizer according to a fourteenth aspect of the present invention corresponds to the speech synthesizer according to the fifth aspect, in which the sampling period of the discrete speech signal synthesized by the speech synthesizer is It is possible to convert the second sampling period different from the first sampling period by the voice synthesizing means, and the sampling process is performed in the synthesizing process when the sampling period conversion to the second sampling period is performed. A speech rate parameter having a value different from that used when the conversion is not performed is used, and the discrete voice signal output from the voice synthesizer is converted by the digital / analog converter to a third sampling period different from the second sampling period. It is characterized in that it is configured to convert into an analog audio signal at a cycle. Here, the speech rate parameter used in the synthesis process may be determined based on the second sampling period and the third sampling period.

【００６０】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、使用する発話速度パラメータと第２の標本周期が
声質選択手段により選択指定された声質に応じて定めら
れる構成とすることも可能である。In addition to the above, voice quality selecting means for selecting and specifying the voice quality of the voice to be synthesized is further provided, and in the voice synthesizing means, the speech rate parameter to be used and the second sampling period are selected and designated by the voice quality selecting means. It is also possible to adopt a configuration that is determined according to the voice quality.

【００６１】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするため、同じ特徴パラメータを用
いて異なる声質の音声を合成することができるが、さら
にここで、合成時に使用される発話速度パラメータを、
標本周期の変換を行わないとき（第１の標本周期に変換
することと等価）に用いる値と異ならせるとか、第２の
標本周期及び第３の標本周期に応じて定めるとか、声質
選択手段により選択指定された声質に応じて第２の標本
周期を定め、選択指定された声質に応じて発話速度パラ
メータを定めることで、合成する音声の発話速度を適切
に制御できるから、声質を変化させながら自然な発話速
度で音声が合成できる。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of creating the speech unit, D is sampled at the third sampling period different from the second sampling period. By performing the A / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same feature parameter. Parameters
When the sample period is not converted (equivalent to converting to the first sample period), it is different from the value used, or is determined according to the second sample period and the third sample period. By determining the second sampling period according to the selected and designated voice quality and determining the speech rate parameter according to the selected and designated voice quality, it is possible to appropriately control the speech rate of the synthesized voice, and thus while changing the voice quality. Speech can be synthesized at a natural speaking rate.

【００６２】次に、本発明の第１５の観点に係る音声合
成装置は、前記第６の観点に係る音声合成装置に対応す
るもので、音声合成手段にて合成した離散音声信号の標
本周期を第１の標本周期とは異なる第２の標本周期に当
該音声合成手段にて変換することが可能で、上記第２の
標本周期への標本周期変換を行う際の合成処理では、標
本周期の変換を行わないときとは異なる音韻継続時間と
なるような音韻継続時間決定を行う構成とすると共に、
上記音声合成手段から出力される離散音声信号をディジ
タル／アナログ変換手段により上記第２の標本周期とは
異なる第３の周期でアナログ音声信号に変換する構成と
したことを特徴とする。ここで、合成処理で使用する音
韻継続時間を第２の標本周期及び第３の標本周期に基づ
いて定めるとよい。Next, a speech synthesizer according to a fifteenth aspect of the present invention corresponds to the speech synthesizer according to the sixth aspect, wherein the sampling period of the discrete speech signal synthesized by the speech synthesizer is It is possible to convert the second sample period different from the first sample period by the voice synthesizing means, and in the synthesizing process at the time of performing the sample period conversion to the second sample period, the conversion of the sample period is performed. The phoneme duration is determined so that the phoneme duration is different from that when not performing
It is characterized in that the discrete voice signal output from the voice synthesizing means is converted into an analog voice signal by a digital / analog converting means in a third cycle different from the second sampling cycle. Here, the phoneme duration used in the synthesis process may be determined based on the second sampling period and the third sampling period.

【００６３】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、使用する音韻継続時間と第２の標本周期が声質選
択手段により選択指定された声質に応じて定められる構
成とすることも可能である。In addition to the above, voice quality selecting means for selecting and specifying the voice quality of the voice to be synthesized is further provided, and in the above-mentioned voice synthesizing means, the phoneme duration to be used and the second sampling period are selected and designated by the voice quality selecting means. It is also possible to adopt a configuration that is determined according to the voice quality.

【００６４】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするため、同じ特徴パラメータを用
いて異なる声質の音声を合成することができるが、さら
にここで、合成時に使用される音韻継続時間を、標本周
期の変換を行わないとき（第１の標本周期に変換するこ
とと等価）に用いる値と異ならせるとか、第２の標本周
期及び第３の標本周期に応じて定めるとか、声質選択手
段により選択指定された声質に応じて第２の標本周期を
定め、選択指定された声質に応じて音韻継続時間を定め
ることで、合成する音声の発話速度を適切に制御できる
から、声質を変化させながら自然な発話速度で音声が合
成できる。In the present invention, after the synthesized speech is converted into the second sample period different from the first sample period at the time of creating the speech unit, D is sampled at the third sample period different from the second sample period. By performing the A / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same feature parameter. The time is set to be different from the value used when the sampling period is not converted (equivalent to converting to the first sampling period), is set according to the second sampling period and the third sampling period, or the voice quality is set. Since the second sampling period is determined according to the voice quality selected and designated by the selection means and the phoneme duration is determined according to the voice quality selected and designated, the utterance speed of the synthesized voice can be appropriately controlled. change Voice can be synthesized in a natural speech rate while.

【００６５】次に、本発明の第１６の観点に係る音声合
成装置は、前記第７の観点に係る音声合成装置に対応す
るもので、第１の標本周期で標本化した離散音声信号か
ら作成した音声素片を、与えられた音韻情報に基づいて
選択し、この選択した音声素片を時間軸方向に伸縮させ
ながら接続することによって離散音声信号を合成する音
声合成手段であって、上記合成した離散音声信号の標本
周期を上記第１の標本周期とは異なる第２の標本周期に
変換することが可能な音声合成手段と、この音声合成手
段から出力される離散音声信号を上記第２の標本周期と
は異なる第３の標本周期でアナログ音声信号に変換する
ディジタル／アナログ変換手段とを備えたことを特徴と
する。ここで、音声合成手段における声素片接続時の音
声素片に対する時間軸方向への伸縮の度合いを、上記第
２の標本周期及び第３の標本周期に基づいて定めるとよ
い。Next, a speech synthesizer according to a sixteenth aspect of the present invention corresponds to the speech synthesizer according to the seventh aspect, and is created from a discrete speech signal sampled at the first sampling period. A speech synthesizing unit for synthesizing a discrete speech signal by selecting the selected speech unit based on the given phoneme information and connecting the selected speech units while expanding and contracting in the time axis direction. The speech synthesis means capable of converting the sampling period of the discrete speech signal into a second sampling period different from the first sampling period, and the discrete speech signal output from the speech synthesis means are converted into the second sampling period. And a digital / analog conversion means for converting into an analog audio signal at a third sampling period different from the sampling period. Here, the degree of expansion / contraction in the time axis direction with respect to the voice unit when the voice unit is connected in the voice synthesizing unit may be determined based on the second sampling period and the third sampling period.

【００６６】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするため、同じ特徴パラメータを用
いて異なる声質の音声を合成することができるが、さら
にここで、音声素片を時間軸方向に伸縮させながら接続
することによって、例えば、第２の標本周期及び第３の
標本周期に基づいて定められる度合いで時間軸方向に伸
縮させながら接続することによって、合成する音声の発
話速度を適切に制御でき、なおかつ、合成する音声のス
ペクトル過渡部分の時間変化も適切に制御できるから、
声質を変化させながら自然な発話速度で明瞭な音声が合
成できる。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of creating the speech unit, D is sampled at the third sampling period different from the second sampling period. By performing the A / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same feature parameter. By connecting while expanding and contracting to, for example, connecting while expanding and contracting in the time axis direction at a degree determined based on the second sampling period and the third sampling period, the utterance speed of the synthesized voice is appropriately adjusted. It is possible to control, and it is also possible to properly control the temporal change of the spectrum transient part of the synthesized voice.
Clear voice can be synthesized at a natural speech rate while changing the voice quality.

【００６７】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、選択した音声素片を声質選択手段により選択指定
された声質に応じて定められる度合いで時間軸方向に伸
縮させながら接続することによって離散音声信号を合成
し、この合成した離散音声信号の標本周期を選択指定さ
れた声質に応じて定められる第２の標本周期に変換する
構成とすることも可能である。この場合、選択された声
質に応じて定められる第２の標本周期に合成音声を変換
した後、第３の標本周期でＤ／Ａ変換を行うことによ
り、音声のスペクトルが周波数対数軸上でシフトするた
め、同じ特徴パラメータを用いて異なる声質の音声を合
成することができるが、さらにここで、合成時に音声素
片を選択された声質に応じた度合で時間軸方向に伸縮さ
せながら接続することによって、合成する音声の発話速
度を適切に制御でき、なおかつ、合成する音声のスペク
トル過渡部分の時間変化も適切に制御できるから、声質
を変化させながら自然な発話速度で明瞭な音声が合成で
きる。In addition to the above, voice quality selection means for selectively designating the voice quality of the voice to be synthesized is further provided, and in the above-mentioned voice synthesis means, the selected voice unit is determined according to the voice quality selected and designated by the voice quality selection means. A configuration in which discrete voice signals are synthesized by expanding and contracting in the time axis direction depending on the degree, and the sampling period of the synthesized discrete voice signals is converted into a second sampling period determined according to the voice quality selected and designated. It is also possible to do so. In this case, the spectrum of the voice is shifted on the frequency logarithmic axis by converting the synthesized voice into the second sample period determined according to the selected voice quality and then performing the D / A conversion in the third sample period. Therefore, it is possible to synthesize voices with different voice qualities by using the same feature parameter.However, when synthesizing, it is necessary to connect the voice units while expanding and contracting in the time axis direction to the degree according to the selected voice quality. By this, the speech rate of the synthesized speech can be appropriately controlled, and the temporal change of the spectrum transient part of the synthesized speech can also be controlled appropriately, so that a clear speech can be synthesized at a natural speech rate while changing the voice quality.

【００６８】次に、本発明の第１７の観点に係る音声合
成装置は、前記第８の観点に係る音声合成装置に対応す
るもので、第１の標本周期で標本化した離散音声信号に
第１のフレーム周期の時間窓をかけて分析して得られる
音声の特徴パラメータフレームの時系列から所定の合成
単位で切り出した音声素片を複数蓄積する音声素片蓄積
手段と、この音声素片蓄積手段から入力音韻情報に基づ
いて音声素片を選択し接続して合成パラメータフレーム
の時系列を生成する合成パラメータフレーム時系列生成
手段と、この合成パラメータフレーム時系列生成手段に
より生成された合成パラメータフレームの時系列から上
記第１のフレーム周期とは異なる第２のフレーム周期で
離散音声信号を合成する合成手段と、この合成手段によ
って合成された離散音声信号の標本周期を上記第１の標
本周期とは異なる第２の標本周期に変換する標本周期変
換手段と、この標本周期変換手段によって標本周期が変
換された離散音声信号を上記第２の標本周期とは異なる
第３の標本周期でアナログ音声信号に変換するディジタ
ル／アナログ変換手段とを備えたことを特徴とする。こ
こで、上記第２のフレーム周期を、上記第１のフレーム
周期、第２の標本周期及び第３の標本周期に基づいて定
めるとよい。Next, a speech synthesizer according to a seventeenth aspect of the present invention corresponds to the speech synthesizer according to the eighth aspect, in which the discrete speech signal sampled at the first sampling period is A speech unit accumulating unit for accumulating a plurality of speech units cut out in a predetermined synthesis unit from a time series of speech feature parameter frames obtained by analyzing a time window of one frame period, and the speech unit accumulation. Synthesizer parameter frame time series generating means for selecting and connecting speech units based on input phoneme information from the means to generate a time series of the synthesized parameter frame, and the synthesizer parameter frame time series generating means And a synthesizing means for synthesizing the discrete audio signal in the second frame cycle different from the first frame cycle from the time series of Sampling period conversion means for converting the sampling period of the audio signal into a second sampling period different from the first sampling period, and a discrete speech signal whose sampling period has been converted by the sampling period conversion means, the second sampling And a digital / analog conversion means for converting into an analog audio signal at a third sampling period different from the period. Here, the second frame cycle may be determined based on the first frame cycle, the second sampling cycle, and the third sampling cycle.

【００６９】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするため、同じ特徴パラメータを用
いて異なる声質の音声を合成することができるが、さら
にここで、合成時の第２のフレーム周期を第１のフレー
ム周期、第２の標本周期及び第３の標本周期に基づいて
定めるなどして、分析時の窓の周期（第１のフレーム周
期）と異ならせることにより、合成する音声の発話速度
を適切に制御できるから、声質を変化させながら自然な
発話速度の音声が合成できる。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of creating the speech unit, D is sampled at the third sampling period different from the second sampling period. By performing the A / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same feature parameter. Here, however, the second frame at the time of synthesis is synthesized. By determining the period based on the first frame period, the second sampling period, and the third sampling period, and by making it different from the window period (first frame period) at the time of analysis, Since the speech rate can be appropriately controlled, it is possible to synthesize a voice with a natural speech rate while changing the voice quality.

【００７０】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記音声合成手段
では、上記第２のフレーム周期と第２の標本周期が声質
選択手段により選択指定された声質に応じて定められる
構成とすることも可能である。この場合、選択された声
質に応じて定められる第２の標本周期に合成音声を変換
した後、第３の標本周期でＤ／Ａ変換を行うことによ
り、音声のスペクトルが周波数対数軸上でシフトするた
め、同じ特徴パラメータを用いて異なる声質の音声を合
成することができるが、さらにここで、合成時の第２の
フレーム周期を選択された声質に応じて定め、分析時の
窓の周期（第１のフレーム周期）と異ならせることによ
り、合成する音声の発話速度を適切に制御できるから、
声質を変化させながら自然な発話速度の音声が合成でき
る。In addition to the above, voice quality selecting means for selecting and specifying the voice quality of the voice to be synthesized is further provided, and in the voice synthesizing means, the second frame period and the second sample period are selected and designated by the voice quality selecting means. It is also possible to adopt a configuration determined according to the voice quality. In this case, the spectrum of the voice is shifted on the frequency logarithmic axis by converting the synthesized voice into the second sample period determined according to the selected voice quality and then performing the D / A conversion in the third sample period. Therefore, it is possible to synthesize voices having different voice qualities using the same feature parameter. However, here, the second frame period at the time of synthesis is determined according to the selected voice quality, and the window period at the time of analysis ( Since it is possible to appropriately control the utterance speed of the synthesized voice by making it different from the first frame period),
A voice with a natural speaking speed can be synthesized while changing the voice quality.

【００７１】次に、本発明の第１８の観点に係る音声合
成装置は、前記第９の観点に係る音声合成装置に対応す
るもので、第１の標本周期とは異なる第２の標本周期及
び第２の標本周期とは異なる第３の標本周期に基づいて
入力韻律情報から音声の基本周波数パターンを生成する
ピッチパターン生成手段と、（入力音韻情報に基づいて
生成された音韻パラメータと上記ピッチパターン生成手
段によって生成された基本周波数パターンとから）合成
された離散音声信号の標本周期を上記第２の標本周期に
変換する標本周期変換手段と、この標本周期変換手段に
よって標本周期が変換された離散音声信号を上記第３の
標本周期でアナログ音声信号に変換するディジタル／ア
ナログ変換手段とを備えたことを特徴とする。Next, a speech synthesizer according to an eighteenth aspect of the present invention corresponds to the speech synthesizer according to the ninth aspect, wherein a second sampling period different from the first sampling period and Pitch pattern generation means for generating a fundamental frequency pattern of a voice from input prosody information based on a third sampling period different from the second sampling period; and (a phonological parameter generated based on the input phonological information and the pitch pattern). Sampling period conversion means for converting the sampling period of the synthesized discrete audio signal (from the fundamental frequency pattern generated by the generation means) into the second sampling period, and the sampling period conversion means for converting the sampling period. Digital / analog conversion means for converting the audio signal into an analog audio signal at the third sampling period.

【００７２】この発明においては、音声素片作成時の第
１の標本周期とは異なる第２の標本周期に合成音声を変
換した後、第２の標本周期とは異なる第３の標本周期で
Ｄ／Ａ変換を行うことにより、音声のスペクトルが周波
数対数軸上でシフトするため、同じ特徴パラメータを用
いて異なる声質の音声を合成することができるが、さら
にここで、第２の標本周期及び第３の標本周期に基づい
て合成の基本周波数パターンを定めることにより、合成
する音声のピッチ（声の高さ）を適切に制御できるか
ら、声質を変化させながら自然なピッチの音声が合成で
きる。In the present invention, after the synthesized speech is converted into the second sampling period different from the first sampling period at the time of creating the speech unit, D is sampled at the third sampling period different from the second sampling period. By performing the A / A conversion, the spectrum of the voice shifts on the frequency logarithmic axis, so that voices having different voice qualities can be synthesized using the same feature parameter. By determining the basic frequency pattern for synthesis based on the sampling period of 3, the pitch of the voice to be synthesized (voice pitch) can be controlled appropriately, so that a voice with a natural pitch can be synthesized while changing the voice quality.

【００７３】この他、合成する音声の声質を選択指定す
るための声質選択手段をさらに設け、上記ピッチパター
ン生成手段では、声質選択手段により選択指定された声
質に応じて上記韻律情報から音声の基本周波数パターン
が生成され、上記標本周期変換手段では、合成された離
散音声信号の標本周期が声質選択手段により選択指定さ
れた声質に応じて定められる第２の標本周期に変換され
る構成とすることも可能である。この場合、選択された
声質に応じて定められる第２の標本周期に合成音声を変
換した後、第３の標本周期でＤ／Ａ変換を行うことによ
り、音声のスペクトルが周波数対数軸上でシフトするた
め、同じ特徴パラメータを用いて異なる声質の音声を合
成することができるが、さらにここで、選択された声質
に応じて合成の基本周波数パターンを決定することによ
り、合成する音声のピッチ（声の高さ）を適切に制御で
きるから、声質を変化させながら自然なピッチの音声が
合成できる。In addition, voice quality selection means for selecting and designating the voice quality of the voice to be synthesized is further provided, and the pitch pattern generating means uses the prosody information to select the basic voice from the prosody information according to the voice quality selected and designated by the voice quality selection means. A frequency pattern is generated, and the sampling cycle conversion means converts the sampling cycle of the combined discrete speech signal into a second sampling cycle determined according to the voice quality selected and designated by the voice quality selection means. Is also possible. In this case, the spectrum of the voice is shifted on the frequency logarithmic axis by converting the synthesized voice into the second sample period determined according to the selected voice quality and then performing the D / A conversion in the third sample period. Therefore, it is possible to synthesize voices with different voice qualities using the same feature parameter.However, here, by determining the fundamental frequency pattern for synthesis according to the selected voice quality, the pitch of the voices to be synthesized (voice The pitch can be controlled appropriately, so that a voice with a natural pitch can be synthesized while changing the voice quality.

【００７４】[0074]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００７５】［第１の実施形態］図１は本発明の第１の
実施形態に係る音声の分析合成装置の概略構成を示すブ
ロック図である。[First Embodiment] FIG. 1 is a block diagram showing the schematic arrangement of a speech analysis / synthesis apparatus according to the first embodiment of the present invention.

【００７６】図１において、メモリ１１には、１１０２
５Ｈｚ（第１の標本周期）でサンプリング（標本化）し
た音声（離散音声信号）に対して、フレーム周期（第１
のフレーム周期）１０msecで窓幅２０msecのハニング窓
をかけ、従来技術にて説明した手順で得られる０次〜２
５次までの低次ケプストラム係数のパラメータフレーム
の時系列（特徴パラメータ）と、各フレームに対応した
音声の有声・無声情報が記憶されている。低次ケプスト
ラム係数については従来技術において説明済みであるの
で、ここでは省略する。In FIG. 1, 1102 is stored in the memory 11.
For a voice (discrete voice signal) sampled (sampled) at 5 Hz (first sampling period), a frame period (first sampling period)
Frame period), a Hanning window with a window width of 20 msec is applied at 10 msec, and the 0th to 2nd order obtained by the procedure described in the related art.
A time series (feature parameters) of parameter frames of low-order cepstrum coefficients up to the fifth order and voiced / unvoiced information of voice corresponding to each frame are stored. The low-order cepstrum coefficient has already been described in the prior art, and will be omitted here.

【００７７】一方、メモリ１２には、所定の基本周波数
抽出方法により同じ音声から得られる、音声の基本周波
数の時系列パターンが記憶されている。On the other hand, the memory 12 stores a time-series pattern of the fundamental frequency of the voice, which is obtained from the same voice by the predetermined fundamental frequency extraction method.

【００７８】合成フィルタ処理部１３は、これら２つの
メモリ１１，１２より各データを読み出し、メモリ１１
より読み出された音声のパラメータフレーム（ケプスト
ラム係数）をフィルタ係数とするＬＭＡフィルタ（図示
せず）を、有声区間ではメモリ１２より読み出された音
声の基本周波数の時系列パターン（ピッチパターン）に
基づいた周期パルスで、無声区間ではランダムノイズで
駆動することにより所望の音声（離散音声信号）を合成
する。The synthesis filter processing unit 13 reads out each data from these two memories 11 and 12, and
An LMA filter (not shown) having a filter coefficient of the parameter frame (cepstral coefficient) of the voice read out is used as a time series pattern (pitch pattern) of the fundamental frequency of the voice read from the memory 12 in the voiced section. A desired voice (discrete voice signal) is synthesized by driving with random noise in the unvoiced section with a periodic pulse based on the above.

【００７９】ここまでの処理はプログラムによって行わ
れるため、合成フィルタ処理部１３（内のＬＭＡフィル
タ）から出力される音声は離散音声信号である。そこ
で、この離散音声信号をＤ／Ａ変換器１４に供給し、電
気的なアナログ信号に変換する。こうして得られた音声
のアナログ信号をアンプ１５にて増幅し、スピーカ１６
を駆動することにより聴覚で知覚できる音声を得ること
ができる。Since the processing up to this point is performed by the program, the sound output from the synthesis filter processing unit 13 (the LMA filter therein) is a discrete sound signal. Therefore, this discrete audio signal is supplied to the D / A converter 14 and converted into an electrical analog signal. The audio analog signal thus obtained is amplified by the amplifier 15, and the speaker 16
By driving, it is possible to obtain audio that can be perceptually heard.

【００８０】ここまでは従来技術で挙げた（図２３の分
析合成装置の）例とほぼ同じである。本実施形態のポイ
ントは、声質切替部１７及び声質制御部１８が加えられ
たことにある。なお、Ｄ／Ａ変換器１４はハードウェア
により構成されているが、その変換のサンプリング周波
数はソフトウェアから制御可能なようになっている。The process up to this point is almost the same as the example (of the analyzing and synthesizing apparatus of FIG. 23) described in the prior art. The point of this embodiment is that a voice quality switching unit 17 and a voice quality control unit 18 are added. Although the D / A converter 14 is composed of hardware, the sampling frequency for the conversion can be controlled by software.

【００８１】声質切替部１７は、ユーザによる指定もし
くはアプリケーションプログラム等によって合成する際
の声質を切り替え指定することができるようになってい
る。本実施形態では、この声質切替部１７にて３種類の
声質が指定可能であるものとする。The voice quality switching section 17 is capable of switching and designating the voice quality at the time of synthesis by a user or an application program. In the present embodiment, it is assumed that the voice quality switching unit 17 can specify three types of voice qualities.

【００８２】声質制御部１８は、声質切替部１７で指定
された声質に応じて１１０２５Ｈｚ，１２０００ＨＺ，
１００００Ｈｚのいずれかのサンプリング周波数（第２
の標本周期）でディジタル／アナログ変換（Ｄ／Ａ変
換）を行うようにＤ／Ａ変換器１４を制御する。The voice quality control unit 18 sets 11025 Hz, 12000 HZ, according to the voice quality designated by the voice quality switching unit 17.
Any sampling frequency of 10000 Hz (second
The D / A converter 14 is controlled so as to perform digital / analog conversion (D / A conversion) at the sampling period of.

【００８３】メモリ１１に蓄えられたケプストラムを作
成した際の音声のサンプリング周波数（第１の標本周
期）と同じサンプリング周波数、即ち１１０２５Ｈｚ
で、Ｄ／Ａ変換器１４がＤ／Ａ変換が行えば、元の音声
の声質で音声合成することができる。The same sampling frequency as the voice sampling frequency (first sampling period) when the cepstrum stored in the memory 11 is created, that is, 11025 Hz.
Then, if the D / A converter 14 performs D / A conversion, it is possible to perform voice synthesis with the voice quality of the original voice.

【００８４】一方、Ｄ／Ａ変換器１４が他のサンプリン
グ周波数（第１の標本周期とは異なる第２の標本周期）
でＤ／Ａ変換すれば、図２に示すように、音声スペクト
ル（図はスペクトル包絡）を周波数軸方向にシフトした
効果が得られるため、音声の個人性が変化し、こうして
得られるアナログ音声信号の声質は、元となる音声の声
質とは異なったものとなる。On the other hand, the D / A converter 14 uses another sampling frequency (second sampling period different from the first sampling period).
As shown in FIG. 2, if the D / A conversion is performed, the effect of shifting the sound spectrum (spectrum envelope in the figure) in the frequency axis direction is obtained, so that the individuality of the sound is changed and the analog sound signal thus obtained is obtained. Has a voice quality different from that of the original voice.

【００８５】［第２の実施形態］前記第１の実施形態に
おいては、声質切替部１７及び声質制御部１８を設けた
ことで、合成音声の声質を簡単に増やすことができるも
のの、合成される音声のスピードが声質により異なる。[Second Embodiment] In the first embodiment, by providing the voice quality switching unit 17 and the voice quality control unit 18, the voice quality of the synthesized voice can be easily increased, but it is synthesized. Voice speed varies depending on voice quality.

【００８６】即ち、（ケプストラム作成時のサンプリン
グ周波数）＞（Ｄ／Ａ変換のサンプリング周波数）のと
きには、合成される音声のスピードは遅くなる。逆に、
（ケプストラム作成時のサンプリング周波数）＜（Ｄ／
Ａ変換のサンプリング周波数）のときには、合成される
音声のスピードは早くなる。That is, when (sampling frequency when creating the cepstrum)> (sampling frequency for D / A conversion), the speed of the synthesized voice becomes slow. vice versa,
(Sampling frequency when creating the cepstrum) <(D /
At the sampling frequency of A conversion), the speed of the synthesized voice is high.

【００８７】このような合成される音声のスピードの違
いは、Ｄ／Ａ変換時のサンプリング周波数が前記第１の
実施形態程度の違い（９１％，１０９％）ではあまり問
題とはならない。しかし以下に述べるように、Ｄ／Ａ変
換時のサンプリング周波数（第２の標本周期）が、メモ
リ１１に蓄えられたケプストラムを作成した際の音声の
サンプリング周波数（第１の標本周期）と大きく異なる
場合には、問題となる。The difference in the speed of the synthesized voice does not cause much problem if the sampling frequencies at the time of D / A conversion are different from those in the first embodiment (91%, 109%). However, as described below, the sampling frequency (the second sampling period) at the time of D / A conversion is significantly different from the sampling frequency (the first sampling period) of the voice when the cepstrum stored in the memory 11 is created. In some cases, it becomes a problem.

【００８８】まず、ケプストラム作成時のサンプリング
周波数と合成時のＤ／Ａ変換のサンプリング周波数の比
が１に近ければ声質は変化も小さく、逆にこの比が１か
ら離れれば声質は大きく変化する。したがって、声質を
大きく変えようとすれば、これら両サンプリング周波数
の比を例えば５０％，２００％程度に設定すればよい
が、これでは合成音声のスピードもそれぞれ元の音声の
５０％，２００％、即ち半分と倍になり、かなり聞きづ
らくなる。First, if the ratio of the sampling frequency at the time of creating the cepstrum and the sampling frequency of the D / A conversion at the time of synthesis is close to 1, the voice quality changes little. Conversely, if the ratio deviates from 1, the voice quality changes greatly. Therefore, if the voice quality is to be changed significantly, the ratio of these two sampling frequencies may be set to, for example, about 50% and 200%. With this, the speed of the synthesized voice is 50% and 200% of the original voice, respectively. In other words, it will be doubled, and it will be hard to hear.

【００８９】そこで、合成時のＤ／Ａ変換のサンプリン
グ周波数がケプストラム作成時のサンプリング周波数と
大きく異なった場合でも、合成音声のスピードを一定に
できるようにした第２の実施形態につき説明する。Therefore, a second embodiment will be described in which the speed of synthesized speech can be made constant even when the sampling frequency for D / A conversion during synthesis is significantly different from the sampling frequency for creating the cepstrum.

【００９０】図３は本発明の第２の実施形態に係る音声
の分析合成装置の概略構成を示すブロック図であり、図
１と同一部分には同一符号を付してある。FIG. 3 is a block diagram showing a schematic configuration of a voice analysis / synthesis apparatus according to the second embodiment of the present invention. The same parts as those in FIG. 1 are designated by the same reference numerals.

【００９１】本実施形態のポイントは、図１中の合成フ
ィルタ処理部１３に代えて、合成時のフレーム周期が制
御可能な合成フィルタ処理部２３を設けると共に、図１
中の声質制御部１８に代えて、Ｄ／Ａ変換のサンプリン
グ周波数だけでなく合成時のフレーム周期を制御する声
質制御部２８を設け、当該声質制御部２８により、声質
切替部１７の指定に応じて、Ｄ／Ａ変換器１４でのＤ／
Ａ変換のサンプリング周波数と同時に、合成フィルタ処
理部２３での合成時のフレーム周期を制御するところに
ある。The point of the present embodiment is that, instead of the synthesizing filter processing unit 13 in FIG. 1, a synthesizing filter processing unit 23 capable of controlling the frame period at the time of synthesizing is provided, and
Instead of the voice quality control unit 18 in the middle, a voice quality control unit 28 for controlling not only the sampling frequency of D / A conversion but also the frame period at the time of synthesis is provided, and the voice quality control unit 28 responds to the designation of the voice quality switching unit 17. D / A converter 14
At the same time as the sampling frequency of the A conversion, the frame period at the time of synthesis in the synthesis filter processing unit 23 is controlled.

【００９２】本実施形態では、前記第１の実施形態と同
様に声質切替部１７にて３種類の声質が指定可能であ
る。声質制御部２８は声質切替部１７で指定された声質
に応じて、例えば１１０２５Ｈｚ，８０００Ｈｚ，１６
０００Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変
換を行うようにＤ／Ａ変換器１４を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching section 17 as in the first embodiment. The voice quality control unit 28 determines, for example, 11025 Hz, 8000 Hz, 16 according to the voice quality designated by the voice quality switching unit 17.
The D / A converter 14 is controlled to perform D / A conversion at any sampling frequency of 000 Hz.

【００９３】同時に声質制御部２８は、声質切替部１７
によって指定された声質に応じて、合成フィルタ処理部
２３で行われる合成のフレーム周期を設定する。これに
より合成フィルタ処理部２３では、メモリ１１より読み
出された音声のパラメータフレーム（ケプストラム係
数）を設定されたフレーム周期で（ＬＭＡフィルタに）
入力し、当該フレーム周期で音声（離散音声信号）を合
成する。At the same time, the voice quality control unit 28 uses the voice quality switching unit 17
The frame period of the synthesis performed by the synthesis filter processing unit 23 is set according to the voice quality designated by. As a result, in the synthesis filter processing unit 23, the parameter frame (cepstral coefficient) of the voice read from the memory 11 is set at the set frame period (to the LMA filter).
Input and synthesize speech (discrete speech signal) at the frame cycle.

【００９４】合成のフレーム周期は次式により与えられ
る。The frame period of synthesis is given by the following equation.

【００９５】（フレーム周期）＝（分析フレーム周期）×（分析サンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（分析フレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（分析サンプリング周波数）したがって、ケプストラム作成時（分析時）の音声のサ
ンプリング周波数（第１の標本周期）と同じサンプリン
グ周波数、即ち１１０２５ＨｚでＤ／Ａ変換を行う際に
は、声質制御部２８は上式に基づき、合成時のフレーム
周期をケプストラム作成時のフレーム周期である分析フ
レーム周期（第１のフレーム周期）と同じ１０msecで合
成するよう合成フィルタ処理部１３を制御する。但し、
メモリ１１に蓄えられたケプストラムは前記第１の実施
形態と同じ条件で作成されているものとする。(Frame period) = (Analysis frame period) × (Analysis sampling period) / (D / A conversion sampling period) = (Analysis frame period) × (D / A conversion sampling frequency) / (Analysis sampling) Therefore, when performing D / A conversion at the same sampling frequency as the sampling frequency (first sampling period) of the voice at the time of creating the cepstrum (at the time of analysis), that is, at 11025 Hz, the voice quality control unit 28 uses the above equation. The synthesis filter processing unit 13 is controlled so that the synthesis frame period is synthesized in 10 msec, which is the same as the analysis frame period (first frame period) which is the frame period when the cepstrum is created. However,
It is assumed that the cepstrum stored in the memory 11 is created under the same conditions as in the first embodiment.

【００９６】また声質制御部２８は、ケプストラム作成
時（分析時）の音声のサンプリング周波数とは異なるサ
ンプリング周波数（第１の標本周期とは異なる第２の標
本周期）、例えば８０００ＨｚにてＤ／Ａ変換を行う場
合には、１０[msec]×８０００[Hz]／１１０２５[Hz]＝７．３[m
sec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御し、１６０００Ｈ
ｚにてＤ／Ａ変換を行う場合には、１０[msec]×１６０００[Hz]／１１０２５[Hz]＝１４．
５[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御する。Further, the voice quality control section 28 sets the D / A at a sampling frequency (second sampling period different from the first sampling period) different from the sampling frequency of the voice when the cepstrum is created (at the time of analysis), for example, 8000 Hz. When converting, 10 [msec] x 8000 [Hz] / 11025 [Hz] = 7.3 [m]
sec] frame period (second frame period different from the first frame period) is controlled to perform synthesis, and 16000H
When performing D / A conversion in z, 10 [msec] × 16000 [Hz] / 11025 [Hz] = 14.
The control is performed so that the composition is performed in a frame period of 5 [msec] (a second frame period different from the first frame period).

【００９７】このように本実施形態においては、分析時
と異なるサンプリング周波数（第１の標本周期とは異な
る第２の標本周期）でＤ／Ａ変換したときの合成音声の
スピードの変化を、合成フィルタ処理部２３での合成の
フレーム周期（第２のフレーム周期）をケプストラム作
成時（分析時）のフレーム周期（第１のフレーム周期）
とは異ならせることで相殺することができる。As described above, in this embodiment, the change in the speed of the synthesized voice when the D / A conversion is performed at the sampling frequency different from the analysis time (the second sampling period different from the first sampling period) is synthesized. The frame period (second frame period) of the synthesis in the filter processing unit 23 is the frame period (first frame period) when the cepstrum is created (at the time of analysis).
It can be offset by making it different from.

【００９８】したがって、分析時のサンプリング周波数
と異なる８０００Ｈｚあるいは１６０００ＨｚでＤ／Ａ
変換を行っても、同じスピードの音声のアナログ信号を
得ることができる。Therefore, D / A at 8000 Hz or 16000 Hz, which is different from the sampling frequency at the time of analysis,
Even if the conversion is performed, it is possible to obtain the analog signal of the voice having the same speed.

【００９９】［第３の実施形態］ところで、前記第１の
実施形態においては、Ｄ／Ａ変換のサンプリング周波数
を分析時のものと変えると、声の高さ、即ち音声のピッ
チの変化を招く。[Third Embodiment] In the first embodiment, when the sampling frequency of D / A conversion is changed from that at the time of analysis, the pitch of voice, that is, the pitch of voice is changed. .

【０１００】即ち、（ケプストラム作成時のサンプリン
グ周波数）＞（Ｄ／Ａ変換のサンプリング周波数）のと
きには、合成される音声のピッチは低くなる。逆に、
（ケプストラム作成時のサンプリング周波数）＜（Ｄ／
Ａ変換のサンプリング周波数）のときには、合成される
音声のピッチは高くなる。That is, when (sampling frequency when creating the cepstrum)> (sampling frequency for D / A conversion), the pitch of the synthesized voice becomes low. vice versa,
(Sampling frequency when creating the cepstrum) <(D /
At the sampling frequency of A conversion), the pitch of synthesized speech becomes high.

【０１０１】このような合成される音声のピッチの違い
は、Ｄ／Ａ変換時のサンプリング周波数が前記第１の実
施形態程度の違い（９１％，１０９％）ではあまり問題
とはならない。しかし、声質を大きく変えようとして、
両サンプリング周波数の比を例えば５０％，２００％程
度に設定すれば、合成音声のピッチもそれぞれ５０％，
２００％と変化するため、ケプストラム作成時と同じ１
１０２５ＨｚでＤ／Ａ変換したときの音声（あるいは原
音声）と比較して、前者はピッチが１[oct] （オクター
ブ）低い音声が合成され、後者は１[oct] 高い音声が合
成されるので聞きづらくなるという問題が発生する。The difference in the pitch of the synthesized voices does not cause much problem if the sampling frequencies at the time of D / A conversion are different from those in the first embodiment (91%, 109%). However, in an attempt to significantly change the voice quality,
If the ratio of both sampling frequencies is set to, for example, about 50% and 200%, the pitch of the synthesized voice is 50%,
Since it changes to 200%, it is the same as when creating the cepstrum. 1
Compared with the voice (or the original voice) when D / A converted at 1025 Hz, the former synthesizes a voice with a pitch lower by 1 [oct] (octave), and the latter synthesizes a voice with a higher pitch of 1 [oct]. The problem that it becomes difficult to hear occurs.

【０１０２】そこで、合成時のＤ／Ａ変換のサンプリン
グ周波数がケプストラム作成時のサンプリング周波数と
大きく異なった場合でも、合成音声のピッチを一定にで
きるようにした第３の実施形態につき説明する。Therefore, a third embodiment will be described in which the pitch of the synthesized voice can be made constant even when the sampling frequency of the D / A conversion at the time of synthesis is significantly different from the sampling frequency at the time of creating the cepstrum.

【０１０３】図４は本発明の第３の実施形態に係る音声
の分析合成装置の概略構成を示すブロック図であり、図
１と同一部分には同一符号を付してある。FIG. 4 is a block diagram showing the schematic arrangement of a speech analysis / synthesis apparatus according to the third embodiment of the present invention. The same parts as those in FIG. 1 are designated by the same reference numerals.

【０１０４】本実施形態のポイントは、図１中の合成フ
ィルタ処理部１３に代えて、合成時のフレーム周期が制
御可能な合成フィルタ処理部３３を設けると共に、メモ
リ１２と合成フィルタ処理部３３との間にメモリ１２よ
り読み出された基本周波数（の時系列）パターン（ピッ
チパターン）を周波数の異なる別の基本周波数パターン
に変換（ピッチ変調）して合成フィルタ処理部３３に与
えるピッチ変調処理部３１を設け、さらに図１中の声質
制御部１８に代えて、Ｄ／Ａ変換のサンプリング周波数
だけでなく合成時のフレーム周期及びピッチの変調を制
御する声質制御部３８を設け、当該声質制御部３８によ
り、声質切替部１７の指定に応じて、Ｄ／Ａ変換器１４
でのサンプリング周波数と合成フィルタ処理部３３での
合成時のフレーム周期を制御すると同時に、ピッチ変調
処理部３１でのピッチの変調を制御するところにある。The point of this embodiment is to replace the synthesis filter processing unit 13 in FIG. 1 with a synthesis filter processing unit 33 capable of controlling the frame period at the time of synthesis, and to use the memory 12 and the synthesis filter processing unit 33. A pitch modulation processing unit for converting (pitch modulation) a basic frequency pattern (time series) (pitch pattern) read from the memory 12 into another basic frequency pattern having a different frequency and giving it to the synthesis filter processing unit 33. 31 is provided, and instead of the voice quality control unit 18 in FIG. 1, a voice quality control unit 38 is provided for controlling not only the sampling frequency of D / A conversion but also the frame period and pitch modulation at the time of synthesis. According to the designation of the voice quality switching unit 17, the D / A converter 14
The sampling frequency and the frame period at the time of synthesis in the synthesis filter processing unit 33 are controlled, and at the same time, the pitch modulation in the pitch modulation processing unit 31 is controlled.

【０１０５】本実施形態では、前記第１の実施形態と同
様に声質切替部１７にて３種類の声質を指定可能であ
る。声質制御部３８は声質切替部１７で指定された声質
に応じて、例えば１１０２５Ｈｚ，８０００Ｈｚ，１６
０００Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変
換を行うようにＤ／Ａ変換器１４を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 17 as in the first embodiment. The voice quality control unit 38 determines, for example, 11025 Hz, 8000 Hz, 16 according to the voice quality designated by the voice quality switching unit 17.
The D / A converter 14 is controlled to perform D / A conversion at any sampling frequency of 000 Hz.

【０１０６】声質制御部３８は、声質切替部１７によっ
て指定された声質に応じて、Ｄ／Ａ変換器１４のＤ／Ａ
変換のサンプリング周波数を設定すると同時に、合成フ
ィルタ処理部３３で行われる合成のフレーム周期を設定
する。合成のフレーム周期は次式により与えられる。The voice quality control unit 38 controls the D / A converter 14 according to the voice quality designated by the voice quality switching unit 17.
At the same time as setting the sampling frequency for conversion, the frame period for combining performed by the combining filter processing unit 33 is set. The frame period of synthesis is given by the following equation.

【０１０７】（フレーム周期）＝（分析フレーム周期）×（分析サンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（分析フレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（分析サンプリング周波数）なお、メモリ１１に蓄えられたケプストラムは前記第１
の実施形態と同じ条件で作成されているものとする。(Frame period) = (Analysis frame period) × (Analysis sampling period) / (D / A conversion sampling period) = (Analysis frame period) × (D / A conversion sampling frequency) / (Analysis sampling) Frequency) The cepstrum stored in the memory 11 is the first
It is assumed that it is created under the same conditions as in the embodiment.

【０１０８】声質制御部３８はさらに、合成フィルタ処
理部３３に与えるピッチ（ピッチパターン）が、（合成フィルタ処理部３３に与えるピッチ）＝（メモリ１２より読み出したピッチ）×（Ｄ／Ａ変換のサンプリング周期）／（分析時のサンプリング周期）＝（メモリ１２より読み出したピッチ）×（分析時のサンプリング周波数）／（Ｄ／Ａ変換のサンプリング周波数）となるように、ピッチ変調処理部３１を制御する。The voice quality control section 38 further determines that the pitch (pitch pattern) given to the synthesis filter processing section 33 is (pitch given to the synthesis filter processing section 33) = (pitch read from the memory 12) × (D / A conversion) Sampling period) / (sampling period during analysis) = (pitch read from memory 12) × (sampling frequency during analysis) / (sampling frequency for D / A conversion) Control.

【０１０９】したがって、ケプストラム作成時の音声の
サンプリング周波数（第１の標本周期）と同じサンプリ
ング周波数、即ち１１０２５ＨｚでＤ／Ａ変換を行う際
には、声質制御部３８は上式に基づき、合成フィルタ処
理部３３に与えるピッチを分析時と同じピッチとなるよ
うピッチ変調処理部３１を制御する。Therefore, when performing D / A conversion at the same sampling frequency as the sampling frequency (first sampling period) of the voice at the time of creating the cepstrum, that is, 11025 Hz, the voice quality control section 38 uses the above-mentioned expression to synthesize the filter. The pitch modulation processing unit 31 is controlled so that the pitch given to the processing unit 33 becomes the same as that at the time of analysis.

【０１１０】また声質制御部３８は、ケプストラム作成
時（分析時）の音声のサンプリング周波数とは異なるサ
ンプリング周波数（第１の標本周期とは異なる第２の標
本周期）、例えば８０００ＨｚにてＤ／Ａ変換を行う場
合には、分析時のピッチを（１１０２５[Hz]／８０００
[Hz]）倍して合成フィルタ処理部３３に与えるように、
１６０００ＨｚでＤ／Ａ変換を行う場合には、同じく分
析時のピッチを（１１０２５[Hz]／１６０００[Hz]）倍
して合成フィルタ処理部３３に与えるように、ピッチ変
調処理部３１を制御する。Further, the voice quality control section 38 performs D / A at a sampling frequency (second sampling period different from the first sampling period) different from the sampling frequency of the voice when the cepstrum is created (at the time of analysis), for example, 8000 Hz. When converting, set the pitch during analysis to (11025 [Hz] / 8000
[Hz]) so that it is given to the synthesis filter processing unit 33,
When performing D / A conversion at 16000 Hz, the pitch modulation processing unit 31 is controlled so as to multiply the analysis pitch by (11025 [Hz] / 16000 [Hz]) and give it to the synthesis filter processing unit 33. .

【０１１１】このように本実施形態においては、合成フ
ィルタ処理部３３に与えるピッチを声質制御部３８の制
御のもとでピッチ変調処理部３１にて予め変調しておく
ことにより、分析時と異なるサンプリング周波数でＤ／
Ａ変換したときに生じる合成音声のピッチの変化を相殺
することができる。As described above, in the present embodiment, the pitch given to the synthesis filter processing unit 33 is previously modulated by the pitch modulation processing unit 31 under the control of the voice quality control unit 38, which is different from the time of analysis. D / at sampling frequency
It is possible to cancel the change in the pitch of the synthetic voice that occurs when the A conversion is performed.

【０１１２】したがって、分析時のサンプリング周波数
と異なる８０００Ｈｚあるいは１６０００ＨｚでＤ／Ａ
変換を行っても、同じ声の高さの音声のアナログ信号を
得ることができる。Therefore, D / A at 8000 Hz or 16000 Hz different from the sampling frequency at the time of analysis
Even if the conversion is performed, it is possible to obtain an analog signal of voice having the same pitch.

【０１１３】［第４の実施形態］図５は本発明の第４の
実施形態に係る音声の規則合成装置の概略構成を示すブ
ロック図である。[Fourth Embodiment] FIG. 5 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the fourth embodiment of the present invention.

【０１１４】この音声規則合成装置は、例えばパーソナ
ルコンピュータ等の情報処理装置上で専用のソフトウェ
ア（文音声変換ソフトウェア）を実行することにより実
現されるもので、文音声変換（ＴＴＳ）処理機能、即ち
テキストから音声を生成する文音声変換処理（文音声合
成処理）機能を有しており、その機能構成は、大別して
言語処理部４１と、音声合成部４２とに分けられる。This speech rule synthesizing device is realized by executing dedicated software (sentence / speech conversion software) on an information processing device such as a personal computer, and has a sentence / speech conversion (TTS) processing function, that is, It has a sentence-speech conversion processing (sentence-speech synthesis processing) function for generating speech from a text, and the functional configuration thereof is roughly divided into a language processing unit 41 and a speech synthesis unit 42.

【０１１５】言語処理部４１は、入力文、例えば漢字か
な混じり文を解析して読み情報とアクセント情報を生成
する処理と、これら情報に基づき音韻記号系列・アクセ
ント情報が記述された音声記号列を生成する処理を司
る。The language processing unit 41 analyzes the input sentence, for example, a sentence containing kanji and kana to generate reading information and accent information, and a phonetic symbol string in which phonological symbol sequence / accent information is described based on the information. Manages the process of generating.

【０１１６】音声合成部４２は、言語処理部４１の出力
である音声記号列をもとに音声を生成する処理を司る。The voice synthesizing unit 42 controls the process of generating a voice based on the voice symbol string output from the language processing unit 41.

【０１１７】さて、図５の音声規則合成装置において、
文音声変換（読み上げ）の対象となる文書（ここでは日
本語文書）はテキストファイル４３として保存されてい
る。本装置では、文音声変換ソフトウェアに従い、当該
ファイル４３から漢字かな混じり文を１文ずつ読み出し
て、言語処理部４１及び音声合成部４２2 により以下に
述べる文音声変換処理を行い、音声を合成する。Now, in the speech rule synthesizing device of FIG.
A document (Japanese document in this case) to be subjected to sentence-to-speech conversion (speech) is stored as a text file 43. According to the sentence / speech conversion software, this device reads out the sentence containing kanji and kana from the file 43 one by one, and the language processing unit 41 and the speech synthesis unit 422 perform the sentence / speech conversion processing described below to synthesize the speech.

【０１１８】まず、テキストファイル４３から読み出さ
れた漢字かな混じり文は、言語処理部４１内の言語解析
処理部４４に入力される。First, the kanji / kana mixed sentence read from the text file 43 is input to the language analysis processing unit 44 in the language processing unit 41.

【０１１９】言語解析処理部４４は、入力される漢字か
な混じり文に対して形態素解析を行い、読み情報とアク
セント情報を生成する。形態素解析とは、与えられた文
の中で、どの文字列が語句を構成しているか、そしてそ
の語の構造がどのようなものかを解析する作業である。The language analysis processing unit 44 performs morphological analysis on the input kanji / kana mixed sentence to generate reading information and accent information. Morphological analysis is an operation of analyzing which character string forms a phrase in a given sentence, and what the structure of the word is.

【０１２０】そのために言語解析処理部４４は、文の最
小構成要素である「形態素」を見出し語にもつ形態素辞
書４５と形態素間の接続規則が登録されている接続規則
ファイル４６を利用する。即ち言語解析処理部４４は、
入力文と形態素辞書４５とを照合することで得られる全
ての形態素系列候補を求め（総当たり法）、その中か
ら、接続規則ファイル４６を参照して文法的に前後に接
続できる組み合わせを出力する。形態素辞書４５には、
解析時に用いられる文法情報と共に、形態素の読み並び
にアクセントの型が登録されている。このため、形態素
解析により形態素が定まれば、同時に読みとアクセント
型も与えることができる。For this purpose, the language analysis processing unit 44 uses a morpheme dictionary 45 having a morpheme, which is the minimum constituent element of a sentence, as an entry word, and a connection rule file 46 in which connection rules between morphemes are registered. That is, the language analysis processing unit 44
All the morpheme sequence candidates obtained by matching the input sentence with the morpheme dictionary 45 are obtained (brute force method), and from among them, the connection rule file 46 is referred to and the combinations that can be connected grammatically before and after are output. . In the morpheme dictionary 45,
Along with grammatical information used at the time of analysis, morpheme readings and accent types are registered. For this reason, if a morpheme is determined by morphological analysis, reading and accent type can be given at the same time.

【０１２１】例えば、「公園へ行って本を読みます」と
いう文に対して形態素解析を行うと、／公園／ヘ／行って／本／を／読み／ます／。For example, when the morphological analysis is performed on the sentence "go to the park and read a book", / park / f / go / book / of / read / read /.

【０１２２】と形態素に分割される。各形態素に読みと
アクセント型が与えられ、／コウエン／エ／イッテ／ホ＾ン／ヲ／ヨミ／マ＾ス／となる。ここで「＾」の入っている形態素は、その直前
の音節でピッチが高く、その直後の音節ではピッチが落
ちるアクセントであることを意味する。また「＾」がな
い場合は、平板型のアクセントであることを意味する。Is divided into morphemes. Reading and accent type are given to each morpheme, and it becomes / Kouen / E / Itte / Hon / wo / Yomi / Mas /. Here, a morpheme containing "^" means that the pitch is high in the syllable immediately before and the pitch is low in the syllable immediately after. When there is no "^", it means that the accent is flat.

【０１２３】ところで、人間が文章を読むときには、こ
のような形態素単位でアクセントを付けて読むことはせ
ず、幾つかの形態素をひとまとめにして、そのまとまり
にアクセントを付けて読んでいる。By the way, when humans read a sentence, they do not read by adding accents in units of such morphemes, but by reading several morphemes together and adding accents to the unit.

【０１２４】そこで、このようなことを考慮して、言語
解析処理部４４ではさらに、一つのアクセント句（アク
セントを与える単位）で形態素をまとめると同時に、ま
とめたことによるアクセントの移動も推定する。これに
加えて言語解析処理部４４は、母音の無声化や読み上げ
の際のポーズ（息継ぎ）等の情報も付加し、上記の例で
は、最終的に次のような読み情報を生成する。Therefore, in consideration of such a situation, the language analysis processing unit 44 further collects morphemes by one accent phrase (unit giving an accent) and at the same time estimates movement of the accent due to the combination. In addition to this, the language analysis processing unit 44 also adds information such as vowel devoicing and pause (breathing) during reading, and in the above example, finally generates the following reading information.

【０１２５】／コーエンエ／イッテ．／ホ＾ンオ／ヨミ
マ＾（ス）／ここで、ピリオド「．」は息継ぎを、「（）」は母音
が無声化した音節を表わす。/ Cohenye / Itte. / Ho ^ o / Yomima ^ (s) / Here, the period "." Represents breath and "()" represents a syllable in which a vowel is devoiced.

【０１２６】さて、上記のようにして言語処理部４１内
の言語解析処理部４４により読み情報が生成されると、
音声合成部４２内の音韻継続時間計算処理部４７が起動
される。音韻継続時間計算処理部４７は、言語解析処理
部４４で生成した読み情報に従って、入力文に含まれる
各音節の子音部ならびに母音部の継続時間（単位はms）
を決定する。Now, when the reading information is generated by the language analysis processing unit 44 in the language processing unit 41 as described above,
The phoneme duration calculation processing unit 47 in the voice synthesis unit 42 is activated. The phoneme duration calculation processing unit 47, according to the reading information generated by the language analysis processing unit 44, the duration time (unit is ms) of the consonant part and the vowel part of each syllable included in the input sentence.
To determine.

【０１２７】この音韻継続時間計算処理部４７での継続
時間の決定処理は、子音（Ｃ）と母音（Ｖ）の境界（Ｃ
Ｖわたり）の位置が等間隔に並ぶようにするという、極
めて簡単なアルゴリズムにより実現されている。The process of determining the duration in the phoneme duration calculation unit 47 is performed by the boundary (C) between the consonant (C) and the vowel (V).
This is realized by an extremely simple algorithm in which the positions of (V crossing) are arranged at equal intervals.

【０１２８】ＣＶわたりの間隔（発話速度パラメータと
してのＣＶわたりの間隔）は、音声合成部４２内の発話
速度制御部４８より与えられる。図示しないが、本実施
形態で用いられるソフトウェアではユーザが合成音声の
スピードを指定することが可能となっている。そして、
ユーザが指定した音声のスピードがこの発話速度制御部
４８に与えられることにより、当該発話速度制御部４８
が（音韻継続時間計算処理部４７での継続時間の決定処
理にて決定される）先程のＣＶわたりの間隔を調整して
合成音声の速度を実際に変化させている。但し、日本語
の音声は、発声の速度を変えても子音の継続時間はほぼ
一定であることが分析結果から分かっているので、子音
の継続時間は一定に保ち、母音の継続時間を調節してＣ
Ｖわたりの間隔を変える。The CV crossover interval (CV crossover interval as a speech rate parameter) is given by the speech rate control section 48 in the voice synthesis section 42. Although not shown, the software used in this embodiment allows the user to specify the speed of synthesized speech. And
By giving the speed of the voice designated by the user to the speech rate control unit 48, the speech rate control unit 48 concerned
Is adjusting the interval of the CV crossing (determined by the duration determining process in the phoneme duration calculating unit 47) to actually change the speed of the synthesized voice. However, it is known from the analysis results that the duration of consonants in Japanese speech is almost constant even if the utterance speed is changed, so the duration of consonants is kept constant and the duration of vowels is adjusted. C
Change the interval between V crossings.

【０１２９】音韻継続時間計算処理部４７により入力文
に含まれる各音節の（子音部ならびに母音部の）継続時
間が決定されると、同じ音声合成部４２内のピッチ生成
処理部４９が起動される。ピッチ生成処理部４９は音韻
継続時間計算処理部４７により決定された継続時間と、
（言語処理部４１内の）言語解析処理部４４により決定
されたアクセント情報に基づいて、まず点ピッチ位置を
設定する。次にピッチ生成処理部４９は、設定した複数
の点ピッチを直線で補間して例えば１０msec毎のピッチ
パターン（基本周波数パターン）を得る。When the phoneme duration calculation processing section 47 determines the duration of each syllable (consonant section and vowel section) included in the input sentence, the pitch generation processing section 49 in the same speech synthesis section 42 is activated. It The pitch generation processing unit 49 determines the duration determined by the phoneme duration calculation processing unit 47,
First, the point pitch position is set based on the accent information determined by the language analysis processing unit 44 (in the language processing unit 41). Next, the pitch generation processing unit 49 interpolates a plurality of set point pitches with a straight line to obtain a pitch pattern (fundamental frequency pattern) every 10 msec, for example.

【０１３０】一方、音声合成部４２内の音韻パラメータ
生成処理部５０は、（言語処理部４１内の）言語解析処
理部４４から渡される音声記号列の音韻情報をもとに音
韻パラメータを生成する処理を、例えばピッチ生成処理
部４９によるピッチパターン生成処理と並行して次のよ
うに行う。On the other hand, the phonological parameter generation processing unit 50 in the speech synthesis unit 42 generates phonological parameters based on the phonological information of the speech symbol string passed from the language analysis processing unit 44 (in the language processing unit 41). The processing is performed as follows in parallel with the pitch pattern generation processing by the pitch generation processing unit 49, for example.

【０１３１】まず本実施形態では、サンプリング周波数
１１０２５Ｈｚ（第１の標本周期）で標本化した実音声
を改良ケプストラム法により窓長２０msec、フレーム周
期（第１のフレーム周期）１０msecで分析して得た０次
から２５次のケプストラム係数を子音＋母音（ＣＶ）の
単位で日本語音声の合成に必要な全音節を切り出した計
１３７個の音声素片が蓄積された音声素片ファイル（図
示せず）が用意されている。この音声素片ファイルの内
容は、文音声変換ソフトウェアに従う文音声変換処理の
開始時に、例えばメインメモリ（図示せず）に確保され
た音声素片領域（以下音声素片メモリと称する）５１に
読み込まれているものとする。First, in this embodiment, an actual voice sampled at a sampling frequency of 11025 Hz (first sampling period) was obtained by analysis with a modified cepstrum method at a window length of 20 msec and a frame period (first frame period) of 10 msec. A speech unit file in which a total of 137 speech units, which are obtained by cutting out all syllables necessary for synthesizing Japanese speech from 0th to 25th order cepstral coefficients in units of consonant + vowel (CV) (not shown) ) Is prepared. The contents of this speech unit file are read into a speech unit area (hereinafter referred to as a speech unit memory) 51 secured in, for example, a main memory (not shown) at the start of the sentence-speech conversion process according to the sentence-speech conversion software. It is assumed that

【０１３２】音韻パラメータ生成処理部５０は、（言語
処理部４１内の）言語解析処理部４４から渡される音声
記号列中の音韻情報に従って、上記したＣＶ単位の音声
素片を音声素片メモリ５１から順次読み出し、読み出し
た音声素片を接続することにより合成すべき音声の音韻
パラメータ（特徴パラメータ）を生成する。The phoneme parameter generation processing unit 50 stores the above-mentioned CV-based speech units in the speech unit memory 51 in accordance with the phoneme information in the speech symbol string passed from the language analysis processing unit 44 (in the language processing unit 41). Are sequentially read, and the phoneme parameters (feature parameters) of the speech to be synthesized are generated by connecting the read speech units.

【０１３３】ピッチ生成処理部４９によりピッチパター
ンが生成され、音韻パラメータ生成処理部５０により音
韻パラメータが生成されると、音声合成部４２内の合成
フィルタ処理部５２が起動される。この合成フィルタ処
理部５２は、図６に示すように、ホワイトノイズ発生部
５２１、インパルス発生部５２２、駆動音源切替部５２
３、及びＬＭＡフィルタ５２４から構成されており、上
記生成されたピッチパターンと音韻パラメータから、次
のようにして音声を合成する。When the pitch generation processing unit 49 generates a pitch pattern and the phoneme parameter generation processing unit 50 generates phoneme parameters, the synthesis filter processing unit 52 in the speech synthesis unit 42 is activated. As shown in FIG. 6, the synthesis filter processing unit 52 includes a white noise generating unit 521, an impulse generating unit 522, and a driving sound source switching unit 52.
3 and an LMA filter 524, and synthesizes speech from the generated pitch pattern and phonological parameters as follows.

【０１３４】まず、音声の有声部（Ｖ）では、駆動音源
切替部５２３によりインパルス発生部５２２側に切り替
えられる。インパルス発生部５２２は、ピッチ生成処理
部４９により生成されたピッチパターンに応じた間隔の
インパルスを発生し、このインパルスを音源としてＬＭ
Ａフィルタ５２４を駆動する。First, in the voiced part (V) of the voice, the driving sound source switching part 523 switches it to the impulse generating part 522 side. The impulse generation unit 522 generates impulses at intervals according to the pitch pattern generated by the pitch generation processing unit 49, and uses this impulse as a sound source for the LM.
The A filter 524 is driven.

【０１３５】一方、音声の無声部（Ｕ）では、駆動音源
切替部５２３によりホワイトノイズ発生部５２１側に切
り替えられる。ホワイトノイズ発生部５２１はホワイト
ノイズを発生し、このホワイトノイズを音源としてＬＭ
Ａフィルタ５２４を駆動する。ＬＭＡフィルタ５２４は
音声のケプストラムを直接フィルタ係数とするものであ
る。On the other hand, in the unvoiced part (U) of the voice, the driving sound source switching part 523 switches it to the white noise generating part 521 side. The white noise generator 521 generates white noise, and the white noise is used as a sound source for the LM.
The A filter 524 is driven. The LMA filter 524 uses the sound cepstrum directly as the filter coefficient.

【０１３６】本実施形態において音韻パラメータ生成処
理部５０により生成された音韻パラメータは前記したよ
うにケプストラムであることから、この音韻パラメータ
がＬＭＡフィルタ５２４のフィルタ係数となり、駆動音
源切替部５２３により切り替えられる音源によって駆動
されることで、合成音声を出力する。Since the phonological parameter generated by the phonological parameter generation processing unit 50 in this embodiment is the cepstrum as described above, this phonological parameter becomes the filter coefficient of the LMA filter 524 and is switched by the driving sound source switching unit 523. When driven by a sound source, it outputs synthetic speech.

【０１３７】合成フィルタ処理部５２（内のＬＭＡフィ
ルタ５２４）により合成された音声は離散音声信号であ
り、Ｄ／Ａ変換器５３によりアナログ信号に変換し、ア
ンプ５４を通してスピーカ５５に出力することで、初め
て音として聞くことができる。The voice synthesized by the synthesis filter processing unit 52 (the LMA filter 524 therein) is a discrete voice signal, which is converted into an analog signal by the D / A converter 53 and output to the speaker 55 through the amplifier 54. For the first time, it can be heard as a sound.

【０１３８】ここまでの処理は、図２４を参照しながら
［従来技術］の欄にて説明した例とほぼ同じである。The processing up to this point is almost the same as the example described in the "Prior Art" section with reference to FIG.

【０１３９】本実施形態のポイントは、音声合成部４２
内に（図１中の声質切替部１７及び声質制御部１８に相
当する）声質切替部５６及び声質制御部５７が加えられ
たことにある。なお、Ｄ／Ａ変換器５３はハードウェア
により構成されているが、その変換のサンプリング周波
数はソフトウェアから制御可能なようになっている。The point of this embodiment is that the voice synthesis unit 42
A voice quality switching unit 56 and a voice quality control unit 57 (corresponding to the voice quality switching unit 17 and the voice quality control unit 18 in FIG. 1) are added therein. The D / A converter 53 is composed of hardware, but the sampling frequency for the conversion can be controlled by software.

【０１４０】声質切替部５６は、ユーザによる指定もし
くはアプリケーションプログラム等によって合成する際
の声質を切り替えることができるようになっている。本
実施形態では、この声質切替部５６にて３種類の声質が
指定可能であるものとする。The voice quality switching section 56 is capable of switching the voice quality at the time of synthesis by a user designation or an application program. In this embodiment, it is assumed that three types of voice qualities can be designated by the voice quality switching unit 56.

【０１４１】声質制御部５７は、声質切替部５６で指定
された声質に応じて１１０２５Ｈｚ，１２０００ＨＺ，
１００００Ｈｚのいずれかのサンプリング周波数でＤ／
Ａ変換を行うようにＤ／Ａ変換器５３を制御する。The voice quality control unit 57 operates at 11025 Hz, 12000 HZ, depending on the voice quality designated by the voice quality switching unit 56.
D / at any sampling frequency of 10000 Hz
The D / A converter 53 is controlled to perform A conversion.

【０１４２】音声素片を作成した際のサンプリング周波
数（第１の標本周期）と同じサンプリング周波数、即ち
１１０２５ＨｚでＤ／Ａ変換器５３がＤ／Ａ変換を行え
ば、元の音声の声質で音声合成することができる。一
方、Ｄ／Ａ変換器５３が他のサンプリング周波数（第１
の標本周期とは異なる第２の標本周期）でＤ／Ａ変換す
れば、前記第１の実施形態で既に述べたように、音声ス
ペクトルを図２に示すように周波数軸方向にシフトした
効果が得られるため、音声の個人性が変化し、こうして
得られるアナログ音声信号の声質は、音声素片の元とな
っている音声の声質とは異なったものとなる。If the D / A converter 53 performs D / A conversion at the same sampling frequency (first sampling period) as when the speech unit was created, that is, 11025 Hz, the voice quality of the original voice Can be synthesized. On the other hand, the D / A converter 53 has another sampling frequency (first
If the D / A conversion is performed at a second sampling period (which is different from the sampling period), the effect of shifting the voice spectrum in the frequency axis direction as shown in FIG. 2 is obtained as already described in the first embodiment. As a result, the individuality of the voice changes, and the voice quality of the analog voice signal thus obtained differs from the voice quality of the voice that is the source of the voice segment.

【０１４３】［第５の実施形態］前記第４の実施形態に
おいては、声質切替部５６及び声質制御部５７を設けた
ことで、合成音声の声質を簡単に増やすことができるも
のの、合成される音声のスピードが声質により異なる。[Fifth Embodiment] In the fourth embodiment, since the voice quality switching unit 56 and the voice quality control unit 57 are provided, the voice quality of the synthesized voice can be easily increased, but the voice is synthesized. Voice speed varies depending on voice quality.

【０１４４】即ち、（素片作成時のサンプリング周波
数）＞（Ｄ／Ａ変換のサンプリング周波数）のときに
は、合成される音声のスピードは遅くなる。逆に、（素
片作成時のサンプリング周波数）＜（Ｄ／Ａ変換のサン
プリング周波数）のときには、合成される音声のスピー
ドは早くなる。That is, when (sampling frequency when creating a segment)> (sampling frequency for D / A conversion), the speed of the synthesized voice becomes slow. On the other hand, when (sampling frequency at the time of creating a segment) <(sampling frequency for D / A conversion), the speed of the synthesized voice becomes faster.

【０１４５】このような合成される音声のスピードの違
いは、Ｄ／Ａ変換時のサンプリング周波数が前記第４の
実施形態程度の違い（９１％，１０９％）ではあまり問
題とはならない。しかし以下に述べるように、Ｄ／Ａ変
換時のサンプリング周波数が、音声素片メモリ５１に読
み込まれている音声素片を作成した際の音声のサンプリ
ング周波数と大きく異なる場合には、問題となる。The difference in the speed of the synthesized voice does not cause much problem if the sampling frequencies at the time of D / A conversion are the same as those in the fourth embodiment (91%, 109%). However, as will be described below, when the sampling frequency at the time of D / A conversion is significantly different from the sampling frequency of the voice when the voice unit read in the voice unit memory 51 is significantly different, there is a problem.

【０１４６】まず、素片作成時のサンプリング周波数と
合成時のＤ／Ａ変換のサンプリング周波数の比が１に近
ければ声質は変化も小さく、逆にこの比が１から離れれ
ば声質は大きく変化する。したがって、声質を大きく変
えようとすれば、これら両サンプリング周波数の比を例
えば５０％，２００％程度に設定すればよいが、これで
は合成音声のスピードもそれぞれ元の音声の５０％，２
００％、即ち半分と倍になり聞きづらくなる。First, if the ratio of the sampling frequency at the time of creating a segment and the sampling frequency of D / A conversion at the time of synthesis is close to 1, the voice quality changes little, and conversely if the ratio deviates from 1, the voice quality changes greatly. . Therefore, if the voice quality is to be changed significantly, the ratio of these two sampling frequencies may be set to, for example, about 50% and 200%, but with this, the speed of the synthesized voice is 50% and 2% of the original voice, respectively.
00%, that is, half the time, making it difficult to hear.

【０１４７】そこで、合成時のＤ／Ａ変換のサンプリン
グ周波数が素片作成時のサンプリング周波数と大きく異
なった場合でも、合成音声のスピードを一定にできるよ
うにした第５の実施形態につき説明する。Therefore, a fifth embodiment will be described in which the speed of the synthesized speech can be made constant even when the sampling frequency of the D / A conversion at the time of synthesis is significantly different from the sampling frequency at the time of creating the segment.

【０１４８】図７は本発明の第５の実施形態に係る音声
の規則合成装置の概略構成を示すブロック図であり、図
５と同一部分には同一符号を付してある。FIG. 7 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the fifth embodiment of the present invention. The same parts as those in FIG. 5 are designated by the same reference numerals.

【０１４９】本実施形態のポイントは、図５中の発話速
度制御部４８に代えて、合成音声のスピード（発話速
度）が制御可能な発話速度制御部６８を設けると共に、
図５中の声質制御部５７に代えて、Ｄ／Ａ変換のサンプ
リング周波数だけでなく合成音声のスピードを制御する
声質制御部６７を設け、当該声質制御部６７により、声
質切替部５６の指定に応じて、Ｄ／Ａ変換器５３でのＤ
／Ａ変換のサンプリング周波数と同時に発話速度制御部
４８での合成音声のスピードを制御するところにある。The point of the present embodiment is to provide a speech rate control section 68 capable of controlling the speed (speech rate) of synthesized speech instead of the speech rate control section 48 in FIG.
In place of the voice quality control unit 57 in FIG. 5, a voice quality control unit 67 for controlling not only the sampling frequency of D / A conversion but also the speed of synthesized speech is provided, and the voice quality control unit 67 specifies the voice quality switching unit 56. Accordingly, D in the D / A converter 53
This is to control the speed of the synthesized voice in the speech rate control unit 48 at the same time as the sampling frequency of the / A conversion.

【０１５０】本実施形態では、前記第４の実施形態と同
様に声質切替部５６にて３種類の声質が指定可能であ
る。声質制御部６７は声質切替部５６で指定された声質
に応じて、例えば１１０２５Ｈｚ，８０００Ｈｚ，１６
０００Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変
換を行うようにＤ／Ａ変換器５３を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 56 as in the fourth embodiment. The voice quality control unit 67 determines, for example, 11025 Hz, 8000 Hz, 16 according to the voice quality designated by the voice quality switching unit 56.
The D / A converter 53 is controlled to perform D / A conversion at any sampling frequency of 000 Hz.

【０１５１】同時に声質制御部６７は、声質切替部５６
によって指定された声質に応じて、発話速度制御部４８
を次のように制御する。即ち声質制御部６７は、先に説
明したＣＶわたりの間隔が、（ＣＶわたりの間隔）＝（１１０２５ＨｚでＤ／Ａ変換
時のＣＶわたりの間隔）×（Ｄ／Ａ変換のサンプリング
周波数）／（素片作成時のサンプリング周波数）となるよう発話速度制御部６８を制御する。At the same time, the voice quality control unit 67 controls the voice quality switching unit 56.
According to the voice quality designated by
Is controlled as follows. That is, the voice quality control unit 67 determines that the CV crossover interval described above is (CV crossover interval) = (CV crossover interval during D / A conversion at 11025 Hz) × (D / A conversion sampling frequency) / ( The speech rate control unit 68 is controlled so that it becomes the sampling frequency when the segment is created).

【０１５２】したがって、８０００ＨｚにてＤ／Ａ変換
を行う際には、声質制御部６７は上式に基づき、１１０
２５ＨｚでＤ／Ａ変換するときのＣＶわたり間隔の（１
１０２５[Hz]／８０００[Hz]）倍のＣＶわたり間隔とな
るよう発話速度制御部６８を制御する。また声質制御部
６７は、１６０００ＨｚでＤ／Ａ変換を行う場合には、
１１０２５ＨｚでＤ／Ａ変換するときのＣＶわたり間隔
の（１１０２５[Hz]／１６０００[Hz]）倍のＣＶわたり
間隔となるよう発話速度制御部６８を制御する。Therefore, when the D / A conversion is performed at 8000 Hz, the voice quality control unit 67 calculates
The CV crossing interval (1 for D / A conversion at 25 Hz
The utterance speed control unit 68 is controlled so that the CV crossing interval is 1025 [Hz] / 8000 [Hz] times. Further, the voice quality control unit 67, when performing D / A conversion at 16000 Hz,
The utterance speed control unit 68 is controlled so that the CV spread interval is (11025 [Hz] / 16000 [Hz]) times the CV spread interval when D / A conversion is performed at 11025 Hz.

【０１５３】このように本実施形態においては、音声素
片作成時と異なるサンプリング周波数（第１の標本周期
とは異なる第２の標本周期）でＤ／Ａ変換したときの合
成音声のスピードの変化を、ＣＶわたりの間隔（発話速
度パラメータ）を変えることで相殺することができる。
したがって、音声素片作成時のサンプリング周波数と異
なる８０００Ｈｚあるいは１６０００ＨｚでＤ／Ａ変換
を行っても、ほぼ同じスピードの音声のアナログ信号を
得ることができる。As described above, in this embodiment, the change in the speed of the synthesized voice when the D / A conversion is performed at the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of creating the speech unit. Can be canceled by changing the interval between CVs (speech rate parameter).
Therefore, even if the D / A conversion is performed at 8000 Hz or 16000 Hz, which is different from the sampling frequency at the time of creating the voice unit, it is possible to obtain a voice analog signal having substantially the same speed.

【０１５４】［第６の実施形態］前記第５の実施形態に
基づいて規則合成を行えば、確かに声質を変えながらも
合成される音声のスピードをほぼ一定に保つことが簡単
に実現できる。しかし既に説明したように、音韻継続時
間計算処理部４７での処理（音韻継続時間計算処理）で
は、ＣＶわたりの間隔を変えても子音の音韻継続時間を
音節毎に一定にするため、音声素片作成時と異なるサン
プリング周波数でＤ／Ａ変換すると、子音の継続時間が
縮んだり、間延びしたりし、この結果、合成される音声
の明瞭性・自然性に影響を及ぼすことがある。[Sixth Embodiment] If rule synthesis is performed based on the fifth embodiment, it is possible to easily realize that the speed of synthesized speech can be kept substantially constant while changing the voice quality. However, as described above, in the process (phoneme duration calculation process) in the phoneme duration calculation processing unit 47, the phoneme duration of the consonant is made constant for each syllable even if the interval between CV crossings is changed. If the D / A conversion is performed at a sampling frequency different from that at the time of creating one piece, the duration of the consonant may be shortened or extended, and as a result, the clarity and naturalness of the synthesized voice may be affected.

【０１５５】そこで、合成される音声の子音の継続時間
を音節毎に一定に保つことができるようにした第６の実
施形態につき説明する。Therefore, a sixth embodiment will be described in which the duration of the consonant of the synthesized voice can be kept constant for each syllable.

【０１５６】図８は本発明の第６の実施形態に係る音声
の規則合成装置の概略構成を示すブロック図であり、図
５または図７と同一部分には同一符号を付してある。FIG. 8 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the sixth embodiment of the present invention. The same parts as those in FIG. 5 or 7 are designated by the same reference numerals.

【０１５７】本実施形態のポイントは、図７中の音韻継
続時間計算処理部４７に代えて、音韻継続時間が制御可
能な音韻継続時間計算処理部７７を設けると共に、図７
中の声質制御部６７に代えて、Ｄ／Ａ変換のサンプリン
グ周波数だけでなく音韻継続時間による合成音声のスピ
ードを制御する声質制御部８７を設け、当該声質制御部
８７により、声質切替部５６の指定に応じて、Ｄ／Ａ変
換器５３でのＤ／Ａ変換のサンプリング周波数と同時に
音韻継続時間計算処理部７７での音韻継続時間を制御す
るところにある。The point of this embodiment is that, instead of the phoneme duration calculation processing section 47 in FIG. 7, a phoneme duration calculation processing section 77 whose phoneme duration is controllable is provided.
Instead of the voice quality control unit 67 in the middle, a voice quality control unit 87 for controlling the speed of the synthesized voice according to the phoneme duration as well as the sampling frequency of the D / A conversion is provided, and the voice quality control unit 87 controls the voice quality switching unit 56. According to the designation, the phoneme duration in the phoneme duration calculation processing unit 77 is controlled at the same time as the sampling frequency of the D / A conversion in the D / A converter 53.

【０１５８】本実施形態では、前記第５の実施形態と同
様に声質切替部５６にて３種類の声質が指定可能であ
る。声質制御部８７は声質切替部５６で指定された声質
に応じて、例えば１１０２５Ｈｚ，８０００Ｈｚ，１６
０００Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変
換を行うようにＤ／Ａ変換器５３を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 56 as in the fifth embodiment. The voice quality control unit 87 determines, for example, 11025 Hz, 8000 Hz, 16 according to the voice quality designated by the voice quality switching unit 56.
The D / A converter 53 is controlled to perform D / A conversion at any sampling frequency of 000 Hz.

【０１５９】同時に声質制御部８７は、声質切替部５６
によって指定された声質に応じて、音韻継続時間計算処
理部７７を次のように制御する。即ち声質制御部８７
は、全ての音韻の継続時間、つまり子音の継続時間と母
音の継続時間が（音韻継続時間）＝（１１０２５ＨｚでＤ／Ａ変換時の
音韻継続時間）×（Ｄ／Ａ変換のサンプリング周波数）
／（素片作成時のサンプリング周波数）となるよう音韻継続時間計算処理部７７を制御する。At the same time, the voice quality control section 87 controls the voice quality switching section 56.
The phoneme duration calculation processing unit 77 is controlled as follows according to the voice quality designated by. That is, the voice quality control unit 87
Is the duration of all phonemes, that is, the duration of consonants and the duration of vowels (phoneme duration) = (phoneme duration during D / A conversion at 11025 Hz) × (sampling frequency of D / A conversion)
The phoneme duration calculation processing unit 77 is controlled so that it becomes / (sampling frequency at the time of segment production).

【０１６０】したがって、８０００ＨｚにてＤ／Ａ変換
を行う際には、声質制御部８７は上式に基づき、１１０
２５ＨｚでＤ／Ａ変換するときの音韻継続時間の（８０
００[Hz]／１１０２５[Hz]）倍の音韻継続時間となるよ
う音韻継続時間計算処理部７７を制御する。また声質制
御部８７は、１６０００ＨｚでＤ／Ａ変換を行う場合に
は、１１０２５ＨｚでＤ／Ａ変換するときの音韻継続時
間の（１６０００[Hz]／１１０２５[Hz]）倍の音韻継続
時間となるよう音韻継続時間計算処理部７７を制御す
る。Therefore, when the D / A conversion is performed at 8000 Hz, the voice quality control unit 87 calculates 110
Of the phoneme duration (80 for D / A conversion at 25 Hz)
The phoneme duration calculation processing unit 77 is controlled so that the phoneme duration becomes 00 [Hz] / 11025 [Hz]) times. When performing D / A conversion at 16000 Hz, the voice quality control unit 87 has a phoneme duration that is (16000 [Hz] / 11025 [Hz]) times the phoneme duration when D / A conversion is performed at 11025 Hz. The phoneme duration calculation processing unit 77 is controlled as follows.

【０１６１】このように本実施形態においては、音声素
片作成時と異なるサンプリング周波数（第１の標本周期
とは異なる第２の標本周期）でＤ／Ａ変換したときの合
成音声のスピードの変化を、各音韻継続時間を変えるこ
とで相殺することができ、かつ合成される音声の子音の
継続時間を音節毎に一定に保つことができる。As described above, in this embodiment, the change in the speed of the synthesized voice when the D / A conversion is performed at the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of creating the speech unit. Can be canceled by changing each phoneme duration, and the duration of the consonant of the synthesized voice can be kept constant for each syllable.

【０１６２】［第７の実施形態］前記第５または第６の
実施形態に基づいて合成を行えば、確かに声質を変えな
がらも合成される音声のスピードをほぼ一定に保つこと
ができる。しかし、音声素片作成時のサンプリング周波
数とＤ／Ａ変換時のサンプリング周波数とを違えるとい
うことは、レコードの早回しや遅回しとほぼ同じである
から、音声の過渡部分で時間的に縮んだり間延びするこ
とは避けられない。[Seventh Embodiment] If synthesis is performed based on the fifth or sixth embodiment, the speed of synthesized speech can be kept substantially constant while changing the voice quality. However, the difference between the sampling frequency at the time of creating a voice unit and the sampling frequency at the time of D / A conversion is almost the same as the fast-forwarding or slow-moving of a record. It is unavoidable to extend it.

【０１６３】例をあげれば、／わ／という発声は、／う
／に近い口の形から急激に唇を開いて／あ／へ移る運動
を発声器管が行う。したがって、Ｄ／Ａ変換時のサンプ
リング周波数を落して、レコードの遅回しのようなこと
をすれば、この変化が緩やかになり、／わ／ではなく、
／うあー／のように聞こえてくる。To give an example, the vocalization "/ wa /" is performed by the vocal tract, which causes the mouth to open suddenly and move to / a / from the shape of the mouth close to / u /. Therefore, if the sampling frequency at the time of D / A conversion is lowered and the record is delayed, this change becomes gentle, and not / wa /
It sounds like / uu /.

【０１６４】そこで、素片作成時と異なるサンプリング
周波数でＤ／Ａ変換したときに生じる合成音声過渡部分
の時間的方向の縮みや間延びを抑えることができるよう
にした第７の実施形態につき説明する。Therefore, a seventh embodiment will be described in which it is possible to suppress the temporal shrinkage or extension of the synthesized voice transient portion that occurs when the D / A conversion is performed at a sampling frequency different from that at the time of creating the segment. .

【０１６５】図９は本発明の第７の実施形態に係る音声
の規則合成装置の概略構成を示すブロック図であり、図
８と同一部分には同一符号を付してある。FIG. 9 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the seventh embodiment of the present invention. The same parts as those in FIG. 8 are designated by the same reference numerals.

【０１６６】本実施形態のポイントは、図８中の音韻パ
ラメータ生成処理部５０に代えて、（音声素片からな
る）音韻パラメータを時間軸方向へ伸縮する機能が付加
された音韻パラメータ生成処理部９０を設けると共に、
図８中の声質制御部８７に代えて、Ｄ／Ａ変換のサンプ
リング周波数及び音韻継続時間だけでなく音韻パラメー
タの時間軸方向への伸縮を制御する声質制御部９７を設
け、当該声質制御部９７により、声質切替部５６の指定
に応じて、Ｄ／Ａ変換器５３でのＤ／Ａ変換のサンプリ
ング周波数及び音韻継続時間計算処理部７７での音韻継
続時間を制御する他に、音韻パラメータ生成処理部９０
を制御し、合成音声の過渡部が縮んでしまうような場合
には、予め音韻パラメータを時間方向に引き伸ばして音
韻パラメータを作成させ、合成音声の過渡部が間延びす
るような場合には、予め音韻パラメータを時間方向に圧
縮して音韻パラメータを作成させるところにある。The point of the present embodiment is that, instead of the phoneme parameter generation processing unit 50 in FIG. 8, a phoneme parameter generation processing unit to which a function of expanding and contracting a phoneme parameter (consisting of speech units) in the time axis direction is added. With 90
Instead of the voice quality control unit 87 in FIG. 8, a voice quality control unit 97 is provided for controlling not only the sampling frequency and phoneme duration of D / A conversion but also the expansion and contraction of the phoneme parameters in the time axis direction. Thus, in addition to controlling the sampling frequency of D / A conversion in the D / A converter 53 and the phoneme duration in the phoneme duration calculation processing unit 77 according to the designation of the voice quality switching unit 56, the phoneme parameter generation processing is performed. Part 90
When the transient part of the synthetic speech is shortened, the phoneme parameter is stretched in advance in the time direction to create the phoneme parameter. The parameter is compressed in the time direction to create a phoneme parameter.

【０１６７】本実施形態では、前記第６の実施形態と同
様に声質切替部５６にて３種類の声質が指定可能であ
る。声質制御部９７は声質切替部５６で指定された声質
に応じて、１１０２５Ｈｚ，８０００Ｈｚ，１６０００
Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変換を行
うようにＤ／Ａ変換器４１３を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 56 as in the sixth embodiment. The voice quality control unit 97 determines 11025 Hz, 8000 Hz, 16000 according to the voice quality designated by the voice quality switching unit 56.
The D / A converter 413 is controlled to perform D / A conversion at any sampling frequency of Hz.

【０１６８】同時に声質制御部９７は、声質切替部５６
によって指定された声質に応じて、全ての音韻の継続時
間、即ち子音の継続時間と母音の継続時間を（音韻継続時間）＝（１１０２５ＨｚでＤ／Ａ変換時の
音韻継続時間）×（Ｄ／Ａ変換のサンプリング周波数）
／（素片作成時のサンプリング周波数）となるよう音韻継続時間計算処理部７７を制御する。At the same time, the voice quality control unit 97 controls the voice quality switching unit 56.
Depending on the voice quality specified by, the duration of all phonemes, that is, the duration of consonants and the duration of vowels is (phoneme duration) = (phoneme duration at D / A conversion at 11025 Hz) × (D / A conversion sampling frequency)
The phoneme duration calculation processing unit 77 is controlled so that it becomes / (sampling frequency at the time of segment production).

【０１６９】さらに、声質制御部９７は音韻パラメータ
生成処理部９０を制御し、Ｄ／Ａ変換のサンプリング周
波数（第２の標本周期）を素片作成時のサンプリング周
波数（第１の標本周期）とは異ならせることによって合
成音声の過渡部が縮んでしまうような場合には、予め音
韻パラメータを時間方向に引き伸ばして音韻パラメータ
を作成させ、合成音声の過渡部が間延びするような場合
には、予め音韻パラメータを時間方向に圧縮して音韻パ
ラメータを作成させる。Further, the voice quality control unit 97 controls the phonological parameter generation processing unit 90, and sets the sampling frequency of D / A conversion (second sampling period) to the sampling frequency (first sampling period) at the time of segment production. If the transitional part of the synthetic speech is shortened by making different, the phoneme parameters are stretched in advance in the time direction to create the phoneme parameters. A phonological parameter is created by compressing the phonological parameter in the time direction.

【０１７０】もっと正確には、声質制御部９７は、素片
自身の長さを、（Ｄ／Ａ変換のサンプリング周波数／素
片作成時のサンプリング周波数）倍となる伸縮を行って
から接続補間し、音韻パラメータを生成させる。More precisely, the voice quality control unit 97 expands or contracts the length of the segment itself by (sampling frequency of D / A conversion / sampling frequency when the segment is created), and then interpolates the connection. , Generate phonological parameters.

【０１７１】即ち、本実施形態における声質制御部９７
は、１１０２５ＨｚにてＤ／Ａ変換を行う場合には、音
声素片の伸縮は行わずに音韻パラメータを生成し、８０
００ＨｚでＤ／Ａ変換を行う場合には、音声素片を（８
０００[Hz]／１１０２５[Hz]）倍の長さに縮めてから接
続補間して音韻パラメータを生成し、１６０００Ｈｚで
Ｄ／Ａ変換を行う場合には、音声素片を（１６０００[H
z]／１１０２５[Hz]）倍の長さに延ばしてから接続補間
して音韻パラメータを生成するよう音韻パラメータ生成
処理部９０を制御する。That is, the voice quality control unit 97 in this embodiment.
When D / A conversion is performed at 11025 Hz, the phoneme parameter is generated without expanding or contracting the speech unit, and
When performing D / A conversion at 00Hz, the speech unit is set to (8
000 [Hz] / 11025 [Hz]) times the length, and then connect and interpolate to generate phoneme parameters, and when D / A conversion is performed at 16000 Hz, the speech unit is (16000 [H]
z] / 11025 [Hz]) times longer, and then the phoneme parameter generation processing unit 90 is controlled so as to generate a phoneme parameter by connection interpolation.

【０１７２】このように本実施形態においては、素片作
成時と異なるサンプリング周波数（第１の標本周期とは
異なる第２の標本周期）でＤ／Ａ変換したときに生じる
合成音声過渡部分の時間方向の縮みや間延びを、予め音
韻パラメータ生成時に音声素片を伸縮させておくことで
相殺することができる。As described above, in the present embodiment, the time of the synthetic speech transient portion which occurs when the D / A conversion is performed at the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of creating the segment. The contraction and the extension in the direction can be canceled by expanding and contracting the speech unit in advance when the phoneme parameters are generated.

【０１７３】［第８の実施形態］前述の第５乃至第７の
実施形態は、ケプストラムやＬＰＣなどを利用した音声
規則合成、即ち音声波形を分析して得られるパラメータ
を用いた音声規則合成だけではなく、波形合成（による
規則合成）にも応用は可能である。しかし、パラメータ
を用いた音声規則合成では、前述の第５乃至第７の実施
形態を用いずとも、声質を変えながら、合成音声のスピ
ードを一定にし、かつ音声過渡部の縮み間延びを起こさ
せない簡便な方法が適用可能である。[Eighth Embodiment] In the fifth to seventh embodiments described above, only voice rule synthesis using a cepstrum or LPC, that is, voice rule synthesis using parameters obtained by analyzing a voice waveform is performed. Instead, it can be applied to waveform synthesis (rule synthesis by). However, in the voice rule synthesis using the parameters, the speed of the synthesized voice can be kept constant while the voice quality is changed, and the contraction / extension of the voice transition part does not occur without using the fifth to seventh embodiments. Various methods are applicable.

【０１７４】そこで、この簡便な方法を、パラメータを
用いた音声規則合成に適用した第８の実施形態につき説
明する。Therefore, an explanation will be given of an eighth embodiment in which this simple method is applied to voice rule synthesis using parameters.

【０１７５】図１０は本発明の第８の実施形態に係る音
声の規則合成装置の概略構成を示すブロック図であり、
図５と同一部分には同一符号を付してある。FIG. 10 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the eighth embodiment of the present invention.
The same parts as those in FIG. 5 are designated by the same reference numerals.

【０１７６】本実施形態のポイントは、図５中の合成フ
ィルタ処理部５２に代えて、合成時のフレーム周期が制
御可能な合成フィルタ処理部１１２を設けると共に、図
５中の声質制御部５７に代えて、Ｄ／Ａ変換のサンプリ
ング周波数だけでなく合成時のフレーム周期を制御する
声質制御部１１７を設け、当該声質制御部１１７によ
り、声質切替部５６の指定に応じて、Ｄ／Ａ変換器５３
でのＤ／Ａ変換のサンプリング周波数と同時に、合成フ
ィルタ処理部１１２での合成時のフレーム周期を制御す
るところにある。The point of this embodiment is that, instead of the synthesis filter processing unit 52 in FIG. 5, a synthesis filter processing unit 112 capable of controlling the frame period at the time of synthesis is provided, and the voice quality control unit 57 in FIG. Instead, a voice quality control unit 117 that controls not only the sampling frequency of D / A conversion but also the frame period at the time of synthesis is provided, and according to the designation of the voice quality switching unit 56, the D / A converter is provided by the voice quality control unit 117. 53
At the same time as the sampling frequency of the D / A conversion in (1), the frame period at the time of synthesis in the synthesis filter processing unit 112 is controlled.

【０１７７】本実施形態では、前記第４の実施形態と同
様に声質切替部５６にて３種類の声質が指定可能であ
る。声質制御部１１７は声質切替部５６で指定された声
質に応じて、１１０２５Ｈｚ，８０００Ｈｚ，１６００
０Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変換を
行うようにＤ／Ａ変換器５３を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 56 as in the fourth embodiment. The voice quality control unit 117 determines 11025 Hz, 8000 Hz, 1600 according to the voice quality designated by the voice quality switching unit 56.
The D / A converter 53 is controlled to perform D / A conversion at any sampling frequency of 0 Hz.

【０１７８】同時に声質制御部１１７は、声質切替部５
６によって指定された声質に応じて、合成フィルタ処理
部１１２で行われる合成のフレーム周期を設定する。こ
れにより合成フィルタ処理部１１２では、音韻パラメー
タ生成処理部５０により生成された音韻パラメータ（ケ
プストラム）を設定されたフレーム周期で（ＬＭＡフィ
ルタに）入力し、当該フレーム周期で音声（離散音声信
号）を合成する。At the same time, the voice quality control unit 117 causes the voice quality switching unit 5 to
According to the voice quality designated by No. 6, the frame period of the synthesis performed by the synthesis filter processing unit 112 is set. As a result, the synthesis filter processing unit 112 inputs the phoneme parameters (cepstrum) generated by the phoneme parameter generation processing unit 50 at the set frame cycle (to the LMA filter), and outputs the voice (discrete voice signal) at the frame cycle. To synthesize.

【０１７９】合成のフレーム周期は次式により与えられ
る。The frame period of synthesis is given by the following equation.

【０１８０】（フレーム周期）＝（素片作成時のフレーム周期）×（素片作成時のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（素片作成時のフレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（素片作成時のサンプリング周波数）したがって、音声素片作成時の音声のサンプリング周波
数（第１の標本周期）と同じサンプリング周波数、即ち
１１０２５ＨｚでＤ／Ａ変換を行う際には、声質制御部
１１７は上式に基づき、合成時のフレーム周期を音声素
片作成時のフレーム周期（第１のフレーム周期）と同じ
１０msecで合成するよう合成フィルタ処理部１１２を制
御する。(Frame Cycle) = (Frame Cycle During Element Creation) x (Sampling Cycle During Element Creation) / (D / A Conversion Sampling Cycle) = (Frame Cycle During Element Creation) x (D / Sampling frequency of A conversion) / (Sampling frequency when creating a voice segment) Therefore, D / A conversion is performed at the same sampling frequency as the voice sampling frequency (first sampling period) when creating a voice voice unit, that is, 11025 Hz. At this time, the voice quality control unit 117 controls the synthesis filter processing unit 112 based on the above equation so that the synthesis frame period is synthesized at 10 msec which is the same as the frame period (first frame period) at the time of creating the speech unit. .

【０１８１】また声質制御部１１７は、音声素片作成時
の音声のサンプリング周波数とは異なるサンプリング周
波数（第１の標本周期とは異なる第２の標本周期）、例
えば８０００ＨｚにてＤ／Ａ変換を行う場合には、１０[msec]×８０００[Hz]／１１０２５[Hz]＝７．３[m
sec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御し、１６０００Ｈ
ｚにてＤ／Ａ変換を行う場合には、１０[msec]×１６０００[Hz]／１１０２５[Hz]＝１４．
５[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御する。Further, the voice quality control unit 117 performs D / A conversion at a sampling frequency (second sampling period different from the first sampling period) different from the sampling frequency of the voice when the speech unit is created, for example, 8000 Hz. When performing, 10 [msec] × 8000 [Hz] / 11025 [Hz] = 7.3 [m
sec] frame period (second frame period different from the first frame period) is controlled to perform synthesis, and 16000H
When performing D / A conversion in z, 10 [msec] × 16000 [Hz] / 11025 [Hz] = 14.
The control is performed so that the composition is performed in a frame period of 5 [msec] (a second frame period different from the first frame period).

【０１８２】このように本実施形態においては、音声素
片作成時と異なるサンプリング周波数（第１の標本周期
とは異なる第２の標本周期）でＤ／Ａ変換したときの合
成音声のスピードの変化を、合成フィルタ処理部１１２
での合成のフレーム周期（第２のフレーム周期）を音声
素片作成時のフレーム周期（第１のフレーム周期）とは
異ならせることで相殺することができる。As described above, in this embodiment, the change in the speed of the synthesized voice when the D / A conversion is performed at the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of creating the speech unit. The synthesis filter processing unit 112
This can be offset by making the frame cycle of synthesis (2nd frame cycle) in step 2 different from the frame cycle (first frame cycle) at the time of creating the speech unit.

【０１８３】したがって、音声素片作成時のサンプリン
グ周波数とは異なる８０００Ｈｚあるいは１６０００Ｈ
ｚでＤ／Ａ変換を行っても、同じスピードの音声のアナ
ログ信号を得ることができる。また同時に、音声素片作
成時と異なるサンプリング周波数でＤ／Ａ変換したとき
に生じる音声過渡部の縮みや間延びも防ぐことができ
る。Therefore, 8000 Hz or 16000 H, which is different from the sampling frequency at the time of creating a speech unit
Even if D / A conversion is performed with z, it is possible to obtain an analog signal of voice with the same speed. At the same time, it is possible to prevent the contraction or extension of the voice transient portion which occurs when the D / A conversion is performed at a sampling frequency different from that at the time of creating the voice unit.

【０１８４】［第９の実施形態］前記第４乃至第８の実
施形態にはもう１つの問題点が存在する。それは、Ｄ／
Ａ変換のサンプリング周波数を素片作成時のものに変え
ると、声の高さ即ち音声のピッチが変化してしまうとい
うことである。例えば、（素片作成時のサンプリング周
波数）＞（Ｄ／Ａ変換のサンプリング周波数）のときに
は、合成される音声のピッチは低くなる。逆に、（素片
作成時のサンプリング周波数）＜（Ｄ／Ａ変換のサンプ
リング周波数）のときには、合成される音声のピッチは
高くなる。[Ninth Embodiment] There is another problem in the fourth to eighth embodiments. It ’s D /
This means that if the sampling frequency of A conversion is changed to the one used when the segment was created, the pitch of the voice, that is, the pitch of the voice changes. For example, when (sampling frequency when creating a segment)> (sampling frequency for D / A conversion), the pitch of synthesized speech is low. On the other hand, when (sampling frequency at the time of segment production) <(sampling frequency for D / A conversion), the pitch of the synthesized voice becomes high.

【０１８５】このような合成される音声のピッチの違い
は、Ｄ／Ａ変換時のサンプリング周波数が前記第４の実
施形態程度の違い（９１％，１０９％）ではあまり問題
とはならない。The difference in the pitch of the synthesized voices does not cause much problem if the sampling frequencies at the time of D / A conversion are different from those in the fourth embodiment (91%, 109%).

【０１８６】しかし、声質を大きく変えようとして、両
サンプリング周波数の比を例えば５０％，２００％程度
に設定すれば、合成音声のピッチもそれぞれ５０％，２
００％になる。この場合、１１０２５ＨｚでＤ／Ａ変換
したときの音声と比較して、前者はピッチが１[oct] 低
い音声が合成され、後者は１[oct] 高い音声が合成され
るので聞きづらくなるという問題が発生する。However, if the ratio of both sampling frequencies is set to, for example, about 50% and 200% in order to largely change the voice quality, the pitches of the synthesized voices are 50% and 2 respectively.
It will be 00%. In this case, compared to the voice when D / A converted at 11025 Hz, the former is synthesized with a voice having a pitch lower by 1 [oct], and the latter is synthesized with a voice having a pitch higher than 1 [oct], which makes it difficult to hear. Occurs.

【０１８７】そこで、合成時のＤ／Ａ変換のサンプリン
グ周波数が音声素片作成時のサンプリング周波数と大き
く異なった場合でも、合成音声のピッチを一定にできる
ようにした第９の実施形態につき説明する。Therefore, a ninth embodiment will be described in which the pitch of synthesized speech can be made constant even if the sampling frequency of D / A conversion at the time of synthesis is significantly different from the sampling frequency at the time of speech unit creation. .

【０１８８】図１１は本発明の第９の実施形態に係る音
声の規則合成装置の概略構成を示すブロック図であり、
図５と同一部分には同一符号を付してある。FIG. 11 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the ninth embodiment of the present invention.
The same parts as those in FIG. 5 are designated by the same reference numerals.

【０１８９】本実施形態のポイントは、図５中の合成フ
ィルタ処理部５２に代えて、合成時のフレーム周期及び
合成音声のピッチが制御可能な合成フィルタ処理部１３
２を設けると共に、ピッチ生成処理部４９と合成フィル
タ処理部１３２との間にピッチ生成処理部４９で生成さ
れたピッチパターン（基本周波数パターン）を周波数の
異なる別のピッチパターンに変換（ピッチ変調）して合
成フィルタ処理部１３２に与えるピッチ変調処理部１３
８を設け、さらに図５中の声質制御部５７に代えて、Ｄ
／Ａ変換のサンプリング周波数だけでなく合成時のフレ
ーム周期及びピッチの変調を制御する声質制御部１３７
を設け、当該声質制御部１３７により、声質切替部５６
の指定に応じて、Ｄ／Ａ変換器５３でのサンプリング周
波数と合成フィルタ処理部１３２での合成のフレーム周
期を制御すると同時に、ピッチ変調処理部１３８でのピ
ッチの変調を制御するところにある。The point of this embodiment is that, instead of the synthesis filter processing unit 52 in FIG. 5, the synthesis filter processing unit 13 capable of controlling the frame period and the pitch of synthesized speech during synthesis.
2 is provided, and the pitch pattern (fundamental frequency pattern) generated by the pitch generation processing unit 49 is converted between the pitch generation processing unit 49 and the synthesis filter processing unit 132 into another pitch pattern having a different frequency (pitch modulation). Pitch modulation processing unit 13 which is then given to the synthesis filter processing unit 132
8 is provided, and in addition to the voice quality control unit 57 in FIG.
Voice quality control unit 137 for controlling not only the sampling frequency of the A / A conversion but also the modulation of the frame period and pitch at the time of synthesis.
And the voice quality control unit 137 controls the voice quality switching unit 56.
The sampling frequency in the D / A converter 53 and the synthesizing frame period in the synthesizing filter processing unit 132 are controlled, and at the same time, the pitch modulation processing unit 138 controls the pitch modulation.

【０１９０】本実施形態では、前記第４の実施形態と同
様に声質切替部５６にて３種類の声質が指定可能であ
る。声質制御部１３７は声質切替部５６で指定された声
質に応じて、１１０２５Ｈｚ，８０００Ｈｚ，１６００
０Ｈｚのいずれかのサンプリング周波数でＤ／Ａ変換を
行うようにＤ／Ａ変換器５３を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 56 as in the fourth embodiment. The voice quality control unit 137 determines 11025 Hz, 8000 Hz, 1600 according to the voice quality designated by the voice quality switching unit 56.
The D / A converter 53 is controlled to perform D / A conversion at any sampling frequency of 0 Hz.

【０１９１】同時に声質制御部１３７は、声質切替部５
６によって指定された声質に応じて、合成フィルタ処理
部１３２で行われる合成のフレーム周期を設定する。合
成のフレーム周期は次式により与えられる。At the same time, the voice quality control section 137 determines the voice quality switching section 5
According to the voice quality designated by 6, the frame period of the synthesis performed by the synthesis filter processing unit 132 is set. The frame period of synthesis is given by the following equation.

【０１９２】（フレーム周期）＝（素片作成時のフレーム周期）×（素片作成時のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（素片作成時のフレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（素片作成時のサンプリング周波数）声質制御部１３７はさらに、合成フィルタ処理部１３２
に与えるピッチ（ピッチパターン）が、（合成フィルタ処理部１３２に与えるピッチ）＝（ピッチ生成処理部４９で生成されたピッチ）×（Ｄ／Ａ変換のサンプリング周期）／（素片作成時のサンプリング周期）＝（ピッチ生成処理部４９で生成されたピッチ）×（素片作成時のサンプリング周波数）／（Ｄ／Ａ変換のサンプリング周波数）となるように、ピッチ変調処理部１３８を制御する。(Frame Cycle) = (Frame Cycle during Element Creation) x (Sampling Cycle during Element Creation) / (Sampling Cycle for D / A Conversion) = (Frame Cycle During Element Creation) x (D / Sampling frequency of A conversion) / (sampling frequency at the time of segment creation) The voice quality control unit 137 further includes
(Pitch pattern given to the synthesis filter processing unit 132) = (pitch generated by the pitch generation processing unit 49) × (D / A conversion sampling period) / (at the time of segment generation) Sampling cycle) = (pitch generated by pitch generation processing unit 49) × (sampling frequency at the time of segment production) / (sampling frequency of D / A conversion) The pitch modulation processing unit 138 is controlled. .

【０１９３】したがって、音声素片作成時の音声のサン
プリング周波数と同じサンプリング周波数、即ち１１０
２５ＨｚでＤ／Ａ変換を行う際には、声質制御部１３７
は、音声素片作成時と同じピッチをそのまま合成フィル
タ処理部１３２に与えるようピッチ変調処理部１３８を
制御する。Therefore, the same sampling frequency as the sampling frequency of the voice when the voice unit is created, that is, 110
When performing D / A conversion at 25 Hz, the voice quality control unit 137
Controls the pitch modulation processing unit 138 so that the same pitch as when the speech unit is created is given to the synthesis filter processing unit 132 as it is.

【０１９４】また声質制御部１３７は、８０００Ｈｚに
てＤ／Ａ変換を行う場合には、音声素片作成時のピッチ
を（１１０２５[Hz]／８０００[Hz]）倍して合成フィル
タ処理部１３２に与えるよう制御し、１６０００Ｈｚに
てＤ／Ａ変換を行う場合には（１１０２５[Hz]／１６０
００[Hz]）倍して合成フィルタ処理部１３２に与えるよ
うに制御する。When the D / A conversion is performed at 8000 Hz, the voice quality control unit 137 multiplies the pitch at the time of creating a speech unit by (11025 [Hz] / 8000 [Hz]) and the synthesis filter processing unit 132. When performing D / A conversion at 16000 Hz (11025 [Hz] / 160
00 [Hz]), and control is given to the synthesis filter processing unit 132.

【０１９５】このように本実施形態においては、合成フ
ィルタ処理部１３２に与えるピッチを声質制御部１３７
の制御のもとでピッチ変調処理部１３８にて予め変調し
ておくことにより、音声素片作成時と異なるサンプリン
グ周波数（第１の標本周期とは異なる第２の標本周期）
でＤ／Ａ変換したときに生じる合成音声のピッチの変化
を相殺することができる。As described above, in this embodiment, the pitch given to the synthesis filter processing unit 132 is set to the voice quality control unit 137.
By performing modulation in advance in the pitch modulation processing unit 138 under the control of (1), a sampling frequency different from that at the time of creating a speech unit (second sampling period different from the first sampling period)
It is possible to cancel the change in the pitch of the synthesized voice that occurs when the D / A conversion is performed with.

【０１９６】したがって、音声素片作成時のサンプリン
グ周波数と異なる８０００Ｈｚあるいは１６０００Ｈｚ
でＤ／Ａ変換を行っても、同じ声の高さの音声のアナロ
グ信号を得ることができる。Therefore, 8000 Hz or 16000 Hz, which is different from the sampling frequency at the time of creating a speech unit
Even if the D / A conversion is performed with, it is possible to obtain an analog signal of voice having the same pitch.

【０１９７】以上に述べた第１乃至第９の実施形態で
は、いずれも、合成フィルタ処理部から出力される合成
音声（離散音声信号）をＤ／Ａ変換器で電気的なアナロ
グ信号に変換する際のＤ／Ａ変換のサンプリング周波数
が音質制御部からの指示により可変される場合について
説明したが、これに限るものではない。例えば合成フィ
ルタ処理部から出力される合成音声（離散音声信号）の
サンプリング周波数自体を可変するようにしても構わな
い。In any of the above-described first to ninth embodiments, the D / A converter converts the synthesized voice (discrete voice signal) output from the synthesis filter processing unit into an electrical analog signal. The case where the sampling frequency of the D / A conversion at this time is changed according to an instruction from the sound quality control unit has been described, but the present invention is not limited to this. For example, the sampling frequency itself of the synthesized voice (discrete voice signal) output from the synthesis filter processing unit may be variable.

【０１９８】以下、合成音声（離散音声信号）のサンプ
リング周波数自体を可変するようにした第１０乃至第１
８の実施形態について、第１０の実施形態から順に説明
する。Hereinafter, the tenth to the first, in which the sampling frequency itself of the synthesized speech (discrete speech signal) is made variable
The eighth embodiment will be described in order from the tenth embodiment.

【０１９９】［第１０の実施形態］図１２は本発明の第
１０の実施形態に係る音声の分析合成装置の概略構成を
示すブロック図である。[Tenth Embodiment] FIG. 12 is a block diagram showing the schematic arrangement of a speech analysis / synthesis apparatus according to the tenth embodiment of the present invention.

【０２００】図１２において、メモリ１４１には、１１
０２５Ｈｚ（第１の周期）でサンプリング（標本化）し
た音声（離散音声信号）に対して、フレーム周期（第１
のフレーム周期）１０msecで窓幅２０msecのハニング窓
をかけ、従来技術にて説明した手順で得られる０次〜２
５次までの低次ケプストラム係数のパラメータフレーム
の時系列（特徴パラメータ）と、各フレームに対応した
音声の有声・無声情報が記憶されている。低次ケプスト
ラム係数については従来技術において説明済みであるの
で、ここでは省略する。In FIG. 12, 11 is stored in the memory 141.
For the voice (discrete voice signal) sampled (sampled) at 025 Hz (first period), the frame period (first period)
Frame period), a Hanning window with a window width of 20 msec is applied at 10 msec, and the 0th to 2nd order obtained by the procedure described in the related art.
A time series (feature parameters) of parameter frames of low-order cepstrum coefficients up to the fifth order and voiced / unvoiced information of voice corresponding to each frame are stored. The low-order cepstrum coefficient has already been described in the prior art, and will be omitted here.

【０２０１】一方、メモリ１４２には、所定の基本周波
数抽出方法により同じ音声から得られる、音声の基本周
波数の時系列パターンが記憶されている。On the other hand, the memory 142 stores a time-series pattern of the fundamental frequency of the voice, which is obtained from the same voice by a predetermined fundamental frequency extraction method.

【０２０２】合成フィルタ処理部１４３は、これら２つ
のメモリ１４１，１４２より各データを読み出し、メモ
リ１４２より読み出された音声のパラメータフレーム
（ケプストラム係数）をフィルタ係数とするＬＭＡフィ
ルタ（図示せず）を、有声区間では上記基本周波数の時
系列パターン（ピッチパターン）に基づいた周期パルス
で、無声区間ではランダムノイズで駆動することにより
所望の音声（離散音声信号）を合成する。The synthesis filter processing unit 143 reads out each data from these two memories 141 and 142, and uses an LMA filter (not shown) in which the parameter frame (cepstral coefficient) of the voice read out from the memory 142 is used as a filter coefficient. The desired voice (discrete voice signal) is synthesized by driving with a periodic pulse based on the time-series pattern (pitch pattern) of the fundamental frequency in the voiced section and with random noise in the unvoiced section.

【０２０３】ここまでの処理はプログラムによって行わ
れるため、合成フィルタ処理部１４３（内のＬＭＡフィ
ルタ）から出力される音声は離散音声信号である。そこ
で、この離散音声信号をＤ／Ａ変換器１４４に供給し、
電気的なアナログ信号に変換する。こうして得られた音
声のアナログ信号をアンプ１４５にて増幅し、スピーカ
１４６を駆動することにより聴覚で知覚できる音声を得
ることができる。Since the processing up to this point is performed by the program, the voice output from the synthesis filter processing unit 143 (the LMA filter therein) is a discrete voice signal. Therefore, this discrete audio signal is supplied to the D / A converter 144,
Convert to electrical analog signal. By amplifying the analog signal of the voice thus obtained by the amplifier 145 and driving the speaker 146, a voice that can be perceptually perceived can be obtained.

【０２０４】ここまでは従来技術で挙げた（図２３の分
析合成装置の）例とほぼ同じである。The process up to this point is almost the same as the example (of the analysis / synthesis apparatus in FIG. 23) given in the prior art.

【０２０５】この図１２の構成は、前記第１の実施形態
に係る図１の分析合成装置の構成に対応するもので、図
１の構成においてＤ／Ａ変換器でのＤ／Ａ変換のサンプ
リング周波数を可変する代わりに、合成音声（離散音声
信号）のサンプリング周波数自体を可変する点に特徴が
ある。The configuration of FIG. 12 corresponds to the configuration of the analysis / synthesis apparatus of FIG. 1 according to the first embodiment. In the configuration of FIG. 1, sampling of D / A conversion by the D / A converter is performed. It is characterized in that the sampling frequency itself of the synthetic speech (discrete speech signal) is varied instead of varying the frequency.

【０２０６】即ち本実施形態のポイントは、合成フィル
タ処理部１４３とＤ／Ａ変換器１４４との間に、当該合
成フィルタ処理部１４３の出力である合成音声のサンプ
リング周波数を変換するサンプリング周波数変換処理部
１４９を設けると共に、（図１中の声質切替部１７及び
声質制御部１８に対応する）声質切替部１４７及び声質
制御部１４８を設け、当該声質制御部１４８が、合成フ
ィルタ処理部１４３から出力される合成音声のサンプリ
ング周波数を声質切替部１４７で指定された声質で決ま
る周波数（第２の標本周期）に変換するようにサンプリ
ング周波数変換処理部１４９を制御するところにある。
ここで、Ｄ／Ａ変換器１４４でのＤ／Ａ変換のサンプリ
ング周波数（第３の標本周期）は固定であり、ケプスト
ラム作成時の音声のサンプリング周波数（第１の標本周
期）に一致するものとする。That is, the point of the present embodiment is that the sampling frequency conversion processing for converting the sampling frequency of the synthetic speech output from the synthesis filter processing section 143 between the synthesis filter processing section 143 and the D / A converter 144. The voice quality switching unit 147 and the voice quality control unit 148 (corresponding to the voice quality switching unit 17 and the voice quality control unit 18 in FIG. 1) are provided, and the voice quality control unit 148 outputs from the synthesis filter processing unit 143. The sampling frequency conversion processing unit 149 is controlled so as to convert the sampling frequency of the synthesized voice to the frequency (second sampling period) determined by the voice quality designated by the voice quality switching unit 147.
Here, the sampling frequency (third sampling period) of the D / A conversion in the D / A converter 144 is fixed and coincides with the sampling frequency (first sampling period) of the voice when creating the cepstrum. To do.

【０２０７】本実施形態において、声質切替部１４７
は、図１中の声質切替部１７と同様に（ユーザによる指
定もしくはアプリケーションプログラム等によって）３
種類の声質が指定可能であり、声質制御部１４８は、声
質切替部１４７で指定された声質に応じて、合成フィル
タ処理部１４３から出力される合成音声のサンプリング
周波数を、１１０２５Ｈｚ（＝未変換），１２０００Ｈ
Ｚ，１００００Ｈｚのいずれかに変換するように、サン
プリング周波数変換処理部１４９を制御する。In the present embodiment, the voice quality switching section 147.
Is the same as the voice quality switching unit 17 in FIG. 1 (specified by the user or by an application program) 3
Various types of voice quality can be designated, and the voice quality control unit 148 sets the sampling frequency of the synthesized voice output from the synthesis filter processing unit 143 to 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 147. , 12000H
The sampling frequency conversion processing unit 149 is controlled so as to convert to either Z or 10000 Hz.

【０２０８】したがって、合成音声のサンプリング周波
数を、メモリ１４１に蓄えられたケプストラムを作成し
た際の音声のサンプリング周波数と同じサンプリング周
波数、即ち１１０２５Ｈｚに変換すれば、元の音声の声
質で音声合成することができる。なお、合成音声のサン
プリング周波数を、ケプストラム作成時の音声のサンプ
リング周波数と同じサンプリング周波数に変換すること
は、サンプリング周波数の変換を行わないことと等価で
あり、サンプリング周波数変換処理部１４９における変
換処理を行わなくても構わない。Therefore, if the sampling frequency of the synthesized voice is converted to the same sampling frequency as the voice sampling frequency when the cepstrum stored in the memory 141 is created, that is, 11025 Hz, the voice is synthesized with the original voice quality. You can Converting the sampling frequency of the synthesized voice to the same sampling frequency as the voice sampling frequency at the time of creating the cepstrum is equivalent to not converting the sampling frequency, and the conversion processing in the sampling frequency conversion processing unit 149 is performed. You don't have to do it.

【０２０９】一方、他のサンプリング周波数（第１の標
本周期とは異なる第２の標本周期）に変換（１１０２５
Ｈｚから１２０００Ｈｚ、あるいは１１０２５Ｈｚから
１００００Ｈｚに変換）すれば、図１３に示すように、
音声スペクトルを周波数軸方向にシフトした効果が得ら
れるため、音声の個人性が変化し、こうして得られるア
ナログ音声信号の声質は、元となる音声の声質とは異な
ったものとなる。On the other hand, conversion to another sampling frequency (second sampling period different from the first sampling period) (11025
Converting from Hz to 12000 Hz or from 11025 Hz to 10000 Hz), as shown in FIG.
Since the effect of shifting the voice spectrum in the frequency axis direction is obtained, the individuality of the voice is changed, and the voice quality of the analog voice signal thus obtained is different from the voice quality of the original voice.

【０２１０】ここで、サンプリング周波数変換処理部１
４９によるサンプリング周波数変換処理の詳細を説明す
る。このサンプリング周波数変換には種々の方法が適用
可能であるが、本実施形態では、図１４に示す構成によ
る簡便な方法を用いているものとする。Here, the sampling frequency conversion processing unit 1
The details of the sampling frequency conversion processing by 49 will be described. Although various methods can be applied to this sampling frequency conversion, in this embodiment, it is assumed that a simple method having the configuration shown in FIG. 14 is used.

【０２１１】サンプリング周波数変換処理部１４９は、
図１４（ａ）に示すように、サンプリング周波数拡大器
１４９ａ、ローパスフィルタ（ＬＰＦ）１４９ｂ及びサ
ンプリング周波数圧縮器１４９ｃから構成されている。The sampling frequency conversion processing unit 149
As shown in FIG. 14A, the sampling frequency expander 149a, the low-pass filter (LPF) 149b, and the sampling frequency compressor 149c are included.

【０２１２】サンプリング周波数変換処理部１４９内の
サンプリング周波数拡大器１４９ａには、合成フィルタ
処理部１４３の出力である合成音声（音声データ）が供
給される。この合成音声のサンプリング周波数がｆ1 で
あるものとする。The sampling frequency expander 149a in the sampling frequency conversion processing unit 149 is supplied with the synthesized voice (voice data) output from the synthesis filter processing unit 143. It is assumed that the sampling frequency of this synthesized voice is f1.

【０２１３】図１４（ａ）のサンプリング周波数変換処
理部１４９で、サンプリング周波数ｆ1 からｆ2 ＝（Ｌ
／Ｍ）ｆ1 に周波数変換するには、図１４（ｂ）に示す
ように、まずサンプリング周波数拡大器１４９ａにて、
サンプリング周波数ｆ1 の音声データのサンプルｓ1 間
に（Ｌ−１）個の零サンプルｓ0 を挿入する。In the sampling frequency conversion processing unit 149 of FIG. 14A, the sampling frequencies f1 to f2 = (L
/ M) f1 for frequency conversion, as shown in FIG. 14 (b), first, in the sampling frequency expander 149a,
(L-1) zero samples s0 are inserted between the samples s1 of the voice data having the sampling frequency f1.

【０２１４】次に、サンプリング周波数拡大器１４９ａ
から出力される音声データ、即ちサンプルｓ1 間に（Ｌ
−１）個の零サンプルｓ0 が挿入された音声データを、
エイリアシング防止のために、ｆ1 またはｆ2 の小さい
方（ｍｉｎ（ｆ1 ，ｆ2 ）を遮断周波数とするローパス
フィルタ（ローパス型のディジタルフィルタ）１４９ｂ
に通す。ここで、サンプリング周波数拡大器１４９ａで
の零サンプル挿入によるゲイン低下（１／Ｌ倍）を防ぐ
ために、ローパスフィルタ１４９ｂは、通過帯域でＬ倍
のゲインを持つ。Next, the sampling frequency expander 149a
Voice data output from, that is, between samples s1 (L
-1) voice data in which zero samples s0 are inserted,
To prevent aliasing, a low-pass filter (low-pass digital filter) 149b having a cut-off frequency of the smaller of f1 and f2 (min (f1, f2)) is used.
Pass through. Here, in order to prevent a gain decrease (1 / L times) due to the insertion of zero samples in the sampling frequency expander 149a, the low-pass filter 149b has a gain of L times in the pass band.

【０２１５】最後に、ローパスフィルタ１４９ｂを通過
した音声データに対して、周波数圧縮器１４９ｃにおい
て、図１４（ｂ）に示すように、Ｍサンプル毎に１サン
プルのみを取り出す間引き処理を行うことにより、サン
プリング周波数ｆ2 ＝（Ｌ／Ｍ）ｆ1 の音声データが得
られる。Finally, for the audio data that has passed through the low-pass filter 149b, the frequency compressor 149c performs a decimation process for extracting only one sample for every M samples, as shown in FIG. 14B. Audio data having a sampling frequency f2 = (L / M) f1 can be obtained.

【０２１６】したがって、前記した例のように、サンプ
リング周波数を１１０２５Ｈｚから１２０００Ｈｚに変
換する場合であれば、ｆ1 ＝１１０２５[Hz] ｆ2 ＝１２０００[Hz] ｆ2 ＝（１２０００／１１０２５）ｆ1 ＝（１６０／１４７）ｆ1 であるので、サンプリング周波数変換処理部１４９で
は、Ｌ＝１６０Ｍ＝１４７（ＬＰＦの遮断周波数）＝ｍｉｎ（ｆ1 ，ｆ2 ）＝ｆ1
＝１１０２５[Hz] として、上述した処理を行えばよい。Therefore, in the case of converting the sampling frequency from 11025 Hz to 12000 Hz as in the above example, f1 = 11025 [Hz] f2 = 12000 [Hz] f2 = (12000/11025) f1 = (160 / 147) Since it is f1, in the sampling frequency conversion processing unit 149, L = 160 M = 147 (cutoff frequency of LPF) = min (f1, f2) = f1
= 11025 [Hz], the above processing may be performed.

【０２１７】同様に、サンプリング周波数を１１０２５
Ｈｚから１００００Ｈｚに変換する場合であれば、ｆ1 ＝１１０２５[Hz] ｆ2 ＝１００００[Hz] ｆ2 ＝（１００００／１１０２５）ｆ1 ＝（４００／４４１）ｆ1 であるので、サンプリング周波数変換処理部１４９で
は、Ｌ＝４００Ｍ＝４４１（ＬＰＦの遮断周波数）＝ｍｉｎ（ｆ1 ，ｆ2 ）＝ｆ1
＝１００００[Hz] として、上述した処理を行えばよい。Similarly, the sampling frequency is set to 11025.
In the case of converting from Hz to 10000 Hz, f1 = 11025 [Hz] f2 = 10000 [Hz] f2 = (10000/1125) f1 = (400/441) f1, so the sampling frequency conversion processing unit 149 L = 400 M = 441 (cutoff frequency of LPF) = min (f1, f2) = f1
= 10000 [Hz], the above processing may be performed.

【０２１８】［第１１の実施形態］前記第１０の実施形
態においては、声質切替部１４７、声質制御部１４８及
びサンプリング周波数変換処理部１４９を設けたこと
で、合成音声の声質を簡単に増やすことができるもの
の、合成される音声のスピードが声質により異なる。[Eleventh Embodiment] In the tenth embodiment, the voice quality switching unit 147, the voice quality control unit 148, and the sampling frequency conversion processing unit 149 are provided to easily increase the voice quality of the synthesized voice. However, the speed of the synthesized voice differs depending on the voice quality.

【０２１９】即ち、（サンプリング周波数変換処理後の
サンプリング周波数）＞（Ｄ／Ａ変換のサンプリング周
波数）のときには、合成される音声のスピードは遅くな
る。逆に、（サンプリング周波数変換処理後のサンプリ
ング周波数）＜（Ｄ／Ａ変換のサンプリング周波数）の
ときには、合成される音声のスピードは早くなる。That is, when (sampling frequency after sampling frequency conversion processing)> (sampling frequency for D / A conversion), the speed of the synthesized voice becomes slow. On the other hand, when (sampling frequency after sampling frequency conversion processing) <(sampling frequency for D / A conversion), the speed of the synthesized voice becomes faster.

【０２２０】このような合成される音声のスピードの違
いは、サンプリング周波数変換処理後のサンプリング周
波数が前記第１０の実施形態程度の違い（９１％，１０
９％）ではあまり問題とはならない。しかし以下に述べ
るように、サンプリング周波数変換処理後のサンプリン
グ周波数がＤ／Ａ変換時のサンプリング周波数と大きく
異なる場合には、問題となる。The difference in the speed of the synthesized voices is that the sampling frequency after the sampling frequency conversion processing is the same as that in the tenth embodiment (91%, 10%).
9%) does not cause much problem. However, as described below, when the sampling frequency after the sampling frequency conversion processing is significantly different from the sampling frequency at the time of D / A conversion, there is a problem.

【０２２１】まず、サンプリング周波数変換処理後のサ
ンプリング周波数と合成時のＤ／Ａ変換のサンプリング
周波数の比が１に近ければ声質は変化も小さく、逆にこ
の比が１から離れれば声質は大きく変化する。したがっ
て、声質を大きく変えようとすれば、これら両サンプリ
ング周波数の比を例えば５０％，２００％程度に設定す
ればよいが、これでは合成音声のスピードもそれぞれ元
の音声の２００％，５０％、即ち倍と半分になり、かな
り聞きづらくなる。First, if the ratio of the sampling frequency after the sampling frequency conversion processing and the sampling frequency of the D / A conversion at the time of synthesis is close to 1, the voice quality changes little, and conversely if the ratio deviates from 1, the voice quality changes greatly. To do. Therefore, if the voice quality is to be changed significantly, the ratio of these two sampling frequencies may be set to, for example, about 50% and 200%. With this, the speed of the synthesized voice is 200% and 50% of the original voice, respectively. In other words, it will be doubled and halved, making it much harder to hear.

【０２２２】そこで、サンプリング周波数変換処理後の
サンプリング周波数がＤ／Ａ変換のサンプリング周波数
と大きく異なった場合でも、合成音声のスピードを一定
にできるようにした第１１の実施形態につき説明する。Therefore, an eleventh embodiment will be described in which the speed of synthesized speech can be made constant even when the sampling frequency after the sampling frequency conversion processing is significantly different from the sampling frequency for D / A conversion.

【０２２３】図１５は本発明の第１１の実施形態に係る
音声の分析合成装置の概略構成を示すブロック図であ
り、図１２と同一部分には同一符号を付してある。FIG. 15 is a block diagram showing the schematic arrangement of the speech analysis / synthesis apparatus according to the eleventh embodiment of the present invention. The same parts as those in FIG. 12 are designated by the same reference numerals.

【０２２４】この図１５の構成は、前記第２の実施形態
に係る図３の分析合成装置の構成に対応するもので、図
３の構成においてＤ／Ａ変換器でのＤ／Ａ変換のサンプ
リング周波数を可変する代わりに、合成音声（離散音声
信号）のサンプリング周波数自体を可変するものであ
る。The configuration of FIG. 15 corresponds to the configuration of the analysis / synthesis apparatus of FIG. 3 according to the second embodiment. In the configuration of FIG. 3, sampling of D / A conversion by the D / A converter is performed. Instead of changing the frequency, the sampling frequency itself of the synthesized voice (discrete voice signal) is changed.

【０２２５】本実施形態のポイントは、図１２中の合成
フィルタ処理部１４３に代えて、合成時のフレーム周期
が制御可能な合成フィルタ処理部１５３を設けると共
に、図１２中の声質制御部１４８に代えて、サンプリン
グ周波数変換処理部１４９の変換するサンプリング周波
数だけでなく合成時のフレーム周期を制御する声質制御
部１５８を設け、当該声質制御部１５８により、声質切
替部１４７の指定に応じて、サンプリング周波数変換処
理部１４９でのサンプリング周波数変換処理と同時に、
合成フィルタ処理部１５３での合成時のフレーム周期を
制御するところにある。ここで、Ｄ／Ａ変換器１４４で
のＤ／Ａ変換のサンプリング周波数（第３の標本周期）
は前記第１０の実施形態と同様に固定であり、ケプスト
ラム作成時の音声のサンプリング周波数（第１の標本周
期）に一致するものとする。The point of this embodiment is that, instead of the synthesis filter processing unit 143 in FIG. 12, a synthesis filter processing unit 153 capable of controlling the frame period at the time of synthesis is provided, and the voice quality control unit 148 in FIG. Instead, a voice quality control unit 158 that controls not only the sampling frequency converted by the sampling frequency conversion processing unit 149 but also the frame period at the time of synthesis is provided, and the voice quality control unit 158 performs sampling according to the designation of the voice quality switching unit 147. Simultaneously with the sampling frequency conversion processing in the frequency conversion processing unit 149,
This is to control the frame cycle at the time of synthesis in the synthesis filter processing unit 153. Here, the sampling frequency of the D / A conversion in the D / A converter 144 (third sampling period)
Is fixed as in the tenth embodiment and matches the sampling frequency (first sampling period) of the voice when the cepstrum is created.

【０２２６】本実施形態では、前記第１０の実施形態と
同様に声質切替部１４７にて３種類の声質が指定可能で
ある。声質制御部１５８は声質切替部１４７で指定され
た声質に応じて、例えば１１０２５Ｈｚ（＝未変換），
８０００Ｈｚ，１６０００Ｈｚのいずれかのサンプリン
グ周波数への変換を行うようにサンプリング周波数変換
処理部１４９を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching section 147 as in the tenth embodiment. The voice quality control unit 158 determines, for example, 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 147.
The sampling frequency conversion processing unit 149 is controlled so as to convert the sampling frequency to either 8000 Hz or 16000 Hz.

【０２２７】同時に声質制御部１５８は、声質切替部１
４７によって指定された声質に応じて、合成フィルタ処
理部１５３で行われる合成のフレーム周期を設定する。
合成のフレーム周期は次式により与えられる。At the same time, the voice quality control unit 158 controls the voice quality switching unit 1
According to the voice quality designated by 47, the frame period of the synthesis performed by the synthesis filter processing unit 153 is set.
The frame period of synthesis is given by the following equation.

【０２２８】（フレーム周期）＝（分析フレーム周期）×（サンプリング周波数変換後のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（分析フレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（サンプリング周波数変換後のサンプリング周波数）したがって上式に基づき、Ｄ／Ａ変換のサンプリング周
波数と同じサンプリング周波数、即ち１１０２５Ｈｚへ
のサンプリング周波数変換処理を行う（＝サンプリング
周波数変換処理を行わない）際には、声質制御部１５８
は上式に基づき、合成時のフレーム周期を分析時（ケプ
ストラム作成時）のフレーム周期（第１のフレーム周
期）と同じ１０msecで合成するよう合成フィルタ処理部
１５３を制御する。但し、メモリ１４１に蓄えられたケ
プストラムは前記第１０の実施形態と同じ条件で作成さ
れているものとする。(Frame period) = (Analysis frame period) × (Sampling period after sampling frequency conversion) / (D / A conversion sampling period) = (Analysis frame period) × (D / A conversion sampling frequency) / (Sampling frequency after sampling frequency conversion) Therefore, based on the above equation, when performing sampling frequency conversion processing to the same sampling frequency as D / A conversion, that is, 11025 Hz (= not performing sampling frequency conversion processing) Is the voice quality control unit 158.
Controls the synthesizing filter processing unit 153 based on the above equation so that the synthesizing frame period is 10 msec, which is the same as the frame period (first frame period) at the time of analysis (when creating the cepstrum). However, it is assumed that the cepstrum stored in the memory 141 is created under the same conditions as in the tenth embodiment.

【０２２９】また声質制御部１５８は、ケプストラム作
成時（分析時）の音声のサンプリング周波数とは異なる
サンプリング周波数（第１の標本周期とは異なる第２の
標本周期）、例えば８０００Ｈｚへのサンプリング周波
数変換処理を行う場合には、１０[msec]×１１０２５[Hz]／８０００[Hz]＝１３．８
[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御し、１６０００Ｈ
ｚへのサンプリング周波数変換処理を行う場合には、１０[msec]×１１０２５[Hz]／１６０００[Hz]＝６．９
[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御する。Further, the voice quality control unit 158 converts the sampling frequency to a sampling frequency (second sampling period different from the first sampling period) different from the sampling frequency of the voice when the cepstrum is created (during analysis), for example, 8000 Hz. When processing is performed, 10 [msec] × 11025 [Hz] / 8000 [Hz] = 13.8
Control is performed so that composition is performed at a frame cycle of [msec] (second frame cycle different from the first frame cycle), and 16000H
When performing the sampling frequency conversion processing to z, 10 [msec] × 11025 [Hz] / 16000 [Hz] = 6.9
Control is performed so that composition is performed in a frame cycle of [msec] (second frame cycle different from the first frame cycle).

【０２３０】このように本実施形態においては、分析時
と異なるサンプリング周波数（第１の標本周期とは異な
る第２の標本周期）にサンプリング周波数変換処理した
ときの合成音声のスピードの変化を、合成フィルタ処理
部１５３における合成のフレーム周期を（第２のフレー
ム周期）をケプストラム作成時（分析時）のフレーム周
期（第１のフレーム周期）とは異ならせることで相殺す
ることができる。As described above, in this embodiment, the change in the speed of the synthesized voice when the sampling frequency conversion processing is performed to the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of analysis is synthesized. It is possible to cancel the synthesis frame period in the filter processing unit 153 by making the (second frame period) different from the frame period (first frame period) at the time of creating the cepstrum (at the time of analysis).

【０２３１】したがって、分析時のサンプリング周波数
と異なる８０００Ｈｚあるいは１６０００Ｈｚへのサン
プリング周波数変換処理を行っても、同じスピードの音
声のアナログ信号を得ることができる。Therefore, even if the sampling frequency conversion processing to 8000 Hz or 16000 Hz, which is different from the sampling frequency at the time of analysis, is performed, it is possible to obtain an analog signal of voice having the same speed.

【０２３２】［第１２の実施形態］ところで、前記第１
０の実施形態においては、サンプリング周波数変換処理
を行ってからＤ／Ａ変換すると、声の高さ、即ち音声の
ピッチの変化を招く。[Twelfth Embodiment] Incidentally, the first embodiment
In the embodiment of 0, when the sampling frequency conversion process is performed and then the D / A conversion is performed, the pitch of the voice, that is, the pitch of the voice is changed.

【０２３３】即ち、（サンプリング周波数変換処理後の
サンプリング周波数）＞（Ｄ／Ａ変換のサンプリング周
波数）のときには、合成される音声のピッチは低くな
る。逆に、（サンプリング周波数変換処理後のサンプリ
ング周波数）＜（Ｄ／Ａ変換のサンプリング周波数）の
ときには、合成される音声のピッチは高くなる。That is, when (sampling frequency after sampling frequency conversion processing)> (sampling frequency for D / A conversion), the pitch of synthesized speech becomes low. On the contrary, when (sampling frequency after sampling frequency conversion processing) <(sampling frequency for D / A conversion), the pitch of the synthesized voice becomes high.

【０２３４】このような合成される音声のピッチの違い
は、サンプリング周波数変換処理後のサンプリング周波
数がＤ／Ａ変換のサンプリング周波数に比べ、前記第１
０の実施形態程度の違い（９１％，１０９％）ではあま
り問題とはならない。The difference in the pitch of the synthesized voices is that the sampling frequency after the sampling frequency conversion processing is the same as the first sampling frequency as compared with the sampling frequency of the D / A conversion.
A difference of about 0 embodiment (91%, 109%) does not cause much problem.

【０２３５】しかし、声質を大きく変えようとして、両
サンプリング周波数の比を例えば５０％，２００％程度
に設定すれば、合成音声のピッチもそれぞれ５０％，２
００％と変化するため、ケプストラム作成時と同じ１１
０２５Ｈｚにサンプリング周波数変換処理した（あるい
は変換処理しない）ときの音声（あるいは原音声）と比
較して、前者はピッチが１[oct] （オクターブ）高い音
声が合成され、後者は１[oct] 低い音声が合成されるの
で聞きづらくなるという問題が発生する。However, if the ratio of both sampling frequencies is set to, for example, about 50% and 200% in an attempt to greatly change the voice quality, the pitch of the synthesized voice is also set to 50% and 2 respectively.
As it changes to 00%, it is the same as when creating the cepstrum 11
Compared with the voice (or the original voice) when the sampling frequency is converted to 025 Hz (or not converted), the former is synthesized by a voice whose pitch is 1 [oct] higher, and the latter is 1 [oct] lower. Since the voice is synthesized, there is a problem that it becomes difficult to hear.

【０２３６】そこで、サンプリング周波数変換処理後の
サンプリング周波数がケプストラム作成時のサンプリン
グ周波数と大きく異なった場合でも、合成音声のピッチ
を一定にできるようにした第１２の実施形態につき説明
する。Therefore, a twelfth embodiment will be described in which the pitch of synthesized speech can be made constant even when the sampling frequency after the sampling frequency conversion processing is significantly different from the sampling frequency when the cepstrum was created.

【０２３７】図１６は本発明の第１２の実施形態に係る
音声の分析合成装置の概略構成を示すブロック図であ
り、図１２と同一部分には同一符号を付してある。FIG. 16 is a block diagram showing the schematic arrangement of a speech analysis / synthesis apparatus according to the twelfth embodiment of the present invention. The same parts as those in FIG. 12 are designated by the same reference numerals.

【０２３８】この図１６の構成は、前記第３の実施形態
に係る図４の分析合成装置の構成に対応するもので、図
４の構成においてＤ／Ａ変換器でのＤ／Ａ変換のサンプ
リング周波数を可変する代わりに、合成音声（離散音声
信号）のサンプリング周波数自体を可変するものであ
る。The configuration of FIG. 16 corresponds to the configuration of the analysis / synthesis apparatus of FIG. 4 according to the third embodiment. In the configuration of FIG. 4, sampling of D / A conversion by the D / A converter is performed. Instead of changing the frequency, the sampling frequency itself of the synthesized voice (discrete voice signal) is changed.

【０２３９】本実施形態のポイントは、図１２中の合成
フィルタ処理部１４３に代えて、合成時のフレーム周期
が制御可能な合成フィルタ処理部１６３を設けると共
に、メモリ１４２と合成フィルタ処理部１６３との間に
メモリ１４２より読み出された基本周波数（の時系列）
パターン（ピッチパターン）を周波数の異なる別のピッ
チパターンに変換（ピッチ変調）して合成フィルタ処理
部１６３に与えるピッチ変調処理部１６１を設け、さら
に図１２中の声質制御部１４８に代えて、サンプリング
周波数変換処理部１４９の変換するサンプリング周波数
だけでなく合成時のフレーム周期及びピッチの変調を制
御する声質制御部１６８を設け、当該声質制御部１６８
により、声質切替部１４７の指定に応じて、サンプリン
グ周波数変換処理部１４９でのサンプリング周波数変換
処理と合成フィルタ処理部１６３での合成時のフレーム
周期を制御すると同時に、ピッチ変調処理部１６１での
ピッチの変調を制御するところにある。ここで、Ｄ／Ａ
変換器１４４でのＤ／Ａ変換のサンプリング周波数（第
３の標本周期）は前記第１０の実施形態と同様に固定で
あり、ケプストラム作成時の音声のサンプリング周波数
（第１の標本周期）に一致するものとする。The point of this embodiment is that, instead of the synthesis filter processing unit 143 in FIG. 12, a synthesis filter processing unit 163 capable of controlling the frame period at the time of synthesis is provided, and the memory 142 and the synthesis filter processing unit 163 are provided. (The time series of) the fundamental frequency read from the memory 142 during
A pitch modulation processing unit 161 that converts (pitch modulates) a pattern (pitch pattern) into another pitch pattern having a different frequency and gives it to the synthesis filter processing unit 163 is provided, and sampling is performed instead of the voice quality control unit 148 in FIG. A voice quality control unit 168 is provided for controlling not only the sampling frequency converted by the frequency conversion processing unit 149 but also the frame period and pitch modulation at the time of synthesis, and the voice quality control unit 168 is provided.
According to the designation of the voice quality switching unit 147, the sampling frequency conversion processing unit 149 controls the sampling frequency conversion processing and the synthesis filter processing unit 163 controls the frame period at the time of synthesis, and at the same time, the pitch modulation processing unit 161 controls the pitch. Is to control the modulation of. Where D / A
The sampling frequency (third sampling period) of the D / A conversion in the converter 144 is fixed as in the tenth embodiment, and coincides with the sampling frequency (first sampling period) of the voice when the cepstrum is created. It shall be.

【０２４０】本実施形態では、前記第１０の実施形態と
同様に声質切替部１４７にて３種類の声質が指定可能で
ある。声質制御部１６８は声質切替部１４７で指定され
た声質に応じて、例えば１１０２５Ｈｚ（＝未変換），
８０００Ｈｚ，１６０００Ｈｚのいずれかのサンプリン
グ周波数への変換を行うようにサンプリング周波数変換
処理部１４９を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching section 147 as in the tenth embodiment. The voice quality control unit 168 determines, for example, 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 147.
The sampling frequency conversion processing unit 149 is controlled so as to convert the sampling frequency to either 8000 Hz or 16000 Hz.

【０２４１】声質制御部１６８は、声質切替部１４７に
よって指定された声質に応じて、サンプリング周波数変
換処理部１４９の変換するサンプリング周波数を設定す
ると同時に、合成フィルタ処理部１６３で行われる合成
のフレーム周期を設定する。合成のフレーム周期は次式
により与えられる。The voice quality control unit 168 sets the sampling frequency to be converted by the sampling frequency conversion processing unit 149 according to the voice quality specified by the voice quality switching unit 147, and at the same time, the frame period of the synthesis performed by the synthesis filter processing unit 163. To set. The frame period of synthesis is given by the following equation.

【０２４２】（フレーム周期）＝（分析フレーム周期）×（サンプリング周波数変換後のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（分析フレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（サンプリング周波数変換後のサンプリング周波数）なお、メモリ１４１に蓄えられたケプストラムは前記第
１０の実施形態と同じ条件で作成されているものとす
る。(Frame period) = (Analysis frame period) × (Sampling period after sampling frequency conversion) / (D / A conversion sampling period) = (Analysis frame period) × (D / A conversion sampling frequency) / (Sampling frequency after sampling frequency conversion) It is assumed that the cepstrum stored in the memory 141 is created under the same conditions as in the tenth embodiment.

【０２４３】声質制御部１６８はさらに、合成フィルタ
処理部１６３に与えるピッチ（ピッチパターン）が、（合成フィルタ処理部１６３に与えるピッチ）＝（メモリ１４２より読み出したピッチ）×（Ｄ／Ａ変換のサンプリング周期）／（サンプリング周波数変換処理後のサンプリング周期）＝（メモリ１４２より読み出したピッチ）×（サンプリング周波数変換処理後のサンプリング周波数）／（Ｄ／Ａ変換のサンプリング周波数）となるように、ピッチ変調処理部１６１を制御する。The voice quality control section 168 further determines that the pitch (pitch pattern) given to the synthesis filter processing section 163 is (pitch given to the synthesis filter processing section 163) = (pitch read from the memory 142) × (D / A conversion) Sampling cycle) / (sampling cycle after sampling frequency conversion processing) = (pitch read from memory 142) × (sampling frequency after sampling frequency conversion processing) / (sampling frequency for D / A conversion) The pitch modulation processing unit 161 is controlled.

【０２４４】したがって、Ｄ／Ａ変換のサンプリング周
波数と同じサンプリング周波数、即ち１１０２５Ｈｚへ
のサンプリング周波数変換処理を行う（＝サンプリング
周波数変換処理を行わない）際には、声質制御部１６８
は上式に基づき、合成フィルタ処理部１６３に与えるピ
ッチを分析時（ケプストラム作成時）と同じピッチとな
るようピッチ変調処理部１６１を制御する。Therefore, when the sampling frequency conversion processing to the same sampling frequency as the D / A conversion sampling frequency, that is, 11025 Hz is performed (= no sampling frequency conversion processing is performed), the voice quality control unit 168 is used.
Controls the pitch modulation processing unit 161 based on the above equation so that the pitch given to the synthesis filter processing unit 163 becomes the same pitch as during analysis (during cepstrum creation).

【０２４５】また声質制御部１６８は、ケプストラム作
成時（分析時）の音声のサンプリング周波数とは異なる
サンプリング周波数（第１の標本周期とは異なる第２の
標本周期）、例えば８０００Ｈｚへのサンプリング周波
数変換処理を行う場合には、メモリ１４２から読み出し
たピッチを（８０００[Hz]／１１０２５[Hz]）倍して合
成フィルタ処理部１６３に与えるように、１６０００Ｈ
ｚへのサンプリング周波数変換処理を行う場合には、メ
モリ１４２から読み出したピッチを（１６０００[Hz]／
１１０２５[Hz]）倍して合成フィルタ処理部１６３に与
えるように、ピッチ変調処理部１６１を制御する。Further, the voice quality control unit 168 converts the sampling frequency to a sampling frequency (second sampling period different from the first sampling period) different from the sampling frequency of the voice when the cepstrum is created (during analysis), for example, 8000 Hz. When processing is performed, the pitch read from the memory 142 is multiplied by (8000 [Hz] / 11025 [Hz]) and applied to the synthesis filter processing unit 163.
When performing the sampling frequency conversion processing to z, the pitch read from the memory 142 is set to (16000 [Hz] /
The pitch modulation processing unit 161 is controlled so as to be multiplied by 11025 [Hz]) and given to the synthesis filter processing unit 163.

【０２４６】このように本実施形態においては、合成フ
ィルタ処理部１６３に与えるピッチを声質制御部１６８
の制御のもとでピッチ変調処理部１６１にて予め変調し
ておくことにより、Ｄ／Ａ変換時のサンプリング周波数
（第３の標本周期）とは異なるサンプリング周波数（第
２の標本周期）へサンプリング周波数変換処理したとき
に生じる合成音声のピッチの変化を相殺することができ
る。As described above, in the present embodiment, the pitch given to the synthesis filter processing section 163 is set to the voice quality control section 168.
Sampling to a sampling frequency (second sampling period) different from the sampling frequency (third sampling period) at the time of D / A conversion by performing modulation in advance in the pitch modulation processing unit 161 under the control of It is possible to cancel the change in the pitch of the synthesized voice that occurs when the frequency conversion processing is performed.

【０２４７】したがって、Ｄ／Ａ変換時のサンプリング
周波数と異なる８０００Ｈｚあるいは１６０００Ｈｚへ
のサンプリング周波数変換処理を行っても、同じ声の高
さの音声のアナログ信号を得ることができる。Therefore, even if the sampling frequency conversion processing to 8000 Hz or 16000 Hz different from the sampling frequency at the time of D / A conversion is performed, it is possible to obtain the analog signal of the voice of the same pitch.

【０２４８】［第１３の実施形態］図１７は本発明の第
１３の実施形態に係る音声の規則合成装置の概略構成を
示すブロック図である。[Thirteenth Embodiment] FIG. 17 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the thirteenth embodiment of the present invention.

【０２４９】この音声規則合成装置は、例えばパーソナ
ルコンピュータ等の情報処理装置上で専用のソフトウェ
ア（文音声変換ソフトウェア）を実行することにより実
現されるもので、文音声変換（ＴＴＳ）処理機能、即ち
テキストから音声を生成する文音声変換処理（文音声合
成処理）機能を有しており、その機能構成は、大別して
言語処理部１７１、音声合成部１７２とに分けられる。This speech rule synthesizing apparatus is realized by executing a dedicated software (sentence / speech conversion software) on an information processing apparatus such as a personal computer. The speech / speech conversion (TTS) processing function, that is, It has a sentence-speech conversion processing (sentence-speech synthesis processing) function for generating speech from a text, and the functional configuration thereof is roughly divided into a language processing unit 171 and a speech synthesis unit 172.

【０２５０】言語処理部１７１は、入力文、例えば漢字
かな混じり文を解析して読み情報とアクセント情報を生
成する処理と、これら情報に基づき音韻記号系列・アク
セント情報が記述された音声記号列を生成する処理を司
る。The language processing unit 171 analyzes the input sentence, for example, a kanji / kana mixture sentence, to generate reading information and accent information, and a phonetic symbol string in which the phoneme symbol sequence / accent information is described based on these processes. Manages the process of generating.

【０２５１】音声合成部１７２は、言語処理部１７１の
出力である音声記号列をもとに音声を生成する処理を司
る。The voice synthesizing unit 172 controls the process of generating a voice based on the voice symbol string output from the language processing unit 171.

【０２５２】さて、図１７の音声規則合成装置におい
て、文音声変換（読み上げ）の対象となる文書（ここで
は日本語文書）はテキストファイル１７３として保存さ
れている。本装置では、文音声変換ソフトウェアに従
い、当該ファイル１７３から漢字かな混じり文を１文ず
つ読み出して、言語処理部１７１及び音声合成部１７２
により以下に述べる文音声変換処理を行い、音声を合成
する。In the speech rule synthesizing device shown in FIG. 17, a document (Japanese document here) to be subjected to sentence-speech conversion (speech) is saved as a text file 173. According to the sentence / speech conversion software, the present apparatus reads out a sentence containing kanji and kana from the file 173 one by one, and the language processing unit 171 and the speech synthesis unit 172.
The sentence-to-speech conversion processing described below is performed to synthesize the speech.

【０２５３】まず、テキストファイル１７３から読み出
された漢字かな混じり文は、言語処理部１７１内の言語
解析処理部１７４に入力される。First, the kanji / kana mixed sentence read from the text file 173 is input to the language analysis processing unit 174 in the language processing unit 171.

【０２５４】言語解析処理部１７４は、入力される漢字
かな混じり文に対して形態素解析を行い、読み情報とア
クセント情報を生成する。The language analysis processing unit 174 performs morphological analysis on the input Kanji / Kana mixed sentence and generates reading information and accent information.

【０２５５】そのために言語解析処理部１７４は、文の
最小構成要素である「形態素」を見出し語にもつ形態素
辞書１７５と形態素間の接続規則が登録されている接続
規則ファイル１７６を利用する。即ち言語解析処理部１
７４は、入力文と形態素辞書１７５とを照合することで
得られる全ての形態素系列候補を求め（総当たり法）、
その中から、接続規則ファイル１７６を参照して文法的
に前後に接続できる組み合わせを出力する。形態素辞書
１７５には、解析時に用いられる文法情報と共に、形態
素の読み並びにアクセントの型が登録されている。この
ため、形態素解析により形態素が定まれば、同時に読み
とアクセント型も与えることができる。For that purpose, the language analysis processing unit 174 uses the morpheme dictionary 175 having the morpheme, which is the minimum constituent element of the sentence, as an entry word, and the connection rule file 176 in which the connection rule between morphemes is registered. That is, the language analysis processing unit 1
74 finds all morpheme sequence candidates obtained by matching the input sentence with the morpheme dictionary 175 (brute force method),
Among them, the connection rule file 176 is referred to, and a combination that allows grammatical connection before and after is output. In the morpheme dictionary 175, the reading and accent types of morphemes are registered together with the grammatical information used during analysis. For this reason, if a morpheme is determined by morphological analysis, reading and accent type can be given at the same time.

【０２５６】例えば、「公園へ行って本を読みます」と
いう文に対して形態素解析を行うと、／公園／ヘ／行って／本／を／読み／ます／。For example, when a morphological analysis is performed on a sentence "go to a park and read a book", / park / f / go / book / of / read / read /.

【０２５７】と形態素に分割される。各形態素に読みと
アクセント型が与えられ、／コウエン／エ／イッテ／ホ＾ン／ヲ／ヨミ／マ＾ス／となる。ここで「＾」の入っている形態素は、その直前
の音節でピッチが高く、その直後の音節ではピッチが落
ちるアクセントであることを意味する。また「＾」がな
い場合は、平板型のアクセントであることを意味する。Is divided into morphemes. Reading and accent type are given to each morpheme, and it becomes / Kouen / E / Itte / Hon / wo / Yomi / Mas /. Here, a morpheme containing "^" means that the pitch is high in the syllable immediately before and the pitch is low in the syllable immediately after. When there is no "^", it means that the accent is flat.

【０２５８】ところで、人間が文章を読むときには、こ
のような形態素単位でアクセントを付けて読むことはせ
ず、幾つかの形態素をひとまとめにして、そのまとまり
にアクセントを付けて読んでいる。By the way, when a person reads a sentence, he or she does not read by adding accents in units of such morphemes, but makes a group of several morphemes and reads them by accent.

【０２５９】そこで、このようなことを考慮して、言語
解析処理部１７４ではさらに、一つのアクセント句（ア
クセントを与える単位）で形態素をまとめると同時に、
まとめたことによるアクセントの移動も推定する。これ
に加えて言語解析処理部４４は、母音の無声化や読み上
げの際のポーズ（息継ぎ）等の情報も付加し、上記の例
では、最終的に次のような読み情報を生成する。Therefore, in consideration of such a situation, the language analysis processing unit 174 further collects the morphemes with one accent phrase (unit giving an accent) and at the same time,
The movement of accents due to the compilation is also estimated. In addition to this, the language analysis processing unit 44 also adds information such as vowel devoicing and pause (breathing) during reading, and in the above example, finally generates the following reading information.

【０２６０】／コーエンエ／イッテ．／ホ＾ンオ／ヨミ
マ＾（ス）／ここで、ピリオド「．」は息継ぎを、「（）」は母音
が無声化した音節を表わす。/ Cohenye / Itte. / Ho ^ o / Yomima ^ (s) / Here, the period "." Represents breath and "()" represents a syllable in which a vowel is devoiced.

【０２６１】さて、上記のようにして言語処理部１７１
内の言語解析処理部１７４により読み情報が生成される
と、音声合成部１７２内の音韻継続時間計算処理部１７
７が起動される。音韻継続時間計算処理部１７７は、言
語解析処理部１７４で生成した読み情報に従って、入力
文に含まれる各音節の子音部ならびに母音部の継続時間
（単位はms）を決定する。Now, the language processing unit 171 is executed as described above.
When the reading information is generated by the language analysis processing unit 174 in the speech synthesis unit 172, the phoneme duration calculation processing unit 17 in the speech synthesis unit 172.
7 is activated. The phoneme duration calculation processing unit 177 determines the duration (unit: ms) of the consonant part and vowel part of each syllable included in the input sentence according to the reading information generated by the language analysis processing unit 174.

【０２６２】この音韻継続時間計算処理部１７７での継
続時間の決定処理は、子音（Ｃ）と母音（Ｖ）の境界
（ＣＶわたり）の位置が等間隔に並ぶようにするとい
う、極めて簡単なアルゴリズムにより実現されている。The phoneme duration calculation processing unit 177 performs the duration determination processing in such a manner that the positions of the boundaries (CV crossings) between the consonants (C) and the vowels (V) are arranged at equal intervals. It is realized by an algorithm.

【０２６３】ＣＶわたりの間隔は、音声合成部１７２内
の発話速度制御部１７８より与えられる。図示しない
が、本実施形態で用いられるソフトウェアではユーザが
合成音声のスピードを指定することが可能となってい
る。そして、ユーザが指定した音声のスピードがこの発
話速度制御部１７８に与えられることにより、当該発話
速度制御部１７８が（音韻継続時間計算処理部１７７で
の継続時間の決定処理にて決定される）先程のＣＶわた
りの間隔を調整して合成音声の速度を実際に変化させて
いる。但し、日本語の音声は、発声の速度を変えても子
音の継続時間はほぼ一定であることが分析結果から分か
っているので、子音の継続時間は一定に保ち、母音の継
続時間を調節してＣＶわたりの間隔を変える。The CV crossing interval is given by the speech rate control unit 178 in the voice synthesis unit 172. Although not shown, the software used in this embodiment allows the user to specify the speed of synthesized speech. Then, the speed of the voice designated by the user is given to the speech rate control unit 178, so that the speech rate control unit 178 (determined by the duration determination process in the phoneme duration calculation processing unit 177). The speed of the synthesized voice is actually changed by adjusting the interval between CVs. However, it is known from the analysis results that the duration of consonants in Japanese speech is almost constant even if the utterance speed is changed, so the duration of consonants is kept constant and the duration of vowels is adjusted. Change the interval of CV crossing.

【０２６４】音韻継続時間計算処理部１７７により入力
文に含まれる各音節の（子音部ならびに母音部の）継続
時間が決定されると、同じ音声合成部１７２内のピッチ
生成処理部１７９が起動される。ピッチ生成処理部１７
９は音韻継続時間計算処理部１７７により決定された継
続時間と、（言語処理部１７１内の）言語解析処理部１
７４により決定されたアクセント情報に基づいて、まず
点ピッチ位置を設定する。次にピッチ生成処理部１７９
は、設定した複数の点ピッチを直線で補間して例えば１
０msec毎のピッチパターン（基本周波数パターン）を得
る。When the phoneme duration calculation processing unit 177 determines the duration of each syllable (consonant portion and vowel portion) included in the input sentence, the pitch generation processing unit 179 in the same speech synthesis unit 172 is activated. It Pitch generation processing unit 17
Reference numeral 9 indicates the duration determined by the phoneme duration calculation processor 177, and the language analysis processor 1 (in the language processor 171).
First, the point pitch position is set based on the accent information determined by 74. Next, the pitch generation processing unit 179
Is a linear interpolation of a plurality of set point pitches.
A pitch pattern (fundamental frequency pattern) every 0 msec is obtained.

【０２６５】一方、音声合成部１７２内の音韻パラメー
タ生成処理部１８０は、（言語処理部１７１内の）言語
解析処理部１７４から渡される音声記号列の音韻情報を
もとに音韻パラメータを生成する処理を、例えばピッチ
生成処理部１７９によるピッチパターン生成処理と並行
して次のように行う。On the other hand, the phoneme parameter generation processing unit 180 in the speech synthesis unit 172 generates a phoneme parameter based on the phoneme information of the speech symbol string passed from the language analysis processing unit 174 (in the language processing unit 171). For example, the processing is performed as follows in parallel with the pitch pattern generation processing by the pitch generation processing unit 179.

【０２６６】まず本実施形態では、サンプリング周波数
１１０２５Ｈｚで標本化した実音声を改良ケプストラム
法により窓長２０msec、フレーム周期１０msecで分析し
て得た０次から２５次のケプストラム係数を子音＋母音
（ＣＶ）の単位で日本語音声の合成に必要な全音節を切
り出した計１３７個の音声素片が蓄積された音声素片フ
ァイル（図示せず）が用意されている。この音声素片フ
ァイルの内容は、文音声変換ソフトウェアに従う文音声
変換処理の開始時に、例えばメインメモリ（図示せず）
に確保された音声素片領域（以下音声素片メモリと称す
る）１８１に読み込まれているものとする。First, in the present embodiment, the 0th to 25th order cepstral coefficients obtained by analyzing the real voice sampled at the sampling frequency of 11025 Hz at a window length of 20 msec and a frame period of 10 msec by the improved cepstrum method are used as consonants + vowels (CV). ), A voice unit file (not shown) in which a total of 137 voice units, which are all syllables necessary for synthesizing Japanese voice, are accumulated is prepared. The content of this speech unit file is, for example, in the main memory (not shown) at the start of the sentence-speech conversion process according to the sentence-speech conversion software.
It is assumed that the speech unit area (hereinafter, referred to as a speech unit memory) 181 that is secured in (1) is read.

【０２６７】音韻パラメータ生成処理部１８０は、（言
語処理部１７１内の）言語解析処理部１７４から渡され
る音声記号列中の音韻情報に従って、上記したＣＶ単位
の音声素片を音声素片メモリ１８１から順次読み出し、
読み出した音声素片を接続することにより合成すべき音
声の音韻パラメータ（特徴パラメータ）を生成する。The phoneme parameter generation processing unit 180, in accordance with the phoneme information in the phonetic symbol string passed from the language analysis processing unit 174 (in the language processing unit 171,) stores the above-mentioned CV-based speech units in the speech unit memory 181. Sequentially read from,
The phoneme parameters (feature parameters) of the speech to be synthesized are generated by connecting the read speech units.

【０２６８】ピッチ生成処理部１７９によりピッチパタ
ーンが生成され、音韻パラメータ生成処理部１８０によ
り音韻パラメータが生成されると、音声合成部１７２内
の合成フィルタ処理部１８２が起動される。この合成フ
ィルタ処理部１８２は、図５中の合成フィルタ処理部５
２と同様に、図６に示したような構成となっている。し
たがって合成フィルタ処理部１８２は、上記生成された
ピッチパターンと音韻パラメータから、次のようにして
音声を合成する。When the pitch generation processing unit 179 generates a pitch pattern and the phoneme parameter generation processing unit 180 generates phoneme parameters, the synthesis filter processing unit 182 in the speech synthesis unit 172 is activated. The synthesis filter processing unit 182 is the synthesis filter processing unit 5 in FIG.
Similar to item 2, the configuration is as shown in FIG. Therefore, the synthesis filter processing unit 182 synthesizes a voice from the generated pitch pattern and phoneme parameter as follows.

【０２６９】まず、音声の有声部（Ｖ）では、駆動音源
切り替え部５２３によりインパルス発生部５２２側に切
り替えられる。インパルス発生部５２２は、ピッチ生成
処理部１７９により生成されたピッチパターンに応じた
間隔のインパルスを発生し、このインパルスを音源とし
てＬＭＡフィルタ５２４を駆動する。First, in the voiced part (V) of the voice, the driving sound source switching part 523 switches it to the impulse generating part 522 side. The impulse generation unit 522 generates impulses at intervals according to the pitch pattern generated by the pitch generation processing unit 179, and drives the LMA filter 524 using this impulse as a sound source.

【０２７０】一方、音声の無声部（Ｕ）では、駆動音源
切り替え部５２３によりホワイトノイズ発生部５２１側
に切り替えられる。ホワイトノイズ発生部５２１はホワ
イトノイズを発生し、このホワイトノイズを音源として
ＬＭＡフィルタ５２４を駆動する。ＬＭＡフィルタ５２
４は音声のケプストラムを直接フィルタ係数とするもの
である。On the other hand, in the unvoiced part (U) of the voice, the driving sound source switching part 523 switches it to the white noise generating part 521 side. The white noise generator 521 generates white noise, and drives the LMA filter 524 using this white noise as a sound source. LMA filter 52
4 directly uses the cepstrum of voice as a filter coefficient.

【０２７１】本実施形態において音韻パラメータ生成処
理部１８０により生成された音韻パラメータは前記した
ようにケプストラムであることから、この音韻パラメー
タがＬＭＡフィルタ５２４のフィルタ係数となり、駆動
音源切り替え部５２３により切り替えられる音源によっ
て駆動されることで、合成音声を出力する。Since the phoneme parameter generated by the phoneme parameter generation processing unit 180 in this embodiment is the cepstrum as described above, this phoneme parameter becomes the filter coefficient of the LMA filter 524 and is switched by the driving sound source switching unit 523. When driven by a sound source, it outputs synthetic speech.

【０２７２】合成フィルタ処理部１７２（内のＬＭＡフ
ィルタ５２４）により合成された音声は離散音声信号で
あり、Ｄ／Ａ変換器１８３によりアナログ信号に変換
し、アンプ１８４を通してスピーカ１８５に出力するこ
とで、初めて音として聞くことができる。The voice synthesized by the synthesis filter processing unit 172 (the LMA filter 524 therein) is a discrete voice signal, which is converted into an analog signal by the D / A converter 183 and output to the speaker 185 through the amplifier 184. For the first time, it can be heard as a sound.

【０２７３】ここまでの処理は、図３１を参照しながら
［従来技術］の欄にて説明した例とほぼ同じである。The processing up to this point is almost the same as the example described in the section "Prior Art" with reference to FIG.

【０２７４】この図１７の構成は、前記第４の実施形態
に係る図５の音声規則合成装置の構成に対応するもの
で、図５の構成においてＤ／Ａ変換器でのＤ／Ａ変換の
サンプリング周波数を可変する代わりに、合成音声（離
散音声信号）のサンプリング周波数自体を可変する点に
特徴がある。The configuration of FIG. 17 corresponds to the configuration of the speech rule synthesizing device of FIG. 5 according to the fourth embodiment. In the configuration of FIG. 5, D / A conversion by a D / A converter is performed. The feature is that instead of changing the sampling frequency, the sampling frequency itself of the synthesized voice (discrete voice signal) is changed.

【０２７５】即ち本実施形態のポイントは、合成フィル
タ処理部１８２とＤ／Ａ変換器１８３との間に、当該合
成フィルタ処理部１８３の出力である合成音声のサンプ
リング周波数を変換するサンプリング周波数変換処理部
１８８を設けると共に、（図５中の声質切替部５６及び
声質制御部５７に対応する）声質切替部１８６及び声質
制御部１８７を設け、当該声質制御部１８７が、合成フ
ィルタ処理部１８２から出力される合成音声のサンプリ
ング周波数を声質切替部１８６で指定された声質で決ま
る周波数に変換するようにサンプリング周波数変換処理
部１８８を制御するところにある。ここで、Ｄ／Ａ変換
器１８３でのＤ／Ａ変換のサンプリング周波数（第３の
標本周期）は固定であり、音声素片作成時の音声のサン
プリング周波数（第１の標本周期）に一致するものとす
る。That is, the point of the present embodiment is that the sampling frequency conversion processing for converting the sampling frequency of the synthetic speech output from the synthesis filter processing section 183 between the synthesis filter processing section 182 and the D / A converter 183. A voice quality switching unit 186 and a voice quality control unit 187 (corresponding to the voice quality switching unit 56 and the voice quality control unit 57 in FIG. 5) are provided, and the voice quality control unit 187 outputs from the synthesis filter processing unit 182. The sampling frequency conversion processing unit 188 is controlled so as to convert the sampling frequency of the synthesized voice to the frequency determined by the voice quality designated by the voice quality switching unit 186. Here, the sampling frequency (third sampling period) of the D / A conversion in the D / A converter 183 is fixed and coincides with the sampling frequency (first sampling period) of the voice when the speech unit is created. I shall.

【０２７６】本実施形態において、声質切替部１８６
は、図５中の声質切替部５６と同様に（ユーザによる指
定もしくはアプリケーションプログラム等によって）３
種類の声質が指定可能であり、声質制御部１８７は、声
質切替部１８６で指定された声質に応じて、合成フィル
タ処理部１８２から出力される合成音声のサンプリング
周波数を、１１０２５Ｈｚ（＝未変換），１２０００Ｈ
Ｚ，１００００Ｈｚのいずれかに変換するように、サン
プリング周波数変換処理部１８８を制御する。In the present embodiment, the voice quality switching section 186.
Is the same as the voice quality switching unit 56 in FIG. 5 (specified by the user or by an application program) 3
Various types of voice quality can be designated, and the voice quality control unit 187 sets the sampling frequency of the synthetic voice output from the synthesis filter processing unit 182 to 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 186. , 12000H
The sampling frequency conversion processing unit 188 is controlled so as to convert to either Z or 10000 Hz.

【０２７７】したがって、合成音声のサンプリング周波
数を、メモリ１８１に蓄えられた音声素片を作成した際
のサンプリング周波数（第１の標本周期）と同じサンプ
リング周波数、即ち１１０２５Ｈｚに変換するならば
（あるいはサンプリング周波数変換処理を行わないなら
ば）、元の音声の声質で音声合成することができる。Therefore, if the sampling frequency of the synthesized speech is converted to the same sampling frequency as the sampling frequency (first sampling period) when the speech unit stored in the memory 181 is created, that is, 11025 Hz (or sampling is performed). If frequency conversion processing is not performed), voice synthesis can be performed with the voice quality of the original voice.

【０２７８】一方、音声素片作成時とは異なるサンプリ
ング周波数（第１の標本周期とは異なる第２の標本周
期）、例えば１１０２５Ｈｚから１２０００Ｈｚ、ある
いは１１０２５Ｈｚから１００００Ｈｚへのサンプリン
グ周波数変換処理を行えば、既に述べたが、音声スペク
トルを図１３に示したように周波数軸方向にシフトした
効果が得られるため、音声の個人性が変化し、こうして
得られるアナログ音声信号の声質は、音声素片の元とな
っている音声の声質とは異なったものとなる。On the other hand, if the sampling frequency conversion processing from the sampling frequency (second sampling cycle different from the first sampling cycle) different from that at the time of creating the speech unit, for example, 11025 Hz to 12000 Hz or 11025 Hz to 10000 Hz is performed, As described above, since the effect of shifting the voice spectrum in the frequency axis direction as shown in FIG. 13 is obtained, the individuality of the voice is changed, and the voice quality of the analog voice signal thus obtained is the same as that of the voice segment. Is different from the voice quality of the voice.

【０２７９】［第１４の実施形態］前記第１３の実施形
態においては、声質切替部１８６、声質制御部１８７及
びサンプリング周波数変換処理部１８８を設けたこと
で、合成音声の声質を簡単に増やすことができるもの
の、合成される音声のスピードが声質により異なる。[Fourteenth Embodiment] In the thirteenth embodiment, the voice quality switching section 186, the voice quality control section 187, and the sampling frequency conversion processing section 188 are provided to easily increase the voice quality of synthesized speech. However, the speed of the synthesized voice differs depending on the voice quality.

【０２８０】即ち、（サンプリング周波数変換処理後の
サンプリング周波数）＞（Ｄ／Ａ変換のサンプリング周
波数）のときには、合成される音声のスピードは遅くな
る。逆に、（サンプリング周波数変換処理後のサンプリ
ング周波数）＜（Ｄ／Ａ変換のサンプリング周波数）の
ときには、合成される音声のスピードは早くなる。That is, when (sampling frequency after sampling frequency conversion processing)> (sampling frequency for D / A conversion), the speed of synthesized speech becomes slow. On the other hand, when (sampling frequency after sampling frequency conversion processing) <(sampling frequency for D / A conversion), the speed of the synthesized voice becomes faster.

【０２８１】このような合成される音声のスピードの違
いは、サンプリング周波数変換処理後のサンプリング周
波数が前記第１３の実施形態程度の違い（９１％，１０
９％）ではあまり問題とはならない。しかし以下に述べ
るように、サンプリング周波数変換処理後のサンプリン
グ周波数がＤ／Ａ変換時のサンプリング周波数と大きく
異なる場合には、問題となる。The difference in the speed of the synthesized voices is that the sampling frequency after the sampling frequency conversion processing is different from that in the thirteenth embodiment (91%, 10%).
9%) does not cause much problem. However, as described below, when the sampling frequency after the sampling frequency conversion processing is significantly different from the sampling frequency at the time of D / A conversion, there is a problem.

【０２８２】まず、サンプリング周波数変換処理後のサ
ンプリング周波数と合成時のＤ／Ａ変換のサンプリング
周波数の比が１に近ければ声質は変化も小さく、逆にこ
の比が１から離れれば声質は大きく変化する。したがっ
て、声質を大きく変えようとすれば、これら両サンプリ
ング周波数の比を例えば５０％，２００％程度に設定す
ればよいが、これでは合成音声のスピードもそれぞれ元
の音声の２００％，５０％、即ち倍と半分になり、かな
り聞きづらくなる。First, if the ratio of the sampling frequency after the sampling frequency conversion processing and the sampling frequency of the D / A conversion at the time of synthesis is close to 1, the voice quality changes little. Conversely, if the ratio deviates from 1, the voice quality changes greatly. To do. Therefore, if the voice quality is to be changed significantly, the ratio of these two sampling frequencies may be set to, for example, about 50% and 200%. With this, the speed of the synthesized voice is 200% and 50% of the original voice, respectively. In other words, it will be doubled and halved, making it much harder to hear.

【０２８３】そこで、サンプリング周波数変換処理後の
サンプリング周波数がＤ／Ａ変換のサンプリング周波数
と大きく異なった場合でも、合成音声のスピードを一定
にできるようにした第１４の実施形態につき説明する。Therefore, a fourteenth embodiment will be described in which the speed of synthesized speech can be made constant even when the sampling frequency after the sampling frequency conversion processing is significantly different from the sampling frequency for D / A conversion.

【０２８４】図１８は本発明の第１４の実施形態に係る
音声の規則合成装置の概略構成を示すブロック図であ
り、図１７と同一部分には同一符号を付してある。FIG. 18 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the fourteenth embodiment of the present invention. The same parts as those in FIG. 17 are designated by the same reference numerals.

【０２８５】この図１８の構成は、前記第５の実施形態
に係る図７の音声規則合成装置の構成に対応するもの
で、図７の構成においてＤ／Ａ変換器でのＤ／Ａ変換の
サンプリング周波数を可変する代わりに、合成音声（離
散音声信号）のサンプリング周波数自体を可変するもの
である。The configuration of FIG. 18 corresponds to the configuration of the speech rule synthesizing device of FIG. 7 according to the fifth embodiment. In the configuration of FIG. 7, D / A conversion by a D / A converter is performed. Instead of varying the sampling frequency, the sampling frequency itself of the synthetic speech (discrete speech signal) is varied.

【０２８６】本実施形態のポイントは、図１７中の発話
速度制御部１７８に代えて、合成音声のスピード（発話
速度）が制御可能な発話速度制御部１９８を設けると共
に、図１７中の声質制御部１８７に代えて、サンプリン
グ周波数変換処理部１８８の変換するサンプリング周波
数だけでなく合成音声のスピードを制御する声質制御部
１９７を設け、当該声質制御部１９７により、声質切替
部１８６の指定に応じて、サンプリング周波数変換処理
部１８８でのサンプリング周波数変換処理と同時に、発
話速度制御部１９８での合成音声のスピードを制御する
ところにある。ここで、Ｄ／Ａ変換器１８３でのＤ／Ａ
変換のサンプリング周波数（第３の標本周期）は固定で
あり、前記第１３の実施形態と同様に音声素片作成時の
音声のサンプリング周波数（第１の標本周期）に一致す
るものとする。The point of this embodiment is that a speech rate control section 198 capable of controlling the speed (speech rate) of synthesized speech is provided in place of the speech rate control section 178 in FIG. 17, and the voice quality control in FIG. Instead of the section 187, a voice quality control section 197 for controlling not only the sampling frequency converted by the sampling frequency conversion processing section 188 but also the speed of synthesized speech is provided, and the voice quality control section 197 causes the voice quality switching section 186 to specify the voice quality. At the same time as the sampling frequency conversion processing in the sampling frequency conversion processing unit 188, the speed of the synthesized voice in the speech rate control unit 198 is controlled. Here, the D / A in the D / A converter 183
The sampling frequency for conversion (third sampling period) is fixed, and is the same as the sampling frequency (first sampling period) of speech at the time of creating a speech segment, as in the thirteenth embodiment.

【０２８７】本実施形態では、前記第１３の実施形態と
同様に声質切替部１８６にて３種類の声質が指定可能で
ある。声質制御部１９７は声質切替部１８６で指定され
た声質に応じて、例えば１１０２５Ｈｚ，８０００Ｈ
ｚ，１６０００Ｈｚのいずれかのサンプリング周波数へ
の変換を行うようにサンプリング周波数変換処理部１８
８を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching section 186, as in the thirteenth embodiment. The voice quality control unit 197 determines, for example, 11025 Hz, 8000H according to the voice quality designated by the voice quality switching unit 186.
The sampling frequency conversion processing unit 18 so as to convert the sampling frequency to either z or 16000 Hz.
8 is controlled.

【０２８８】同時に声質制御部１９７は、声質切替部１
８６によって指定された声質に応じて、発話速度制御部
１９８を次のように制御する。即ち声質制御部１９７
は、先に説明したＣＶわたりの間隔が、（ＣＶわたりの間隔）＝（１１０２５Ｈｚへのサンプリ
ング周波数変換時のＣＶわたりの間隔）×（Ｄ／Ａ変換
のサンプリング周波数）／サンプリング周波数変換後の
サンプリング周波数）となるよう発話速度制御部１９８を制御する。At the same time, the voice quality control unit 197 determines that the voice quality switching unit 1
According to the voice quality designated by 86, the speech rate control unit 198 is controlled as follows. That is, the voice quality control unit 197
Is the above-mentioned CV crossing interval, (CV crossing interval) = (CV crossing interval when sampling frequency conversion to 11025 Hz) × (D / A conversion sampling frequency) / sampling after sampling frequency conversion The speech rate control unit 198 is controlled so that the frequency becomes).

【０２８９】したがって、８０００Ｈｚへのサンプリン
グ周波数変換を行う際には、声質制御部１９７は上式に
基づき、１１０２５Ｈｚへサンプリング周波数変換する
とき（あるいはサンプリング周波数変換をしないとき）
のＣＶわたり間隔の（１１０２５[Hz]／８０００[Hz]）
倍のＣＶわたり間隔となるよう発話速度制御部１９８を
制御する。また声質制御部１９７は、１６０００Ｈｚへ
のサンプリング周波数変換を行う際には、１１０２５Ｈ
ｚでサンプリング周波数変換するとき（あるいはサンプ
リング周波数変換をしないとき）のＣＶわたり間隔の
（１１０２５[Hz]／１６０００[Hz]）倍のＣＶわたり間
隔となるよう発話速度制御部１９８を制御する。Therefore, when converting the sampling frequency to 8000 Hz, the voice quality control section 197 converts the sampling frequency to 11025 Hz (or does not convert the sampling frequency) based on the above equation.
CV crossing interval (11025 [Hz] / 8000 [Hz])
The utterance speed control unit 198 is controlled so that the CV crossing interval is doubled. Further, the voice quality control unit 197 sets 11025H when performing sampling frequency conversion to 16000 Hz.
The utterance speed control unit 198 is controlled so that the CV crossing interval is (11025 [Hz] / 16000 [Hz]) times the CV crossing interval when the sampling frequency is converted by z (or when the sampling frequency is not converted).

【０２９０】このように本実施形態においては、発話速
度制御部１９８の制御でＣＶわたりの間隔（発話速度パ
ラメータ）を変えることにより、素片作成時と異なるサ
ンプリング周波数（第１の標本周期とは異なる第２の標
本周期）へのサンプリング周波数変換処理をした後、素
片作成時と同じサンプリング周波数（第２の標本周期と
は異なる第３の標本周期）でＤ／Ａ変換したときに生じ
る合成音声のスピードの変化を相殺することができる。As described above, in the present embodiment, by changing the interval between CVs (speech rate parameter) under the control of the speech rate control unit 198, a sampling frequency different from that at the time of segment generation (first sampling period After the sampling frequency conversion processing to a different second sampling period), the synthesis that occurs when D / A conversion is performed at the same sampling frequency (third sampling period different from the second sampling period) as when the segment was created The change in voice speed can be offset.

【０２９１】したがって、素片作成時のサンプリング周
波数と異なる８０００Ｈｚあるいは１６０００Ｈｚへの
サンプリング周波数変換処理を行った後、素片作成時と
同じサンプリング周波数でＤ／Ａ変換を行っても、ほぼ
同じスピードの音声のアナログ信号を得ることができ
る。Therefore, even if the D / A conversion is performed at the same sampling frequency as that at the time of producing the segment after the sampling frequency conversion processing to 8000 Hz or 16000 Hz different from the sampling frequency at the time of producing the segment is performed. A voice analog signal can be obtained.

【０２９２】［第１５の実施形態］前記第１４の実施形
態に基づいて規則合成を行えば、確かに声質を変えなが
らも合成される音声のスピードをほぼ一定に保つことが
簡単に実現できる。しかし既に説明したように、音韻継
続時間計算処理部１７７での処理（音韻継続時間計算処
理）では、ＣＶわたりの間隔を変えても子音の音韻継続
時間を音節毎に一定にするため、素片作成時と異なるサ
ンプリング周波数へ合成音声を変換した後、素片作成時
と同じサンプリング周波数でＤ／Ａ変換すると、子音の
継続時間が縮んだり、間延びしたりし、この結果、合成
される音声の明瞭性・自然性に影響を及ぼすことがあ
る。[Fifteenth Embodiment] If rule synthesis is performed based on the fourteenth embodiment, it is possible to easily realize that the speed of synthesized speech can be kept substantially constant while changing the voice quality. However, as described above, in the processing (phoneme duration calculation processing) in the phoneme duration calculation processing unit 177, the phoneme duration of consonants is made constant for each syllable even if the interval between CV crossings is changed. If the D / A conversion is performed at the same sampling frequency as that at the time of creating the segment after the synthesized speech is converted to a sampling frequency different from that at the time of creation, the duration of the consonant is shortened or extended, and as a result, the synthesized speech is This may affect clarity and naturalness.

【０２９３】そこで、合成される音声の子音の継続時間
を音節毎に一定に保つことができるようにした第１５の
実施形態につき説明する。Now, a fifteenth embodiment will be described in which the duration of the consonant of the synthesized voice can be kept constant for each syllable.

【０２９４】図１９は本発明の第１５の実施形態に係る
音声の規則合成装置の概略構成を示すブロック図であ
り、図１７または図１８と同一部分には同一符号を付し
てある。FIG. 19 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the fifteenth embodiment of the present invention. The same parts as those in FIG. 17 or 18 are designated by the same reference numerals.

【０２９５】この図１９の構成は、前記第６の実施形態
に係る図８の音声規則合成装置の構成に対応するもの
で、図８の構成においてＤ／Ａ変換器でのＤ／Ａ変換の
サンプリング周波数を可変する代わりに、合成音声（離
散音声信号）のサンプリング周波数自体を可変するもの
である。The configuration of FIG. 19 corresponds to the configuration of the speech rule synthesizing device of FIG. 8 according to the sixth embodiment. In the configuration of FIG. 8, D / A conversion in the D / A converter is performed. Instead of varying the sampling frequency, the sampling frequency itself of the synthetic speech (discrete speech signal) is varied.

【０２９６】本実施形態のポイントは、図１８中の音韻
継続時間計算処理部１７７に代えて、音韻継続時間が制
御可能な音韻継続時間計算処理部２０７を設けると共
に、図１８中の声質制御部１９７に代えて、サンプリン
グ周波数変換処理部１８８の変換するサンプリング周波
数だけでなく合成音声のスピードを制御する声質制御部
２１７を設け、当該声質制御部２１７により、声質切替
部１８６の指定に応じて、サンプリング周波数変換処理
部１８８でのサンプリング周波数変換処理と同時に、音
韻継続時間計算処理部２０７での音韻継続時間を制御す
るところにある。ここで、Ｄ／Ａ変換器１８３でのＤ／
Ａ変換のサンプリング周波数（第３の標本周期）は固定
であり、前記第１３の実施形態と同様に音声素片作成時
の音声のサンプリング周波数（第１の標本周期）に一致
するものとする。The point of this embodiment is that, instead of the phoneme duration calculation processing unit 177 in FIG. 18, a phoneme duration calculation processing unit 207 capable of controlling the phoneme duration is provided, and the voice quality control unit in FIG. Instead of 197, a voice quality control unit 217 that controls not only the sampling frequency converted by the sampling frequency conversion processing unit 188 but also the speed of synthesized speech is provided, and the voice quality control unit 217 causes the voice quality switching unit 186 to specify the voice quality. At the same time as the sampling frequency conversion processing in the sampling frequency conversion processing unit 188, the phoneme duration in the phoneme duration calculation processing unit 207 is controlled. Here, D / in the D / A converter 183
The sampling frequency (third sampling period) of the A conversion is fixed, and is the same as the sampling frequency (first sampling period) of the voice at the time of creating the speech unit, as in the thirteenth embodiment.

【０２９７】本実施形態では、前記第１４の実施形態と
同様に声質切替部１８６にて３種類の声質が指定可能で
ある。声質制御部２１７は声質切替部１８６で指定され
た声質に応じて、例えば１１０２５Ｈｚ（＝未変換），
８０００Ｈｚ，１６０００Ｈｚのいずれかのサンプリン
グ周波数へのサンプリング周波数変換処理を行うように
サンプリング周波数変換処理部１８８を制御する。In the present embodiment, three types of voice qualities can be designated by the voice quality switching section 186 as in the fourteenth embodiment. The voice quality control unit 217 determines, for example, 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 186.
The sampling frequency conversion processing unit 188 is controlled so as to perform the sampling frequency conversion processing to the sampling frequency of either 8000 Hz or 16000 Hz.

【０２９８】同時に声質制御部２１７は、声質切替部１
８６によって指定された声質に応じて、音韻継続時間計
算処理部２０７を次のように制御する。即ち声質制御部
２１７は、全ての音韻の継続時間、つまり子音の継続時
間と母音の継続時間が（音韻継続時間）＝（１１０２５Ｈｚへのサンプリング
周波数変換時の音韻継続時間）×（Ｄ／Ａ変換のサンプ
リング周波数）／（サンプリング周波数変換後のサンプ
リング周波数）となるよう音韻継続時間計算処理部２０７を制御する。At the same time, the voice quality control unit 217 determines the voice quality switching unit 1
In accordance with the voice quality designated by 86, the phoneme duration calculation processing unit 207 is controlled as follows. That is, the voice quality control unit 217 determines that the duration of all phonemes, that is, the duration of consonants and the duration of vowels (phoneme duration) = (phoneme duration when sampling frequency conversion to 11025 Hz) × (D / A conversion). Of the sampling frequency) / (sampling frequency after sampling frequency conversion) is controlled by the phoneme duration calculation processing unit 207.

【０２９９】したがって、８０００Ｈｚへのサンプリン
グ周波数変換処理を行う際には、声質制御部２１７は上
式に基づき、１１０２５Ｈｚへサンプリング周波数変換
するとき（あるいはサンプリング周波数変換をしないと
き）の音韻継続時間の（１１０２５[Hz]／８０００[H
z]）倍の音韻継続時間となるよう音韻継続時間計算処理
部２０７を制御する。また声質制御部２１７は、１６０
００Ｈｚへのサンプリング周波数変換処理を行う際に
は、１１０２５Ｈｚへサンプリング周波数変換するとき
（あるいはサンプリング周波数変換をしないとき）の音
韻継続時間の（１１０２５[Hz]／１６０００[Hz]）倍の
音韻継続時間となるよう音韻継続時間計算処理部２０７
を制御する。Therefore, when performing the sampling frequency conversion processing to 8000 Hz, the voice quality control unit 217 calculates the phoneme duration (when sampling frequency conversion is performed to 11025 Hz (or when sampling frequency conversion is not performed)) based on the above equation. 11025 [Hz] / 8000 [H
z]) The phoneme duration calculation processing unit 207 is controlled so that the phoneme duration is doubled. Further, the voice quality control unit 217
When performing the sampling frequency conversion process to 00 Hz, the phoneme duration is (11025 [Hz] / 16000 [Hz]) times the phoneme duration when the sampling frequency is converted to 11025 Hz (or when the sampling frequency is not converted). Phoneme duration calculation processing unit 207
Control.

【０３００】このよう本実施形態においては、素片作成
時と異なるサンプリング周波数（第１の標本周期とは異
なる第２の標本周期）へサンプリング周波数変換処理
し、素片作成時と同じサンプリング周波数（第２の標本
周期とは異なる第３の標本周期）でＤ／Ａ変換したとき
に生じる合成音声のスピードの変化を、各音韻継続時間
を一定の割合で変えることで相殺することができ、かつ
合成される音声の子音の継続時間を音節毎に一定に保つ
ことができる。As described above, in this embodiment, the sampling frequency conversion processing is performed to the sampling frequency (the second sampling period different from the first sampling period) different from that at the time of creating the segment, and the same sampling frequency as that at the time of creating the segment ( A change in the speed of the synthesized speech that occurs when D / A conversion is performed at a third sampling period different from the second sampling period) can be offset by changing each phoneme duration at a constant rate, and The duration of the consonant of the synthesized voice can be kept constant for each syllable.

【０３０１】［第１６の実施形態］前記第１４または第
１５の実施形態に基づいて合成を行えば、確かに声質を
変えながらも合成される音声のスピードをほぼ一定に保
つことができる。しかし、Ｄ／Ａ変換器１８３へ入力す
る音声のサンプリング周波数（サンプリング周波数変換
処理後のサンプリング周波数）とＤ／Ａ変換のサンプリ
ング周波数とを違えるということは、レコードの早回し
や遅回しとほぼ同じであるから、音声の過渡部分の時間
的に縮んだり間延びすることは避けられない。[Sixteenth Embodiment] By performing synthesis based on the fourteenth or fifteenth embodiment, the speed of synthesized speech can be kept substantially constant while changing the voice quality. However, the difference between the sampling frequency of the voice input to the D / A converter 183 (the sampling frequency after the sampling frequency conversion processing) and the sampling frequency of the D / A conversion is almost the same as the early or late rotation of the record. Therefore, it is unavoidable that the transient portion of the voice is temporally shortened or extended.

【０３０２】例をあげれば、/ わ/ という発声は、/ う
/ に近い口の形から急激に唇を開いて/ あ/ へ移る運動
を発声器管が行う。したがって、Ｄ／Ａ変換時のサンプ
リング周波数を落して、レコードの遅回しのようなこと
をすれば、この変化が緩やかになり、/ わ/ ではなく、
/ うあー/ のように聞こえてくる。As an example, the utterance "/ wa /" is / u
The vocal tract performs a movement of opening the lips from the shape of the mouth close to / and moving to / a /. Therefore, if the sampling frequency at the time of D / A conversion is lowered and the record is delayed, this change will be gradual, and not / w /
It sounds like / uh /.

【０３０３】そこで、Ｄ／Ａ変換器１８３へ入力する音
声のサンプリング周波数（サンプリング周波数変換処理
後のサンプリング周波数）と異なるサンプリング周波数
でＤ／Ａ変換したときに生じる合成音声過渡部分の時間
的方向の縮みや間延びを抑えることができるようにした
第１６の実施形態につき説明する。Therefore, the temporal direction of the synthesized voice transient portion generated when the D / A conversion is performed at a sampling frequency different from the sampling frequency of the voice input to the D / A converter 183 (sampling frequency after the sampling frequency conversion processing) is performed. A sixteenth embodiment in which shrinkage and extension can be suppressed will be described.

【０３０４】図２０は本発明の第１６の実施形態に係る
音声の規則合成装置の概略構成を示すブロック図であ
り、図１９と同一部分には同一符号を付してある。FIG. 20 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the sixteenth embodiment of the present invention. The same parts as those in FIG. 19 are designated by the same reference numerals.

【０３０５】この図２０の構成は、前記第７の実施形態
に係る図９の音声規則合成装置の構成に対応するもの
で、図９の構成においてＤ／Ａ変換器でのＤ／Ａ変換の
サンプリング周波数を可変する代わりに、合成音声（離
散音声信号）のサンプリング周波数自体を可変するもの
である。The configuration of FIG. 20 corresponds to the configuration of the speech rule synthesizing device of FIG. 9 according to the seventh embodiment. In the configuration of FIG. 9, D / A conversion by a D / A converter is performed. Instead of varying the sampling frequency, the sampling frequency itself of the synthetic speech (discrete speech signal) is varied.

【０３０６】本実施形態のポイントは、図１９中の音韻
パラメータ生成処理部１８０に代えて、（音声素片から
なる）音韻パラメータを時間軸方向へ伸縮する機能が付
加された音韻パラメータ生成処理部２３０を設けると共
に、図１９中の声質制御部２１７に代えて、音韻継続時
間及びサンプリング周波数変換処理部１８８の変換する
サンプリング周波数だけでなく音韻パラメータの時間軸
方向への伸縮を制御する声質制御部２３７を設け、当該
声質制御部２３７により、声質切替部１８６の指定に応
じて、サンプリング周波数変換処理部１８８でのサンプ
リング周波数変換処理及び音韻継続時間計算処理部２０
７での音韻継続時間を制御する他に、音韻パラメータ生
成処理部２３０を制御し、合成音声の過渡部が縮んでし
まうような場合には、予め音韻パラメータを時間方向に
引き伸ばして音韻パラメータを作成させ、合成音声の過
渡部が間延びするような場合には、予め音韻パラメータ
を時間方向に圧縮して音韻パラメータを作成させるとこ
ろにある。ここで、Ｄ／Ａ変換器１８３でのＤ／Ａ変換
のサンプリング周波数（第３の標本周期）は固定であ
り、前記第１３の実施形態と同様に音声素片作成時の音
声のサンプリング周波数（第１の標本周期）に一致する
ものとする。The point of the present embodiment is that, instead of the phoneme parameter generation processing unit 180 in FIG. 19, a phoneme parameter generation processing unit to which a function of expanding and contracting a phoneme parameter (consisting of speech units) in the time axis direction is added. A voice quality control unit 230 is provided, and instead of the voice quality control unit 217 in FIG. 19, not only the phoneme duration and the sampling frequency converted by the sampling frequency conversion processing unit 188 but also the expansion and contraction of the phoneme parameters in the time axis direction are controlled. 237 is provided, and according to the designation of the voice quality switching unit 186 by the voice quality control unit 237, the sampling frequency conversion processing and the phoneme duration calculation processing unit 20 in the sampling frequency conversion processing unit 188 is performed.
In addition to controlling the phoneme duration in 7, the phoneme parameter generation processing unit 230 is controlled, and when the transient part of the synthesized speech shrinks, the phoneme parameter is stretched in advance in the time direction to create the phoneme parameter. In the case where the transitional part of the synthetic speech is extended, the phoneme parameter is compressed in advance in the time direction to create the phoneme parameter. Here, the sampling frequency (third sampling period) of the D / A conversion in the D / A converter 183 is fixed, and the sampling frequency of the voice at the time of creating the speech unit (as in the thirteenth embodiment) ( The first sampling period).

【０３０７】本実施形態では、前記第１５の実施形態と
同様に声質切替部１８６にて３種類の声質が指定可能で
ある。声質制御部２３７は声質切替部１８６で指定され
た声質に応じて、１１０２５Ｈｚ（＝未変換），８００
０Ｈｚ，１６０００Ｈｚのいずれかのサンプリング周波
数へのサンプリング周波数変換処理を行うようにサンプ
リング周波数変換処理部１８８を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching section 186 as in the fifteenth embodiment. The voice quality control unit 237 responds to the voice quality designated by the voice quality switching unit 186 by 11025 Hz (= unconverted), 800
The sampling frequency conversion processing unit 188 is controlled so as to perform the sampling frequency conversion processing to the sampling frequency of either 0 Hz or 16000 Hz.

【０３０８】同時に声質制御部２３７は、声質切替部１
８６によって指定された声質に応じて、全ての音韻の継
続時間、即ち子音の継続時間と母音の継続時間を（音韻継続時間）＝（１１０２５Ｈｚへのサンプリング
変換時の音韻継続時間）×（Ｄ／Ａ変換のサンプリング
周波数）／（サンプリング周波数変換処理後のサンプリ
ング周波数）となるよう音韻継続時間計算処理部２０７を制御する。At the same time, the voice quality control section 237 determines that the voice quality switching section 1 has
According to the voice quality designated by 86, all phoneme durations, that is, consonant durations and vowel durations, are defined as (phoneme duration) = (phoneme duration during sampling conversion to 11025Hz) × (D / The phoneme duration calculation processing unit 207 is controlled so that the sampling frequency of A conversion) / (sampling frequency after sampling frequency conversion processing).

【０３０９】さらに、声質制御部２３７は音韻パラメー
タ生成処理部２３０を制御し、Ｄ／Ａ変換器１８３への
入力となる合成音声のサンプリング周波数（サンプリン
グ周波数変換処理後のサンプリング周波数）とＤ／Ａ変
換器１８３でのＤ／Ａ変換のサンプリング周波数が異な
るために、合成音声の過渡部が縮んでしまうような場合
には、予め音韻パラメータを時間方向に引き伸ばして音
韻パラメータを作成させ、合成音声の過渡部が間延びす
るような場合には、予め音韻パラメータを時間方向に圧
縮して音韻パラメータを作成させる。Further, the voice quality control section 237 controls the phoneme parameter generation processing section 230, and the sampling frequency (sampling frequency after the sampling frequency conversion processing) of the synthetic speech which is an input to the D / A converter 183 and the D / A. When the transient frequency of the synthesized speech is shortened due to the different sampling frequency of the D / A conversion in the converter 183, the phonological parameter is stretched in advance in the time direction to create the phonological parameter, and When the transitional part is extended, the phoneme parameters are compressed in advance in the time direction to generate the phoneme parameters.

【０３１０】もっと正確には、声質制御部２３７は、素
片自身の長さを、（Ｄ／Ａ変換のサンプリング周波数／
サンプリング周波数変換処理後のサンプリング周波数）
倍となるように伸縮を行ってから接続補間し、音韻パラ
メータを生成させる。More precisely, the voice quality control unit 237 calculates the length of the segment itself as (Sampling frequency of D / A conversion /
Sampling frequency after sampling frequency conversion processing)
Expansion and contraction are performed so that the number is doubled, and then connection interpolation is performed to generate phoneme parameters.

【０３１１】即ち、本実施形態における声質制御部２３
７は、サンプリング周波数変換処理部１８８が１１０２
５Ｈｚ（メモリ１８１に蓄えられた音声素片を作成した
際のサンプリング周波数と同じサンプリング周波数）へ
のサンプリング周波数変換処理を行う（言い換えれば、
サンプリング周波数変換処理を行わない）場合には、音
声素片の伸縮は行わずに音韻パラメータを生成し、８０
００へのサンプリング周波数変換処理を行う場合には、
音声素片を（１１０２５[Hz]／８０００[Hz]）倍の長さ
に延ばしてから接続補間して音韻パラメータを生成し、
１６０００Ｈｚへのサンプリング周波数変換処理を行う
場合には、音声素片を（１１０２５[Hz]／１６０００[H
z]）倍の長さに縮めてから接続補間して音韻パラメータ
を生成するよう音韻パラメータ生成処理部２３０を制御
する。That is, the voice quality control unit 23 in this embodiment.
7, the sampling frequency conversion processing unit 188 sets 1102.
Sampling frequency conversion processing to 5 Hz (sampling frequency the same as the sampling frequency when the speech unit stored in the memory 181 was created) is performed (in other words,
If the sampling frequency conversion process is not performed), the phoneme parameter is generated without expanding or contracting the speech unit, and
When performing sampling frequency conversion processing to 00,
The speech unit is extended to (11025 [Hz] / 8000 [Hz]) times and then connected and interpolated to generate phoneme parameters.
When performing the sampling frequency conversion processing to 16000 Hz, the speech unit is (11025 [Hz] / 16000 [H
z]) The phoneme parameter generation processing unit 230 is controlled so as to generate phoneme parameters by performing connection interpolation after shortening the phoneme parameter length.

【０３１２】このように本実施形態においては、Ｄ／Ａ
変換器１８３への入力となる合成音声のサンプリング周
波数（サンプリング周波数変換処理後のサンプリング周
波数、第２の標本周期）とＤ／Ａ変換器１８３でのＤ／
Ａ変換のサンプリング周波数（第３の標本周期）を異な
らせたことによって生じる合成音声過渡部分の時間方向
の縮みや間延びを、予め音韻パラメータ生成時に音声素
片を伸縮させておくことで打ち消すことができる。As described above, in this embodiment, D / A
The sampling frequency (sampling frequency after the sampling frequency conversion process, the second sampling period) of the synthesized speech that is an input to the converter 183 and the D / A converter 183 D / A
It is possible to cancel the shrinkage or extension in the time direction of the synthetic speech transient portion caused by the different sampling frequency (third sampling period) of the A conversion by expanding or contracting the speech unit in advance when the phoneme parameter is generated. it can.

【０３１３】［第１７の実施形態］前述の第１４乃至第
１６の実施形態は、ケプストラムやＬＰＣなどを利用し
た音声規則合成、即ち音声波形を分析して得られるパラ
メータを用いた音声規則合成だけではなく、波形合成
（による規則合成）にも応用は可能である。しかし、パ
ラメータを用いた音声規則合成では、前述の第１４乃至
第１６の実施形態を用いずとも、声質を変えながら、合
成音声のスピードを一定にし、かつ音声過渡部の縮み間
延びを起こさせない簡便な方法が適用可能である。[Seventeenth Embodiment] In the fourteenth to sixteenth embodiments described above, only speech rule synthesis using a cepstrum or LPC, that is, speech rule synthesis using parameters obtained by analyzing a speech waveform is performed. Instead, it can be applied to waveform synthesis (rule synthesis by). However, in the voice rule synthesis using the parameters, the speed of the synthesized voice is kept constant while the voice quality is changed and the contraction / extension of the voice transition portion is not caused even without using the above-mentioned fourteenth to sixteenth embodiments. Various methods are applicable.

【０３１４】そこで、この簡便な方法を、パラメータを
用いた音声規則合成に適用した第１７の実施形態につき
説明する。Now, a seventeenth embodiment in which this simple method is applied to voice rule synthesis using parameters will be described.

【０３１５】図２１は本発明の第１７の実施形態に係る
音声の規則合成装置の概略構成を示すブロック図であ
り、図１７と同一部分には同一符号を付してある。FIG. 21 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the seventeenth embodiment of the present invention. The same parts as those in FIG. 17 are designated by the same reference numerals.

【０３１６】この図２１の構成は、前記第８の実施形態
に係る図１０の音声規則合成装置の構成に対応するもの
で、図１０の構成においてＤ／Ａ変換器でのＤ／Ａ変換
のサンプリング周波数を可変する代わりに、合成音声
（離散音声信号）のサンプリング周波数自体を可変する
ものである。The configuration of FIG. 21 corresponds to the configuration of the speech rule synthesizing apparatus of FIG. 10 according to the eighth embodiment. In the configuration of FIG. 10, the D / A conversion in the D / A converter is performed. Instead of varying the sampling frequency, the sampling frequency itself of the synthetic speech (discrete speech signal) is varied.

【０３１７】本実施形態のポイントは、図１７中の合成
フィルタ処理部１８２に代えて、合成時のフレーム周期
が制御可能な合成フィルタ処理部２５２を設けると共
に、図１７中の声質制御部１８７に代えて、サンプリン
グ周波数変換処理部１８８の変換するサンプリング周波
数だけでなく合成時のフレーム周期を制御する声質制御
部２５７を設け、当該声質制御部２５７により、声質切
替部１８６の指定に応じて、サンプリング周波数変換処
理部１８８でのサンプリング周波数変換処理と同時に、
合成フィルタ処理部２５２での合成時のフレーム周期を
制御するところにある。ここで、Ｄ／Ａ変換器１８３で
のＤ／Ａ変換のサンプリング周波数（第３の標本周期）
は固定であり、前記第１３の実施形態と同様に音声素片
作成時の音声のサンプリング周波数（第１の標本周期）
に一致するものとする。The point of this embodiment is that, instead of the synthesis filter processing unit 182 in FIG. 17, a synthesis filter processing unit 252 capable of controlling the frame period at the time of synthesis is provided, and the voice quality control unit 187 in FIG. Instead, a voice quality control unit 257 that controls not only the sampling frequency converted by the sampling frequency conversion processing unit 188 but also the frame period at the time of synthesis is provided, and the voice quality control unit 257 performs sampling according to the designation of the voice quality switching unit 186. Simultaneously with the sampling frequency conversion processing in the frequency conversion processing unit 188,
This is to control the frame cycle at the time of composition in the composition filter processing unit 252. Here, the sampling frequency of the D / A conversion in the D / A converter 183 (third sampling period)
Is fixed, and the sampling frequency of the voice at the time of creating the voice segment (first sampling period) as in the thirteenth embodiment.
Shall match.

【０３１８】本実施形態では、前記第１３の実施形態と
同様に声質切替部１８６にて３種類の声質が指定可能で
ある。声質制御部２５７は声質切替部１８６で指定され
た声質に応じて、１１０２５Ｈｚ（＝未変換），８００
０Ｈｚ，１６０００Ｈｚのいずれかのサンプリング周波
数へのサンプリング周波数変換処理を行うようにサンプ
リング周波数変換処理部１８８を制御する。In this embodiment, three kinds of voice qualities can be designated by the voice quality switching section 186, as in the thirteenth embodiment. The voice quality control unit 257 responds to the voice quality specified by the voice quality switching unit 186 by 11025 Hz (= unconverted), 800
The sampling frequency conversion processing unit 188 is controlled so as to perform the sampling frequency conversion processing to the sampling frequency of either 0 Hz or 16000 Hz.

【０３１９】同時に声質制御部２５７は、声質切替部１
８６によって指定された声質に応じて、合成フィルタ処
理部２５２で行われる合成のフレーム周期を設定する。
合成のフレーム周期は次式により与えられる。At the same time, the voice quality control unit 257 determines that the voice quality switching unit 1
According to the voice quality designated by 86, the frame period of the synthesis performed by the synthesis filter processing unit 252 is set.
The frame period of synthesis is given by the following equation.

【０３２０】（フレーム周期）＝（素片作成時のフレーム周期）×（サンプリング周波数変換後のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（素片作成時のフレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（サンプリング周波数変換後のサンプリング周波数）したがって、Ｄ／Ａ変換時のサンプリング周波数と同じ
サンプリング周波数、即ち１１０２５Ｈｚへのサンプリ
ング周波数変換処理を行う（言い換えれば、サンプリン
グ周波数変換処理を行わない）際には、声質制御部２５
７は上式に基づき、フレーム周期をケプストラム作成時
と同じ１０msecで合成するよう合成フィルタ処理部２５
２を制御する。(Frame Cycle) = (Frame Cycle During Fragment Creation) × (Sampling Cycle After Sampling Frequency Conversion) / (D / A Conversion Sampling Cycle) = (Frame Cycle During Fragment Creation) × ( Sampling frequency of D / A conversion) / (Sampling frequency after sampling frequency conversion) Therefore, the same sampling frequency as the sampling frequency at the time of D / A conversion, that is, the sampling frequency conversion processing to 11025 Hz is performed (in other words, sampling frequency conversion (Not processed), the voice quality control unit 25
7 is a synthesis filter processing unit 25 based on the above equation so that the frame period is synthesized in the same 10 msec as when the cepstrum is created.
Control 2

【０３２１】また声質制御部２５７は、音声素片作成時
の音声のサンプリング周波数とは異なるサンプリング周
波数（第１の標本周期とは異なる第２の標本周期）、例
えば８０００Ｈｚへのサンプリング周波数変換処理を行
う際には、１０[msec]×１１０２５[Hz]／８０００[Hz]＝１３．８
[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御し、１６０００Ｈ
ｚへのサンプリング周波数変換処理を行う際には、１０[msec]×１１０２５[Hz]／１６０００[Hz]＝６．９
[msec] のフレーム周期（第１のフレーム周期とは異なる第２の
フレーム周期）で合成を行うよう制御する。Also, the voice quality control unit 257 performs a sampling frequency conversion process to a sampling frequency different from the sampling frequency of the voice at the time of creating the speech unit (second sampling period different from the first sampling period), for example, 8000 Hz. When performing, 10 [msec] × 11025 [Hz] / 8000 [Hz] = 13.8
Control is performed so that composition is performed at a frame cycle of [msec] (second frame cycle different from the first frame cycle), and 16000H
When performing the sampling frequency conversion processing to z, 10 [msec] × 11025 [Hz] / 16000 [Hz] = 6.9
Control is performed so that composition is performed in a frame cycle of [msec] (second frame cycle different from the first frame cycle).

【０３２２】このように本実施形態においては、Ｄ／Ａ
変換器１８３への入力である合成音声のサンプリング周
波数（第２の標本周期）とＤ／Ａ変換器１８３でのＤ／
Ａ変換のサンプリング周波数（第３の標本周期）を違え
たときに生じる合成音声のスピードの変化を、合成フィ
ルタ処理部２５２における合成のフレーム周期（第２の
フレーム周期）を音声素片作成時のフレーム周期（第１
のフレーム周期）とは異ならせることで相殺することが
できる。As described above, in this embodiment, D / A
The sampling frequency (second sampling period) of the synthesized voice which is an input to the converter 183 and the D / A converter 183 D / A
The change in the synthetic speech speed that occurs when the sampling frequency for A conversion (third sampling period) is changed is calculated as the synthetic frame period (second frame period) in the synthetic filter processing unit 252 when the speech unit is created. Frame period (first
It can be offset by making it different from the frame period.

【０３２３】したがって、Ｄ／Ａ変換器１８３でのＤ／
Ａ変換のサンプリング周波数（１１０２５Ｈｚ）とは異
なる８０００Ｈｚあるいは１６０００Ｈｚへサンプリン
グ周波数を変換した後、１１０２５Ｈｚのサンプリング
周波数でＤ／Ａ変換を行っても、同じスピードの音声の
アナログ信号を得ることができる。また同時に、音声素
片作成時と異なるサンプリング周波数でＤ／Ａ変換した
ときに生じる音声過渡部の縮みや間延びも同時に防ぐこ
とができる。Therefore, D / A in the D / A converter 183
Even if the sampling frequency is converted to 8000 Hz or 16000 Hz different from the sampling frequency of A conversion (11025 Hz) and then D / A conversion is performed at the sampling frequency of 11025 Hz, an analog signal of voice with the same speed can be obtained. At the same time, it is possible to simultaneously prevent shrinkage or extension of the voice transition portion that occurs when D / A conversion is performed at a sampling frequency different from that at the time of creating the voice unit.

【０３２４】［第１８の実施形態］前記第１３乃至第１
７の実施形態にはもう１つの問題点が存在する。それ
は、Ｄ／Ａ変換器１８３への入力となる音声のサンプリ
ング周波数（＝サンプリング周波数変換処理後のサンプ
リング周波数）とＤ／Ａ変換のサンプリング周波数が異
なると、声の高さ即ち音声のピッチが変化してしまうと
いうことである。例えば、（サンプリング周波数変換処
理後のサンプリング周波数）＞（Ｄ／Ａ変換のサンプリ
ング周波数）のときには、合成される音声のピッチは低
くなる。逆に、（サンプリング周波数変換処理後のサン
プリング周波数）＜（Ｄ／Ａ変換のサンプリング周波
数）のときには、合成される音声のピッチは高くなる。[18th Embodiment] The 13th to 1st Embodiments
Another problem exists in the seventh embodiment. When the sampling frequency (= sampling frequency after sampling frequency conversion processing) of the voice that is input to the D / A converter 183 and the sampling frequency of the D / A conversion are different, the pitch of the voice, that is, the pitch of the voice changes. It means to do. For example, when (sampling frequency after sampling frequency conversion processing)> (sampling frequency for D / A conversion), the pitch of synthesized speech becomes low. On the contrary, when (sampling frequency after sampling frequency conversion processing) <(sampling frequency for D / A conversion), the pitch of the synthesized voice becomes high.

【０３２５】このような合成される音声のピッチの違い
は、サンプリング周波数変換処理後のサンプリング周波
数が前記第１３の実施形態程度の違い（９１％，１０９
％）ではあまり問題とはならない。The difference in the pitch of the synthesized voices is that the sampling frequency after the sampling frequency conversion processing is the same as that in the thirteenth embodiment (91%, 109).
%) Does not matter so much.

【０３２６】しかし、声質を大きく変えようとして、両
サンプリング周波数の比を例えば５０％，２００％程度
に設定すれば、合成音声のピッチはそれぞれ２００％，
５０％になる。この場合、１１０２５Ｈｚへサンプリン
グ周波数変換処理したとき（あるいはサンプリング周波
数変換を行わないとき）の合成音声と比較して、前者は
ピッチが１[oct] 高い音声が合成され、後者は１[oct]
低い音声が合成されるので聞きづらくなるという問題が
発生する。However, if the ratio of both sampling frequencies is set to, for example, about 50% and 200% in order to largely change the voice quality, the pitch of the synthesized voice is 200%, respectively.
It will be 50%. In this case, compared with the synthesized speech when the sampling frequency conversion processing is performed to 11025 Hz (or when the sampling frequency conversion is not performed), the former speech is synthesized with a pitch higher by 1 [oct], and the latter speech is synthesized with 1 [oct].
Since low voices are synthesized, it becomes difficult to hear.

【０３２７】図２２は本発明の第１８の実施形態に係る
音声の規則合成装置の概略構成を示すブロック図であ
り、図１７と同一部分には同一符号を付してある。FIG. 22 is a block diagram showing the schematic arrangement of a speech rule synthesizing apparatus according to the eighteenth embodiment of the present invention. The same parts as those in FIG. 17 are designated by the same reference numerals.

【０３２８】この図２２の構成は、前記第９の実施形態
に係る図１１の音声規則合成装置の構成に対応するもの
で、図１１の構成においてＤ／Ａ変換器でのＤ／Ａ変換
のサンプリング周波数を可変する代わりに、合成音声
（離散音声信号）のサンプリング周波数自体を可変する
ものである。The configuration of FIG. 22 corresponds to the configuration of the speech rule synthesizing device of FIG. 11 according to the ninth embodiment. In the configuration of FIG. 11, D / A conversion by the D / A converter is performed. Instead of varying the sampling frequency, the sampling frequency itself of the synthetic speech (discrete speech signal) is varied.

【０３２９】本実施形態のポイントは、図１７中の合成
フィルタ処理部１８２に代えて、合成時のフレーム周期
及び合成音声のピッチが制御可能な合成フィルタ処理部
２７２を設けると共に、ピッチ生成処理部１７９と合成
フィルタ処理部２７２の間にピッチ生成処理部１７９で
生成されたピッチパターン（基本周波数パターン）を周
波数の異なる別のピッチパターンに変換（ピッチ変調）
して合成フィルタ処理部２７２に与えるピッチ変調処理
部２７８を設け、さらに図１７中の声質制御部１８７に
代えて、サンプリング周波数変換処理部１８８の変換す
るサンプリング周波数だけでなく合成時のフレーム周期
及びピッチの変調を制御する声質制御部２７７を設け、
当該声質制御部２７７により、声質切替部１８６の指定
に応じて、サンプリング周波数変換処理部１８８でのサ
ンプリング周波数変換処理及び合成フィルタ処理部２７
２での合成のフレーム周期を制御すると同時に、ピッチ
変調処理部２７８でのピッチの変調を制御するところに
ある。ここで、Ｄ／Ａ変換器１８３でのＤ／Ａ変換のサ
ンプリング周波数（第３の標本周期）は固定であり、前
記第１３の実施形態と同様に音声素片作成時の音声のサ
ンプリング周波数（第１の標本周期）に一致するものと
する。The point of this embodiment is that, instead of the synthesis filter processing unit 182 in FIG. 17, a synthesis filter processing unit 272 capable of controlling the frame period at the time of synthesis and the pitch of synthesized speech is provided, and the pitch generation processing unit is provided. The pitch pattern (fundamental frequency pattern) generated by the pitch generation processing unit 179 between the 179 and the synthesis filter processing unit 272 is converted into another pitch pattern having a different frequency (pitch modulation).
The pitch modulation processing unit 278 provided to the synthesis filter processing unit 272 is provided, and in addition to the voice quality control unit 187 in FIG. 17, not only the sampling frequency converted by the sampling frequency conversion processing unit 188 but also the frame period at the time of synthesis and A voice quality control unit 277 for controlling pitch modulation is provided,
The voice quality control unit 277 causes the sampling frequency conversion processing unit 188 to perform sampling frequency conversion processing and synthesis filter processing unit 27 in accordance with the designation of the voice quality switching unit 186.
In this case, the pitch modulation in the pitch modulation processing unit 278 is controlled at the same time that the frame period of the synthesis in 2 is controlled. Here, the sampling frequency (third sampling period) of the D / A conversion in the D / A converter 183 is fixed, and the sampling frequency of the voice at the time of creating the speech unit (as in the thirteenth embodiment) ( The first sampling period).

【０３３０】本実施形態では、前記第１３の実施形態と
同様に声質切替部１８６にて３種類の声質が指定可能で
ある。声質制御部２７７は声質切替部１８６で指定され
た声質に応じて、例えば１１０２５Ｈｚ（＝未変換），
８０００Ｈｚ，１６０００Ｈｚのいずれかのサンプリン
グ周波数への変換を行うようにサンプリング周波数変換
処理部１８８を制御する。In this embodiment, three types of voice qualities can be designated by the voice quality switching unit 186, as in the thirteenth embodiment. The voice quality control unit 277 determines, for example, 11025 Hz (= unconverted) according to the voice quality designated by the voice quality switching unit 186.
The sampling frequency conversion processing unit 188 is controlled so as to convert the sampling frequency to either 8000 Hz or 16000 Hz.

【０３３１】同時に声質制御部２７７は、声質切替部１
８６によって指定された声質に応じて、合成フィルタ処
理部２７２で行われる合成のフレーム周期を設定する。
合成のフレーム周期は次式により与えられる。At the same time, the voice quality control unit 277 determines that the voice quality switching unit 1
The frame period of the synthesis performed by the synthesis filter processing unit 272 is set according to the voice quality designated by 86.
The frame period of synthesis is given by the following equation.

【０３３２】（フレーム周期）＝（素片作成時のフレーム周期）×（サンプリング周波数変換後のサンプリング周期）／（Ｄ／Ａ変換のサンプリング周期）＝（素片作成時のフレーム周期）×（Ｄ／Ａ変換のサンプリング周波数）／（サンプリング周波数変換後のサンプリング周波数）声質制御部２７７はさらに、合成フィルタ処理部２７２
に与えるピッチを、（合成フィルタ処理部２７２に与えるピッチ）＝（１１０２５[Hz]へのサンプリング周波数変換時のピッチ）×（Ｄ／Ａ変換のサンプリング周期）／（サンプリング周波数変換処理後のサンプリング周期）＝（１１０２５[Hz]へのサンプリング周波数変換時のピッチ）×（サンプリング周波数変換処理後のサンプリング周波数）／（Ｄ／Ａ変換のサンプリング周波数）となるように、ピッチ変調処理部２７８を制御する。(Frame Cycle) = (Frame Cycle During Fragment Creation) × (Sampling Cycle After Sampling Frequency Conversion) / (D / A Conversion Sampling Cycle) = (Frame Cycle During Fragment Creation) × ( D / A conversion sampling frequency) / (sampling frequency after sampling frequency conversion) The voice quality control unit 277 further includes a synthesis filter processing unit 272.
Is given by: (pitch given to synthesis filter processing section 272) = (pitch when sampling frequency conversion to 11025 [Hz]) × (sampling period of D / A conversion) / (sampling frequency after sampling frequency conversion processing Period) = (pitch when sampling frequency conversion to 11025 [Hz]) × (sampling frequency after sampling frequency conversion processing) / (sampling frequency for D / A conversion) Control.

【０３３３】したがって、Ｄ／Ａ変換時のサンプリング
周波数と同じサンプリング周波数、即ち１１０２５[Hz]
へのサンプリング周波数変換処理を行う（言い換えれ
ば、サンプリング周波数変換処理を行わない）際には、
声質制御部２７７は、ピッチ生成処理部１７９が生成し
たピッチをそのまま合成フィルタ処理部２７２に与える
ようにピッチ変調処理部２７８を制御する。Therefore, the same sampling frequency as that at the time of D / A conversion, that is, 11025 [Hz]
When performing the sampling frequency conversion processing to (in other words, not performing the sampling frequency conversion processing),
The voice quality control unit 277 controls the pitch modulation processing unit 278 so that the pitch generated by the pitch generation processing unit 179 is given to the synthesis filter processing unit 272 as it is.

【０３３４】また声質制御部２７７は、８０００Ｈｚへ
のサンプリング周波数変換処理を行う際には、ピッチ生
成処理部１７９が生成したピッチを（８０００[Hz]／１
１０２５[Hz]）倍して合成フィルタ処理部２７２に与え
るよう制御し、１６０００Ｈｚへのサンプリング周波数
変換処理を行う際には、ピッチ生成処理部１７９が生成
したピッチを（１６０００[Hz]／１１０２５[Hz]）倍し
て合成合成フィルタ処理部２７２に与えるように制御す
る。When performing the sampling frequency conversion processing to 8000 Hz, the voice quality control section 277 sets the pitch generated by the pitch generation processing section 179 to (8000 [Hz] / 1.
1025 [Hz]), and the synthesis filter processing unit 272 is controlled so that the sampling frequency conversion processing to 16000 Hz is performed, the pitch generated by the pitch generation processing unit 179 is (16000 [Hz] / 11025 [ Hz]), and control is performed so that it is given to the synthesis filter processing unit 272.

【０３３５】このように本実施形態においては、合成フ
ィルタ処理部２７２に与えるピッチを声質制御部２７７
の制御のもとで予め変調しておくことにより、Ｄ／Ａ変
換器１８３への入力である合成音声のサンプリング周波
数（第２の標本周期）とＤ／Ａ変換器１８３でのＤ／Ａ
変換のサンプリング周波数（第３の標本周期）を違えた
ときに生じる合成音声のピッチの変化を相殺することが
できる。As described above, in the present embodiment, the pitch given to the synthesis filter processing section 272 is set to the voice quality control section 277.
By performing modulation in advance under the control of the D / A converter 183, the sampling frequency (second sampling period) of the synthesized voice which is an input to the D / A converter 183 and the D / A converter 183
It is possible to cancel the change in the pitch of the synthesized voice that occurs when the conversion sampling frequency (third sampling period) is changed.

【０３３６】したがって、８０００Ｈｚあるいは１６０
００Ｈｚへサンプリング周波数を変換した後、１１０２
５Ｈｚのサンプリング周波数でＤ／Ａ変換を行っても、
同じ声の高さの音声のアナログ信号を得ることができ
る。Therefore, 8000 Hz or 160
After converting the sampling frequency to 00 Hz, 1102
Even if D / A conversion is performed at a sampling frequency of 5 Hz,
It is possible to obtain a voice analog signal having the same pitch.

【０３３７】以上本発明の実施形態について説明してき
たが、本発明はこれら実施形態に限定されるものではな
い。Although the embodiments of the present invention have been described above, the present invention is not limited to these embodiments.

【０３３８】例えば、前記全ての実施形態では、音声の
特徴パラメータとしてケプストラムを使用しているが、
ＬＰＣやＰＡＲＣＯＲなど他のパラメータであっても本
発明は適用可能であり、同様な効果が得られる。声質の
数も全ての実施形態において３種類としたが、２種類ま
たは４種類以上でもよい。言語処理部に関しても形態素
解析以外に構文解析等が挿入されても全く問題なく、ま
た日本語のＴＴＳに限らず英語やその他の言語のＴＴＳ
に応用可能である。For example, in all the above-mentioned embodiments, the cepstrum is used as the characteristic parameter of voice,
The present invention can be applied to other parameters such as LPC and PARCOR, and similar effects can be obtained. The number of voice qualities is three in all embodiments, but may be two or four or more. Regarding the language processing unit, there is no problem even if syntax analysis or the like is inserted in addition to morphological analysis, and it is not limited to Japanese TTS and TTS of English and other languages.
Applicable to

【０３３９】また、継続時間決定の方法に関してもＣＶ
わたり間隔一定といった方法でなく、統計的な手法に基
づいた制御によっても構わない。ピッチ生成に関して
も、点ピッチによる方法でなくとも、例えば藤崎モデル
を利用した場合でも本発明は適用可能である。Also, regarding the method of determining the duration, CV
The control based on a statistical method may be used instead of the method of keeping the interval constant. The present invention can also be applied to the pitch generation, not only by the point pitch method but also by using the Fujisaki model, for example.

【０３４０】要するに本発明はその要旨に逸脱しない範
囲で種々変形して実施することができる。In short, the present invention can be variously modified and implemented without departing from the scope of the invention.

【０３４１】[0341]

【発明の効果】以上詳述したように本発明によれば、デ
ィジタル／アナログ変換手段に入力される合成された離
散音声信号の標本周期と当該離散音声信号を上記ディジ
タル／アナログ変換手段でアナログ音声信号に変換する
際の標本周期（変換周期）とが異なるように制御するこ
とで、あるいはディジタル／アナログ変換手段に入力さ
れる合成された離散音声信号の標本周期と当該離散音声
信号を上記ディジタル／アナログ変換手段でアナログ音
声信号に変換する際の標本周期（変換周期）との比を選
択指定された声質に応じて可変制御することで、アナウ
ンサ発声の収録や音声素片の再切り出しを行うことな
く、合成音声の声質を簡単に増やすことができる。As described above in detail, according to the present invention, the sampling period of the combined discrete audio signal input to the digital / analog converting means and the discrete audio signal are converted into analog audio by the digital / analog converting means. By controlling so that the sampling period (conversion period) at the time of conversion into a signal is different, or the sampling period of the combined discrete speech signal input to the digital / analog conversion means and the discrete speech signal By performing variable control of the ratio to the sampling period (conversion period) when converting to an analog voice signal by the analog converting means according to the specified voice quality, recording of announcer utterances and recutting of voice segments are performed. It is possible to easily increase the voice quality of synthetic speech.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係る音声の分析合成
装置の概略構成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a voice analysis / synthesis device according to a first embodiment of the present invention.

【図２】同実施形態においてＤ／Ａ変換のサンプリング
周波数を変えることにより得られる効果を説明するため
の図。FIG. 2 is a diagram for explaining the effect obtained by changing the sampling frequency of D / A conversion in the same embodiment.

【図３】本発明の第２の実施形態に係る音声の分析合成
装置の概略構成を示すブロック図。FIG. 3 is a block diagram showing a schematic configuration of a voice analysis / synthesis device according to a second exemplary embodiment of the present invention.

【図４】本発明の第３の実施形態に係る音声の分析合成
装置の概略構成を示すブロック図。FIG. 4 is a block diagram showing a schematic configuration of a voice analysis / synthesis device according to a third embodiment of the present invention.

【図５】本発明の第４の実施形態に係る音声の規則合成
装置の概略構成を示すブロック図。FIG. 5 is a block diagram showing a schematic configuration of a speech rule synthesizing apparatus according to a fourth embodiment of the present invention.

【図６】合成フィルタ処理部の構成を示すブロック図。FIG. 6 is a block diagram showing a configuration of a synthesis filter processing unit.

【図７】本発明の第５の実施形態に係る音声の規則合成
装置の概略構成を示すブロック図。FIG. 7 is a block diagram showing a schematic configuration of a rule synthesizing device for speech according to a fifth embodiment of the present invention.

【図８】本発明の第６の実施形態に係る音声の規則合成
装置の概略構成を示すブロック図。FIG. 8 is a block diagram showing a schematic configuration of a rule synthesizing device for speech according to a sixth embodiment of the present invention.

【図９】本発明の第７の実施形態に係る音声の規則合成
装置の概略構成を示すブロック図。FIG. 9 is a block diagram showing a schematic configuration of a speech rule synthesizing apparatus according to a seventh embodiment of the present invention.

【図１０】本発明の第８の実施形態に係る音声の規則合
成装置の概略構成を示すブロック図。FIG. 10 is a block diagram showing a schematic configuration of a speech rule synthesizing apparatus according to an eighth embodiment of the present invention.

【図１１】本発明の第９の実施形態に係る音声の規則合
成装置の概略構成を示すブロック図。FIG. 11 is a block diagram showing a schematic configuration of a speech rule synthesizing apparatus according to a ninth embodiment of the present invention.

【図１２】本発明の第１０の実施形態に係る音声の分析
合成装置の概略構成を示すブロック図。FIG. 12 is a block diagram showing a schematic configuration of a voice analysis / synthesis device according to a tenth exemplary embodiment of the present invention.

【図１３】同実施形態においてＤ／Ａ変換器への入力と
なる合成音声のサンプリング周波数を変えることにより
得られる効果を説明するための図。FIG. 13 is a diagram for explaining an effect obtained by changing the sampling frequency of the synthetic voice that is an input to the D / A converter in the same embodiment.

【図１４】サンプリング周波数変換処理部の構成を、そ
の動作と共に説明するための図。FIG. 14 is a diagram for explaining the configuration of a sampling frequency conversion processing unit along with its operation.

【図１５】本発明の第１１の実施形態に係る音声の分析
合成装置の概略構成を示すブロック図。FIG. 15 is a block diagram showing a schematic configuration of a voice analysis / synthesis device according to an eleventh exemplary embodiment of the present invention.

【図１６】本発明の第１２の実施形態に係る音声の分析
合成装置の概略構成を示すブロック図。FIG. 16 is a block diagram showing a schematic configuration of a speech analysis / synthesis device according to a twelfth exemplary embodiment of the present invention.

【図１７】本発明の第１３の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 17 is a block diagram showing a schematic configuration of a speech rule synthesizing device according to a thirteenth embodiment of the present invention.

【図１８】本発明の第１４の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 18 is a block diagram showing a schematic configuration of a speech rule synthesizing device according to a fourteenth embodiment of the present invention.

【図１９】本発明の第１５の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 19 is a block diagram showing a schematic configuration of a speech rule synthesizing device according to a fifteenth embodiment of the present invention.

【図２０】本発明の第１６の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 20 is a block diagram showing a schematic configuration of a rule synthesizing device for speech according to a sixteenth embodiment of the present invention.

【図２１】本発明の第１７の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 21 is a block diagram showing a schematic configuration of a speech rule synthesizing apparatus according to a seventeenth embodiment of the present invention.

【図２２】本発明の第１８の実施形態に係る音声の規則
合成装置の概略構成を示すブロック図。FIG. 22 is a block diagram showing a schematic configuration of a rule synthesizing device for speech according to an eighteenth embodiment of the present invention.

【図２３】従来の音声の分析合成装置の概略構成を示す
ブロック図。FIG. 23 is a block diagram showing a schematic configuration of a conventional speech analysis / synthesis device.

【図２４】従来の音声の規則合成装置の概略構成を示す
ブロック図。FIG. 24 is a block diagram showing a schematic configuration of a conventional speech rule synthesizing device.

[Explanation of symbols]

１１，１４１…メモリ（特徴パラメータ蓄積手段）、１２，１４２…メモリ（ピッチパターン蓄積手段）、１３，２３，３３，５２，，１１２，１３２，１４３，
１５３，１６３，１８２，２５２，２７２…合成フィル
タ処理部（合成手段）、１４，５３，１４４，１８３…Ｄ／Ａ変換器（ディジタ
ル／アナログ変換手段）、１７，５６，１４７，１８６…声質切替部（声質選択手
段）、１８，２８，３８，５７，６７，８７，９７，１１７，
１３７，１４８，１５８，１６８，１８７，１９７，２
１７，２３７，２５７，２７７…声質制御部、３１，１３８，１６１，２７８…ピッチ変調処理部（ピ
ッチパターン変調手段）、４２，６２，７２，９２，１０２，１２２，１７２，１
９２，２０２，２２２，２４２，２６２…音声合成部
（音声合成手段）、４７，７７，１７７，２０７…音韻継続時間計算処理
部、４８，６８，１７８，１９８…発話速度制御部、４９，１７９…ピッチ生成処理部、５０，９０，１８０，２３０…音韻パラメータ生成処理
部（合成パラメータフレーム時系列生成手段）、５１，１８１…音声素片メモリ（素片蓄積手段）、１４９，１８８…サンプリング周波数変換処理部（標本
周期変換手段）。11, 141 ... Memory (feature parameter storage means), 12, 142 ... Memory (pitch pattern storage means), 13, 23, 33, 52, 112, 132, 143
153, 163, 182, 252, 272 ... Synthesis filter processing section (synthesis means), 14, 53, 144, 183 ... D / A converter (digital / analog conversion means), 17, 56, 147, 186 ... Voice quality switching Section (voice quality selection means), 18, 28, 38, 57, 67, 87, 97, 117,
137, 148, 158, 168, 187, 197, 2
17, 237, 257, 277 ... Voice quality control section, 31, 138, 161, 278 ... Pitch modulation processing section (pitch pattern modulation means), 42, 62, 72, 92, 102, 122, 172, 1
92, 202, 222, 242, 262 ... Speech synthesis section (speech synthesis means), 47, 77, 177, 207 ... Phoneme duration calculation processing section, 48, 68, 178, 198 ... Speech rate control section, 49, 179 ... Pitch generation processing unit, 50, 90, 180, 230 ... Phonological parameter generation processing unit (synthesis parameter frame time series generation means), 51, 181: Speech unit memory (unit storage unit), 149, 188 ... Sampling frequency Conversion processing unit (sample period conversion means).

Claims

[Claims]

1. A storage means for storing a characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a discrete speech signal with the characteristic parameter of the speech read from the storage means as an input. And a digital / analog converting means for converting the discrete voice signal synthesized by the synthesizer into an analog voice signal at a second sampling period different from the first sampling period. Characteristic speech synthesizer.

2. A speech characteristic parameter obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in accumulating means, and the speech characteristic parameter read out from said accumulating means is input as a discrete value. A voice synthesizing method for synthesizing a voice signal, wherein the synthesized discrete voice signal is converted into an analog voice signal at a second sampling period different from the first sampling period.

3. A storage means for storing characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a discrete speech signal with the characteristic parameters of speech read out from the storage means as an input. A voice quality selection means for selectively designating the voice quality of the voice to be synthesized, and a discrete voice signal synthesized by the synthesis means determined according to the voice quality selected and designated by the voice quality selection means. A voice synthesis apparatus comprising: a digital / analog conversion means for converting into an analog voice signal at a sampling period of 2.

4. A speech characteristic parameter obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in accumulating means, and the speech characteristic parameter read out from said accumulating means is input as a discrete signal. In a voice synthesis method for synthesizing a voice signal, selection and designation of a voice quality of a voice to be synthesized is accepted, and the synthesized discrete voice signal is converted into an analog voice signal at a second sample period determined according to the accepted voice quality. A voice synthesis method characterized by:

5. From a time series of characteristic parameter frames of speech obtained by analyzing a discrete speech signal sampled at a first sampling period by applying a time window at a first frame period to the first frame period. A synthesizing means for synthesizing the discrete speech signal in a second frame cycle different from the above, and a discrete speech signal synthesized by the synthesizing means into an analog speech signal in a second sampling cycle different from the first sampling cycle. A voice synthesizer comprising:

6. From the time series of the characteristic parameter frames of the voice obtained by analyzing the discrete voice signal sampled at the first sampling period by applying a time window at the first frame period to the first frame period. A discrete voice signal is synthesized in a second frame period different from the above, and the discrete voice signal synthesized in the second frame period is converted into an analog voice signal in a second sample period different from the first sample period. A speech synthesis method characterized by the above.

7. The first frame period is calculated from a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period. , The first sampling period and the first
And a synthesizing means for synthesizing the discrete speech signal in a second frame period determined based on a second sampling period different from the sampling period of, and the discrete speech signal synthesized by the synthesizing means in the second sampling period. A voice synthesizer comprising: a digital / analog conversion means for converting into an analog voice signal.

8. The first frame period is calculated from a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period. , The first sampling period and the first
Of the second sampling period different from the sampling period of the second sampling period, the discrete speech signal is synthesized in the second frame period, and the discrete speech signal synthesized in the second frame period is analogized in the second sampling period. A speech synthesis method characterized by converting to a speech signal.

9. A storage means for storing a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period, and synthesizing. A voice quality selection means for selectively designating the voice quality of the voice to be played, and a second frame cycle determined according to the voice quality selected and designated by the voice quality selection means from the time series of the characteristic parameter frames read from the storage means. A synthesizing unit for synthesizing the discrete voice signal, and a digital / converting unit for synthesizing the discrete voice signal synthesized by the synthesizing unit into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting unit. A voice synthesizing device comprising: an analog converting means.

10. A time series of voice characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period to a storage unit. In a speech synthesis method for synthesizing a discrete speech signal from a time series of characteristic parameter frames read from the accumulating means, a selection specification of a voice quality of speech to be synthesized is accepted, and a discrete speech signal is synthesized from a time series of the characteristic parameter frame. The processing is performed in the second frame cycle determined according to the received voice quality, and the discrete voice signal synthesized in the second frame cycle is used as the analog voice in the second sample cycle determined according to the received voice quality. A speech synthesis method characterized by converting into a signal.

11. The first frame period is calculated from a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period. , Based on the first sampling period and a second sampling period different from the first sampling period, the second frame period = the first frame period × the first sampling period / the second sampling period The synthesizing means synthesizes the discrete speech signal in the second frame cycle which is defined, and the digital / analog converting means which transforms the discrete speech signal synthesized by the synthesizing means into the analog speech signal in the second sampling cycle. A speech synthesizer characterized by:

12. The first frame period is calculated from a time series of voice characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period. , Based on the first sampling period and a second sampling period different from the first sampling period, the second frame period = the first frame period × the first sampling period / the second sampling period A voice synthesizing method comprising synthesizing a discrete voice signal in a predetermined second frame period, and converting the discrete voice signal synthesized in the second frame period into an analog voice signal in the second sampling period.

13. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. Synthesizing means for synthesizing a discrete speech signal from the characteristic parameter read out from the characteristic parameter accumulating means and the fundamental frequency pattern read out from the pitch pattern accumulating means, and the discrete speech signal synthesized by the synthesizing means is converted into an analog speech signal. A digital / analog conversion means for selectively converting the analog audio signal into the first sampling period or a second sampling period different from the first sampling period. The analog conversion means and the digital / analog conversion means perform the analog sampling at the second sampling period. And a pitch pattern generating means for generating a fundamental frequency pattern different from the fundamental frequency pattern read from the pitch pattern accumulating means and applying the fundamental frequency pattern to the synthesizing means when the conversion to the voice signal is performed. A speech synthesizer.

14. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in a characteristic parameter accumulating means, and a fundamental frequency pattern of the speech is accumulated in a pitch pattern accumulating means. A voice in which a discrete voice signal is stored and synthesized from the feature parameter read from the feature parameter storage means and the fundamental frequency pattern read from the pitch pattern storage means, and the synthesized discrete voice signal is converted into an analog voice signal. In the synthesizing method, a first mode in which the conversion into the analog audio signal is performed in the first sampling period and a second mode different from the first sampling period
And a second mode performed in a sampling period of, and in the synthesis process in the second mode, a fundamental frequency pattern different from the fundamental frequency pattern read from the pitch pattern accumulating unit is generated to generate the discrete speech signal. A speech synthesis method characterized by being used for synthesizing speech.

15. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. A pitch pattern modulation unit that modulates the fundamental frequency pattern read from the pitch pattern storage unit based on the first sampling period and a second sampling period different from the first sampling period; Synthesizing means for synthesizing the discrete speech signal from the read characteristic parameter and the fundamental frequency pattern modulated by the pitch pattern modulating means, and the discrete speech signal synthesized by the synthesizing means for analog speech at the second sampling period. Speech synthesis characterized by comprising digital / analog conversion means for converting into a signal Location.

16. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is stored in a characteristic parameter storage means, and a fundamental frequency pattern of the speech is stored in a pitch pattern storage means. The basic frequency pattern that has been stored and read from the pitch pattern storage means is modulated based on the first sampling period and a second sampling period different from the first sampling period, A voice synthesizing method comprising synthesizing a discrete voice signal from the read characteristic parameter and the pitch-modulated fundamental frequency pattern, and converting the synthesized discrete voice signal into an analog voice signal at the second sampling period.

17. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. Voice quality selection means for selectively designating the voice quality of the voice to be synthesized, and pitch pattern modulation for performing modulation according to the voice quality selected and designated by the voice quality selection means for the fundamental frequency pattern read from the pitch pattern storage means. Means, a synthesizing means for synthesizing a discrete speech signal from the characteristic parameter read from the characteristic parameter accumulating means and the fundamental frequency pattern modulated by the pitch pattern modulating means, and the discrete speech signal synthesized by this synthesizing means. A second sampling period determined according to the voice quality selected and designated by the voice quality selection means. Speech synthesis apparatus characterized by comprising a digital / analog converting means for converting an analog audio signal.

18. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in a characteristic parameter accumulating means, and a fundamental frequency pattern of the speech is accumulated in a pitch pattern accumulating means. In a voice synthesizing method for synthesizing discrete voice signals based on the characteristic parameters read out from the feature parameter accumulating means and the fundamental frequency pattern read out from the pitch pattern accumulating means, selection and designation of voice quality of voice to be synthesized The target frequency pattern read from the pitch pattern accumulating means, performs modulation according to the accepted voice quality, and is discrete from the characteristic parameter read from the characteristic parameter accumulating means and the modulated fundamental frequency pattern. The voice signal is synthesized, and the synthesized discrete voice signal is Speech synthesis method characterized by converting the analog audio signal at a second sample period is determined according to only put the voice quality.

19. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. The basic frequency pattern read from the pitch pattern accumulating means is based on the first sampling period and a second sampling period different from the first sampling period (second sampling period / first sampling period). ) Double pitch pattern modulating means, synthesizing means for synthesizing a discrete voice signal from the characteristic parameter read from the characteristic parameter accumulating means and the fundamental frequency pattern modulated by the pitch pattern modulating means, and this synthesizing means. A digital / analog converter for converting the discrete voice signal synthesized by the above into an analog voice signal at the second sampling period. Speech synthesis apparatus characterized by comprising and.

20. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in a characteristic parameter accumulating means, and a fundamental frequency pattern of speech is accumulated in a pitch pattern accumulating means. The fundamental frequency pattern that has been accumulated and read from the pitch pattern accumulating means is based on the first sampling period and a second sampling period different from the first sampling period (second sampling period / second sampling period / A first sample period) times, a discrete speech signal is synthesized from the characteristic parameter read from the characteristic parameter storage means and the modulated fundamental frequency pattern, and the synthesized discrete speech signal is used as the second sample. A voice synthesizing method characterized by converting into an analog voice signal in a cycle.

21. A voice synthesizing means for synthesizing a discrete voice signal by selecting and connecting a voice segment created from a discrete voice signal sampled at a first sampling period based on given phonological information, A voice synthesizing apparatus comprising: a digital / analog converting unit for converting the discrete voice signal synthesized by the voice synthesizing unit into an analog voice signal at a second sampling period different from the first sampling period. .

22. Discrete speech signals are synthesized by selecting and connecting speech units created from discrete speech signals sampled at a first sampling period based on given phonological information, and the synthesized discrete speech signals. A voice synthesizing method comprising converting a voice signal into an analog voice signal at a second sampling period different from the first sampling period.

23. A voice synthesizing means for synthesizing a discrete voice signal by selecting and connecting a voice unit created from a discrete voice signal sampled at a first sampling period based on given phoneme information, Voice quality selection means for selectively designating the voice quality of the voice to be synthesized, and a discrete voice signal synthesized by the voice synthesis means is analogized at a second sampling period determined according to the voice quality selected and designated by the voice quality selection means. A voice synthesizing apparatus comprising: a digital / analog converting means for converting into a voice signal.

24. A voice synthesizing method for synthesizing a discrete voice signal by selecting and connecting a voice unit created from a discrete voice signal sampled at a first sampling period based on given phonological information, A voice synthesizing method, characterized in that selection and designation of a voice quality of a voice to be synthesized is accepted, and the synthesized discrete voice signal is converted into an analog voice signal at a second sampling period determined according to the accepted voice quality.

25. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is uttered in a speech to be synthesized. A voice synthesizing means for synthesizing a discrete voice signal by connecting according to a utterance speed parameter relating to a speed or a utterance time, and a digital / analog conversion for converting the discrete voice signal synthesized by the voice synthesizing means into an analog voice signal. And converting the analog audio signal into the first audio signal.
And a digital / analog converter that can selectively perform a second sample cycle different from the first sample cycle or the first sample cycle, wherein the voice synthesizer is the digital / analog converter. When the conversion to the analog voice signal is performed in the second sampling period, the speech rate parameter having a different value from that in the conversion to the analog voice signal in the first sampling period is used. A speech synthesizer characterized by being configured as follows.

26. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A voice synthesis method for synthesizing a discrete voice signal by connecting according to a voice speed parameter relating to a speed or a voice time, and converting the synthesized discrete voice signal into an analog voice signal, wherein the conversion to the analog voice signal is performed. A first mode performed in the first sampling period and a second mode different from the first sampling period
And a second mode performed at a sampling period of, and in the synthesizing process in the second mode, the utterance speed having a value different from that when the conversion into the analog audio signal is performed at the first sampling period. A speech synthesis method characterized in that a parameter is used.

27. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is uttered in a speech to be synthesized. A voice synthesizing means for synthesizing a discrete voice signal by connecting according to a speech rate parameter relating to a speed or a speech time, wherein the speech rate parameter to be used is the first sampling period and the first sampling period. And a digital / analog converting means for converting the discrete voice signal synthesized by the voice synthesizing means into an analog voice signal at the second sample period. A voice synthesizing device comprising.

28. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A speech synthesis method for synthesizing a discrete speech signal by connecting in accordance with a speech rate parameter relating to a speed or a speech time, wherein the speech rate parameter used in synthesis is the first sampling period and the first sampling period. A method for synthesizing speech, which is determined based on a second sampling period different from, and at which the synthesized discrete speech signal is converted into an analog speech signal at the second sampling period.

29. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A voice synthesizing means for synthesizing a discrete voice signal by connecting according to a utterance speed parameter related to a speed or a utterance time, a voice quality selecting means for selectively designating a voice quality of a voice to be synthesized, and a synthesizing means by the voice synthesizing means. A digital / analog converting means for converting the discrete voice signal thus converted into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means, wherein the voice synthesizing means is used. A voice synthesizer characterized in that the speech rate parameter to be determined is determined according to the voice quality selected and designated by the voice quality selecting means. Place.

30. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is uttered in a speech to be synthesized. In a speech synthesis method for synthesizing discrete speech signals by connecting according to speech rate parameters related to speed or speech time, the selection and designation of the voice quality of the speech to be synthesized is accepted, and the speech rate parameter used at the time of synthesis is A voice synthesizing method characterized by determining the received voice quality and converting the synthesized discrete voice signal into an analog voice signal at a second sampling period determined according to the received voice quality.

31. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme; and a discrete voice signal synthesized by the voice synthesizing unit into an analog voice signal. Digital / analog conversion means for converting, wherein the conversion to the analog audio signal is performed by the first
And a digital / analog conversion means capable of selectively performing a second sampling cycle different from the first sampling cycle or the first sampling cycle. When the conversion to the analog voice signal is performed at the second sampling period, the phoneme duration is set to have a phoneme duration different from that when the conversion to the analog speech signal is performed at the first sampling period. A speech synthesizer characterized by being configured to determine a time.

32. While determining the duration of each phoneme included in the given phoneme information, a speech unit created from a discrete speech signal sampled at a first sampling period is selected based on the phoneme information. A voice synthesizing method for synthesizing a discrete voice signal by connecting the selected voice units based on the determined duration of each phoneme, and converting the synthesized discrete voice signal into an analog voice signal, A first mode in which conversion into an audio signal is performed in the first sampling period and a second mode different from the first sampling period
And a second mode that is performed in the sampling cycle, and the synthesizing process in the second mode has a phoneme duration different from that when the conversion into the analog audio signal is performed in the first sampling cycle. A method for synthesizing speech, wherein the phoneme duration is determined as described above.

33. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A voice synthesizing means for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme, wherein the phoneme duration to be used is the first sampling period and A voice synthesizing unit that determines based on a second sample period different from the first sample period, and a digital that converts a discrete voice signal synthesized by the voice synthesizing unit into an analog voice signal at the second sample period. / Analog conversion means.

34. The duration of each phoneme included in the given phoneme information is determined, and a phoneme segment created from a discrete audio signal sampled at a first sampling period is selected based on the phoneme information. A speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, wherein the phoneme duration used during synthesis is the first sampling period and A voice synthesizing method characterized by determining based on a second sample period different from the first sample period, and converting the synthesized discrete voice signal into an analog voice signal at the second sample period.

35. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme; a voice quality selecting unit for selectively designating a voice quality of a voice to be synthesized; Digital / analog converting means for converting the discrete voice signal synthesized by the voice synthesizing means into an analog voice signal at a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means, The voice synthesizing means is configured to determine the phoneme duration to be used at the time of synthesis according to the voice quality selected and designated by the voice quality selecting means. A speech synthesizer.

36. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. In a speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, a selection specification of a voice quality of a speech to be synthesized is accepted and used at the time of synthesis. The phoneme duration to be determined is determined according to the received voice quality, and the synthesized discrete voice signal is converted into an analog voice signal at a second sampling period determined according to the received voice quality. Speech synthesis method.

37. The voice synthesizing means converts the analog voice signal into the analog voice signal in the first sampling cycle when the digital / analog converting means converts into the analog voice signal in the second sampling cycle. 32. The phoneme duration is determined such that the phoneme duration is (first sampling period / second sampling period) times as long as the phoneme duration when the conversion is performed. Speech synthesizer.

38. In the combining process in the second mode,
The phoneme duration is determined so that the phoneme duration is (first sample cycle / second sample cycle) times the phoneme duration when the conversion into the analog voice signal is performed in the first sample cycle. 33. The speech synthesis method according to claim 32, which is performed.

39. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is expanded or contracted in the time axis direction. A voice synthesizing unit for synthesizing a discrete voice signal by connecting, and a digital / digital converter for converting the discrete voice signal synthesized by the voice synthesizing unit into an analog voice signal at a second sampling period different from the first sampling period. A voice synthesizing device comprising: an analog converting means.

40. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is expanded or contracted in the time axis direction. A voice synthesizing method, characterized by synthesizing discrete voice signals by connecting them, and converting the synthesized discrete voice signals into analog voice signals at a second sampling period different from the first sampling period.

41. The speech synthesizing means, when connecting the selected speech unit, performs the speech unit operation for a period of time determined based on the first sampling period and the second sampling period. 40. The voice synthesizer according to claim 39, wherein the voice synthesizer is connected while expanding and contracting in the axial direction.

42. When connecting the selected speech unit, while expanding or contracting the speech unit in the time axis direction at a degree determined based on the first sampling period and the second sampling period. The voice synthesizing method according to claim 40, characterized in that connection is made.

43. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phoneme information, and the selected speech unit is expanded or contracted in the time axis direction. A voice synthesizing unit for synthesizing discrete voice signals by connecting, a voice quality selecting unit for selectively designating a voice quality of a voice to be synthesized, and a discrete voice signal synthesized by the voice synthesizing unit being selectively specified by the voice quality selecting unit. A digital / analog conversion unit for converting into an analog voice signal at a second sampling period determined according to the selected voice quality, wherein the voice synthesis unit selects and specifies the selected voice unit by the voice quality selection unit. A voice synthesizing device configured to connect while expanding and contracting in a time axis direction at a degree determined according to a voice quality.

44. A speech unit produced by selecting discrete speech signals sampled at a first sampling period based on given phonological information, and connecting the selected speech units to obtain a discrete speech signal. In the voice synthesis method for synthesizing the voice signal and converting the synthesized discrete voice signal into an analog voice signal, the selection designation of the voice quality of the voice to be synthesized is accepted, and when connecting the selected voice unit, The speech units are connected while expanding and contracting in the time axis direction at a degree determined according to the received voice quality, and when converting the synthesized discrete voice signal into an analog voice signal, the voice unit is determined according to the received voice quality. The second
A method for synthesizing speech, which comprises converting into an analog speech signal at a sampling period of.

45. The voice synthesizing means, when connecting the selected voice unit, has a size of the first voice unit (first sampling period / second sampling period) times in the time axis direction. 40. The voice synthesizing apparatus according to claim 39, wherein the voice synthesizing device is connected while being expanded and contracted.

46. When connecting the selected speech unit, the speech unit is connected while expanding and contracting in the time axis direction by a size of (first sampling period / second sampling period) times. 41. The speech synthesis method according to claim 40.

47. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames A sequence generating means, and a synthesizing means for synthesizing a discrete audio signal in a second frame cycle different from the first frame cycle from the time series of the synthetic parameter frames generated by the synthetic parameter frame time series generating means, A digitizer for converting the discrete voice signal synthesized by the synthesizer into an analog voice signal at a second sampling period different from the first sampling period. And a voice / analog conversion means.

48. A discrete voice signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of voice characteristic parameter frames obtained by analyzing the discrete voice signal by applying a time window of a first frame period. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. A discrete speech signal is synthesized in a second frame period different from the first frame period from a time series of the generated synthesis parameter frame, and the synthesized discrete speech signal is combined in a second frame period different from the first sampling period. A speech synthesis method characterized by converting into an analog speech signal at a sampling period.

49. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by analyzing the discrete speech signal by applying a time window of a first frame period. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames From the sequence generation means and the time series of the synthesis parameter frames generated by the synthesis parameter frame time series generation means, the first
A frame period, a first sampling period, and a synthesizing unit for synthesizing a discrete audio signal in a second frame period determined based on a second sampling period different from the first sampling period; A voice synthesizing apparatus, comprising: a digital / analog converting means for converting the synthesized discrete voice signal into an analog voice signal at the second sampling period.

50. A discrete voice signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of a voice characteristic parameter frame obtained by analyzing the discrete voice signal by applying a time window of a first frame period. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. Discrete in a second frame period determined based on the first frame period, the first sample period, and a second sample period different from the first sample period from the generated time series of the combined parameter frames. A voice synthesizing method comprising synthesizing a voice signal and converting the synthesized discrete voice signal into an analog voice signal at the second sampling period.

51. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of characteristic parameter frames of speech obtained by applying a time window of a first frame period to analysis. A speech unit accumulating unit for accumulating a plurality of speech units, a voice quality selecting unit for selectively designating a voice quality of a voice to be synthesized, and selecting the speech unit based on input phoneme information from the speech unit accumulating unit. A synthesized parameter frame time series generating means for connecting and generating a time series of the synthesized parameter frame, and the voice quality selecting means selectively designates from the time series of the synthesized parameter frames generated by this synthetic parameter frame time series generating means. Synthesizing means for synthesizing the discrete speech signal in the second frame period determined according to the voice quality, and the discrete speech synthesized by the synthesizing means. Speech synthesis apparatus characterized by comprising a digital / analog converting means for converting an analog audio signal at a second sample period is determined according to the voice quality selected and designated by the voice selecting unit No..

52. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. In a voice synthesis method for synthesizing a discrete voice signal from a time series of generated synthesis parameter frames, a selection process of a voice quality of a voice to be synthesized is accepted, and a synthesis process for synthesizing a discrete voice signal from the time series of the synthesis parameter frame is performed. A second frame period determined according to the received voice quality, and the synthesized discrete voice signal is determined according to the received voice quality. A method for synthesizing speech, comprising converting into an analog speech signal at a second sampling period.

53. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by analysis by applying a time window of a first frame period. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames From the sequence generation means and the time series of the synthesis parameter frames generated by the synthesis parameter frame time series generation means, the first
Frame period, the first sampling period, and a second sampling period different from the first sampling period, the second frame period = the first frame period × the first sampling period / the second sampling period. A synthesizing means for synthesizing the discrete speech signal in the second frame period defined by the sampling period, and a digital / analog converting means for transforming the discrete speech signal synthesized by the synthesizing means into an analog speech signal in the second sampling period. A speech synthesis apparatus comprising:

54. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of a characteristic parameter frame of speech obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. From the time series of the generated combined parameter frame, the second frame period = first frame period based on the first frame period, the first sample period, and the second sample period different from the first sample period. Frame frequency × first sampling period / second sampling period, the discrete speech signal is synthesized in a second frame period, and the synthesized discrete speech signal is converted into an analog speech signal in the second sampling period. Speech synthesis method characterized by conversion.

55. Firstly, using prosody information and phonological information as input
A voice unit accumulating unit that stores a plurality of voice units created from discrete voice signals sampled at the sampling period, a pitch pattern generating unit that generates a fundamental frequency pattern of voice from the prosody information, and a phonological unit based on the phonological unit information. Phoneme parameter generating means for generating a phoneme parameter of the voice by selectively reading out and connecting the phoneme pieces from the phoneme element accumulating means, the phoneme parameter generated by the phoneme parameter generating means, and the pitch pattern generating means. A synthesizing means for synthesizing the discrete voice signal from the fundamental frequency pattern generated by the synthesizing means, and a digital / analog converting means for converting the discrete voice signal synthesized by the synthesizing means into an analog voice signal. The transformation may be the first sampling period or a first sampling period different from the first sampling period. Digital / analog conversion means capable of selectively performing two sampling periods, and the pitch pattern generation means converts the analog / speech signal into analog audio signals at the second sampling period by the digital / analog conversion means. When the conversion is performed, the basic frequency pattern is generated so that the basic frequency pattern is different from that when the conversion into the analog audio signal is performed in the first sampling period. A speech synthesizer.

56. Firstly, by inputting prosodic information and phonological information
A plurality of speech units created from discrete speech signals sampled at the sampling period of are accumulated in the speech unit accumulating unit, and a fundamental frequency pattern of speech is generated from the prosody information,
Generates phoneme parameters of the voice by selectively reading and connecting the phonemes from the phoneme storage unit based on the phoneme information, synthesizes a discrete voice signal from the generated phoneme parameters and the fundamental frequency pattern, In the voice synthesizing method of converting the synthesized discrete voice signal into an analog voice signal, a second mode different from the first mode in which the conversion into the analog voice signal is performed in the first sampling period and the first sampling period is different.
And a second mode performed at a sampling period of 1, and the fundamental frequency pattern used in the synthesis process in the second mode is different from the fundamental frequency pattern used in the synthesis process in the first mode. A speech synthesis method characterized in that a fundamental frequency pattern is generated.

57. First input with prosody information and phonological information
Based on the first sampling period and a second sampling period different from the first sampling period, the speech unit accumulating unit storing a plurality of speech units created from discrete speech signals sampled at the sampling period of And a pitch pattern generating means for generating a fundamental frequency pattern of a voice from the prosody information, and a phoneme parameter of a voice by selectively reading and connecting a voice element from the voice element accumulating means based on the phoneme information. Phonological parameter generating means, synthesizing means for synthesizing a discrete speech signal from the phonological parameter generated by the phonological parameter generating means and the fundamental frequency pattern generated by the pitch pattern generating means, and the discrete synthesizing means synthesized by the synthesizing means. Digital / analog conversion means for converting an audio signal into an analog audio signal at the second sampling period A speech synthesis apparatus comprising:

58. Firstly, by inputting prosodic information and phonological information
A plurality of speech units created from discrete speech signals sampled at the sampling period of (1) are accumulated in the speech unit accumulating unit, and the first sampling period and the second sampling period different from the first sampling period. Based on the prosody information to generate a fundamental frequency pattern of the voice, to generate a phoneme parameter of the voice by selectively reading and connecting the voice unit from the voice unit storage means based on the phoneme information, A voice synthesizing method comprising synthesizing a discrete voice signal from the generated phoneme parameter and a fundamental frequency pattern, and converting the synthesized discrete voice signal into an analog voice signal at the second sampling period.

59. Firstly, using prosody information and phonological information as input
The voice unit storing means for storing a plurality of voice units created from the discrete voice signals sampled at the sampling period, the voice quality selecting unit for selectively specifying the voice quality of the synthesized voice, and the voice quality selecting unit A pitch pattern generating means for generating a fundamental frequency pattern of voice from the prosody information according to the voice quality, and a voice by selectively reading and connecting a voice element from the voice element accumulating means based on the phonological information. And a synthesizing means for synthesizing a discrete voice signal from the phonological parameter generated by the phonological parameter generating means and the fundamental frequency pattern generated by the pitch pattern generating means. The discrete voice signal synthesized by the voice quality selecting means is adapted to the voice quality selected and designated by the voice quality selecting means. Speech synthesis apparatus characterized by comprising a digital / analog converting means for converting an analog audio signal at a second sample period defined Te.

60. Firstly, using prosody information and phonological information as input
A plurality of speech units created from discrete speech signals sampled at the sampling period of are accumulated in the speech unit accumulating unit, and a fundamental frequency pattern of speech is generated from the prosody information,
A voice that synthesizes a discrete voice signal from the generated phoneme parameter and the fundamental frequency pattern by generating a phoneme parameter of the voice by selectively reading out and connecting the phoneme unit from the voice unit storage unit based on the phoneme information. In the synthesis method, the selection and designation of the voice quality of the voice to be synthesized is accepted, the fundamental frequency pattern used in the synthesis process is generated from the prosody information according to the accepted voice quality, and the synthesized discrete voice signal is accepted. A voice synthesizing method, characterized in that the voice signal is converted into an analog voice signal at a second sampling period determined according to the voice quality.

61. The pitch pattern generation means, when the digital / analog conversion means performs conversion into an analog audio signal at the second sampling period, the analog sound signal at the first sampling period. 56. A fundamental frequency pattern having a fundamental frequency multiplied by (second sampling period / first sampling period) is generated with respect to a fundamental frequency pattern generated when the conversion to the is performed. Speech synthesizer.

62. A fundamental frequency pattern used in the synthesis process in the second mode, wherein the fundamental frequency is the first frequency
57. The voice synthesizing method according to claim 56, wherein the fundamental frequency pattern used for the synthesizing process in the mode is generated so as to be (second sampling period / first sampling period) times the fundamental frequency.

63. Accumulating means for accumulating voice characteristic parameters obtained by analyzing a discrete voice signal sampled at a first sampling period, and a discrete voice signal using the voice characteristic parameters read from the storage means as an input. And a sampling period converting unit for converting the sampling period of the discrete speech signal synthesized by the combining unit into a second sampling period different from the first sampling period, and the sampling period converting unit. A voice synthesizing apparatus comprising: a digital / analog conversion means for converting a discrete voice signal whose sample period has been converted into an analog voice signal at a third sample period different from the second sample period.

64. A voice characteristic parameter obtained by analyzing a discrete voice signal sampled at a first sampling period is stored in a storage means, and the voice characteristic parameter read from the storage means is input as a discrete signal. In a speech synthesis method for synthesizing a speech signal, a sampling period of the synthesized discrete speech signal is converted into a second sampling period different from the first sampling period, and after the sampling period conversion into the second sampling period. The method of synthesizing speech, wherein the discrete speech signal of is converted into an analog speech signal at a third sampling period different from the second sampling period.

65. Accumulating means for accumulating voice characteristic parameters obtained by analyzing a discrete speech signal sampled at a first sampling period, and a discrete speech signal using the speech characteristic parameters read from the accumulating means as an input. Synthesizing means, a voice quality selecting means for selectively designating the voice quality of the voice to be synthesized, and a sampling period of the discrete speech signal synthesized by the synthesizing means according to the voice quality selected and designated by the voice quality selecting means. Sampling period conversion means for converting to a predetermined second sampling period, and the discrete speech signal whose sampling period has been converted by this sampling period conversion device are different from the second sampling period (however, the second sampling period is (Except when it matches the first sampling period)
A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

66. A voice characteristic parameter obtained by analyzing a discrete voice signal sampled at a first sampling period is stored in a storage means, and the voice feature parameter read from the storage means is input as a discrete signal. In a voice synthesizing method for synthesizing a voice signal, a selection and designation of a voice quality of a voice to be synthesized is accepted, and a sample period of the synthesized discrete voice signal is converted into a second sample period determined according to the accepted voice quality. , The discrete speech signal after the conversion of the sampling period into the second sampling period is different from the second sampling period (except when the second sampling period matches the first sampling period) A method for synthesizing speech, which comprises converting into an analog speech signal at a sampling period of.

67. From a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period to the first frame period. A synthesizing means for synthesizing the discrete speech signal in a second frame period different from the above, and a sampling period of the discrete speech signal synthesized by the synthesizing means into a second sampling period different from the first sampling period. Sample period conversion means and digital / analog conversion means for converting the discrete audio signal whose sample cycle is converted by the sample cycle conversion means into an analog audio signal at a third sample cycle different from the second sample cycle. A voice synthesizing device comprising.

68. From a time series of characteristic parameter frames of speech obtained by analyzing a discrete speech signal sampled at a first sampling period by applying a time window at a first frame period, the first frame period A discrete voice signal is synthesized in a second frame period different from, and a sample period of the discrete voice signal synthesized in the second frame period is converted into a second sample period different from the first sample period, A voice synthesizing method characterized in that the discrete voice signal after the conversion of the sample period into the second sample period is converted into an analog voice signal at a third sample period different from the second sample period.

69. From a time series of speech characteristic parameter frames obtained by analyzing a discrete speech signal sampled at a first sampling period by applying a time window at a first frame period to the first frame period. , A second sampling period different from the first sampling period and a third sampling period different from the second sampling period
And a synthesizing unit for synthesizing the discrete speech signal in the second frame period determined based on the sampling period of, and a sampling period conversion for transforming the sampling period of the discrete speech signal synthesized by the synthesizing unit into the second sampling period. A voice synthesizing apparatus comprising: means and a digital / analog conversion means for converting the discrete voice signal whose sample period is converted by the sample period converter into an analog voice signal at the third sample period.

70. From the time series of characteristic parameter frames of speech obtained by analyzing a discrete speech signal sampled at a first sampling period by applying a time window at a first frame period to the first frame period. A second sampling period different from the first sampling period and a third sampling period different from the second sampling period
A discrete speech signal is synthesized at a second frame period determined based on the sampling period of, and the sampling period of the discrete speech signal synthesized at the second frame period is converted to the second sampling period. The voice synthesis method, comprising: converting the discrete voice signal after the conversion of the sample period to the sample period into the analog voice signal at the third sample period.

71. Accumulating means for accumulating a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period, and combining. Voice quality selection means for selectively designating the voice quality of the voice to be played, and a second frame cycle determined according to the voice quality selected and designated by the voice quality selection means from the time series of the characteristic parameter frames read from the storage means. A synthesizing unit for synthesizing the discrete speech signal, and a sampling period conversion for transforming the sampling period of the discrete speech signal synthesized by the synthesizing unit into a second sampling period determined according to the voice quality selected and designated by the voice quality selecting unit. Means and the discrete speech signal whose sampling period is converted by the sampling period converting means is different from the second sampling period (where the second sampling period is the first sampling period). Except when it matches the sampling period of
A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

72. A time series of voice characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period to a storage unit. In a speech synthesis method for synthesizing a discrete speech signal from a time series of characteristic parameter frames read from the storage means, a selection specification of a voice quality of speech to be synthesized is accepted, and a discrete speech signal is synthesized from a time series of the characteristic parameter frame. The processing is performed in the second frame cycle determined according to the received voice quality, and the sampling cycle of the discrete voice signal synthesized in the second frame cycle is determined in the second sample cycle determined according to the received voice quality. And the discrete speech signal after the conversion of the sampling period into the second sampling period is different from the second sampling period (where the second sampling period is the first sampling period). Speech synthesis method characterized by converting the excluding) analog audio signal at a third sample periods if it matches the sample period.

73. From a time series of speech characteristic parameter frames obtained by analyzing a discrete voice signal sampled at a first sampling period by applying a time window at a first frame period, the first frame period A second sampling period different from the first sampling period and a third sampling period different from the second sampling period
Second frame period = first frame period × second sample period / third sample period based on the sample period of Sampling period converting means for converting the sampling period of the discrete speech signal synthesized by the synthesizing means into the second sampling period, and the discrete speech signal having the sampling period transformed by the sampling period converting means for the third sampling period. And a digital / analog conversion means for converting into an analog voice signal according to 1.

74. From a time series of speech characteristic parameter frames obtained by analyzing a discrete speech signal sampled in a first sampling period by applying a time window in a first frame period, the first frame period , A second sampling period different from the first sampling period and a third sampling period different from the second sampling period
2nd frame period = first frame period × second sample period / third sample period based on the sample period of The sampling period of the discrete speech signal synthesized in the frame period is converted into the second sampling period, and the discrete speech signal after the sampling period conversion into the second sampling period is converted into the analog speech signal in the third sampling period. A speech synthesis method characterized by conversion.

75. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. Synthesizing means for synthesizing a discrete speech signal from the characteristic parameter read out from the characteristic parameter accumulating means and the fundamental frequency pattern read out from the pitch pattern accumulating means, and the sampling period of the discrete speech signal synthesized by the synthesizing means is converted. And a second sampling period conversion means for converting the first sampling period or a second sampling period different from the first sampling period.
To a sampling period, and a discrete speech signal whose sampling period has been converted by the sampling period converting unit is a third sample different from the second sampling period. A digital / analog conversion means for converting into an analog voice signal at a cycle, and a fundamental frequency read from the pitch pattern accumulating means when the sampling cycle conversion means performs the sampling cycle conversion to the second sampling cycle. A voice synthesizing apparatus comprising: a pitch pattern generating means for generating a fundamental frequency pattern different from a pattern and giving it to the synthesizing means.

76. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in a characteristic parameter accumulating means, and a fundamental frequency pattern of speech is accumulated in a pitch pattern accumulating means. A voice synthesizing method for synthesizing discrete voice signals from the characteristic parameters read from the feature parameter storage means and the fundamental frequency pattern read from the pitch pattern storage means, wherein the sampling period of the synthesized discrete voice signals is A first mode in which the first sampling period is maintained and a second mode in which the second sampling period is different from the first sampling period are prepared, and the pitch pattern is set in the second mode. A fundamental frequency pattern different from the fundamental frequency pattern read from the storage means is generated and used for the synthesis of the discrete speech signal. , The synthesized sampling period of the discrete speech signal is converted into the second sampling period, and the discrete speech signal after the sampling period conversion into the second sampling period is converted into a third sampling period different from the second sampling period. A voice synthesis method characterized by converting to an analog voice signal at a sampling period.

77. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. Pitch pattern modulating means for modulating the fundamental frequency pattern read from the pitch pattern accumulating means based on a second sampling cycle different from the first sampling cycle and a third sampling cycle different from the second sampling cycle. A synthesizing means for synthesizing a discrete speech signal from the characteristic parameter read from the characteristic parameter accumulating means and the fundamental frequency pattern modulated by the pitch pattern modulating means, and a sample of the discrete speech signal synthesized by this synthesizing means. A sampling cycle converting means for converting the cycle into the second sampling cycle, and the sampling cycle converting means Speech synthesis apparatus characterized by the period; and a digital / analog converting means for converting the discrete audio signal converted into the third analog audio signal at a sampling period of the.

78. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is accumulated in a characteristic parameter accumulating means, and a fundamental frequency pattern of speech is accumulated in a pitch pattern accumulating means. The basic frequency pattern read out from the pitch pattern storage means is modulated based on a second sampling period different from the first sampling period and a third sampling period different from the second sampling period. , A discrete speech signal is synthesized from the characteristic parameter read from the characteristic parameter accumulating means and the pitch-modulated fundamental frequency pattern, and a sampling period of the synthesized discrete speech signal is converted into the second sampling period. A sound characterized by converting a discrete speech signal after conversion of a sampling period into two sampling periods into an analog speech signal at the third sampling period. Synthetic methods.

79. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. Voice quality selection means for selectively designating the voice quality of the voice to be synthesized, and pitch pattern modulation for performing modulation according to the voice quality selected and designated by the voice quality selection means for the fundamental frequency pattern read from the pitch pattern storage means. Means, a synthesizing means for synthesizing a discrete speech signal from the characteristic parameter read from the characteristic parameter accumulating means and the fundamental frequency pattern modulated by the pitch pattern modulating means, and a synthesizing means for synthesizing the discrete speech signal by the synthesizing means. A second sampling period is determined according to the voice quality selected and designated by the voice quality selection means. Sampling period conversion means for converting into a sampling period, and the discrete speech signal whose sampling period has been converted by this sampling period converting means is different from the second sampling period (however, the second sampling period is the first sampling period). Except when it matches
A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

80. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is stored in a characteristic parameter storage means, and a fundamental frequency pattern of the speech is stored in a pitch pattern storage means. A voice synthesis method for synthesizing discrete voice signals based on the characteristic parameters read out from the feature parameter storage means and the fundamental frequency pattern read out from the pitch pattern storage means. The target frequency pattern read from the pitch pattern accumulating means, performs modulation according to the accepted voice quality, and discrete from the characteristic parameter read from the characteristic parameter accumulating means and the modulated fundamental frequency pattern. A sample of the synthesized discrete speech signal Period is converted into a second sampling period which is determined according to the received voice quality, and the discrete speech signal after the sampling period conversion to this second sampling period is different from the second sampling period (however, (Except when the second sampling period matches the first sampling period) A voice synthesis method characterized by converting to an analog voice signal at the third sampling period.

81. A characteristic parameter accumulating means for accumulating characteristic parameters of speech obtained by analyzing a discrete speech signal sampled at a first sampling period, and a pitch pattern accumulating means for accumulating a fundamental frequency pattern of speech. The fundamental frequency pattern read from the pitch pattern accumulating unit is based on a second sampling period different from the first sampling period and a third sampling period different from the second sampling period (third sampling period). (Sampling period / second sampling period) times, a pitch pattern modulating unit, a characteristic parameter read from the characteristic parameter accumulating unit, and a fundamental frequency pattern modulated by the pitch pattern modulating unit to synthesize a discrete audio signal. Synthesizing means for converting the sampling period of the discrete speech signal synthesized by the synthesizing means into the second sampling period. Means a speech synthesis device is sampled periodically by the sample periods converter characterized by comprising a digital / analog converting means for converting the analog audio signal a discrete audio signal converted by the third sample periods.

82. A characteristic parameter of speech obtained by analyzing a discrete speech signal sampled at a first sampling period is stored in a characteristic parameter storage means, and a fundamental frequency pattern of the speech is stored in a pitch pattern storage means. The basic frequency pattern that has been accumulated and read out from the pitch pattern accumulating unit has a second sampling period different from the first sampling period and a third sampling period different from the second sampling period. (Third sampling period / second sampling period) times, a discrete audio signal is synthesized from the characteristic parameter read from the characteristic parameter accumulating means and the modulated fundamental frequency pattern, and the synthesized discrete The sampling period of the audio signal is converted into the second sampling period, and the discrete audio signal after the conversion of the sampling period into the second sampling period is analyzed in the third sampling period. Speech synthesis method characterized by converting the audio signal.

83. A speech segment created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information and connected to synthesize a discrete speech signal, and the synthesized speech signal is synthesized. A voice synthesizing unit for converting a sample period of the voice signal into a second sample period different from the first sample period, and a discrete voice signal synthesized by the voice synthesizing unit and having the sample period converted into the second sample period. A voice synthesizing apparatus comprising: a digital / analog converting means for converting into an analog voice signal at a third sampling period different from the sampling period.

84. A voice unit produced from a discrete voice signal sampled at a first sampling period is selected and connected based on given phoneme information to synthesize a discrete voice signal, and the synthesized discrete voice signal. The sampling period of the audio signal is converted into a second sampling period different from the first sampling period, and the discrete audio signal after the sampling period conversion into the second sampling period is different from the second sampling period. A voice synthesizing method characterized by converting into an analog voice signal in a third sampling period.

85. A voice quality selecting means for selectively specifying a voice quality of a voice to be synthesized, and a voice unit created from a discrete voice signal sampled at a first sampling period are selected based on given phoneme information. A voice synthesis means for synthesizing the discrete voice signals by connecting them to each other, wherein the sample period of the synthesized discrete voice signals is converted into a second sample period determined according to the voice quality selected and designated by the voice quality selection means. And the discrete speech signal output from the speech synthesis means is different from the second sampling period (where the second sampling period is the first sampling period).
(Excluding the case where the sampling period is equal to the sampling period), the speech synthesizing device comprising: a digital / analog converting means for converting into an analog speech signal at a third sampling period.

86. A voice synthesizing method for synthesizing a discrete voice signal by selecting and connecting a voice unit created from a discrete voice signal sampled at a first sampling period based on given phoneme information, The selection and designation of the voice quality of the voice to be synthesized is accepted, the sampling period of the synthesized discrete voice signal is converted into the second sampling period determined according to the accepted voice quality, and the sampling to the second sampling period is performed. The discrete voice signal after the period conversion is converted into an analog voice signal at a third sampling period which is different from the second sampling period (except when the second sampling period matches the first sampling period). A speech synthesis method characterized by the above.

87. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A voice synthesizing means for synthesizing a discrete voice signal by connecting in accordance with a voice speed parameter relating to a speed or a voice time, wherein a sampling period of the synthesized discrete voice signal is different from the first sampling period. A voice synthesizing unit capable of being converted into two sample periods, and a digital unit for converting the discrete voice signal output from the voice synthesizing unit into an analog voice signal at a third sample period different from the second sample period. / Analog conversion means, the speech synthesis means, when performing the synthesis process when performing the sampling period conversion to the second sampling period, a value different from that when the conversion of the sampling period is not performed. Speech synthesis apparatus characterized by being configured to use the serial speech rate parameter.

88. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A speech synthesis method for synthesizing a discrete speech signal by connecting according to a speech rate parameter related to a speech rate or a speech time, wherein a sampling period of the synthesized discrete speech signal is kept at the first sampling period. And a second mode in which a second sampling period different from the first sampling period is prepared, and in the second mode, the speech rate parameter having a value different from that in the first mode. Is used to synthesize the discrete speech signal, the sampling period of the synthesized discrete speech signal is converted into the second sampling period, and the discrete speech signal after the sampling period conversion to the second sampling period is converted into the second sampling period. First Speech synthesis method characterized in that of the sampling period into an analog audio signal at a different third sample periods.

89. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is spoken of a speech to be synthesized. A voice synthesizing means for synthesizing a discrete voice signal by connecting according to a voice speed parameter relating to a speed or a voice time, wherein a sampling period of the synthesized discrete voice signal is different from the first sampling period. A voice synthesizing unit capable of being converted into two sample periods, and a digital unit for converting the discrete voice signal output from the voice synthesizing unit into an analog voice signal at a third sample period different from the second sample period. / Analog conversion means, and the voice synthesis means is configured to determine the speech rate parameter to be used based on the second sampling period and the third sampling period. Speech synthesis apparatus characterized by being.

90. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is uttered in a speech to be synthesized. A speech synthesis method for synthesizing a discrete speech signal by connecting according to a speech rate parameter related to a speech rate or a speech time, wherein a sampling period of the synthesized discrete speech signal remains the first sampling period. And a second mode that is a second sampling period different from the first sampling period, and in the second mode, the speech rate parameter to be used is set to the second sampling period and the second sampling period. After deciding on the basis of a third sampling period different from the second sampling period to synthesize the discrete speech signal, the sampling period of the synthesized discrete speech signal is converted to the second sampling period, Speech synthesis method characterized by converting the analog audio signal a discrete audio signal after the sampling cycle conversion into 2 sample periods in the third sample periods.

91. A voice quality selection means for selectively designating a voice quality of a voice to be synthesized, and a voice unit created from a discrete voice signal sampled at a first sampling period are selected based on given phoneme information. Then, a voice synthesizing means for synthesizing a discrete voice signal by connecting the selected voice unit according to a utterance speed parameter related to a utterance speed or a utterance time of a voice to be synthesized, wherein the synthesized discrete A voice synthesizing unit for converting a sample period of the voice signal into a second sample period determined according to the voice quality selected and designated by the voice quality selecting unit, and a discrete voice signal output from the voice synthesizing unit for the second voice Different from the sampling period (however, the second sampling period is the first
A digital / analog conversion unit for converting into an analog voice signal at a third sample period), the voice synthesizing unit determines the speech rate parameter to be used by the voice quality selecting unit. A voice synthesizing device, characterized in that it is configured to decide according to a voice quality selected and designated.

92. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phoneme information, and the selected speech unit is spoken of a speech to be synthesized. In a speech synthesis method for synthesizing a discrete speech signal by connecting according to a speech rate parameter related to a speech rate or a speech time, a selection and designation of a voice quality of speech to be synthesized is accepted, and the speech rate parameter to be used is accepted. And synthesizes the discrete voice signal by converting the sample period of the synthesized discrete voice signal into a second sample period determined according to the received voice quality. A third sample period, which is different from the second sample period (except when the second sample period matches the first sample period); A voice synthesis method characterized by converting into an analog voice signal during a period.

93. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme, wherein a sampling period of the synthesized discrete voice signal is the first A voice synthesizer capable of converting into a second sample period different from the sample period, and a discrete voice signal output from the voice synthesizer with a third sample period different from the second sample period. A digital / analog converting means for converting into a voice signal, wherein the voice synthesizing means performs sampling cycle conversion to the second sampling cycle, Speech synthesis apparatus characterized by being configured to make a determination of the phoneme duration so as to be different phoneme duration.

94. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, wherein the sampling period of the synthesized discrete speech signal is the first sampling period. A first mode to be left as it is and a second mode to be a second sampling period different from the first sampling period are prepared, and when the second mode is different from the first mode. The phoneme duration is determined so that it becomes a phoneme duration and is used in the synthesis of the discrete speech signal, the sampling period of the synthesized discrete speech signal is converted into the second sampling period, and the second sampling period Mark to Speech synthesis method characterized by converting the analog audio signal at a different third sample periods and a discrete audio signal after cycle conversion the second sample period.

95. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme, wherein the sampling period of the synthesized discrete voice signal is the first A voice synthesizer capable of converting into a second sample period different from the sample period, and a discrete voice signal outputted from the voice synthesizer with a third sample period different from the second sample period. A digital / analog converting means for converting into a voice signal, wherein the voice synthesizing means determines the phoneme duration to be used at the time of synthesizing based on the second sampling period and the third sampling period. Speech synthesis apparatus characterized by being configured to.

96. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. In the speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, the phonological duration used during synthesis is defined as the first sampling period. Is determined based on a different second sampling period and a third sampling period different from the second sampling period, and the sampling period of the synthesized discrete audio signal is converted into the second sampling period, A voice synthesizing method, characterized in that a discrete speech signal after conversion of a sampling period into a second sampling period is converted into an analog speech signal at the third sampling period.

97. A voice quality selection means for selectively designating a voice quality of a voice to be synthesized, a duration of each phoneme included in given phoneme information is determined, and a discrete voice sampled at a first sampling period. Select a speech unit created from a signal based on the phonological information,
A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the duration of each of the determined phonemes, the sampling period of the synthesized discrete voice signal being determined by the voice quality selecting unit. A voice synthesizing means for converting into a second sampling period determined according to the selected and designated voice quality, and a discrete voice signal outputted from this voice synthesizing means are different from the second sampling period (however, the second sampling period is different). First sampling period
(Excluding the case where it matches the sampling period of 1), the digital / analog converting means for converting into an analog voice signal at a third sampling period, wherein the voice synthesizing means selects the phoneme duration to be used at the time of synthesizing the voice quality. A voice synthesizing apparatus, characterized in that it is configured to decide according to the voice quality selected and designated by the means.

98. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. In a speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, the selection of the voice quality of the speech to be synthesized is accepted and used. After the phoneme duration is determined according to the accepted voice quality and the discrete voice signal is synthesized, the sampling period of the synthesized discrete voice signal is converted into a second sampling period determined according to the accepted voice quality. , The discrete speech signal after the conversion of the sampling period into the second sampling period is different from the second sampling period (except when the second sampling period matches the first sampling period) A method for synthesizing speech, which comprises converting into an analog speech signal at a sampling period of.

99. The duration of each phoneme included in the given phoneme information is determined, while a phoneme segment created from a discrete audio signal sampled at a first sampling period is selected based on the phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected speech units based on the determined duration of each phoneme, wherein a sampling period of the synthesized discrete voice signal is the first A voice synthesizer capable of converting into a second sample period different from the sample period, and a discrete voice signal output from the voice synthesizer with a third sample period different from the second sample period. A digital / analog converting means for converting into a voice signal, wherein the voice synthesizing means performs a synthesizing process at the time of performing the sample period conversion to the second sample period when the sample period is not converted. It is determined in the phoneme duration (second sample period /
A speech synthesizer characterized by being configured to use a phoneme duration that is three times the third sampling period).

100. The duration of each phoneme included in the given phoneme information is determined, while the phoneme segment created from the discrete audio signal sampled at the first sampling period is selected based on the phoneme information. A speech synthesis method for synthesizing a discrete speech signal by connecting the selected speech units based on the determined duration of each phoneme, wherein the sampling period of the synthesized discrete speech signal is the first sampling period. A first mode to be left as it is and a second mode to be a second sampling period different from the first sampling period are prepared, and in the second mode, the second sampling period and the second sampling period are set. A third sampling period different from the second sampling period is used, and the phoneme duration (second sampling period / third sampling period) determined in the first mode is used.
The phoneme duration is determined so as to be doubled and used for the synthesis of the discrete speech signal, the sampling period of the synthesized discrete speech signal is converted into the second sampling period, and the sampling period of the second sampling period is changed to the second sampling period. A voice synthesizing method comprising converting a discrete voice signal after sample period conversion into an analog voice signal at the third sample period.

101. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is expanded or contracted in the time axis direction. A voice synthesizing means for synthesizing a discrete voice signal by connecting the voice synthesizer capable of converting a sample period of the synthesized discrete voice signal into a second sample period different from the first sample period. Means for converting the discrete voice signal output from the voice synthesizer into an analog voice signal at a third sampling period different from the second sampling period. Speech synthesizer.

102. A speech unit created from a discrete speech signal sampled at a first sampling period is selected based on given phonological information, and the selected speech unit is expanded or contracted in the time axis direction. The discrete speech signal is synthesized by connecting, the sampling period of the synthesized discrete speech signal is converted into a second sampling period different from the first sampling period, and the sampling period is converted into the second sampling period. A voice synthesizing method, characterized in that the subsequent discrete voice signal is converted into an analog voice signal at a third sampling period different from the second sampling period.

103. The voice synthesizing means, when connecting the selected speech unit, performs the speech unit operation for a period of time determined based on the second sampling period and the third sampling period. The voice synthesis device according to claim 101, wherein the voice synthesis device is connected while expanding and contracting in the axial direction.

104. When connecting the selected speech unit, the speech unit is connected to the second sampling period and the third sampling period.
102. The connection is made while expanding and contracting in the time axis direction at a degree determined based on the sampling cycle of.
The described speech synthesis method.

105. A voice quality selection means for selectively designating a voice quality of a voice to be synthesized, and a voice unit created from a discrete voice signal sampled at a first sampling period based on given phoneme information. A voice synthesizing unit for synthesizing a discrete voice signal by connecting the selected voice unit while expanding and contracting in the time axis direction at a degree determined according to the voice quality selected and designated by the voice quality selecting unit, A voice synthesizing unit for converting the sample period of the synthesized discrete voice signal into a second sample period determined according to the voice quality selected and designated by the voice quality selecting unit, and a discrete voice signal output from the voice synthesizing unit. Different from the second sampling period (where the second sampling period is the first
(Excluding the case where the sampling period is equal to the sampling period), the speech synthesizing device comprising: a digital / analog converting means for converting into an analog speech signal at a third sampling period.

106. A discrete speech signal is selected by selecting a speech unit created from a discrete speech signal sampled at a first sampling period based on given phonological information and connecting the selected speech units. In the voice synthesis method for synthesizing the voice signal and converting the synthesized discrete voice signal into an analog voice signal, the selection designation of the voice quality of the voice to be synthesized is accepted, and when connecting the selected voice unit, The speech units are connected while expanding and contracting in the time axis direction at a degree determined according to the received voice quality, and when the discrete voice signal is synthesized, the sampling period of the discrete voice signal is changed according to the received voice quality. Is converted to a second sampling period determined by the above, and the discrete speech signal after the conversion of the sampling period into the second sampling period is different from the second sampling period (where the second sampling period is the first sampling period). Speech synthesis method characterized by converting the excluding) analog audio signal at a third sample periods if it matches the sample period.

107. The speech synthesis means, when connecting the selected speech unit, has a size (second sampling period / third sampling period) times as large as the speech unit in the time axis direction. The voice synthesizing device according to claim 103, wherein the voice synthesizing device is connected while being expanded and contracted.

108. When the selected speech unit is connected, the speech unit is connected in the time axis direction (second sampling period /
105. The speech synthesis method according to claim 104, wherein connection is performed while expanding and contracting by a size equal to a third sampling period).

109. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by analysis by applying a time window of a first frame period. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames A sequence generating means, and a synthesizing means for synthesizing a discrete audio signal in a second frame cycle different from the first frame cycle from the time series of the synthetic parameter frames generated by the synthetic parameter frame time series generating means, A sampling period converter for converting the sampling period of the discrete speech signal synthesized by the synthesizing unit into a second sampling period different from the first sampling period. And a digital / analog converting means for converting the discrete voice signal whose sample period is converted by the sample period converting means into an analog voice signal at a third sample period different from the second sample period. A speech synthesizer characterized by.

110. A discrete speech signal sampled at a first sampling period is clipped in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. A discrete speech signal is synthesized in a second frame period different from the first frame period from the time series of the generated synthesis parameter frames, and the sampling period of the discrete speech signal synthesized in the second frame period is the first frame period. To a second sample period different from the second sample period, and the discrete speech signal after the sample period conversion to the second sample period is analyzed by a third sample period different from the second sample period. Speech synthesis method characterized by converting the audio signal.

111. A discrete voice signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of a voice characteristic parameter frame obtained by analyzing the discrete voice signal by applying a time window of a first frame period. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames From the sequence generation means and the time series of the synthesis parameter frames generated by the synthesis parameter frame time series generation means, the first
Of the first sampling period, a second sampling period different from the first sampling period, and a third frame period different from the second sampling period to synthesize a discrete audio signal in a second frame period. Synthesizing means, sampling period converting means for converting the sampling period of the discrete speech signal synthesized by the synthesizing means into the second sampling period, and the discrete speech signal having the sampling period converted by the sampling period converting means A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

112. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. From the time series of the generated synthesis parameter frame, the first frame period, the second sampling period different from the first sampling period, and the third sampling period different from the second sampling period.
A discrete speech signal is synthesized at a second frame period determined based on the sampling period of, and the sampling period of the discrete speech signal synthesized at the second frame period is converted to the second sampling period. The voice synthesis method, comprising: converting the discrete voice signal after the conversion of the sample period to the sample period into the analog voice signal at the third sample period.

113. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A speech unit accumulating unit for accumulating a plurality of speech units, a voice quality selecting unit for selectively designating a voice quality of a voice to be synthesized, and selecting the speech unit from the speech unit accumulating unit based on input phoneme information. A synthesized parameter frame time series generation means for connecting and generating a time series of synthesized parameter frames, and a time series of synthesized parameter frames generated by the synthesized parameter frame time series generation means are selected and designated by the voice quality selection means. Synthesizing means for synthesizing the discrete speech signal in the second frame period determined according to the voice quality, and the discrete sound synthesized by the synthesizing means. Sampling cycle conversion means for converting the sampling cycle of the signal into a second sampling cycle determined according to the voice quality selected and designated by the voice quality selecting means, and a discrete speech signal having a sampling cycle converted by the sampling cycle conversion means. Different from the second sampling period (except when the second sampling period matches the first sampling period)
A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

114. A discrete speech signal sampled at a first sampling period is cut out in a predetermined synthesis unit from a time series of a characteristic parameter frame of speech obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. In a voice synthesis method for synthesizing a discrete voice signal from a time series of generated synthesis parameter frames, a selection process of a voice quality of a voice to be synthesized is accepted, and a synthesis process for synthesizing a discrete voice signal from the time series of the synthesis parameter frame is performed. , The second frame period determined according to the accepted voice quality, and the sampling period of the synthesized discrete speech signal is changed to the accepted voice quality. According to the second sampling cycle, and the discrete speech signal after the sampling cycle conversion into the second sampling cycle is different from the second sampling cycle (where the second sampling cycle is the first sampling cycle). (Excluding the case where the sampling period is equal to the sampling period) is converted into an analog speech signal at the third sampling period.

115. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A speech unit accumulating means for accumulating a plurality of speech units, and a synthesis parameter frame for selecting and connecting the speech units based on the input phoneme information from the speech unit accumulation means to generate a time series of synthesis parameter frames From the sequence generation means and the time series of the synthesis parameter frames generated by the synthesis parameter frame time series generation means, the first
Second sampling period different from the second sampling period and a third sampling period different from the second sampling period, the second frame period = the first frame period × the second sampling period / the third sampling period. A synthesizing means for synthesizing the discrete speech signal in the second frame cycle defined by the sampling period of And a digital / analog converting means for converting the discrete voice signal whose sample period is converted by the sample period converting means into an analog voice signal at the third sample period.

116. A discrete speech signal sampled at a first sampling period is extracted in a predetermined synthesis unit from a time series of speech characteristic parameter frames obtained by applying a time window of a first frame period to analysis. A plurality of voice units are stored in the voice unit storage unit, and the voice units are selected from the voice unit storage unit based on the input phoneme information and connected to generate a time series of synthesis parameter frames. From the time series of the generated combined parameter frame, the second frame period = the second frame period = the second sample period different from the first sample period and the third sample period different from the second sample period. 1 frame period × second sample period / third sample period, the discrete voice signal is synthesized in a second frame period, and the sample period of the discrete voice signal synthesized in this second frame period is Speech synthesis method characterized by the converted into sample periods, to convert the discrete audio signal after the sample periods conversion to the second sample period into an analog audio signal at the third sampling period.

117. A speech unit accumulating means for accumulating a plurality of speech units created from discrete speech signals sampled at a first sampling period by inputting prosody information and phonological information, and a fundamental frequency of speech from the prosody information. A pitch pattern generating means for generating a pattern; a phonological parameter generating means for generating a phonological parameter of a voice by selectively reading out and connecting a speech element from the speech element accumulating means based on the phonological information; For synthesizing a discrete speech signal from the phoneme parameters generated by the parameter generating means and the fundamental frequency pattern generated by the pitch pattern generating means, and for converting the sampling period of the discrete speech signal synthesized by this synthesizing means And the first sampling period or the first sampling period. Become the second
Sampling period conversion means capable of selectively performing sampling period conversion to the sampling period of, and a discrete speech signal whose sampling period has been converted by the sampling period conversion means, which is different from the second sampling period. And a digital / analog conversion means for converting into an analog audio signal at a sampling period of, the pitch pattern generating means when the sampling period converting means converts the sampling period into the second sampling period. Is configured to generate the fundamental frequency pattern so as to have a fundamental frequency pattern different from that when the sampling period conversion to the first sampling period is performed.

118. A plurality of speech units created from discrete speech signals sampled at a first sampling period with prosody information and phonological information as inputs are accumulated in a speech unit accumulating unit, and the prosody information is used to generate speech. A basic frequency pattern is generated, and a phoneme parameter of the voice is generated by selectively reading out and connecting the phoneme unit from the phoneme unit accumulating unit based on the phoneme information, and the generated phoneme parameter and the basic frequency pattern. In a speech synthesis method for synthesizing a discrete speech signal from a speech signal and converting the synthesized discrete speech signal to an analog speech signal, a first mode in which the sampling period of the synthesized discrete speech signal remains the first sampling period And a second mode in which a second sampling period different from the first sampling period is prepared, and in the second mode, in the first mode, Generate the fundamental frequency patterns so as to have different fundamental frequency patterns and use the synthesized discrete speech signals for synthesis, convert the sampling period of the synthesized discrete speech signals into the second sampling period, A voice synthesizing method, characterized in that a discrete voice signal after conversion of a sample period into a period is converted into an analog voice signal at a third sample period different from the second sample period.

119. A speech segment accumulating means for accumulating a plurality of speech segments created from discrete speech signals sampled at a first sampling period with prosody information and phonological information as inputs, and the first sampling period. Pitch pattern generating means for generating a fundamental frequency pattern of speech from the prosody information based on a different second sampling cycle and a third sampling cycle different from the second sampling cycle; and the speech based on the phonological information. A phoneme parameter generating means for generating a phoneme parameter of the voice by selectively reading out and connecting the phoneme pieces from the phoneme accumulating means, a phoneme parameter generated by the phoneme parameter generating means and the pitch pattern generating means. A synthesizing means for synthesizing the discrete speech signal from the fundamental frequency pattern, and the discrete speech signal synthesized by this synthesizing means. Sampling period conversion means for converting a sampling period into the second sampling period, and digital / analog for converting the discrete speech signal whose sampling period is converted by the sampling period conversion means into an analog speech signal in the third sampling period. A speech synthesis apparatus comprising: a conversion unit.

120. A plurality of speech units created from a discrete speech signal sampled at a first sampling period by inputting prosodic information and phonological information are accumulated in a speech unit accumulating unit, and the first sampling period is stored. Generate a fundamental frequency pattern of speech from the prosody information based on a second sampling period different from the above and a third sampling period different from the second sampling period, and based on the phonological information, the speech unit. The phoneme parameters of the voice are generated by selectively reading and connecting the voice units from the storage means, and the discrete voice signal is synthesized from the generated phoneme parameter and the fundamental frequency pattern, and the sampling period of the synthesized discrete voice signal is It is characterized in that it is converted into the second sampling cycle, and the discrete speech signal after the sampling cycle conversion into the second sampling cycle is converted into an analog speech signal in the third sampling cycle. Speech synthesis method to be.

121. A speech unit accumulating unit for accumulating a plurality of speech units created from a discrete speech signal sampled at a first sampling period by inputting prosodic information and phonological information, and selecting and designating a voice quality of speech to be synthesized. Voice quality selection means, a pitch pattern generation means for generating a fundamental frequency pattern of a voice from the prosody information according to the voice quality selected and designated by the voice quality selection means, and the voice segment accumulation based on the phonological information. A phoneme parameter generating means for generating a phoneme parameter of the voice by selectively reading out and connecting a voice unit from the means, a phoneme parameter generated by the phoneme parameter generating means, and a fundamental frequency generated by the pitch pattern generating means. A synthesizing means for synthesizing a discrete speech signal from a pattern, and a discrete speech synthesized by this synthesizing means Sampling period converting means for converting the sampling period of the signal into a second sampling period determined according to the voice quality selected and designated by the voice quality selecting means, and a discrete speech signal whose sampling period is converted by the sampling period converting means. Different from the second sampling period (except when the second sampling period matches the first sampling period)
A voice synthesizer, comprising: a digital / analog conversion means for converting into an analog voice signal at a third sampling period.

122. A plurality of speech units created from a discrete speech signal sampled at a first sampling period by inputting prosody information and phonological information are accumulated in a speech unit accumulating means, and the speech is converted from the prosody information. A basic frequency pattern is generated, and a phoneme parameter of the voice is generated by selectively reading out and connecting the phoneme unit from the phoneme unit accumulating unit based on the phoneme information, and the generated phoneme parameter and the basic frequency pattern. In a speech synthesis method for synthesizing a discrete speech signal from, the selection and designation of the voice quality of the speech to be synthesized is accepted, the fundamental frequency pattern used in the synthesis process is generated from the prosody information according to the accepted voice quality, and the synthesis is performed. The sample period of the discrete voice signal is converted into a second sample period determined according to the received voice quality, and the second sample period Of the discrete speech signal after the conversion of the sampling period into the period is different from the second sampling period (except when the second sampling period matches the first sampling period) A speech synthesis method characterized by converting into a signal.

123. The pitch pattern generation means performs the sampling cycle conversion to the first sampling cycle when the sampling cycle conversion means performs the sampling cycle conversion to the second sampling cycle. The speech synthesizer according to claim 117, wherein a fundamental frequency pattern whose fundamental frequency is (third sampling period / second sampling period) times that of the fundamental frequency pattern generated when the speech synthesis is performed is generated.

124. The fundamental frequency pattern used in the synthesis process in the second mode has a fundamental frequency of the fundamental frequency of the fundamental frequency pattern used in the synthesis process in the first mode (third sampling period). 120./The second sampling period).

125. Accumulating means for accumulating voice characteristic parameters, synthesizing means for synthesizing a discrete speech signal with the speech characteristic parameters read from the accumulating means as an input, and the discrete speech signal synthesized by this synthesizing means. A digital / analog converting means for converting into an analog audio signal, a sampling period of the discrete audio signal inputted to the digital / analog converting means, and a digital / analog converting means for converting the discrete audio signal into an analog audio signal. A speech synthesizer comprising: a control unit that controls the conversion period so that it is different from the conversion period.

126. Characteristic parameters of speech are accumulated in accumulating means, discrete characteristic speech signals are synthesized using the characteristic parameters of speech read out from the accumulating means as inputs, and the synthesized discrete speech signals are converted into analog speech signals. In the speech synthesizing method, the sampling period of the discrete voice signal to be converted into the analog voice signal is different from the conversion period when the discrete voice signal is converted into the analog voice signal. Speech synthesis method.

127. Accumulating means for accumulating voice characteristic parameters, synthesizing means for synthesizing discrete speech signals with the speech characteristic parameters read from the accumulating means as inputs, and the discrete speech signals synthesized by the synthesizing means. Digital / analog conversion means for converting into an analog voice signal, voice quality selection means for selectively designating voice quality of voice to be synthesized, sampling period of discrete voice signal inputted to the digital / analog conversion means and the discrete voice signal And a control means for varying a ratio with a conversion period when the digital-to-analog converting means converts into an analog voice signal according to the voice quality selected and designated by the voice quality selecting means. Synthesizer.

128. A voice characteristic parameter is accumulated in a storage means, a voice characteristic parameter read from the storage means is input to synthesize a discrete voice signal, and the synthesized discrete voice signal is converted into an analog voice signal. In the speech synthesis method described above, the selection and designation of the voice quality of the speech to be synthesized is accepted, and the sampling period of the discrete speech signal to be converted into the analog speech signal and the conversion of the discrete speech signal into the analog speech signal are performed. A voice synthesizing method, characterized in that a ratio to a conversion cycle is varied according to the received voice quality.

129. Storage means for accumulating pre-created speech units, and synthesis means for synthesizing discrete speech signals by selecting and connecting the speech units from the storage means based on given phoneme information. A digital / analog converting means for converting the discrete audio signal synthesized by the synthesizing means into an analog audio signal, a sampling period of the discrete audio signal inputted to the digital / analog converting means, and the discrete audio signal for the digital / analog converting means. A voice synthesizer, comprising: a control unit that controls so that a conversion cycle when the analog conversion unit converts the analog voice signal is different.

130. A pre-created speech unit is stored in a storage unit, and a speech unit is selected from the storage unit based on given phoneme information and connected to synthesize a discrete speech signal, In a voice synthesis method for converting the synthesized discrete voice signal into an analog voice signal, a sampling period of the discrete voice signal to be converted into the analog voice signal and a conversion for converting the discrete voice signal into the analog voice signal A speech synthesis method characterized in that the period is different.

131. An accumulating means for accumulating a speech element prepared in advance, and a synthesizing element for synthesizing a discrete speech signal by selecting and connecting the speech element from the accumulating element on the basis of given phoneme information. A digital / analog converting means for converting the discrete voice signal synthesized by the synthesizing means into an analog voice signal, a voice quality selecting means for selectively designating a voice quality of voice to be synthesized, and a digital / analog converting means for inputting the voice quality. The ratio of the sampling period of the discrete voice signal to the conversion period when the discrete voice signal is converted into the analog voice signal by the digital / analog converting means is changed according to the voice quality selected and designated by the voice quality selecting means. A voice synthesizer comprising: a control unit.

132. Preliminarily produced speech units are stored in a storage unit, and the speech units are selected and connected from the storage unit based on given phoneme information to synthesize a discrete speech signal, In the voice synthesis method for converting the synthesized discrete voice signal into an analog voice signal, the selection and designation of the voice quality of the voice to be synthesized is accepted, and the sampling period of the discrete voice signal to be converted into the analog voice signal and A voice synthesizing method, characterized in that a ratio of a conversion period when a discrete voice signal is converted into an analog voice signal is varied according to the received voice quality.