JPH10301599A

JPH10301599A - Voice synthesizer

Info

Publication number: JPH10301599A
Application number: JP9112642A
Authority: JP
Inventors: Atsushi Wakao; 淳若尾
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-04-30
Filing date: 1997-04-30
Publication date: 1998-11-13

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizer capable of synthesizing a synthetic voice of various voice tones. SOLUTION: This device is provided with a waveform storage part 3 for storing elemental waveforms as a minimum unit of synthesis so that a waveform is one pitch for a voice sound and a waveform is one phoneme for a non-voice sound, a waveform conversion part 2 converting the elemental waveform stored in the waveform storage part 3 and a synthetic part 1 editing the elemental waveform converted by the waveform conversion part 2 and generating a synthetic sound. Since respective elemental waveforms are converted by the waveform conversion part 2 and the synthetic voice is generated by using the converted elemental waveforms, the voice tone of the synthetic sound is independently revised a pitch and an uttering speed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、波形編集方式の音
声合成装置に関し、特に、様々な声質の合成音声を生成
することができる音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer of a waveform editing system, and more particularly to a speech synthesizer capable of generating synthesized speech having various voice qualities.

【０００２】[0002]

【従来の技術】テキスト音声合成における合成音の品質
は、年々向上している。それに伴い、利用者は、「多彩
な声質」の合成音声を求め始めている。人間の声は、声
帯等の音源で発生した音が声道を通過する際に、様々な
音色（スペクトル包絡）を付与されることで生じる。声
質は、このスペクトル包絡によって決定される。したが
って、このスペクトル包絡を変更することで、声質を変
更することが可能である。2. Description of the Related Art The quality of synthesized speech in text-to-speech synthesis is improving year by year. Along with this, users have begun to seek synthesized voices of “various voice qualities”. A human voice is generated by giving various timbres (spectral envelopes) when a sound generated from a sound source such as a vocal cord passes through a vocal tract. Voice quality is determined by this spectral envelope. Therefore, by changing the spectrum envelope, it is possible to change the voice quality.

【０００３】一般に、音声合成装置は、波形編集方式と
パラメータ分析合成方式の二つに大別される。[0003] In general, speech synthesizers are broadly classified into two types: a waveform editing system and a parameter analysis / synthesis system.

【０００４】パラメータ分析合成方式においては、スペ
クトル包絡はモデル化され、ホルマントなどの声質に影
響の大きいパラメータとして扱われる。このため、この
ようなパラメータを変更することによって、声質を容易
に変更できる。パラメータ分析合成方式における声質変
換法は、例えば、特開平８−８３０９８号公報、特開平
８−２４８９９４号公報、特開昭５７−１０５７９６号
公報、特開昭５６−０７２５００号公報、特開昭５８−
１６８０９８号公報等にて開示されている。しかし、今
のところ、パラメータ分析合成方式は、波形編集方式と
比較して音質が悪いため、実用的ではない。In the parameter analysis / synthesis method, a spectral envelope is modeled and treated as a parameter having a large influence on voice quality such as a formant. Therefore, the voice quality can be easily changed by changing such parameters. The voice quality conversion method in the parameter analysis / synthesis method is described in, for example, JP-A-8-83098, JP-A-8-248994, JP-A-57-105796, JP-A-56-072,500, and JP-A-58 −
It is disclosed in, for example, Japanese Patent No. 168098. At present, however, the parameter analysis / synthesis method is not practical because the sound quality is lower than that of the waveform editing method.

【０００５】これに対し、比較的音質の良い波形編集方
式においては、スペクトル包絡は、モデル化されず、音
声波形の状態で扱われる。このため、波形編集方式にお
いては、声質の変更は容易でない。[0005] On the other hand, in a waveform editing method having relatively good sound quality, the spectral envelope is not modeled and is handled in the state of a sound waveform. Therefore, it is not easy to change the voice quality in the waveform editing method.

【０００６】波形編集方式音声合成方式におけるスペク
トル包絡の変更方法は、例えば、特開平８−１５２９０
０号公報にて開示されている。図２０は、この公報にて
開示されている音声合成装置の構成を示すブロック図で
ある。図２０を参照して、入力テキストが合成部９０１
に入力されると、波形記憶部９０３から適切な波形を選
択、編集し、合成音を生成する。この合成音を波形伸縮
部９０２において時間伸縮（サンプリング周波数変換）
することで、合成音のスペクトル包絡が変化する。これ
により、声質を変化させる。図２１は、有声音の伸縮の
様子を示した例であり、（Ａ）は伸縮前の波形、（Ｂ）
は伸縮後の波形、（Ｃ）は伸縮前のスペクトル包絡、
（Ｄ）は伸縮後のスペクトル包絡である。有声音は、図
２１の（Ａ）のように、複数の１ピッチ波形の並んだも
のとして構成される。その間隔（中心間の距離）がピッ
チ周期（周波数の逆数）、全体の長さが発声時間、１ピ
ッチ波形をフーリエ変換したものがスペクトル包絡
（Ｃ）である。波形の時間伸縮を行うと、そのスペクト
ルも伸縮される。例えば、図２１の例では、波形が伸張
されることにより、伸縮前のスペクトル包絡（Ｄ）は、
伸縮後のスペクトル包絡（Ｃ）と比べ、低周波数方向に
縮退している。このようにスペクトル包絡が変化するこ
とにより、声質は変更される。しかし、伸縮を行うこと
により、そのピッチ、発声時間も変化する。図２１にお
いて、（Ａ）と（Ｂ）とを比較すると、（Ｂ）の伸縮後
の波形は、ピッチ周期、発声時間とも倍の長さとなって
しまう。A method of changing a spectrum envelope in a waveform editing system and a speech synthesis system is disclosed in, for example, Japanese Patent Application Laid-Open No. 8-15290.
No. 0 discloses this. FIG. 20 is a block diagram showing the configuration of the speech synthesizer disclosed in this publication. With reference to FIG.
, An appropriate waveform is selected from the waveform storage unit 903, edited, and a synthesized sound is generated. This synthesized sound is subjected to time expansion / contraction (sampling frequency conversion) in the waveform expansion / contraction unit 902.
By doing so, the spectral envelope of the synthesized sound changes. Thereby, the voice quality is changed. 21A and 21B are examples showing the expansion and contraction of a voiced sound, where FIG. 21A shows a waveform before expansion and contraction, and FIG.
Is the waveform after stretching, (C) is the spectrum envelope before stretching,
(D) is the spectrum envelope after expansion and contraction. The voiced sound is configured as a plurality of one-pitch waveforms arranged as shown in FIG. The interval (distance between centers) is the pitch period (reciprocal of the frequency), the total length is the utterance time, and the one obtained by Fourier transforming the one-pitch waveform is the spectral envelope (C). When the waveform is expanded or contracted over time, its spectrum is also expanded or contracted. For example, in the example of FIG. 21, the spectrum envelope (D) before expansion and contraction is obtained by expanding the waveform.
Compared to the spectral envelope after expansion and contraction (C), the spectrum is degenerated in the low frequency direction. By changing the spectral envelope in this way, the voice quality is changed. However, by performing expansion and contraction, the pitch and utterance time also change. In FIG. 21, when (A) and (B) are compared, the expanded and contracted waveform of (B) has twice the pitch period and the utterance time.

【０００７】これに対処するため、波形を編集する際、
即ち、図２１における（Ａ）の段階でピッチ周期、発声
時間を通常の半分とする必要がある。一般に、波形編集
後に何らかの変換を行う場合は、スペクトル包絡だけで
なく、ピッチや発声時間にも影響を与える。波形編集時
には、この影響を考慮する必要がある。To cope with this, when editing a waveform,
That is, it is necessary to reduce the pitch cycle and the utterance time to half of the normal time at the stage (A) in FIG. In general, when any conversion is performed after waveform editing, it affects not only the spectral envelope but also the pitch and utterance time. When editing the waveform, it is necessary to consider this effect.

【０００８】[0008]

【発明が解決しようとする課題】図２０に示した手法で
は、伸縮に合わせて、波形編集時に発声速度やピッチを
変更するための余分な計算が必要となる。また、合成音
声のスペクトル包絡は、音韻によらず一律な規則で変更
されている。このような変更は、声道の形状に対応して
いると考えられる。しかし、人間の音声は、声帯等の音
源で生成される音が声道を通る際に音色（スペクトル包
絡）を付与されて生成する。したがって、スペクトル包
絡は、音源で発生した音が通過した声道の形状によって
決定される。しかし、音源の位置は音韻により異なる。
例えば、母音の場合には音源は喉の奥の声帯であるが、
摩擦音“ｓ”の場合には舌を固い上蓋に近づけ、ここを
音源としている。つまり、声道の形状が同じように変化
しても、音源から発生した音が声道を通る距離等は異な
るため、声道の形状の変化の影響は音韻によって異な
る。したがって、主に音源から発生した音が通過する声
道の形状によって決まるスペクトル包絡の変化は、音韻
により異なるべきである。また、人間の音声には、ホル
マントと呼ばれるスペクトル包絡のピークが複数存在す
る。このようなホルマントは、声質に大きな影響を及ぼ
す。このため、多彩な声質を得るためには、複数のホル
マントをそれぞれ操作できることが望まれる。しかし、
図２１における（Ｃ）のスペクトル包絡と（Ｄ）のスペ
クトル包絡とを比較すると、（Ｃ）の２つのホルマント
（スペクトル包絡のピーク）は、（Ｄ）において、いず
れも半分の周波数の位置にシフトしている。The technique shown in FIG. 20 requires extra calculations for changing the utterance speed and pitch during waveform editing in accordance with expansion and contraction. The spectral envelope of the synthesized speech is changed according to a uniform rule regardless of the phoneme. Such changes are believed to correspond to the shape of the vocal tract. However, human voice is generated by adding a timbre (spectral envelope) when a sound generated by a sound source such as a vocal cord passes through a vocal tract. Therefore, the spectral envelope is determined by the shape of the vocal tract through which the sound generated by the sound source has passed. However, the position of the sound source differs depending on the phoneme.
For example, in the case of vowels, the sound source is the vocal cords behind the throat,
In the case of the frictional sound "s", the tongue is brought close to the hard upper lid, and this is used as the sound source. In other words, even if the shape of the vocal tract changes in the same way, the distance of the sound generated from the sound source passing through the vocal tract and the like differ, so that the influence of the change in the shape of the vocal tract differs depending on the phoneme. Therefore, the change of the spectral envelope mainly determined by the shape of the vocal tract through which the sound generated from the sound source passes should be different depending on the phoneme. Also, human speech has a plurality of peaks of a spectral envelope called formants. Such formants have a significant effect on voice quality. For this reason, in order to obtain various voice qualities, it is desired that a plurality of formants can be operated. But,
Comparing the spectral envelope of (C) and the spectral envelope of (D) in FIG. 21, the two formants (peaks of the spectral envelope) in (C) are shifted to half frequency positions in (D). doing.

【０００９】本発明の課題は、多彩な声質の合成音声を
合成できる音声合成装置を提供することである。An object of the present invention is to provide a speech synthesizer capable of synthesizing a synthesized speech having various voice qualities.

【００１０】[0010]

【課題を解決するための手段】本発明の第１の音声合成
装置は、合成の最小単位となる素片波形（有声：１ピッ
チ毎、無声：音素毎）を記憶する波形記憶部と前記波形
記憶部に記憶された素片波形を変換する波形変換部と前
記波形変換部で変換された素片波形を編集して合成音を
生成する合成部とを備え、波形変換部において各素片波
形を変換し、変換された素片波形を用いて合成音声を生
成することで合成音の声質をピッチ、発声長と独立に変
更することを特徴とする。According to a first aspect of the present invention, there is provided a voice synthesizer for storing a unit waveform (voiced: every pitch, unvoiced: each phoneme) as a minimum unit of synthesis, and the waveform. A waveform conversion unit configured to convert the unit waveform stored in the storage unit; and a synthesis unit configured to edit the unit waveform converted by the waveform conversion unit to generate a synthesized sound. Is converted, and the voice quality of the synthesized sound is changed independently of the pitch and the utterance length by generating a synthesized voice using the converted segment waveform.

【００１１】本発明の第２の音声合成装置は、合成の最
小単位となる素片波形（有声：１ピッチ毎、無声：音素
毎）を記憶する波形記憶部と前記波形記憶部に記憶され
た素片波形を変換する波形変換部と前記波形変換部で変
換された素片波形を編集して合成音を生成する合成部と
を備え、前記波形変換部が素片波形を時間伸縮する波形
伸縮部とから構成され、波形伸縮部において各素片波形
を伸縮し、伸縮された素片波形を用いて合成音声を生成
することを特徴とする。The second speech synthesizer of the present invention has a waveform storage unit for storing a unit waveform (voiced: for each pitch, unvoiced: for each phoneme) as a minimum unit of synthesis, and stored in the waveform storage unit. A waveform converter for converting a unit waveform and a synthesizing unit for editing the unit waveform converted by the waveform converter to generate a synthesized sound, wherein the waveform converter expands and contracts the unit waveform over time. And a unit that expands and contracts each unit waveform in the waveform expanding and contracting unit, and generates a synthesized speech using the expanded and contracted unit waveform.

【００１２】本発明の第３の音声合成装置は、合成の最
小単位となる素片波形（有声：ピッチ毎、無声：音素
毎）を記憶する波形記憶部と前記波形記憶部に記憶され
た素片波形を変換する波形変換部と前記波形変換部で変
換された素片波形を編集して合成音を生成する合成部と
を備え、前記波形変換部が素片波形をフーリエ変換する
周波数変換部と前記周波数変換部でフーリエ変換された
素片波形スペクトルの周波数をシフトする周波数シフト
部と前記周波数シフト部で周波数シフトする際に必要な
パラメータを記憶したパラメータ記憶部と前記周波数シ
フト部で周波数シフトされた素片波形スペクトルをフー
リエ逆変換する周波数逆変換部から構成され、周波数シ
フト部において各素片波形スペクトルを周波数シフトす
ることで合成音の声質を変更することを特徴とする。A third speech synthesizer according to the present invention comprises a waveform storage unit for storing a unit waveform (voiced: for each pitch, unvoiced: for each phoneme) as a minimum unit of synthesis, and a source stored in the waveform storage unit. A frequency converting unit for converting a segment waveform and a synthesizing unit for editing a segment waveform converted by the waveform converting unit to generate a synthesized sound, wherein the waveform converting unit performs a Fourier transform on the segment waveform A frequency shifter for shifting the frequency of the unit waveform spectrum Fourier-transformed by the frequency converter, a parameter storage for storing parameters necessary for frequency shifting by the frequency shifter, and a frequency shifter for the frequency shifter. It is composed of a frequency inverse transform unit that performs Fourier inverse transform on the obtained unit waveform spectrum, and the frequency shift unit frequency-shifts each unit waveform spectrum to produce a synthesized voice. And changes the.

【００１３】本発明の第４の音声合成装置は、合成の最
小単位となる素片波形（有声：ピッチ毎、無声：音素
毎）を記憶する波形記憶部と前記波形記憶部に記憶され
た素片波形を変換する波形変換部と前記波形変換部で変
換された素片波形を編集して合成音を生成する合成部と
を備え、前記波形変換部が素片波形をラプラス変換する
周波数変換部と前記周波数変換部でラプラス変換された
素片波形スペクトルの周波数をシフトする周波数シフト
部と前記周波数シフト部で周波数シフトする際に必要な
パラメータを記憶するパラメータ記憶部と前記周波数シ
フト部で周波数シフトされた素片波形スペクトルを逆ラ
プラス変換する周波数逆変換部から構成され、周波数シ
フト部において各素片波形スペクトルを周波数シフトす
ることで合成音の声質を変更することを特徴とする。A fourth speech synthesizer according to the present invention comprises a waveform storage unit for storing a unit waveform (voiced: for each pitch, unvoiced: for each phoneme) as a minimum unit of synthesis, and a source stored in the waveform storage unit. A frequency converting unit for editing a unit waveform converted by the waveform converting unit and generating a synthesized sound by editing the unit waveform converted by the waveform converting unit, wherein the waveform converting unit performs a Laplace conversion of the unit waveform; A frequency shift unit for shifting the frequency of the unit waveform spectrum Laplace transformed by the frequency conversion unit, a parameter storage unit for storing parameters necessary for frequency shifting by the frequency shift unit, and a frequency shift unit for the frequency shift unit. Frequency conversion unit for inverse Laplace transform of the obtained unit waveform spectrum, and the frequency shift unit frequency-shifts each unit waveform spectrum to produce a synthesized voice. And changes the.

【００１４】本発明の第５の音声合成装置は、合成の最
小単位となる素片波形（有声：ピッチ毎、無声：音素
毎）を記憶する波形記憶部と前記波形記憶部に記憶され
た素片波形を変換する波形変換部と前記波形変換部で変
換された素片波形を編集して合成音を生成する合成部と
を備え、前記波形変換部が前記波形記憶部に記憶された
素片波形の標本値列ベクトルに行列を積算する行列演算
部と前記行列演算部において用いる行列を記憶するパラ
メータ記憶部から構成され、素片波形の標本値列ベクト
ルに行列を積算することで第３、第４の発明と等価な変
換を実現することを特徴とする。A fifth speech synthesizing apparatus according to the present invention comprises a waveform storage unit for storing a segment waveform (voiced: for each pitch, unvoiced: for each phoneme) as a minimum unit of synthesis, and a source stored in the waveform storage unit. A waveform conversion unit configured to convert the one-sided waveform, and a synthesis unit configured to edit the unitary waveform converted by the waveform conversion unit to generate a synthesized sound, wherein the waveform conversion unit is stored in the waveform storage unit. A matrix operation unit that integrates a matrix into a sample value column vector of a waveform, and a parameter storage unit that stores a matrix used in the matrix operation unit. It is characterized in that conversion equivalent to that of the fourth invention is realized.

【００１５】本発明の第６の音声合成装置は、第１から
第５のいずれかの発明の音声合成装置において素片波形
をいくつかのグループに分け、このグループ毎に異なる
パラメータで変換をすることを特徴とする。According to a sixth aspect of the present invention, there is provided a speech synthesizer according to any one of the first to fifth aspects, wherein the unit waveform is divided into several groups, and conversion is performed using different parameters for each group. It is characterized by the following.

【００１６】本発明の第７の音声合成装置は、第６の発
明の音声合成装置において特に素片波形を有声音と無声
音の二つに分け、有声音のみを変換することを特徴とす
る。A seventh speech synthesizing apparatus according to the present invention is characterized in that, in the speech synthesizing apparatus according to the sixth invention, in particular, the unit waveform is divided into a voiced sound and an unvoiced sound, and only the voiced sound is converted.

【００１７】本発明の第８の音声合成装置は、第１から
第７のいずれかの発明の音声合成装置にさらにパラメー
タの入力部を備え、パラメータを任意に設定できること
を特徴とする。An eighth speech synthesizing apparatus according to the present invention is characterized in that the speech synthesizing apparatus according to any one of the first to seventh aspects further comprises a parameter input section and can set parameters arbitrarily.

【００１８】本発明の第９の音声合成装置は、第一から
第七のいずれかの音声合成装置において、前記波形変換
部が入力テキスト中に明示されたパラメータを用いて素
片波形を変換することを特徴とする。According to a ninth speech synthesizer of the present invention, in any one of the first to seventh speech synthesizers, the waveform converter converts a unit waveform using parameters specified in the input text. It is characterized by the following.

【００１９】本発明の第１０の音声合成装置は、第１か
ら第９のいずれかの発明の音声合成装置において前記波
形変換部で変換された素片波形を記憶する波形記憶部を
備え前記波形記憶部に記憶された素片波形を用いて合成
音声を生成することを特徴とする。According to a tenth speech synthesizer of the present invention, the speech synthesizer according to any one of the first to ninth aspects further comprises a waveform storage unit for storing the unit waveform converted by the waveform conversion unit. A synthesized speech is generated using the unit waveform stored in the storage unit.

【００２０】本発明の第１１の音声合成装置は、第１か
ら第９のいずれかの発明の音声合成装置において前記波
形変換部が前記合成部の韻律情報からパラメータを変更
することを特徴とする。An eleventh speech synthesizer according to the present invention is characterized in that, in the speech synthesizer according to any one of the first to ninth aspects, the waveform conversion unit changes parameters from prosody information of the synthesis unit. .

【００２１】[0021]

【発明の実施の形態】以下、図面を参照して、本発明の
実施の形態による音声合成装置を説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech synthesizing apparatus according to an embodiment of the present invention will be described below with reference to the drawings.

【００２２】［実施の形態１］図１は、本発明の実施の
形態１の構成を示すブロック図である。図１を参照し
て、本発明の実施の形態１は、テキスト等の入力に基づ
いて波形記憶部３から選んだ素片波形を編集して合成音
を生成する合成部１と、素片波形を変換する波形変換部
２と、音素毎の波形と当該波形のラベルを記憶する波形
記憶部３とを有している。[First Embodiment] FIG. 1 is a block diagram showing a configuration of a first embodiment of the present invention. Referring to FIG. 1, a first embodiment of the present invention includes a synthesis unit 1 that edits a segment waveform selected from waveform storage unit 3 based on input of text or the like to generate a synthesized sound, and a segment waveform. And a waveform storage unit 3 for storing a waveform for each phoneme and a label of the waveform.

【００２３】ここで、素片波形とは、有声音の場合は１
ピッチ分の波形、無声音の場合は１音素分の波形を示
す。また、ラベルとは、当該波形がどの音であるかを示
す音素情報や当該波形がどのようなテキストから得られ
たのかを示す音素環境情報等を示す。また、当該音素が
有声音である場合には複数の素片波形から構成されてい
るため、本発明では、これを素片波形毎に分離する必要
がある。このため、有声音波形には、このような分離を
行うためのラベルをも用意される。Here, the unit waveform is 1 for a voiced sound.
It shows a waveform for a pitch, and a waveform for one phoneme for an unvoiced sound. The label indicates phoneme information indicating what sound the waveform is, phoneme environment information indicating what text the waveform was obtained from, and the like. When the phoneme is a voiced sound, the phoneme is composed of a plurality of unit waveforms. Therefore, in the present invention, it is necessary to separate the unit sound for each unit waveform. For this reason, the voiced sound waveform is also provided with a label for performing such separation.

【００２４】図２は、図１の波形変換部２をより詳細に
示すブロック図である。図２を参照して、波形変換部２
は、伸縮率を記憶するパラメータ記憶部２ａと、パラメ
ータ記憶部２ａに記憶された伸縮率を用いて素片波形を
時間伸縮する波形伸縮部２ｂとを備えている。FIG. 2 is a block diagram showing the waveform converter 2 of FIG. 1 in more detail. Referring to FIG. 2, waveform conversion unit 2
Is provided with a parameter storage unit 2a that stores expansion and contraction rates, and a waveform expansion and contraction unit 2b that performs time expansion and contraction on the unit waveform using the expansion and contraction rates stored in the parameter storage unit 2a.

【００２５】次に、図１および図２を参照して、本発明
の実施の形態１の動作について説明する。Next, the operation of the first embodiment of the present invention will be described with reference to FIGS.

【００２６】合成したいテキストあるいは発音記号な
ど、それに準ずる発音情報が合成部１に入力されると、
合成部１は、このテキストを合成するために必要な素片
波形を波形記憶部３から選択する。When phonetic information such as text or phonetic symbols to be synthesized is input to the synthesizing unit 1,
The synthesizing unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3.

【００２７】波形記憶部３は、例えば音素を単位として
区切られた波形と、そのラベルとを記憶している。ラベ
ルは、例えば「ｋｏＮｎ；ｉ；ｃｈｉｗａ」というよう
に与えられる。ここで、最初のセミコロン（；）の前の
「ｋｏＮｎ」は当該波形の直前の音素連鎖、２つのセミ
コロンの間の「ｉ」は当該波形の音素、２つ目のセミコ
ロンの後の「ｃｈｉｗａ」は当該波形の直後の音素連鎖
を示す。The waveform storage unit 3 stores, for example, waveforms delimited by phonemes and their labels. The label is given, for example, “koNn; i; chiwa”. Here, "koNn" before the first semicolon (;) is a phoneme chain immediately before the waveform, "i" between two semicolons is "phoneme" of the waveform, and "chiwa" after the second semicolon. Indicates a phoneme chain immediately after the waveform.

【００２８】また、当該波形が有声音である場合には、
前述のように素片波形毎に分離するためのラベルも持
つ。このラベルは、例えば素片波形の始端と終端の時刻
を「１０；２１；３１；４０；…」という形で与える。
この場合、有声音波形のうち０ｍｓｅｃから１０ｍｓｅ
ｃの部分は最初の１ピッチ波形、１０ｍｓｅｃから２１
ｍｓｅｃは次の１ピッチ波形…であることを示してい
る。When the waveform is a voiced sound,
As described above, it also has a label for separating each unit waveform. This label gives, for example, the start and end times of the unit waveform in the form of "10;21;31;40;...".
In this case, 0 msec to 10 msec of the voiced sound waveform
The part of c is the first one-pitch waveform, 10 msec to 21
msec indicates the next one-pitch waveform...

【００２９】素片波形の選択は、例えば「ａ」が合成に
必要な場合、音素が「ａ」である音素波形を探す。この
候補が複数ある場合は、音素環境情報を用いて合成した
い「ａ」と前後の音素が同じ物を探す。例えば、「ｋａ
ｋｉｇｏｕｒｉ」の「ａ」を合成したい場合、ａの前後
の音素がｋであることを示す「^*ｋ；ａ；ｋ^*」となる
ラベルを持つ波形を探す。ここで、「＊」は任意の音素
系列を示す。さらに候補が複数ある場合は、前後２つと
いうように増やして行く。また、前後どちらかのみ一
致、もしくは完全に一致するものがない場合に備え、同
一音素の音素波形には、予め優先順位を付けておく。ま
た、有声音の音素波形は、選択後、ラベルを用いて素片
波形に分割する。For selecting a unit waveform, for example, when "a" is necessary for synthesis, a phoneme waveform whose phoneme is "a" is searched for. When there are a plurality of candidates, a search is made for the same phoneme as "a" to be synthesized using the phoneme environment information and the same phonemes before and after. For example, "ka
When it is desired to synthesize “a” of “kigouri”, a waveform having a label “ ^* k; a; k ^* ” indicating that the phoneme before and after a is k is searched. Here, “*” indicates an arbitrary phoneme sequence. If there are a plurality of candidates, the number is increased to two before and after. In addition, priorities are assigned in advance to the phoneme waveforms of the same phoneme, in case that there is no match in either the front or the back or there is no match completely. After selection, the voiced waveform of the voiced sound is divided into unit waveforms using labels.

【００３０】選択が行われると、波形伸縮部２ｂは、選
択された素片波形を波形記憶部３から読み出し、パラメ
ータ記憶部２ａに記憶されている伸縮率で時間伸縮す
る。時間伸縮は、例えば、サンプリング周波数変換の手
法で行われ、波形の元の周波数が２０ｋＨｚで伸縮率が
０．５であれば２０ｋＨｚから１０ｋＨｚにサンプリン
グ周波数変換を行う。When the selection is made, the waveform expanding / contracting section 2b reads out the selected segment waveform from the waveform storage section 3 and expands / contracts the time with the expansion / contraction rate stored in the parameter storage section 2a. The time expansion / contraction is performed by, for example, a sampling frequency conversion method. If the original frequency of the waveform is 20 kHz and the expansion / contraction rate is 0.5, the sampling frequency conversion is performed from 20 kHz to 10 kHz.

【００３１】伸縮された素片波形は、合成部１で所望の
ピッチ・発声速度となるように並べられ、合成音声とし
て出力される。The expanded and contracted unit waveforms are arranged in the synthesizing unit 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００３２】具体的には、例えば有声音の場合、図１４
のように波形記憶部３に記憶された素片波形（１ピッチ
分の波形）（Ａ）を波形伸縮部２において時間伸縮し、
素片波形（Ｂ）を生成する。次に、合成部１は、伸縮し
た素片波形（Ｂ）を所望のピッチ周期の間隔で所望の長
さとなるまで並べ、合成音（Ｃ）を得る。一方、無声音
の場合は、例えば、図１６（ａ）および（ｂ）に示すよ
うに、素片波形（１音素分の波形）を伸縮し、伸縮後の
波形が所望の長さとなるように調整する。無声音の場合
の長さの調整法としては、例えば、次のような方法が挙
げられる。図１６（ａ）のように、伸縮後の波形（Ａ）
の長さが所望の長さより短い場合は、伸縮後の波形
（Ａ）からその一部（Ｂ）を抽出し、これを接続するこ
とで望む長さの波形（Ｃ）を得る。また、図１６（ｂ）
のように、伸縮後の波形（Ａ）の長さが所望の長さより
長い場合は、伸縮後の波形（Ａ）からその一部（Ｄ）を
削除することで、望む長さの波形（Ｅ）を得る。Specifically, for example, in the case of a voiced sound, FIG.
The unit waveform (waveform for one pitch) (A) stored in the waveform storage unit 3 as shown in FIG.
A unit waveform (B) is generated. Next, the synthesizing unit 1 arranges the expanded and contracted unit waveforms (B) at a desired pitch cycle interval until a desired length is obtained, thereby obtaining a synthesized sound (C). On the other hand, in the case of an unvoiced sound, for example, as shown in FIGS. 16A and 16B, the unit waveform (waveform for one phoneme) is expanded and contracted so that the expanded and contracted waveform has a desired length. I do. As a method of adjusting the length in the case of an unvoiced sound, for example, the following method can be used. As shown in FIG. 16A, the waveform after expansion and contraction (A)
If the length is shorter than the desired length, a part (B) is extracted from the expanded and contracted waveform (A) and connected to obtain a waveform (C) having a desired length. FIG. 16 (b)
When the length of the expanded / contracted waveform (A) is longer than the desired length, a part (D) of the expanded / contracted waveform (A) is deleted to obtain a desired length of the waveform (E). Get)

【００３３】ここで、比較例として、特開平８−１５２
９００号公報にて開示された従来例では、図１５のよう
に、素片波形（１ピッチ波形）（Ａ）を並べた波形
（Ｄ）を生成し、これを伸縮することで望む声質、ピッ
チ、発声長の合成音（Ｃ）を生成している。しかし、こ
の手法では、（Ｄ）と（Ｃ）とを比較すればわかるよう
に、伸縮によってピッチ周期、発声長が変化する。この
ため、最終的な出力（Ｃ）の段階で望むピッチ周期、発
生長を得るには、素片波形（Ａ）を並べて波形（Ｄ）を
作る段階で伸縮の影響を考慮して波形を並べる必要があ
る。一方、本発明では、図１４のように、素片波形
（Ａ）を予め伸縮し、伸縮された素片波形（Ｂ）を並べ
て合成音（Ｃ）を生成している。このため、素片波形
（Ｂ）を並べて合成音（Ｃ）を作る際に、望むピッチ周
期、発声長となるように並べればよく、余分な計算は必
要ない。Here, as a comparative example, Japanese Patent Application Laid-Open No. 8-152
In the conventional example disclosed in Japanese Patent Publication No. 900, a waveform (D) in which elementary waveforms (one-pitch waveform) (A) are arranged as shown in FIG. , And a synthesized voice (C) having the utterance length. However, in this method, as can be seen by comparing (D) and (C), the pitch period and the utterance length change due to expansion and contraction. For this reason, in order to obtain the desired pitch period and generation length at the final output (C) stage, the waveforms are arranged in consideration of the effect of expansion and contraction at the stage of arranging the segment waveforms (A) and forming the waveform (D). There is a need. On the other hand, in the present invention, as shown in FIG. 14, the unit waveform (A) is expanded and contracted in advance, and the expanded and contracted unit waveform (B) is arranged to generate a synthesized sound (C). Therefore, when the synthesized waveforms (C) are formed by arranging the unit waveforms (B), they may be arranged so as to have a desired pitch period and utterance length, and no extra calculation is required.

【００３４】［実施の形態２］本発明の実施の形態２
は、実施の形態１と基本的に同じ構成である。異なる点
は、波形変換部２の構成ならびに動作である。[Embodiment 2] Embodiment 2 of the present invention
Has basically the same configuration as the first embodiment. The difference is in the configuration and operation of the waveform converter 2.

【００３５】図３は、実施の形態２における図１の波形
変換部２を、より詳細に示すブロック図である。図３を
参照して、波形変換部２は、素片波形をフーリエ変換す
る周波数変換部２ｃと、フーリエ変換された素片波形ス
ペクトルの周波数をシフトする周波数シフト部２ｄと、
周波数シフトの際のパラメータを記憶するパラメータ記
憶部２ａと、周波数シフトされた素片波形スペクトルを
フーリエ逆変換する周波数逆変換部２ｅとを備えてい
る。FIG. 3 is a block diagram showing the waveform converter 2 of FIG. 1 according to the second embodiment in more detail. Referring to FIG. 3, waveform conversion unit 2 includes a frequency conversion unit 2c that performs a Fourier transform on the unit waveform, a frequency shift unit 2d that shifts the frequency of the Fourier-transformed unit waveform spectrum,
A parameter storage unit 2a for storing parameters at the time of frequency shift, and a frequency inverse transform unit 2e for performing Fourier inverse transform of the frequency-shifted unit waveform spectrum are provided.

【００３６】次に、図１および図３を参照して、本発明
の実施の形態２の動作について説明する。Next, the operation of the second embodiment of the present invention will be described with reference to FIGS.

【００３７】合成したいテキストあるいは発音記号など
それに準ずる発音情報が合成部１に入力されると、合成
部１は、このテキストを合成するために必要な素片波形
を波形記憶部３から選択する。選択方法は、実施の形態
１と同様である。選択が行われると、周波数変換部２ｃ
は、選択された素片波形ｆ（ｔ）を読み出してフーリエ
変換を行って素片波形スペクトルＳ（ω）を計算し、周
波数シフト部２ｄに渡す。フーリエ変換は、以下の数式
１で表される。When sound information such as text or phonetic symbols to be synthesized is input to the synthesizing unit 1, the synthesizing unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3. The selection method is the same as in the first embodiment. When the selection is made, the frequency conversion unit 2c
Reads out the selected unit waveform f (t), performs a Fourier transform, calculates a unit waveform spectrum S (ω), and passes it to the frequency shift unit 2d. The Fourier transform is represented by Equation 1 below.

【００３８】[0038]

【数１】周波数シフト部２ｄは、素片波形スペクトルｘ（ω）を
受け取ると、パラメータ記憶部２ａから周波数の対応を
表す関数ｗ（ω）を読み出し、周波数シフトを実行す
る。周波数シフトは、以下の数式２で与えられる。(Equation 1) Upon receiving the unit waveform spectrum x (ω), the frequency shift unit 2d reads a function w (ω) representing the correspondence of the frequency from the parameter storage unit 2a, and executes the frequency shift. The frequency shift is given by Equation 2 below.

【００３９】[0039]

【数２】図１７に、周波数シフトの例を示す。図１７において、
（Ｂ）は元の素片波形スペクトルである。この素片波形
スペクトルを図１７の（Ａ）で表される関数を用いて周
波数シフトすると、例えばω＝２の場合、図１７の
（Ａ）からｗ（２）＝１となるので、ｘ（ｗ（２）＝ｘ
（１）となり、図１７の（Ｂ）のω＝１のホルマントが
ω＝２にシフトし、その結果、図１７の（Ｃ）のような
スペクトルとなる。周波数シフトされた素片波形スペク
トルｘ′Ｓは、周波数逆変換部２ｅに渡される。周波数
逆変換部２ｅは、周波数シフトされた素片波形スペクト
ルｘ′をフーリエ逆変換する。フーリエ逆変換は以下の
数式３で表される。(Equation 2) FIG. 17 shows an example of the frequency shift. In FIG.
(B) is the original unit waveform spectrum. If this unit waveform spectrum is frequency-shifted using the function represented by (A) in FIG. 17, for example, when ω = 2, w (2) = 1 from (A) in FIG. w (2) = x
(1), the formant of ω = 1 in FIG. 17B is shifted to ω = 2, and as a result, a spectrum as shown in FIG. 17C is obtained. The frequency-shifted unit waveform spectrum x'S is passed to the frequency inverse transform unit 2e. The frequency inverse transform unit 2e performs an inverse Fourier transform on the frequency-shifted unit waveform spectrum x '. The inverse Fourier transform is represented by the following Equation 3.

【００４０】[0040]

【数３】フーリエ逆変換された素片波形ｆ（ｔ）は、合成部１で
所望のピッチ・発声速度となるように並べられ、合成音
声として出力される。(Equation 3) The unit waveform f (t) subjected to the inverse Fourier transform is arranged in the synthesizing unit 1 so as to have a desired pitch and utterance speed, and is output as synthesized speech.

【００４１】ここで、比較例の場合は、図２１の（Ｃ）
の変換前のスペクトルの２つのホルマント（スペクトル
のピーク）は、図２１の（Ｄ）の変換後、いずれも半分
の周波数の位置にある。このように従来は２つのホルマ
ントを同じようにしか変化できなかった。これに対し、
本発明の場合、図１７の（Ｂ）のω＝１，３の２つのホ
ルマントは、図１７の（ｃ）ではω＝２，３．５とな
り、各々別の割合で変化させることができる。Here, in the case of the comparative example, FIG.
Both of the two formants (peaks of the spectrum) of the spectrum before the conversion are located at half frequency positions after the conversion of FIG. As described above, conventionally, only two formants can be changed in the same manner. In contrast,
In the case of the present invention, the two formants of ω = 1, 3 in FIG. 17B are ω = 2, 3.5 in FIG. 17C, and can be changed at different ratios.

【００４２】尚、以上の説明では、波形が連続値を取る
ものとして説明したが、離散値を取る場合にも実現でき
る。この場合、フーリエ変換、フーリエ逆変換はそれぞ
れ、離散フーリエ変換、離散フーリエ逆変換に置き換え
る。この場合、変換前の素片波形をＦ［ｎ］、変換前の
素片波形スペクトルをＸ［ｋ］、変換後の素片波形を
Ｆ′［ｎ］、変換前の素片波形スペクトルをＸ′
［ｋ］、周波数シフトを示す関数をＷ［ｋ］とすると、
（１）式、（２）式、（３）式は、以下の数式４および
数式５に置き換えられる。Although the above description has been made on the assumption that the waveform takes a continuous value, the present invention can also be realized in a case where the waveform takes a discrete value. In this case, the Fourier transform and the inverse Fourier transform are replaced by a discrete Fourier transform and a discrete inverse Fourier transform, respectively. In this case, the unit waveform before conversion is F [n], the unit waveform spectrum before conversion is X [k], the unit waveform after conversion is F '[n], and the unit waveform spectrum before conversion is X [k]. ′
[K], and a function indicating the frequency shift is W [k],
Equations (1), (2), and (3) are replaced by Equations 4 and 5 below.

【００４３】[0043]

【数４】 (Equation 4)

【００４４】[0044]

【数５】ここで、Ｎは素片波形のサンプル数、Ｋは素片波形スペ
クトルのサンプル数である。以上の計算は、ＦＦＴによ
って行うことが可能である。(Equation 5) Here, N is the number of samples of the unit waveform, and K is the number of samples of the unit waveform spectrum. The above calculation can be performed by FFT.

【００４５】また、周波数シフトは、２変数の関数ｙ
（ω，ｌ）（離散値の場合はＹ［ｋ，ｈ］）を用いて、
以下の数式６のように表現することもできる。The frequency shift is a function y of two variables.
(Ω, l) (in the case of discrete values, Y [k, h]),
Expression 6 below can also be used.

【００４６】[0046]

【数６】また、フーリエ変換、フーリエ逆変換の代わりに、ラプ
ラス変換、ラプラス逆変換を用いることもできる。この
場合、変換前の素片波形をｆ（ｔ）、変換前の素片波形
スペクトルをｘ（ｓ）、変換後の素片波形をｆ′
（ｔ）、変換前の素片波形スペクトルをｘ′（ｓ）、周
波数シフトを示す関数をｗ（ｓ）とすると（１）式、
（２）式、（３）式は、以下の数式７に置き換えられ
る。(Equation 6) Also, Laplace transform and Laplace inverse transform can be used instead of Fourier transform and Fourier inverse transform. In this case, the unit waveform before conversion is f (t), the unit waveform spectrum before conversion is x (s), and the unit waveform after conversion is f ′.
(T), if the unit waveform spectrum before conversion is x '(s) and the function indicating the frequency shift is w (s), equation (1)
Equations (2) and (3) are replaced by Equation 7 below.

【００４７】[0047]

【数７】ここで、ｓ＝δ＋ｊωである。この場合、周波数シフト
を表す関数Ｗ（ｓ）は複素関数となり、より詳細な変換
を実現できる。(Equation 7) Here, s = δ + jω. In this case, the function W (s) representing the frequency shift is a complex function, and more detailed conversion can be realized.

【００４８】同様に、波形が離散値を取る場合は、ラプ
ラス変換、ラプラス逆変換はそれぞれ、Ｚ変換、逆Ｚ変
換に置き換える。この場合、変換前の素片波形をＦ
［ｎ］、変換前の素片波形スペクトルをＸ［ｚ］、変換
後の素片波形をＦ′［ｎ］、変換前の素片波形スペクト
ルをＸ′［ｚ］、周波数シフトを示す関数をＷ［ｚ］と
すると、（１）式、（２）式、（３）式は、以下の数式
８に置き換えられる。Similarly, when the waveform takes discrete values, Laplace transform and Laplace inverse transform are replaced with Z transform and inverse Z transform, respectively. In this case, the unit waveform before conversion is F
[N], the unit waveform spectrum before conversion is X [z], the unit waveform after conversion is F '[n], the unit waveform spectrum before conversion is X' [z], and the function indicating the frequency shift is Assuming that W [z], Expressions (1), (2), and (3) are replaced by Expression 8 below.

【００４９】[0049]

【数８】［実施の形態３］本発明の実施の形態３は、実施の形態
１と基本的に同じ構成である。異なる点は、波形変換部
２の構成ならびに動作である。(Equation 8) [Third Embodiment] A third embodiment of the present invention has basically the same configuration as the first embodiment. The difference is in the configuration and operation of the waveform converter 2.

【００５０】図４は、実施の形態３における図１の波形
変換部２をより詳細に示すブロック図である。図４を参
照して、波形変換部２は、素片波形の標本値列スペクト
ルに行列を積算する行列演算部２ｆと、行列演算部２ｆ
で積算する行列を記憶するパラメータ記憶部２ａとを備
えている。FIG. 4 is a block diagram showing the waveform converter 2 of FIG. 1 according to the third embodiment in more detail. Referring to FIG. 4, waveform converting section 2 includes a matrix calculating section 2 f for integrating a matrix into a sample value sequence spectrum of a unit waveform, and a matrix calculating section 2 f
And a parameter storage unit 2a for storing a matrix to be integrated in the step (a).

【００５１】次に、図１および図４を参照して、本発明
の実施の形態３の動作について説明する。Next, the operation of the third embodiment of the present invention will be described with reference to FIGS.

【００５２】合成したいテキストあるいは発音記号など
それに準ずる発音情報が合成部１に入力されると、合成
部１は、このテキストを合成するために必要な素片波形
を波形記憶部３から選択する。選択方法は、実施の形態
１と同様である。選択が行われると、行列演算部２ｆ
は、パラメータ記憶部２ａから周波数シフトを表す行列
の要素Ａ［ｉ，ｎ］を読み出して素片波形Ｆ［ｎ］に積
算する。積算された素片波形Ｆ′［ｎ′］は、合成部１
で所望のピッチ・発音速度となるように並べられ、合成
音声として出力される。周波数シフトされた波形Ｆ′
［ｎ′］は、以下の数式９のように表せる。When textual information to be combined, such as text or phonetic symbols, is input to the synthesizing unit 1, the synthesizing unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3. The selection method is the same as in the first embodiment. When the selection is made, the matrix operation unit 2f
Reads out the element A [i, n] of the matrix representing the frequency shift from the parameter storage unit 2a and integrates it into the unit waveform F [n]. The integrated unit waveform F ′ [n ′] is output to the synthesis unit 1
Are arranged so as to have a desired pitch and sound generation speed, and are output as synthesized speech. Frequency-shifted waveform F '
[N '] can be expressed as in the following Expression 9.

【００５３】[0053]

【数９】予めＡ［ｎ′，ｎ］を計算し、行列波形記憶部に記憶す
ることで、合成音生成時の計算量を削減することが可能
である。(Equation 9) By calculating A [n ', n] in advance and storing it in the matrix waveform storage unit, it is possible to reduce the amount of calculation at the time of generating a synthesized sound.

【００５４】特に、Ｗ［ｋ］の解像度はＫによって決ま
るが、式の計算はＫに無関係であるので、Ｋの値を十分
大きくすることができる。In particular, the resolution of W [k] is determined by K, but since the calculation of the equation is independent of K, the value of K can be made sufficiently large.

【００５５】［実施の形態４］本発明の実施の形態４
は、実施の形態１〜３のものと基本的に同じ構成であ
る。異なる点は、波形変換部２において、素片波形をそ
のラベルを用いていくつかのグループに分け、グループ
毎に伸縮率または周波数シフト関数等の変換パラメータ
を割り当てる点である。グループの分け方としては、例
えば音素の種類（母音、摩擦音、破裂音、等）、音素中
の素片波形の位置（有声音の場合）等によって分ける。[Embodiment 4] Embodiment 4 of the present invention
Has basically the same configuration as that of the first to third embodiments. The difference is that the waveform converter 2 divides the unit waveforms into several groups using the labels and assigns a conversion parameter such as a scaling factor or a frequency shift function to each group. The groups are divided according to, for example, the types of phonemes (vowels, fricatives, plosives, etc.), the positions of unit waveforms in the phonemes (for voiced sounds), and the like.

【００５６】比較例の場合、図１５のように、素片波形
（Ａ）を並べた後で変換（伸縮）を行っている。このた
め、例えば波形（Ｃ）の（Ｃ１）と（Ｃ２）とで異なる
変換パラメータ（伸縮率）とすることは、不可能であ
る。これに対し、本発明では、素片波形（Ａ）を変換
（伸縮）する際に、違う変換パラメータ（伸縮率）を用
いて（Ｂ１）、（Ｂ２）を生成すれば、合成音において
（Ｃ１）と（Ｃ２）とで違う変換パラメータ（伸縮率）
とすることができる。したがって、従来と比べ、より変
化に富んだ声質を実現することが可能となる。また、発
明が解決しようとする課題の項で述べたように、声道の
形状の変化の影響は、音韻により異なる。この影響を加
味して音韻毎に変換パラメータ（伸縮率）を設定するこ
とで、より自然な声質変換が可能となる。In the case of the comparative example, conversion (expansion / contraction) is performed after arranging the unit waveforms (A) as shown in FIG. For this reason, for example, it is impossible to set different conversion parameters (expansion and contraction rates) for (C1) and (C2) of the waveform (C). On the other hand, in the present invention, when converting (expanding or contracting) the unit waveform (A), if (B1) and (B2) are generated using different conversion parameters (expansion and contraction rates), (C1) ) And (C2) have different conversion parameters (expansion and contraction rate)
It can be. Therefore, it is possible to realize more varied voice quality as compared with the related art. Further, as described in the section of the problem to be solved by the invention, the influence of the change in the shape of the vocal tract differs depending on the phoneme. By taking into account this effect and setting a conversion parameter (ratio of expansion and contraction) for each phoneme, more natural voice quality conversion becomes possible.

【００５７】次に、図１および図２を参照して、本発明
の実施の形態４の動作について説明する。尚、以下では
実施の形態１に即して説明するが、実施の形態２、３に
即した形での実現も無論可能である。Next, the operation of the fourth embodiment of the present invention will be described with reference to FIGS. In the following, description will be given in accordance with the first embodiment, but it is of course possible to realize the embodiment in accordance with the second and third embodiments.

【００５８】まず、予め、ラベルに応じたグループ分け
を定義する。ここでは、例えばラベルが「ａ」「ｉ」
「ｕ」「ｅ」「ｏ」である場合と、そうでない場合との
２つに分ける。そして、この２つのグループの伸縮率を
１．２、１．１とし、伸縮率をパラメータ記憶部２ａに
記憶する。合成したいテキストあるいは発音記号などそ
れに準ずる発音情報が合成部１に入力されると、合成部
１はこのテキストを合成するために必要な素片波形を波
形記憶部３から選択する。選択方法は、実施の形態１と
同様である。選択が行われると、波形伸縮部２ｂは、選
択された素片波形を波形記憶部３から読み出す。波形伸
縮部２ｂは、読み出された素片波形のラベルからグルー
プ分けを行い、そのグループの伸縮率をパラメータ記憶
部２ａから選択する。先の例では、ラベルが「ａ」
「ｉ」「ｕ」「ｅ」「ｏ」である場合は伸縮率１．２、
それ以外の場合は伸縮率１．１を選択する。波形伸縮部
２ｂは、選択された伸縮率で素片波形を伸縮する。伸縮
された素片波形は合成部１で所望のピッチ・発声速度と
なるように並べられ、合成音声として出力される。First, grouping according to the label is defined in advance. Here, for example, the labels are “a” and “i”
The case is divided into two cases: "u", "e", and "o"; Then, the expansion and contraction ratios of the two groups are set to 1.2 and 1.1, and the expansion and contraction ratios are stored in the parameter storage unit 2a. When sound information such as a text to be synthesized or a phonetic symbol is input to the synthesizing unit 1, the synthesizing unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3. The selection method is the same as in the first embodiment. When the selection is performed, the waveform expansion / contraction unit 2 b reads the selected segment waveform from the waveform storage unit 3. The waveform expansion / contraction unit 2b performs grouping based on the read unit waveform labels, and selects the expansion / contraction rate of the group from the parameter storage unit 2a. In the previous example, the label is "a"
In the case of “i”, “u”, “e”, and “o”, the expansion and contraction ratio is 1.2
In other cases, an expansion / contraction ratio of 1.1 is selected. The waveform expansion / contraction unit 2b expands / contracts the unit waveform at the selected expansion / contraction rate. The expanded and contracted unit waveforms are arranged in the synthesizing unit 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００５９】［実施の形態５］本発明の実施の形態５
は、実施の形態１〜３のものと基本的に同じ構成であ
る。異なる点は、波形変換部２において、素片波形を有
声音、無声音の２つのグループに分け、有声音のみを変
換している点である。[Embodiment 5] Embodiment 5 of the present invention
Has basically the same configuration as that of the first to third embodiments. The difference is that the waveform conversion unit 2 divides the unit waveform into two groups, a voiced sound and an unvoiced sound, and converts only the voiced sound.

【００６０】次に、図１および図２を参照して、本発明
の実施の形態５の動作について説明する。尚、以下では
実施の形態１に即して説明するが、実施の形態２、３に
即した形での実現も無論可能である。Next, the operation of the fifth embodiment of the present invention will be described with reference to FIGS. In the following, description will be given in accordance with the first embodiment, but it is of course possible to realize the embodiment in accordance with the second and third embodiments.

【００６１】合成したいテキストあるいは発音記号など
それに準ずる発音情報が合成部１に入力されると、合成
部１は、このテキストを合成するために必要な素片波形
を波形記憶部３から選択する。選択方法は、実施の形態
１と同様である。選択が行われると、波形伸縮部２ｂ
は、選択された素片波形を波形記憶部３から読み出す。
波形伸縮部２ｂは、読み出された素片波形のラベルから
有声音、無声音を判別する。そして、有声であれば、素
片波形をパラメータ記憶部２ａに記憶された伸縮率で伸
縮を行い、合成部１に出力する。一方、無声であれば、
そのまま合成部１に出力する。素片波形は合成部１で所
望のピッチ・発声速度となるように並べられ、合成音声
として出力される。When pronunciation information such as text or pronunciation symbols to be synthesized is input to the synthesis unit 1, the synthesis unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3. The selection method is the same as in the first embodiment. When the selection is made, the waveform expansion / contraction unit 2b
Reads the selected segment waveform from the waveform storage unit 3.
The waveform expansion / contraction unit 2b determines a voiced sound or an unvoiced sound from the label of the read unit waveform. If it is voiced, the unit waveform is expanded / contracted at the expansion / contraction rate stored in the parameter storage unit 2 a and output to the synthesis unit 1. On the other hand, if you are silent,
The data is output to the synthesizing unit 1 as it is. The unit waveforms are arranged in the synthesizing unit 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００６２】有声音は、図１４の（Ｃ）のように、複数
の素片波形（１ピッチ分の波形）が並んだものとして構
成される。したがって、有声音の発声長は個数と間隔
（ピッチ周期）を変えることにより調整する。これに対
し、無声音の場合、１つの素片波形（１音素分の波形）
から構成される。よって、発声長は素片波形の長さにの
み依存する。このため、素片波形を伸縮すると、発声長
も変化する。発声長を変えないためには、例えば先に図
１６（ａ）および（ｂ）で示したような処理が必要とな
る。しかし、このような波形の接続、削除は接続点での
スペクトル的な不連続が起こりやすく、音質劣化につな
がる。そこで、本発明においては、無声音の伸縮を行わ
ないことで、音質劣化を抑えることができる。A voiced sound is configured as a plurality of unit waveforms (a waveform for one pitch) arranged as shown in FIG. Therefore, the utterance length of the voiced sound is adjusted by changing the number and interval (pitch cycle). On the other hand, in the case of unvoiced sound, one segment waveform (a waveform for one phoneme)
Consists of Therefore, the utterance length depends only on the length of the segment waveform. Therefore, when the unit waveform is expanded or contracted, the utterance length also changes. In order to keep the utterance length unchanged, for example, the processing shown in FIGS. 16A and 16B is required. However, such connection or deletion of the waveform is likely to cause spectral discontinuity at the connection point, leading to sound quality deterioration. Therefore, in the present invention, sound quality degradation can be suppressed by not performing expansion and contraction of unvoiced sound.

【００６３】［実施の形態６］図５は、本発明の実施の
形態６の構成を示すブロック図である。図５を参照し
て、本発明の実施の形態６は、実施の形態１〜５に加
え、外部からの変換パラメータの入力を受け付け、波形
変換部２に変換パラメータを指定する入力部４を有して
いる。合成部１、波形変換部２、および波形記憶部３
は、実施の形態１〜５と同様な動作をする。[Sixth Embodiment] FIG. 5 is a block diagram showing a configuration of a sixth embodiment of the present invention. Referring to FIG. 5, a sixth embodiment of the present invention has an input unit 4 that receives an input of a conversion parameter from the outside and designates a conversion parameter to waveform conversion unit 2 in addition to the first to fifth embodiments. doing. Synthesis unit 1, waveform conversion unit 2, and waveform storage unit 3
Operates in the same manner as in the first to fifth embodiments.

【００６４】入力部４は、変換パラメータの入力を受け
付け、その情報を波形変換部２中のパラメータ記憶部２
ａに書き込む。ここで、変換パラメータは、１つまたは
複数指定することができる。複数指定する場合は、この
変換パラメータは、実施の形態４における素片波形のグ
ループに対応している。また、変換パラメータのうち一
部または全部が指定されなかった場合、指定されなかっ
た変換パラメータは、パラメータ記憶部２ａが予め記憶
している値とする。これにより、例えば利用者が、自分
の好みの合成音を得ることができる。The input unit 4 receives the input of the conversion parameter, and transmits the information to the parameter storage unit 2 in the waveform conversion unit 2.
Write to a. Here, one or more conversion parameters can be specified. When a plurality is specified, this conversion parameter corresponds to the unit waveform group in the fourth embodiment. When some or all of the conversion parameters are not specified, the conversion parameters that are not specified are values stored in advance in the parameter storage unit 2a. Thereby, for example, the user can obtain his / her favorite synthesized sound.

【００６５】次に、図５を参照して、本発明の実施の形
態６の動作について説明する。尚、以下では実施の形態
２、４に即して説明するが、他の実施に即した形での実
現も無論可能である。Next, the operation of the sixth embodiment of the present invention will be described with reference to FIG. In the following, description will be given in accordance with the second and fourth embodiments, but it is of course possible to realize the present invention in a form according to another embodiment.

【００６６】合成を行う前に、入力部４を用いて伸縮率
をパラメータ記憶部２ａに予め書き込む。伸縮率の指定
の方法としては、例えば、ツールバーを用いることもで
きる。合成したいテキストあるいは発音記号などそれに
準ずる発音情報が合成部１に入力されると、合成部１
は、このテキストを合成するために必要な素片波形を波
形記憶部３から選択する。選択は、実施の形態１と同様
である。選択が行われると、波形変換部２は、選択され
た素片波形を波形記憶部３から読み出して伸縮を行う。
例えば、入力部４に母音１．５、子音１．２というよう
に入力されると、波形記憶部３におけるラベルが「ａ」
「ｉ」「ｕ」「ｅ」「ｏ」である素片波形は伸縮率１．
５、その他は伸縮率１．２で伸縮を行う。伸縮された素
片波形は合成部１で所望のピッチ、発声速度となるよう
に並べられ、合成音声として出力される。Before the synthesis, the expansion / contraction ratio is preliminarily written into the parameter storage unit 2a using the input unit 4. As a method of specifying the expansion ratio, for example, a toolbar can be used. When phonetic information such as a text to be synthesized or phonetic symbols is input to the synthesizing unit 1, the synthesizing unit 1
Selects, from the waveform storage unit 3, a unit waveform necessary for synthesizing the text. The selection is the same as in the first embodiment. When the selection is performed, the waveform conversion unit 2 reads out the selected unit waveform from the waveform storage unit 3 and performs expansion and contraction.
For example, when vowels 1.5 and consonants 1.2 are input to the input unit 4, the label in the waveform storage unit 3 is "a".
The unit waveforms of “i”, “u”, “e”, and “o” have expansion and contraction ratios of 1.
5, others expand and contract at an expansion ratio of 1.2. The expanded and contracted unit waveforms are arranged in the synthesizer 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００６７】［実施の形態７］図６は、本発明の実施の
形態７の構成を示すブロック図である。図６を参照し
て、本発明の実施の形態７は、実施の形態１〜５に加
え、入力テキスト等の情報から変換パラメータ情報を分
離するパラメータ抽出部６を有している。合成部１、波
形変換部２、および波形記憶部３は実施の形態１〜５と
同様な動作をする。[Seventh Embodiment] FIG. 6 is a block diagram showing a configuration of a seventh embodiment of the present invention. Referring to FIG. 6, a seventh embodiment of the present invention has a parameter extracting unit 6 for separating conversion parameter information from information such as input text in addition to the first to fifth embodiments. The synthesizing unit 1, the waveform converting unit 2, and the waveform storing unit 3 operate in the same manner as in the first to fifth embodiments.

【００６８】パラメータ抽出部６は、入力テキスト中に
記された変換パラメータを抽出して波形変換部２中のパ
ラメータ記憶部２ａに書き込む。変換パラメータは、入
力テキスト中に一つまたは複数記述することができる。
複数記述した場合、変換パラメータは実施の形態４にお
ける素片波形のグループに対応している。また、変換パ
ラメータのうち一部または全部が記述されなかった場
合、記述されなかった変換パラメータは、パラメータ記
憶部２ａが予め記憶している値とする。これにより、前
述の入力部４を持たなくても、利用者が自分の好みの合
成音を得ることができる。The parameter extracting section 6 extracts the conversion parameters described in the input text and writes them into the parameter storage section 2a in the waveform converting section 2. One or more conversion parameters can be described in the input text.
When a plurality of parameters are described, the conversion parameters correspond to the unit waveform groups in the fourth embodiment. If some or all of the conversion parameters are not described, the conversion parameters that are not described are values stored in advance in the parameter storage unit 2a. Thus, the user can obtain his / her favorite synthesized sound without having the input unit 4 described above.

【００６９】次に、図６を参照して、本発明の実施の形
態７の動作について説明する。尚、以下では実施の形態
２、４に即して説明するが、他の実施に即した形での実
現も無論可能である。Next, the operation of the seventh embodiment of the present invention will be described with reference to FIG. In the following, description will be given in accordance with the second and fourth embodiments, but it is of course possible to realize the present invention in a form according to another embodiment.

【００７０】合成したいテキストあるいは発音記号など
それに準ずる発音情報がパラメータ抽出部６に入力され
ると、パラメータ抽出部６は、それをテキスト情報と伸
縮率情報に分離する。ここで、例えば「＼ｖ１．１＼ｓ
１．２次の交差点を右に曲がってください」と入力され
たとする。ここで、「＼」は特殊記号、その後のアルフ
ァベットは素片波形のグループを表すラベル（ｖ：母
音、ｓ：子音）、その後の数字は伸縮率である。この場
合、母音の伸縮率は１．１、子音の伸縮率は１．２とい
う指定に相当する。このテキストが入力された場合、パ
ラメータ抽出部６は、伸縮率とテキストを分離し、母音
で伸縮率１．１、子音で伸縮率１．２をパラメータ記憶
部２ａに、テキスト「次の交差点を右に曲がって下さ
い」を合成部１に渡す。When phonetic information such as text or phonetic symbols to be synthesized is input to the parameter extracting unit 6, the parameter extracting unit 6 separates it into text information and expansion / contraction information. Here, for example, "{v1.1} s
1.2 Please turn right at the next intersection. " Here, “＼” is a special symbol, the alphabet after the label is a label (v: vowel, s: consonant) representing a group of unit waveforms, and the number following is the expansion rate. In this case, the expansion and contraction rate of the vowel is 1.1, and the expansion and contraction rate of the consonant is 1.2. When this text is input, the parameter extraction unit 6 separates the expansion and contraction rate from the text, and stores the expansion and contraction rate 1.1 for vowels and 1.2 for consonants in the parameter storage unit 2a. Please turn right. "

【００７１】合成部１は、このテキストを合成するため
に必要な素片波形を波形記憶部３から選択する。選択
は、実施の形態１と同様である。選択が行われると、波
形伸縮部２ｂは、選択された素片波形を波形記憶部３か
ら読み出して、パラメータ記憶部２ａに記憶された伸縮
率で伸縮を行う。伸縮された素片波形は合成部１で所望
のピッチ、発声速度となるように並べられ、合成音声と
して出力される。The synthesizing unit 1 selects a unit waveform necessary for synthesizing the text from the waveform storage unit 3. The selection is the same as in the first embodiment. When the selection is performed, the waveform expansion / contraction unit 2b reads the selected segment waveform from the waveform storage unit 3 and performs expansion / contraction at the expansion / contraction rate stored in the parameter storage unit 2a. The expanded and contracted unit waveforms are arranged in the synthesizer 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００７２】［実施の形態８］図７は、本発明の実施の
形態８の構成を示すブロック図である。図７を参照し
て、本発明の実施の形態８は、実施の形態１〜５に加
え、変換後の素片波形とそのラベルを記憶する第２の波
形記憶部５を有している。また、実施の形態６、７と同
様に、パラメータ抽出部６、入力部４を有する構成も可
能である。このような場合の構成は、図８および図９と
なる。[Eighth Embodiment] FIG. 7 is a block diagram showing a configuration of an eighth embodiment of the present invention. Referring to FIG. 7, an eighth embodiment of the present invention has a second waveform storage unit 5 for storing a converted unit waveform and its label in addition to the first to fifth embodiments. Further, similarly to the sixth and seventh embodiments, a configuration having the parameter extracting unit 6 and the input unit 4 is also possible. The configuration in such a case is shown in FIGS.

【００７３】次に、図７を参照して、本発明の実施の形
態８の動作について説明する。尚、以下では実施の形態
２、４に即して説明するが、他の実施に即した形での実
現も無論可能である。Next, the operation of the eighth embodiment of the present invention will be described with reference to FIG. In the following, description will be given in accordance with the second and fourth embodiments, but it is of course possible to realize the present invention in a form according to another embodiment.

【００７４】実施の形態８では、伸縮率が定義された時
点（合成を行う前）に波形変換部２において波形記憶部
３から読み出した素片波形を変換し、第２の波形記憶部
５に記憶する。第２の波形記憶部５は、波形変換部２に
おいて変換された素片波形を記憶しておく。また、第２
の波形記憶部５は、ラベルも波形記憶部３のものをコピ
ーして記憶する。合成音作成の際には、合成したいテキ
ストあるいは発音記号などそれに準ずる発音情報が合成
部１に入力されると、合成部１は、このテキストを合成
するために必要な素片波形を第２の波形記憶部５から選
択する。選択は、実施の形態１と同様である。この素片
波形は合成部１で所望のピッチ・発声速度となるように
並べられ、合成音声として出力される。In the eighth embodiment, at the time when the expansion / contraction ratio is defined (before synthesizing), the waveform converter 2 converts the segment waveform read out from the waveform storage 3, and stores it in the second waveform storage 5. Remember. The second waveform storage unit 5 stores the unit waveform converted by the waveform conversion unit 2. Also, the second
The waveform storage unit 5 stores a label copied from the waveform storage unit 3. At the time of creating a synthesized sound, when pronunciation information such as a text to be synthesized or phonetic symbols is input to the synthesis unit 1, the synthesis unit 1 outputs a segment waveform necessary for synthesizing the text to a second unit. Select from the waveform storage unit 5. The selection is the same as in the first embodiment. The unit waveforms are arranged in the synthesizing unit 1 so as to have a desired pitch and utterance speed, and are output as synthesized speech.

【００７５】このように、素片波形の変換を予め行って
おくため、実施の形態７では、合成音生成の際の計算量
の削減ならびに処理の高速化が可能である。As described above, since the segment waveform is converted in advance, in the seventh embodiment, it is possible to reduce the amount of calculation and to increase the processing speed when generating the synthesized sound.

【００７６】［実施の形態９］図１０は、本発明の実施
の形態９の構成を示すブロック図である。図１０を参照
して、本発明の実施の形態９は、入力を元に適当な素片
波形を編集して合成音を生成する合成部１と、素片波形
を韻律情報を用いて決定した変換パラメータで変換する
変換部７と、伸縮前の素片波形とそのラベルを記憶する
波形記憶装置部３とを有している。また、図１３を参照
して、変換部７は、１つまたは複数の変換パラメータを
記憶するパラメータ記憶部２ａと、パラメータ変更部７
ａから得られる変換パラメータから変換を行う波形変換
部７ｃと、過去に変換を行った素片波形のラベルを記憶
し、これをパラメータ変更部７ａに渡す進行記憶部７ｂ
と、韻律付与部１ｂ、進行記憶部７ｂからの出力を元に
パラメータ記憶部２ａに記憶された変換パラメータを変
更して波形変換部７ｃに渡すパラメータ変更部２ｅとを
備えている。波形変換部７ｃは、図２〜図４で示した波
形変換部からパラメータ記憶部を除いた構成を取る。合
成部１は、入力テキストの構文解析を行う構文解析部１
ａと、構文解析されたテキストに韻律を付与する韻律付
与部１ｂと、韻律を付与されたテキストから適当な素片
波形を編集・合成する波形編集部１ｃとを備えている。
また、実施の形態６、７と同様に、入力部４、パラメー
タ抽出部６を有する構成も可能である。このような場合
の構成は、図１１および図１２となる。[Ninth Embodiment] FIG. 10 is a block diagram showing a configuration of a ninth embodiment of the present invention. Referring to FIG. 10, in a ninth embodiment of the present invention, a synthesis unit 1 that edits an appropriate unit waveform based on an input to generate a synthesized sound, and determines a unit waveform using prosodic information. It has a conversion unit 7 for converting with a conversion parameter, and a waveform storage unit 3 for storing a unit waveform before expansion and contraction and its label. Referring to FIG. 13, conversion unit 7 includes a parameter storage unit 2 a that stores one or a plurality of conversion parameters, and a parameter change unit 7.
a waveform conversion unit 7c that performs conversion from the conversion parameters obtained from the parameter a, and a progress storage unit 7b that stores the labels of the unit waveforms that have been converted in the past and passes them to the parameter change unit 7a.
And a parameter changing unit 2e that changes the conversion parameters stored in the parameter storage unit 2a based on the output from the prosody provision unit 1b and the progress storage unit 7b and passes the converted parameters to the waveform conversion unit 7c. The waveform conversion unit 7c has a configuration in which the parameter storage unit is removed from the waveform conversion units shown in FIGS. The synthesizing unit 1 performs a syntax analysis of the input text.
a, a prosody providing unit 1b for providing a prosody to the parsed text, and a waveform editing unit 1c for editing and synthesizing an appropriate unit waveform from the text to which the prosody is provided.
Further, similarly to the sixth and seventh embodiments, a configuration having the input unit 4 and the parameter extracting unit 6 is also possible. The configuration in such a case is shown in FIG. 11 and FIG.

【００７７】次に、図１０および図１３を参照して、本
発明の実施の形態９の動作について説明する。尚、以下
では実施の形態２、４に即して説明するが、他の実施に
即した形での実現も無論可能である。Next, the operation of the ninth embodiment of the present invention will be described with reference to FIGS. In the following, description will be given in accordance with the second and fourth embodiments, but it is of course possible to realize the present invention in a form according to another embodiment.

【００７８】合成したいテキストあるいは発音記号など
それに準ずる物が合成部１に入力されると、構文解析部
１ａは、これに構文解析を行う。ここでは、例えば「次
の交差点を右に曲がってください」というテキストが入
力されたとすると、構文解析部１ａは、「次の／交差点
を／右に／曲がって／下さい」というように形態素と呼
ばれる単位に分割する。ここで、「／」は分割の位置を
示す。次に、その出力に対応し韻律付与部１ｂでアクセ
ント、イントネーション、ポーズ、リズム等の韻律情報
を付加する。韻律付与部１ｂでは例えば構文解析部１ａ
の出力「次の／交差点を／右に／曲がって／下さい」に
対し、「つぎ’の／こうさてんを／みぎに／まがって／
くださ’い」という読みとアクセントを付与する。ここ
で「’」はアクセントの位置を示す。韻律付与部１ｂの
出力は、波形編集部１ｃとパラメータ変更部７ａとに送
られる。パラメータ変更部７ａでは、この出力をもと
に、合成音声の各部分の伸縮率を決定する。例えば、パ
ラメータ記憶部２ａには、各音素毎に図１８のように伸
縮率が予め記憶されていたとする。また、伸縮率を変更
する規則として、「隣り合う音素間では大きな伸縮率の
違いが起きないようにする」、「アクセントのある音素
の伸縮率は高くする」と決まっていたとする。この場
合、韻律付与部１ｂの出力を用いない従来の合成装置の
場合には、図１９の変更前の伸縮率で各音素の素片波形
は伸縮される。これに対し、本発明では「つぎ’の／こ
うさてんを／みぎに／まがって／くださ’い」という韻
律付与部１ｂの出力を用いて、図１９の変更１、変更２
のように伸縮率を変更する。この例では、変更１は、
「隣り合う音素間では大きな伸縮率の違いが起きないよ
うにする」という規則から、／ｕｓａ／の間の／ｓ／の
伸縮率を左右の音素／ｕ／、／ａ／に近づけている。ま
た、変更２は、「アクセントのある音素は伸縮率を高く
する」という規則から、アクセントのある／ｉ／の伸縮
率を高くしている。下線の部分が変更を施された部分で
ある。したがって、パラメータ変更部７ａは、図１９の
変更２を最終的な伸縮率として保持する。波形編集部１
ｃは、テキストを合成するために必要な素片波形を波形
記憶部３から選択する。素片波形が選択されると、進行
記憶部７ｂは、選択された素片波形のラベルを蓄える。
そして、蓄えられたラベルをパラメータ変更部７ａに出
力する。パラメータ変更部７ａは、この蓄えられたラベ
ルから、現在、図１９のどの部分を処理しているかを把
握し、その部分の伸縮率を波形変換部７ｃに渡す。波形
変換部７ｃは、パラメータ変更部７ａから渡された伸縮
率で素片波形を伸縮し、波形編集部１ｃに渡す。波形編
集部１ｃは、この素片波形を所望のピッチ、発声長にな
るように並び変えて、合成音として出力する。When a text or a phonetic symbol to be synthesized, such as a phonetic symbol, is input to the synthesizing unit 1, the syntactic analysis unit 1a performs a syntactic analysis on this. Here, for example, assuming that the text “turn right at the next intersection” is input, the parsing unit 1a is called a morpheme such as “next / intersection / turn right / turn / please”. Divide into units. Here, “/” indicates the position of division. Next, the prosody provision unit 1b adds prosody information such as accent, intonation, pause, and rhythm in accordance with the output. In the prosody provision unit 1b, for example, the syntax analysis unit 1a
Output of "next / intersection / right / turn / please", "next's / Kosaten / Migi ni / Maki ni /
Give me the reading and the accent. Here, "'" indicates the position of the accent. The output of the prosody provision unit 1b is sent to the waveform editing unit 1c and the parameter changing unit 7a. The parameter changing unit 7a determines the expansion / contraction ratio of each part of the synthesized speech based on the output. For example, it is assumed that the parameter storage unit 2a previously stores the expansion / contraction ratio for each phoneme as shown in FIG. Further, it is assumed that rules for changing the expansion / contraction ratio are determined to be “to prevent a large difference in expansion / contraction ratio between adjacent phonemes” and “to increase the expansion / contraction ratio of phonemes with accents”. In this case, in the case of the conventional synthesizer that does not use the output of the prosody provision unit 1b, the unit waveform of each phoneme is expanded or contracted at the expansion ratio before the change in FIG. On the other hand, in the present invention, the changes 1 and 2 in FIG.
Change the stretch ratio as shown. In this example, change 1 is
According to the rule "to prevent a large difference in expansion ratio between adjacent phonemes", the expansion ratio of / s / between / usa / is closer to the left and right phonemes / u /, / a /. In the second modification, the expansion / contraction ratio of accented / i / is increased based on the rule that “a phoneme with an accent increases the expansion / contraction ratio”. The underlined portions are the changed portions. Therefore, the parameter changing unit 7a holds the change 2 in FIG. 19 as the final expansion / contraction ratio. Waveform editing unit 1
c selects a unit waveform necessary for synthesizing a text from the waveform storage unit 3. When the unit waveform is selected, the progress storage unit 7b stores the label of the selected unit waveform.
Then, the stored label is output to the parameter changing unit 7a. The parameter change unit 7a grasps which part of FIG. 19 is currently being processed from the stored label, and passes the expansion / contraction ratio of that part to the waveform conversion unit 7c. The waveform conversion unit 7c expands and contracts the unit waveform at the expansion and contraction ratio passed from the parameter changing unit 7a, and passes the resultant to the waveform editing unit 1c. The waveform editing unit 1c rearranges the unit waveforms so as to have a desired pitch and utterance length, and outputs the synthesized sound.

【００７９】実施の形態９は、他の実施の形態とは異な
り、同じ素片波形が韻律情報の違いにより異なる変換パ
ラメータで変換されるので、より詳細に声質を変更する
ことが可能となることである。The ninth embodiment differs from the other embodiments in that the same unit waveform is converted with different conversion parameters due to the difference in prosody information, so that the voice quality can be changed in more detail. It is.

【００８０】[0080]

【発明の効果】本発明による音声合成装置は、以下の効
果を奏する。The speech synthesizer according to the present invention has the following effects.

【００８１】第１に、従来方式とは異なり、素片波形を
編集する際、発声速度・ピッチを変換に合わせて計算す
る必要がない。First, unlike the conventional method, when editing the unit waveform, there is no need to calculate the utterance speed and pitch in accordance with the conversion.

【００８２】第２に、素片波形スペクトルを周波数シフ
トすることで声質をより多彩に変化させることができ
る。Second, the voice quality can be varied more by shifting the frequency of the unit waveform spectrum.

【００８３】第３に、変換が音質劣化につながるような
音韻の変換を行わない事によって合成音声の音質劣化を
抑えることができる。Third, deterioration of sound quality of synthesized speech can be suppressed by not performing conversion of phonemes such that conversion leads to sound quality deterioration.

【００８４】第４に、音韻の種類や、韻律情報に応じて
変換パラメータを決定する事により、声質をより多彩に
変化させることができる。Fourth, by determining the conversion parameters according to the types of phonemes and the prosody information, the voice quality can be varied more.

【００８５】第５に、予め変換を行った素片波形を記憶
する波形記憶部を持ち、合成時に変換を行わないため、
合成時の計算量を最小限に抑えられる。Fifth, since there is a waveform storage unit for storing the pre-converted unit waveform and no conversion is performed at the time of synthesis,
The amount of calculation at the time of synthesis can be minimized.

[Brief description of the drawings]

【図１】本発明の実施の形態１〜５による音声合成装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to Embodiments 1 to 5 of the present invention.

【図２】本発明の実施の形態１を示す図であり、図１に
示す波形変換部を示すブロック図である。FIG. 2 is a diagram illustrating the first embodiment of the present invention, and is a block diagram illustrating a waveform conversion unit illustrated in FIG. 1;

【図３】本発明の実施の形態２を示す図であり、図１に
示す波形変換部を示すブロック図である。FIG. 3 is a diagram illustrating a second embodiment of the present invention, and is a block diagram illustrating a waveform conversion unit illustrated in FIG. 1;

【図４】本発明の実施の形態３を示す図であり、図１に
示す波形変換部を示すブロック図である。FIG. 4 is a diagram illustrating a third embodiment of the present invention, and is a block diagram illustrating a waveform conversion unit illustrated in FIG. 1;

【図５】本発明の実施の形態６による音声合成装置の構
成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a speech synthesis device according to a sixth embodiment of the present invention.

【図６】本発明の実施の形態７による音声合成装置の構
成を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration of a speech synthesis device according to a seventh embodiment of the present invention.

【図７】本発明の実施の形態８による音声合成装置の構
成を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of a speech synthesis device according to an eighth embodiment of the present invention.

【図８】本発明の実施の形態８の他の例の構成を示すブ
ロック図である。FIG. 8 is a block diagram showing a configuration of another example of the eighth embodiment of the present invention.

【図９】本発明の実施の形態８のさらに他の例の構成を
示すブロック図である。FIG. 9 is a block diagram showing a configuration of still another example of Embodiment 8 of the present invention.

【図１０】本発明の実施の形態９による音声合成装置の
構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a speech synthesis device according to a ninth embodiment of the present invention.

【図１１】本発明の実施の形態９の他の例の構成を示す
ブロック図である。FIG. 11 is a block diagram showing a configuration of another example of Embodiment 9 of the present invention.

【図１２】本発明の実施の形態９のさらに他の例の構成
を示すブロック図である。FIG. 12 is a block diagram showing a configuration of still another example of Embodiment 9 of the present invention.

【図１３】図１０に示す変換部を示すブロック図であ
る。FIG. 13 is a block diagram illustrating a conversion unit illustrated in FIG. 10;

【図１４】本発明の有声音波形伸縮時の動作を説明する
ための図である。FIG. 14 is a diagram for explaining the operation of the voiced sound wave according to the present invention when expanding and contracting.

【図１５】比較例の有声音波形伸縮時の動作を説明する
ための図である。FIG. 15 is a diagram for explaining an operation when a voiced sound wave of a comparative example is expanded and contracted.

【図１６】本発明の実施の形態１による音声合成装置に
おける無声音波形の時間長を長く／短くする方法の一例
を示す図である。FIG. 16 is a diagram showing an example of a method of increasing / decreasing the time length of an unvoiced sound waveform in the speech synthesizer according to the first embodiment of the present invention.

【図１７】本発明の実施の形態２による音声合成装置に
おける素片波形スペクトルの伸縮の様子を示す図であ
る。FIG. 17 is a diagram showing a state of expansion and contraction of a unit waveform spectrum in the speech synthesizer according to the second embodiment of the present invention.

【図１８】本発明の実施の形態９による音声合成装置に
おける音素毎の伸縮率の一例を示す表図である。FIG. 18 is a table showing an example of a scaling factor for each phoneme in the speech synthesizer according to the ninth embodiment of the present invention.

【図１９】本発明の実施の形態９による音声合成装置に
おける伸縮率の変更例を示す表図である。FIG. 19 is a table showing an example of changing the expansion / contraction ratio in the speech synthesizer according to Embodiment 9 of the present invention.

【図２０】従来例による音声合成装置の構成を示すブロ
ック図である。FIG. 20 is a block diagram showing a configuration of a conventional speech synthesizer.

【図２１】図２０に示す音声合成装置の動作を説明する
ための図である。21 is a diagram for explaining the operation of the speech synthesis device shown in FIG.

[Explanation of symbols]

１合成部１ａ構文解析部１ｂ韻律付与部１ｃ波形編集部２波形変換部２ａパラメータ記憶部２ｂ波形伸縮部２ｃ周波数変換部２ｄ周波数シフト部２ｅ周波数逆変換部３波形記憶部４入力部５第２の波形記憶部６パラメータ抽出部７変換部７ａパラメータ変更部７ｂ進行記憶部７ｃ波形変換部 Reference Signs List 1 synthesis unit 1a syntax analysis unit 1b prosody giving unit 1c waveform editing unit 2 waveform conversion unit 2a parameter storage unit 2b waveform expansion / contraction unit 2c frequency conversion unit 2d frequency shift unit 2e frequency inverse conversion unit 3 waveform storage unit 4 input unit 5 second unit Storage unit 6 parameter extraction unit 7 conversion unit 7a parameter change unit 7b progress storage unit 7c waveform conversion unit

Claims

[Claims]

1. A waveform storage unit for storing a unit waveform as a minimum unit of synthesis, which is a waveform for one pitch for voiced sounds and a waveform for one phoneme for unvoiced sounds. In a speech synthesizer having a waveform conversion unit that converts a stored unit waveform and a synthesis unit that edits the unit waveform converted by the waveform conversion unit to generate a synthesized sound, A speech synthesizer characterized in that the speech quality of the synthesized voice is changed independently of the pitch and the utterance speed by converting each of the voice segments individually and generating a synthesized voice using the converted voice waveform.

2. A waveform storage unit for storing a unit waveform as a minimum unit of synthesis, which is a waveform for one pitch for voiced sounds and a waveform for one phoneme for unvoiced sounds. In a speech synthesizer having a waveform conversion unit for converting a stored unit waveform and a synthesis unit for editing a unit waveform converted by the waveform conversion unit to generate a synthesized sound, the waveform conversion unit may include: A waveform expanding / contracting unit that expands / contracts a unit waveform with time, and a parameter storage unit that stores a parameter indicating a ratio of time expansion / contraction in the expanding / contracting unit, and synthesizes by time-expanding / dividing the unit waveform in the waveform expanding / contracting unit. A speech synthesizer characterized by changing the voice quality of sound.

3. A waveform storage unit for storing a unit waveform as a minimum unit of synthesis, which is a waveform for one pitch for voiced sounds and a waveform for one phoneme for unvoiced sounds. In a speech synthesizer having a waveform conversion unit for converting a stored unit waveform and a synthesis unit for editing a unit waveform converted by the waveform conversion unit to generate a synthesized sound, the waveform conversion unit may include: A frequency conversion unit that performs a Fourier transform on the unit waveform, a frequency shift unit that shifts the frequency of the unit waveform spectrum that has been Fourier-transformed by the frequency conversion unit, and a parameter necessary for performing a frequency shift by the frequency shift unit. A parameter storage unit, and a frequency inverse transform unit that performs Fourier inverse transform on the unit waveform spectrum frequency-shifted by the frequency shift unit. A speech synthesizer characterized by changing a voice quality of a synthesized sound by frequency-shifting a waveform spectrum.

4. A waveform storage unit for storing a unit waveform as a minimum unit of synthesis, which is a waveform for one pitch for voiced sounds and a waveform for one phoneme for unvoiced sounds. In a speech synthesizer having a waveform conversion unit for converting a stored unit waveform and a synthesis unit for editing a unit waveform converted by the waveform conversion unit to generate a synthesized sound, the waveform conversion unit may include: A frequency conversion unit that Laplace transforms a unit waveform, a frequency shift unit that shifts the frequency of a unit waveform spectrum that has been Laplace-transformed by the frequency conversion unit, and a parameter necessary for frequency shifting by the frequency shift unit. A parameter storage unit for storing, and a frequency inverse transform unit for performing an inverse Laplace transform of the unit waveform spectrum frequency-shifted by the frequency shift unit. A speech synthesizer characterized by changing a voice quality of a synthesized sound by frequency-shifting a half-wave spectrum.

5. A waveform storage unit for storing a unit waveform as a minimum unit of synthesis, which is a waveform for one pitch for voiced sounds and a waveform for one phoneme for unvoiced sounds. In a speech synthesizer having a waveform conversion unit for converting a stored unit waveform and a synthesis unit for editing a unit waveform converted by the waveform conversion unit to generate a synthesized sound, the waveform conversion unit may include: A matrix operation unit that integrates a matrix into the sample value column vector of the unit waveform stored in the waveform storage unit, and a matrix operation unit that integrates the matrix into the sample value column vector of the unit waveform in which the matrix is integrated, A speech synthesizer, comprising: a parameter storage unit that stores a matrix used in the matrix operation unit; and converting a unit waveform by integrating the matrix with a sample value sequence vector of the unit waveform.

6. The speech synthesizer according to claim 1, wherein the unit waveforms are divided into a plurality of groups,
A speech synthesizing device characterized in that conversion is performed using different parameters for each group.

7. The speech synthesizer according to claim 6, wherein the unit waveform is divided into a voiced sound and an unvoiced sound, and only the voiced sound is converted.

8. The speech synthesizing apparatus according to claim 1, further comprising a parameter input section, wherein the parameters can be set arbitrarily.

9. The speech synthesizer according to claim 1, wherein said waveform converter converts a unit waveform using a parameter specified in an input text. Synthesizer.

10. The speech synthesizer according to claim 1, further comprising: a second waveform storage unit that stores the unit waveform converted by the waveform conversion unit, wherein the second waveform storage unit stores the second waveform storage unit. A speech synthesizer for generating a synthesized speech using the unit waveform stored in the section.

11. The speech synthesizer according to claim 1, wherein the waveform converter changes a parameter from prosody information of the synthesizer.