JPH09179576A

JPH09179576A - Voice synthesizing method

Info

Publication number: JPH09179576A
Application number: JP7333322A
Authority: JP
Inventors: Takehiko Kagoshima; 岳彦籠嶋; Masami Akamine; 政巳赤嶺; Kimio Miseki; 公生三関; Masahiro Oshikiri; 正浩押切; Ko Amada; 皇天田; Akinobu Yamashita; 明延山下
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-12-21
Filing date: 1995-12-21
Publication date: 1997-07-11

Abstract

PROBLEM TO BE SOLVED: To optionally vary the voice quantity of a synthesized voice for text voice synthesis. SOLUTION: When a synthesized voice signal 107 is generated by connecting information 106 selected out of plural pieces of information in voice synthesis units stored in a synthesis unit set storage part 13 by a voice signal generation part 12, a voice signal 201 is inputted from a voice input part 21, pieces of information 204 in voice synthesis units are generated by a voice analysis part 23 according to the voice signal and stored in the synthesis unit set storage part 13, and the synthesized voice signal 107 is generated by using them.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はテキスト音声合成シ
ステムにおいて、音韻記号列、ピッチおよび音韻継続時
間長などの情報から合成音声信号を生成する音声合成方
法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesis method for generating a synthesized speech signal from information such as a phoneme symbol string, pitch and phoneme duration in a text-to-speech system.

【０００２】[0002]

【従来の技術】任意の文章（テキスト）から人工的に音
声信号を作り出すことをテキスト音声合成という。通
常、テキスト音声合成システムは、図１２に示されるよ
うに言語処理部１０、音韻処理部１１および音声信号生
成部１２の３つの構成要素からなる。2. Description of the Related Art Artificially producing a voice signal from an arbitrary sentence (text) is called text-to-speech synthesis. Normally, the text-to-speech synthesis system is composed of three components, a language processing unit 10, a phoneme processing unit 11, and a speech signal generation unit 12, as shown in FIG.

【０００３】入力されたテキスト１０１は、まず言語処
理部１０において形態素解析や構文解析などが行われ、
次に韻律処理部１１においてアクセントやイントネーシ
ョンの処理が行われて、音韻記号列１０３、音韻継続時
間長１０４およびピッチパターン１０５などの情報が出
力される。最後に、音声信号生成部１２において合成単
位セット記憶部１３に記憶されている音節、音素および
１ピッチ区間などの基本となる小さな単位（音声合成単
位）の特徴パラメータを音韻記号列１０３、音韻継続時
間長１０４およびピッチパターン１０５などの情報に従
って選択し、これらをピッチや継続時間長を制御して接
続することにより合成音声信号１０７を生成する。The input text 101 is first subjected to morphological analysis and syntactic analysis in the language processing unit 10,
Next, accent processing and intonation processing are performed in the prosody processing unit 11, and information such as the phoneme symbol string 103, the phoneme duration 104, and the pitch pattern 105 is output. Finally, in the voice signal generation unit 12, the characteristic parameters of a small basic unit (speech synthesis unit) such as syllables, phonemes, and 1-pitch sections stored in the synthesis unit set storage unit 13 are set to the phonological symbol sequence 103, phonological continuation. Selection is made according to information such as the time length 104 and the pitch pattern 105, and the synthesized voice signal 107 is generated by connecting these by controlling the pitch and duration.

【０００４】このようにして生成される合成音声の声質
（個人性）は、主に音声合成単位である特徴パラメータ
によって決定される。このため、テキスト音声合成にお
いて複数の異なる声質の音声を合成する必要がある場合
は、図１３に示されるようにそれぞれの声質に対応する
複数の合成単位セット記憶部１３，１４を用意し、これ
らを切替えて使用することによって異なる声質の合成音
声信号１０７を生成している。The voice quality (individuality) of the synthesized speech thus generated is mainly determined by the characteristic parameter which is a speech synthesis unit. Therefore, when it is necessary to synthesize a plurality of voices having different voice qualities in text-to-speech synthesis, a plurality of synthesis unit set storage units 13 and 14 corresponding to the respective voice qualities are prepared as shown in FIG. By switching and using, the synthesized voice signal 107 of different voice quality is generated.

【０００５】一方、複数の合成単位セットを記憶するこ
となく様々な声質の音声を合成する方法としては、ピッ
チパターンを変化させる方法や、合成音声信号にフィル
タをかけることによって音質を変化させる方法などがあ
る。On the other hand, as a method of synthesizing voices of various voice qualities without storing a plurality of synthesis unit sets, a method of changing a pitch pattern, a method of changing a sound quality by filtering a synthesized voice signal, etc. There is.

【０００６】前者のピッチパターンを変化させる方法
は、図１４に示されるように韻律処理部１５にピッチ平
均値情報１０９を入力してピッチパターンの平均値を変
えることによって声の高さを変えたり、抑揚の大きさ情
報１１０を入力してピッチの変化の幅を変えることによ
って抑揚の強さを変えるものである。後者のフィルタを
用いる方法は、図１５に示されるようにフィルタ特性情
報１１１に従って特性を制御できるフィルタ部１６によ
って合成音声信号１０７の低域や高域を強調することで
合成音声の音質を変えるものである。The former method of changing the pitch pattern is to change the pitch of the voice by inputting the pitch average value information 109 into the prosody processing section 15 and changing the average value of the pitch pattern as shown in FIG. By inputting the intonation level information 110 and changing the width of the pitch change, the intonation intensity is changed. The latter method using a filter changes the sound quality of synthesized speech by emphasizing the low and high frequencies of the synthesized speech signal 107 by a filter unit 16 whose characteristics can be controlled according to filter characteristic information 111 as shown in FIG. Is.

【０００７】[0007]

【発明が解決しようとする課題】上記のような従来の音
声合成方法では、任意の声質の合成音声を生成できない
という問題がある。すなわち、複数の合成単位セットを
記憶する方法では、合成単位セットの数を増やすことに
よって複数種類の声質の音声を合成することが可能とな
るが、任意の声質の音声を合成することは実質的に不可
能であり、また多数の合成単位セットを記憶することは
必要なメモリ容量が増大してコスト高となるという問題
がある。However, the conventional speech synthesis method as described above has a problem in that it is not possible to generate synthesized speech having an arbitrary voice quality. That is, in the method of storing a plurality of synthesis unit sets, it is possible to synthesize a voice of a plurality of types of voice quality by increasing the number of synthesis unit sets, but it is practical to synthesize a voice of any voice quality. However, there is a problem in that it is impossible to store a large number of composition unit sets, and the required memory capacity increases and the cost increases.

【０００８】また、ピッチパターンを変化させる方法で
は、イントネーションやアクセントは変化しても、本質
的には合成音声の声質、すなわち個人性までも変化させ
ることはできない。フィルタによって合成音声の声質を
変化させる場合、声が明るくなる／暗くなるという程度
の変化は可能であるが、やはり個人性を変化させること
はできない。個人性の情報は、単にホルマントの振幅だ
けでなく、ホルマントの中心周波数や幅などにも存在す
るためである。Further, in the method of changing the pitch pattern, even if the intonation and accent are changed, it is essentially impossible to change the voice quality of the synthesized voice, that is, the individuality. When the voice quality of the synthesized voice is changed by the filter, the voice can be changed to be brighter / darker, but the individuality cannot be changed. This is because the personality information exists not only in the amplitude of the formant but also in the center frequency and width of the formant.

【０００９】本発明は、上記の問題点を解決すべくなさ
れたもので、テキスト音声合成において合成音声の声質
を任意に変更することができる音声合成方法を提供する
ことを目的とする。The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech synthesis method capable of arbitrarily changing the voice quality of synthesized speech in text speech synthesis.

【００１０】[0010]

【課題を解決するための手段】上記の課題を解決するた
め、本発明は複数の音声合成単位の情報から選択された
情報を接続することによって合成音声信号を生成する音
声合成方法において、音声信号および声質情報の少なく
とも一方に従って、合成音声信号の生成に必要な複数の
音声合成単位の情報を生成または加工するようにしたも
のであり、具体的には次の態様をとることを特徴とす
る。In order to solve the above problems, the present invention provides a voice synthesizing method for generating a synthetic voice signal by connecting information selected from information of a plurality of voice synthesizing units. According to at least one of the voice quality information and the voice quality information, information of a plurality of voice synthesis units necessary for generating a synthetic voice signal is generated or processed, and specifically, the following features are adopted.

【００１１】（１）音声信号を入力し、この音声信号に
従って複数の音声合成単位の情報を生成する。このよう
にして音声信号に従って生成された音声合成単位の情報
を記憶しておき、これを用いて合成音声信号を生成すれ
ば、合成音声の声質は音声信号を入力した話者の声質と
同じとなるので、話者が変わることによって合成音声の
声質が変化することになる。(1) Input a voice signal and generate information of a plurality of voice synthesis units according to the voice signal. If the information of the voice synthesis unit generated according to the voice signal is stored and the synthesized voice signal is generated using this, the voice quality of the synthesized voice is the same as the voice quality of the speaker who inputs the voice signal. Therefore, the voice quality of the synthesized voice changes when the speaker changes.

【００１２】（２）音声信号を入力し、この音声信号に
従って予め記憶した複数の音声合成単位の情報を加工す
る。このように既存の音声合成単位の情報を入力された
音声信号に従って加工し、これを用いて合成音声信号を
生成すれば、やはり合成音声の声質は音声信号を入力し
た話者の声質と同じとなり、話者が変わることによって
合成音声の声質が変化する。(2) Input a voice signal and process information of a plurality of voice synthesis units stored in advance according to the voice signal. Thus, if the information of the existing voice synthesis unit is processed according to the input voice signal and the synthesized voice signal is generated using this, the voice quality of the synthesized voice becomes the same as that of the speaker who inputs the voice signal. , The voice quality of synthetic speech changes depending on the speaker.

【００１３】（３）声質情報を入力し、この声質情報に
従って予め記憶した複数の音声合成単位の情報を加工す
る。ここで、声質情報とは合成音声の声質を指定する情
報である。この声質情報に従って既存の音声合成単位の
情報を加工し、これを用いて合成音声信号を生成すれ
ば、合成音声の声質は声質情報に従って変化することに
なる。(3) Input voice quality information, and process information of a plurality of voice synthesis units stored in advance according to this voice quality information. Here, the voice quality information is information that specifies the voice quality of the synthesized voice. If the information of the existing voice synthesis unit is processed according to this voice quality information and a synthesized voice signal is generated using this, the voice quality of the synthesized voice changes according to the voice quality information.

【００１４】（４）音声信号および声質情報を入力し、
音声信号によって複数の音声合成単位の情報を生成する
と共に、声質情報に従って複数の音声合成単位の情報を
加工する。このように音声合成単位の情報を音声信号に
従って生成した後、声質情報に従った加工を行い、これ
を用いて合成音声信号を生成すれば、合成音声の声質は
入力された音声信号および声質情報の両方に従ってさら
に多彩に変化する。(4) Input a voice signal and voice quality information,
Information of a plurality of voice synthesis units is generated from a voice signal, and information of a plurality of voice synthesis units is processed according to voice quality information. In this way, after the information of the voice synthesis unit is generated according to the voice signal, the processing according to the voice quality information is performed, and if the synthesized voice signal is generated using this, the voice quality of the synthesized voice is the input voice signal and voice quality information. According to both, it will change more variably.

【００１５】（５）音声信号および声質情報を入力し、
これらの音声信号および声質情報に従って予め記憶した
複数の音声合成単位の情報を加工する。この場合も、合
成音声の声質は入力された音声信号および声質情報の両
方に従って多彩に変化することになる。(5) Input a voice signal and voice quality information,
Information on a plurality of voice synthesis units stored in advance is processed according to these voice signals and voice quality information. In this case as well, the voice quality of the synthesized voice is variously changed according to both the input voice signal and the voice quality information.

【００１６】（６）複数の声質情報を記憶し、この声質
情報に従って予め記憶した音声合成単位の情報を加工す
る。このようにすると、複数の声質に対応した合成単位
の情報を記憶しておく場合よりも少ないメモリ容量で、
複数種類の声質の合成音声が得られる。(6) A plurality of voice quality information is stored, and the information of the voice synthesis unit stored in advance is processed according to the voice quality information. In this way, with a smaller memory capacity than when storing the information of the synthesis unit corresponding to a plurality of voice qualities,
It is possible to obtain a synthetic voice having a plurality of voice qualities.

【００１７】（７）複数の声質情報を記憶し、これら複
数の声質情報を補間することによって生成された声質情
報に従って音声合成単位の情報を加工する。このように
すると、複数の声質情報によって与えられる複数の声質
を混ぜたような多彩な声質の合成音声が得られる。(7) A plurality of voice quality information is stored, and the information of the voice synthesis unit is processed according to the voice quality information generated by interpolating the plurality of voice quality information. By doing this, it is possible to obtain a synthesized voice having a variety of voice qualities such as a mixture of a plurality of voice qualities given by a plurality of voice quality information.

【００１８】本発明においては、複数の音声合成単位の
情報を加工する際に、全ての音声合成単位について行っ
てもよいが、声質の情報が主として含まれる母音を含む
音声合成単位についてのみ加工を行うようにしてもよ
い。また、音声合成単位の情報の加工は、例えば振幅の
マッピングと周波数のマッピングの少なくとも一方をス
ペクトル情報に対して行われる。In the present invention, when processing the information of a plurality of voice synthesis units, all the voice synthesis units may be processed, but only the voice synthesis unit including a vowel mainly containing voice quality information is processed. It may be performed. Further, the processing of the information of the voice synthesis unit is performed on the spectrum information by, for example, at least one of amplitude mapping and frequency mapping.

【００１９】[0019]

BEST MODE FOR CARRYING OUT THE INVENTION

（第１の実施形態）図１に、本発明の第１の音声合成方
法を適用した第１の実施形態に係る音声合成装置の構成
を示す。この音声合成装置は、言語処理部１０、音韻処
理部１１、音声信号生成部１２および合成単位セット記
憶部１３を有する図１２に示した従来の音声合成装置
に、音声入力部２１、音声認識部２２および音声分析部
２３が追加された構成となっている。(First Embodiment) FIG. 1 shows the configuration of a voice synthesizing apparatus according to a first embodiment to which the first voice synthesizing method of the present invention is applied. This speech synthesizer includes a speech input section 21, a speech recognition section in addition to the conventional speech synthesizer shown in FIG. 12 having a language processing section 10, a phoneme processing section 11, a speech signal generation section 12, and a synthesis unit set storage section 13. 22 and a voice analysis unit 23 are added.

【００２０】この音声合成装置の基本動作は、従来と同
様である。すなわち、入力されたテキスト１０１は、ま
ず言語処理部１０において形態素解析や構文解析などが
行われ、次に言語処理部１０からの解析結果１０２に基
づき韻律処理部１１においてアクセントやイントネーシ
ョンの処理が行われて、音韻記号列１０３、音韻継続時
間長１０４およびピッチパターン１０５などの情報が出
力される。そして、音声信号生成部１２において合成単
位セット記憶部１３に記憶されている音節、音素および
１ピッチ区間などの基本となる小さな単位（音声合成単
位）の特徴パラメータが音韻記号列１０３、音韻継続時
間長１０４およびピッチパターン１０５などの情報に従
って選択され、これらがピッチや継続時間長の制御がな
された後に接続されることにより、合成音声信号１０７
が生成される。The basic operation of this speech synthesizer is the same as the conventional one. That is, the input text 101 is first subjected to morphological analysis and syntactic analysis in the language processing unit 10, and then subjected to accent and intonation processing in the prosody processing unit 11 based on the analysis result 102 from the language processing unit 10. Then, information such as the phoneme symbol string 103, the phoneme duration 104 and the pitch pattern 105 is output. Then, in the speech signal generation unit 12, the characteristic parameters of a small basic unit (speech synthesis unit) such as a syllable, a phoneme, and one pitch section stored in the synthesis unit set storage unit 13 are the phonological symbol sequence 103, the phonological duration. The length 104 and the pitch pattern 105 are selected according to the information, and these are connected after the pitch and the duration are controlled, whereby the synthesized voice signal 107 is selected.
Is generated.

【００２１】次に、本実施形態の特徴的な構成について
説明すると、まず音声入力部２１は例えば話者の発生し
た音声をマイクロフォンにより電気信号に変換し、さら
に増幅などの適当な処理を経て音声信号２０１として出
力するように構成されるか、または外部機器から供給さ
れた音声信号をそのままあるいは適当な処理を施して入
力する構成となっている。Next, the characteristic structure of the present embodiment will be described. First, the voice input unit 21 converts a voice generated by a speaker into an electric signal by a microphone, and further performs an appropriate process such as amplification to output the voice. It is configured to be output as the signal 201, or the audio signal supplied from an external device is input as it is or after being subjected to appropriate processing.

【００２２】音声認識部２２は、音声入力部２１から入
力された音声信号２０１と音韻記号との対応付けを行う
と同時に、入力された音声信号２０１に対応する音韻記
号で示される音韻に対応する区間を決定して、音韻記号
２０２および音韻境界時刻情報２０３を出力する。The voice recognition unit 22 associates the voice signal 201 input from the voice input unit 21 with the phoneme symbol, and at the same time, corresponds to the phoneme indicated by the phoneme symbol corresponding to the input voice signal 201. The section is determined and the phoneme symbol 202 and the phoneme boundary time information 203 are output.

【００２３】音声分析部２３は、音声認識部２２から入
力される音韻境界時刻情報２０３と音韻記号２０２に従
って、音声入力部２１からの音声合成単位となる区間の
音声信号２０１を分析して、音声合成単位の特徴パラメ
ータ２０４を出力する。The voice analysis unit 23 analyzes the voice signal 201 in the section serving as the voice synthesis unit from the voice input unit 21 according to the phoneme boundary time information 203 and the phoneme symbol 202 input from the voice recognition unit 22, and outputs the voice. The characteristic parameter 204 of the composition unit is output.

【００２４】そして、音声分析部２３から出力される特
徴パラメータ２０４が音声合成単位の情報として合成単
位セット記憶部１３に記憶される。特徴パラメータの例
としては、ＬＰＣ係数と残差波形の対などがある。ま
た、この他にもホルマントやケプストラムなどの様々な
パラメータを用いることも可能である。Then, the characteristic parameter 204 output from the voice analysis unit 23 is stored in the synthesis unit set storage unit 13 as information of the voice synthesis unit. An example of the characteristic parameter is a pair of LPC coefficient and residual waveform. In addition to this, it is also possible to use various parameters such as formants and cepstrums.

【００２５】以上のような構成の音声合成装置におい
て、本実施形態では音声入力部２１により予め全ての音
声合成単位を含む音声信号２０１を入力して、全ての音
声合成単位の特徴パラメータを合成単位セット記憶部１
３に記憶させた後に、テキスト合成処理を行う。この場
合、合成音声信号１０７をスピーカにより再生して得ら
れる合成音声の声質は、合成単位セットを生成するため
に音声入力部２１により入力された音声と同様の声質と
なる。合成音声の声質を変えたい場合は、目標とする声
質の話者の音声を用いて再び音声合成単位セットを生成
し直せば良い。In the speech synthesizing apparatus having the above-described configuration, in the present embodiment, the speech input unit 21 inputs the speech signal 201 including all speech synthesizing units in advance, and the characteristic parameters of all speech synthesizing units are used as the synthesizing units. Set storage unit 1
After storing in 3, the text composition processing is performed. In this case, the voice quality of the synthesized voice obtained by reproducing the synthesized voice signal 107 by the speaker is the same as the voice quality input by the voice input unit 21 to generate the synthesis unit set. If it is desired to change the voice quality of the synthesized voice, the voice synthesis unit set may be regenerated using the voice of the speaker having the target voice quality.

【００２６】このように、本実施形態によるとテキスト
音声合成において音声入力部２１により音声信号を入力
する話者の声質に合わせた任意の声質の合成音声を生成
することができる。As described above, according to this embodiment, in the text-to-speech synthesis, it is possible to generate a synthesized voice having an arbitrary voice quality that matches the voice quality of the speaker who inputs the voice signal by the voice input unit 21.

【００２７】（第２の実施形態）図２に、本発明の音声
合成方法を適用した第２の実施形態に係る音声合成装置
の構成を示す。図２において、図１と同じ参照番号を付
した要素は図１と同じ機能を有するものとして説明を省
略する。(Second Embodiment) FIG. 2 shows the configuration of a voice synthesizing apparatus according to a second embodiment to which the voice synthesizing method of the present invention is applied. In FIG. 2, elements given the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００２８】本実施形態は、話者が発声すべき音韻記号
を呈示する音韻呈示部２８を設け、この音韻呈示部２８
により呈示された音韻記号で示される音韻を話者が発声
することによって音声入力部２１から入力される音声信
号を音声分析部２３に入力するようにしたものである。
音韻呈示部２８は、例えば音韻記号をディスプレイで可
視表示するものであってもよいし、またスピーカを通し
て音声で呈示するものを用いることも可能である。In this embodiment, a phonological unit presenting unit 28 for presenting phonological symbols to be uttered by a speaker is provided, and this phonological unit presenting unit 28 is provided.
The voice signal input from the voice input unit 21 is input to the voice analysis unit 23 when the speaker utters the phoneme indicated by the phoneme symbol presented by.
The phoneme presentation unit 28 may be, for example, one that visually displays a phoneme symbol on a display, or that that is presented by voice through a speaker.

【００２９】本実施形態によると、第１の実施形態と同
様の効果が得られるほか、音声分析部２３では音声入力
部２１から入力される音声信号の音韻が分かっているた
め、図１における音声認識部２２を除いた構成とするこ
とができる。音声認識装置の認識率が現状では未だ十分
に高くないことを考えると、本実施形態は合成単位セッ
ト記憶部１３に記憶する合成単位セットの信頼性を高め
る上で有効である。According to this embodiment, the same effect as that of the first embodiment can be obtained, and since the voice analysis unit 23 knows the phoneme of the voice signal input from the voice input unit 21, the voice shown in FIG. The recognition unit 22 may be omitted. Considering that the recognition rate of the voice recognition device is not sufficiently high under the present circumstances, the present embodiment is effective in increasing the reliability of the synthesis unit set stored in the synthesis unit set storage unit 13.

【００３０】（第３の実施形態）図３に、本発明の音声
合成方法を適用した第３の実施形態に係る音声合成装置
の構成を示す。図３において、図１と同じ参照番号を付
した要素は図１と同じ機能を有するものとして説明を省
略する。(Third Embodiment) FIG. 3 shows the configuration of a speech synthesizer according to a third embodiment to which the speech synthesis method of the present invention is applied. In FIG. 3, elements given the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００３１】本実施形態では、図１の構成に加えてスペ
クトル比較部２４、合成単位情報変換部２５およびもう
一つの合成単位セット記憶部２７が設けられている。第
１の合成単位セット記憶部２６には予め定められた音声
合成単位の特徴パラメータが記憶されており、この特徴
パラメータを加工することによって声質を変換した特徴
パラメータが第２の合成単位セット記憶部２７に記憶さ
れる。In the present embodiment, in addition to the configuration shown in FIG. 1, a spectrum comparison section 24, a synthesis unit information conversion section 25 and another synthesis unit set storage section 27 are provided. The first synthesis unit set storage unit 26 stores a predetermined feature parameter of a voice synthesis unit, and the feature parameter obtained by converting the voice quality by processing this feature parameter is stored in the second synthesis unit set storage unit. Stored in 27.

【００３２】スペクトル比較部２４は、音声入力部２１
からの音声信号２０１を音声分析部２３で分析して得ら
れる特徴パラメータ２０４のスペクトルパラメータのパ
ワースペクトルと、第１の合成単位セット記憶部２６に
記憶されている特徴パラメータ２０５の中のスペクトル
パラメータのパワースペクトルとを比較して、変換パラ
メータ２０６を出力する。合成単位情報変換部２５は、
スペクトル比較部２４からの変換パラメータ２０６に基
づいて特徴パラメータ２０７を変換することにより、声
質を変換した音声合成単位の特徴パラメータ２０８を出
力し、これを第２の合成単位セット記憶部２７に記憶す
る。The spectrum comparing section 24 is provided in the voice input section 21.
Of the spectrum parameter of the spectrum parameter of the feature parameter 204 obtained by analyzing the voice signal 201 from the voice analysis unit 23 and the spectrum parameter of the feature parameter 205 stored in the first synthesis unit set storage unit 26. The power spectrum is compared and the conversion parameter 206 is output. The composition unit information conversion unit 25
By converting the characteristic parameter 207 based on the conversion parameter 206 from the spectrum comparison unit 24, the characteristic parameter 208 of the voice synthesis unit whose voice quality has been converted is output and stored in the second synthesis unit set storage unit 27. .

【００３３】合成単位情報変換部２５の処理は、例えば
スペクトルパラメータに対する周波数や振幅のマッピン
グによって実現することができる。この場合、スペクト
ル比較部２４は周波数や振幅のマッピング関数を変換パ
ラメータ２０６として出力する。ｆを周波数とすると、
特徴パラメータ２０５の中のパワースペクトルｓ（ｆ）
から特徴パラメータ２０４の中のパワースペクトルｓ′
（ｆ）への変換は、次式で表される。The processing of the synthesis unit information conversion section 25 can be realized by, for example, mapping frequency and amplitude with respect to spectrum parameters. In this case, the spectrum comparison unit 24 outputs the mapping function of frequency or amplitude as the conversion parameter 206. If f is the frequency,
Power spectrum s (f) in the characteristic parameter 205
From the power spectrum s ′ in the characteristic parameter 204
The conversion into (f) is expressed by the following equation.

【００３４】ｓ′（ｆ）＝ｍ（ｓ（ω（ｆ）））（１）ただし、ω（）は周波数のマッピングを表す関数であ
り、ｍ（）は振幅のマッピングを表す関数である。S ′ (f) = m (s (ω (f))) (1) where ω () is a function representing frequency mapping and m () is a function representing amplitude mapping.

【００３５】例えば、ｓ（ｆ），ｓ′（ｆ）が図４
（ａ），（ｂ）でそれぞれ表されるとする。ここでＦ
₁ ，Ｆ₂ ，Ｆ′₁ ，Ｆ′₂ はホルマント周波数であり、
Ｐ₁ ，Ｐ₂，Ｐ′₁ ，Ｐ′₂ は対応するホルマントのパ
ワーである。この場合、変換パラメータ２０６に対応す
る周波数のマッピング関数ω（）はそれぞれＦ₁ を
Ｆ′₁に、Ｆ₂ をＦ′₂ にマッピングするような関数で
あり、振幅のマッピング関数ｍ（）はそれぞれＰ₁ を
Ｐ′₁ に、Ｐ₂ をＰ′₂ にマッピングするような関数で
ある。これらの関数ω（）およびｍ（）の例を図５
（ａ），（ｂ）にそれぞれ示す。For example, s (f) and s' (f) are shown in FIG.
It is assumed that they are represented by (a) and (b), respectively. Where F
₁ , F ₂ , F ′ ₁ and F ′ ₂ are formant frequencies,
P ₁ , P ₂ , P ′ ₁ and P ′ ₂ are the powers of the corresponding formants. In this case, the frequency mapping function ω () corresponding to the conversion parameter 206 is a function for mapping F ₁ to F ′ ₁ and F ₂ to F ′ ₂ , respectively, and the amplitude mapping function m () is It is a function that maps P ₁ to P ′ ₁ and P ₂ to P ′ ₂ . An example of these functions ω () and m () is shown in FIG.
(A) and (b) respectively show.

【００３６】本実施形態において、音声合成単位情報変
換部２５での特徴パラメータの変換については、例えば
全ての音声合成単位について対応する変換パラメータを
求めて、各特徴パラメータ毎に異なる変換を行うことも
可能であり、また音声合成単位を幾つかのクラスに分類
して１つのクラスに対しては共通の変換パラメータを求
めて変換を行うことも可能である。例えば、複数のパワ
ースペクトルｓ_i （ｆ），（ｉ＝１，２，３，…，Ｎ）
を対応するパワースペクトルｓ′_i （ｆ），（ｉ＝１，
２，３，…，Ｎ）に変換する共通のマッピング関数ω
（），ｍ（）は、公知の最適化手法を用いて次式で
表される変換の誤差関数Ｅ（ω，ｍ）を最小化すること
によって求められる。In the present embodiment, regarding the conversion of the characteristic parameters in the speech synthesis unit information conversion section 25, for example, the corresponding conversion parameters are obtained for all the speech synthesis units, and different conversion may be performed for each characteristic parameter. It is also possible to classify the speech synthesis units into several classes and obtain common conversion parameters for one class for conversion. For example, a plurality of power spectra s _i (f), (i = 1, 2, 3, ..., N)
Corresponding power spectra s ′ _i (f), (i = 1,
2, 3, ..., N) common mapping function ω
() And m () are obtained by minimizing a conversion error function E (ω, m) represented by the following equation using a known optimization method.

【００３７】[0037]

【数１】 [Equation 1]

【００３８】ただし、ｆｓはサンプリング周波数であ
る。なお、声質の情報は主に母音に含まれるため、母音
を含む音声合成単位についてのみ特徴パラメータの変換
を行い、その他の合成単位については変換を行わずにそ
のまま用いることによって声質を変換することも可能で
ある。このようにすれば、合成単位情報変換部２５での
処理量を減らすことができる。However, fs is a sampling frequency. Since voice quality information is mainly contained in vowels, it is possible to convert the voice quality by converting feature parameters only for a voice synthesis unit including a vowel and using the other synthesis units without conversion. It is possible. By doing so, the amount of processing in the synthesis unit information conversion unit 25 can be reduced.

【００３９】また、合成単位情報変換部２５において次
式のような変換を行うことも可能である。ｓ′（ｆ）＝ｇ（ｆ）ｓ（ω（ｆ））（３）これは、振幅の変換を関数ｇ（ｆ）との積で実現した例
である。Further, the synthesis unit information conversion section 25 can also perform conversion as in the following equation. s' (f) = g (f) s (ω (f)) (3) This is an example in which the amplitude conversion is realized by the product of the function g (f).

【００４０】本実施形態では、第１の合成単位セット記
憶部２６に予め記憶されていた音声合成単位セットと、
声質を変換することによって生成され第２の合成単位セ
ット記憶部２７に記憶された音声合成単位セットのいず
れも、音声信号生成部１２で音声合成単位の特徴パラメ
ータ１０６として使用することができる。従って、合成
音声の声質を指定する声質選択情報１０８に従って第１
の合成単位セット記憶部２６と第２の合成単位セット記
憶部２７とを切り替えて用いることにより、多様な声質
で音声合成を行うことが可能である。In the present embodiment, the voice synthesis unit set stored in advance in the first synthesis unit set storage unit 26,
Any of the voice synthesis unit sets generated by converting the voice quality and stored in the second synthesis unit set storage unit 27 can be used as the feature parameter 106 of the voice synthesis unit in the voice signal generation unit 12. Therefore, according to the voice quality selection information 108 that specifies the voice quality of the synthesized voice, the first
By synthesizing and using the synthesizing unit set storage unit 26 and the second synthesizing unit set storage unit 27, it is possible to perform speech synthesis with various voice qualities.

【００４１】また、本実施形態を第２の実施形態と同様
に音韻呈示部を設けて音声認識部を用いない構成に変形
することも可能である。（第４の実施形態）図６に、本発明の音声合成方法を適
用した第４の実施形態に係る音声合成装置の構成を示
す。図６において、図３と同じ参照番号を付した要素は
図３と同じ機能を有するものとして説明を省略する。Further, as in the second embodiment, the present embodiment can be modified to a structure in which a phoneme presentation unit is provided and a voice recognition unit is not used. (Fourth Embodiment) FIG. 6 shows the configuration of a voice synthesizing apparatus according to a fourth embodiment to which the voice synthesizing method of the present invention is applied. In FIG. 6, elements given the same reference numerals as those in FIG. 3 have the same functions as those in FIG.

【００４２】第３の実施形態では、予め声質を変換した
音声合成単位セットを作って記憶したが、本実施形態で
は声質情報を変換パラメータの形で記憶しておき、音声
合成の際、その都度特徴パラメータの変換を行って音声
信号生成部に供給するようにしたものである。In the third embodiment, a voice synthesis unit set in which the voice quality is converted is created and stored in advance. In the present embodiment, the voice quality information is stored in the form of a conversion parameter, and each time voice synthesis is performed, the voice quality information is stored. The characteristic parameters are converted and supplied to the audio signal generator.

【００４３】すなわち、本実施形態では図３における声
質を変換した音声合成単位セットを記憶するための第２
の合成単位セット記憶部２７を除去し、代りに変換パラ
メータ記憶部３２が設けられている。この変換パラメー
タ記憶部３２は、スペクトル比較部２４で得られた変換
パラメータ３０８を記憶しており、音声合成を行う際に
使用する音声合成単位に対応した変換パラメータ２０６
を出力する。合成単位情報変換部２５は、この変換パラ
メータ２０６に基づき特徴パラメータ２０７を変換して
特徴パラメータ２０８を出力し、音声信号生成部１２に
供給する。That is, in the present embodiment, the second unit for storing the voice synthesis unit set obtained by converting the voice quality in FIG.
The synthesis unit set storage unit 27 is removed and a conversion parameter storage unit 32 is provided instead. The conversion parameter storage unit 32 stores the conversion parameter 308 obtained by the spectrum comparison unit 24, and the conversion parameter 206 corresponding to the voice synthesis unit used when performing voice synthesis.
Is output. The synthesis unit information conversion unit 25 converts the characteristic parameter 207 based on the conversion parameter 206 and outputs the characteristic parameter 208, which is supplied to the audio signal generation unit 12.

【００４４】本実施形態によると、図３における声質変
換した音声合成単位を記憶する合成単位セット記憶部２
７のメモリ量より変換パラメータ記憶部３２のメモリ量
が格段に小さくて済むことから、第３の実施形態と比較
して音声合成装置に必要なメモリ量を削減することがで
きる。According to this embodiment, the synthesis unit set storage unit 2 for storing the voice synthesis unit whose voice quality has been converted in FIG.
Since the memory amount of the conversion parameter storage unit 32 is much smaller than the memory amount of 7, it is possible to reduce the memory amount required for the speech synthesizer as compared with the third embodiment.

【００４５】（第５の実施形態）図７に、本発明の音声
合成方法を適用した第４の実施形態に係る音声合成装置
の構成を示す。図７において、図３と同じ参照番号を付
した要素は図３と同じ機能を有するものとして説明を省
略する。(Fifth Embodiment) FIG. 7 shows the arrangement of a speech synthesis apparatus according to the fourth embodiment to which the speech synthesis method of the present invention is applied. In FIG. 7, elements given the same reference numerals as those in FIG. 3 have the same functions as those in FIG.

【００４６】本実施形態では、声質入力部３１、変換パ
ラメータ生成部３０、変換パラメータ記憶部３６および
変換パラメータ入出力部３７が設けられている。声質入
力部３１は、ユーザが分かる尺度で合成音声の質をどの
ように変換するかを指定する声質情報３０１を出力す
る。ユーザが分かる尺度とは、例えば「明るい声で／暗
い声で」、「太い声で／細い声で」、「子供の声で／大
人の声で」などがあり、この他にも様々な尺度がある。
変換パラメータ生成部３０は、声質情報３０１に従って
声質変換を行うように変換パラメータ３０２を生成す
る。In this embodiment, a voice quality input unit 31, a conversion parameter generation unit 30, a conversion parameter storage unit 36, and a conversion parameter input / output unit 37 are provided. The voice quality input unit 31 outputs voice quality information 301 that specifies how to convert the quality of the synthesized voice on a scale that the user can understand. The scales that the user can understand include, for example, “bright voice / dark voice”, “thick voice / thin voice”, “child voice / adult voice”, and other various scales. There is.
The conversion parameter generation unit 30 generates a conversion parameter 302 so as to perform voice quality conversion according to the voice quality information 301.

【００４７】変換パラメータ記憶部３６は、変換パラメ
ータ３０２および変換パラメータ入出力部３７より入力
された変換パラメータ３１０を記憶し、声質選択情報３
０６に従って変換パラメータ３０９を出力する。変換パ
ラメータ入出力部３７は、記憶されている変換パラメー
タ３１１を外部に出力したり、あるいは外部から変換パ
ラメータ３１０を入力するものであり、例えば磁気記憶
装置、ＣＤ−ＲＯＭなどの記憶装置、またはネットワー
クなどとのインタフェースによって実現されるものであ
る。The conversion parameter storage unit 36 stores the conversion parameter 302 and the conversion parameter 310 input from the conversion parameter input / output unit 37, and the voice quality selection information 3
The conversion parameter 309 is output according to 06. The conversion parameter input / output unit 37 outputs the stored conversion parameter 311 to the outside or inputs the conversion parameter 310 from the outside, and is, for example, a storage device such as a magnetic storage device or a CD-ROM, or a network. It is realized by the interface with.

【００４８】このように本実施形態によれば、声質入力
部３１からの種々の方法の指示により入力された声質情
報３０１に基づいて変換パラメータ生成部３０で生成さ
れた変換パラメータ３０２を用いて所望の声質の音声を
合成することが可能であるとともに、生成された変換パ
ラメータ３０２を変換パラメータ記憶部３６に記憶し、
外部との変換パラメータの入出力が可能な構成とするこ
とによって、生成した変換パラメータを他のユーザに提
供したり、逆に他のユーザが生成した変換パラメータを
用いて声質変換を行うことも可能である。As described above, according to the present embodiment, the conversion parameter 302 generated by the conversion parameter generation unit 30 based on the voice quality information 301 input by the voice quality input unit 31 in accordance with the instruction of various methods is used. It is possible to synthesize a voice having the voice quality of, and the generated conversion parameter 302 is stored in the conversion parameter storage unit 36.
With the configuration that allows input and output of conversion parameters with the outside, it is also possible to provide the generated conversion parameters to other users and, conversely, perform voice quality conversion using the conversion parameters generated by other users. Is.

【００４９】（第６の実施形態）図８に、本発明の音声
合成方法を適用した第６の実施形態に係る音声合成装置
の構成を示す。図８において、図１および図７と同じ参
照番号を付した要素は図１および図７と同じ機能を有す
るものとして説明を省略する。(Sixth Embodiment) FIG. 8 shows the arrangement of a speech synthesis apparatus according to the sixth embodiment to which the speech synthesis method of the present invention is applied. In FIG. 8, elements given the same reference numerals as those in FIGS. 1 and 7 have the same functions as those in FIGS. 1 and 7, and a description thereof will be omitted.

【００５０】本実施形態は、入力音声に対して声質の変
化のさせ方を指定することにより、所望の声質の合成素
片セットを生成するものである。すなわち、音声入力部
２１から入力される音声信号１０１を音声分析部２３で
分析して得られる特徴パラメータ２０４を合成単位情報
変換部２５に入力し、声質入力部３１から入力される声
質情報３０１に基づいて、変換パラメータ生成部３０で
生成された変換パラメータ３０２に従って合成単位情報
変換部２５で声質を変換した音声合成単位の特徴パラメ
ータ２０８を生成し、これを合成単位セット記憶部１３
に記憶する。In the present embodiment, a synthesized voice segment set having a desired voice quality is generated by designating how to change the voice quality of an input voice. That is, the feature parameter 204 obtained by analyzing the voice signal 101 input from the voice input unit 21 by the voice analysis unit 23 is input to the synthesis unit information conversion unit 25, and the voice quality information 301 input from the voice quality input unit 31 is input. Based on the conversion parameter generation unit 30, the synthesis unit information conversion unit 25 generates a voice synthesis unit characteristic parameter 208 in which the voice quality is converted according to the conversion parameter 302 generated by the conversion parameter generation unit 30.
To memorize.

【００５１】このように本実施形態によれば、音声合成
単位の特徴パラメータ２０４を音声信号２０１に従って
生成した後、声質情報３０１に従った加工を行い、これ
を用いて合成音声信号１０７を生成することにより、合
成音声の声質を入力された音声信号２０１および声質情
報３０１の両方に従って多彩に変化させることができ
る。As described above, according to this embodiment, after the characteristic parameter 204 of the voice synthesis unit is generated according to the voice signal 201, the processing is performed according to the voice quality information 301, and the synthesized voice signal 107 is generated using this. As a result, the voice quality of the synthesized voice can be variously changed according to both the input voice signal 201 and voice quality information 301.

【００５２】（第７の実施形態）図９に、本発明の音声
合成方法を適用した第７の実施形態に係る音声合成装置
の構成を示す。図９において、図２および図７と同じ参
照番号を付した要素は図２および図７と同じ機能を有す
るものとして説明を省略する。(Seventh Embodiment) FIG. 9 shows the configuration of a speech synthesizing apparatus according to the seventh embodiment to which the speech synthesizing method of the present invention is applied. In FIG. 9, elements designated by the same reference numerals as those in FIGS. 2 and 7 have the same functions as those in FIGS. 2 and 7, and a description thereof will be omitted.

【００５３】本実施形態は、第１の合成単位セット記憶
部２６に予め記憶されている音声合成単位の特徴パラメ
ータ２０７を合成単位情報変換部２５で入力された音声
信号２０１の声質に近くなるように変換を行った後に、
この変換された特徴パラメータ３０３に対して、もう一
つの合成単位情報変換部３３において声質入力部３１か
らの声質情報３０１により指定した声質変換を行うこと
によって、所望の声質の合成素片セットである特徴パラ
メータ２０８を生成し、第２の合成単位セット記憶部２
７に記憶するようにしたものである。従って、本実施形
態によっても第６の実施形態と同様に、合成音声の声質
を入力された音声信号２０１および声質情報３０１の両
方に従って多彩に変化させることができる。In this embodiment, the characteristic parameter 207 of the voice synthesis unit stored in the first synthesis unit set storage unit 26 in advance is close to the voice quality of the voice signal 201 input by the synthesis unit information conversion unit 25. After converting to
The converted characteristic parameter 303 is subjected to the voice quality conversion specified by the voice quality information 301 from the voice quality input unit 31 in the other synthesis unit information conversion unit 33 to obtain a synthesized voice segment set of a desired voice quality. The characteristic parameter 208 is generated, and the second synthesis unit set storage unit 2 is generated.
It is designed to be stored in 7. Therefore, according to the present embodiment, similarly to the sixth embodiment, the voice quality of the synthesized voice can be variously changed according to both the input voice signal 201 and voice quality information 301.

【００５４】（第８の実施形態）図１０に、本発明の音
声合成方法を適用した第８の実施形態に係る音声合成装
置の構成を示す。図１０において、図３と同じ参照番号
を付した要素は図３と同じ機能を有するものとして説明
を省略する。(Eighth Embodiment) FIG. 10 shows the configuration of a speech synthesis apparatus according to an eighth embodiment to which the speech synthesis method of the present invention is applied. In FIG. 10, elements given the same reference numerals as those in FIG. 3 have the same functions as those in FIG.

【００５５】変換パラメータ記憶部３６は予め幾つかの
声質に対応する変換パラメータを記憶しており、声質選
択情報３０６に従って選択された変換パラメータ３０２
を出力する。本実施形態によれば、予め複数の声質に対
応する音声合成単位セットを記憶しておくよりも少ない
メモリ量で同様の声質変換を実現することができる。ま
た、本実施形態の変形として声質に対応するサンプル音
声を記憶して、それらの音声を呈示できる構成とし、声
質を選択させることも可能である。The conversion parameter storage unit 36 stores conversion parameters corresponding to several voice qualities in advance, and the conversion parameter 302 selected according to the voice quality selection information 306.
Is output. According to the present embodiment, similar voice quality conversion can be realized with a smaller memory amount than that in which voice synthesis unit sets corresponding to a plurality of voice qualities are stored in advance. Further, as a modification of the present embodiment, it is possible to store sample voices corresponding to voice qualities, present the voices, and select the voice qualities.

【００５６】（第９の実施形態）図１１に、本発明の音
声合成方法を適用した第９の実施形態に係る音声合成装
置の構成を示す。図１１において、図１０と同じ参照番
号を付した要素は図１０と同じ機能を有するものとして
説明を省略する。(Ninth Embodiment) FIG. 11 shows the configuration of a speech synthesizing apparatus according to the ninth embodiment to which the speech synthesizing method of the present invention is applied. 11, elements given the same reference numerals as those in FIG. 10 have the same functions as those in FIG.

【００５７】変換パラメータ記憶部３４は声質選択情報
３０５に従って複数の変換パラメータ３１２を選択して
出力する。変換パラメータ補間部３５は、これら複数の
変換パラメータ３１２について重み係数３０７に従って
補間を行って変換パラメータ３０２を出力する。本実施
形態によれば、予め変換パラメータが記憶されている幾
つかの声質を任意の割合で混ぜ合わせたような声質の音
声を合成することが可能である。The conversion parameter storage unit 34 selects and outputs a plurality of conversion parameters 312 according to the voice quality selection information 305. The conversion parameter interpolation unit 35 interpolates the plurality of conversion parameters 312 according to the weighting coefficient 307 and outputs the conversion parameters 302. According to this embodiment, it is possible to synthesize a voice having a voice quality in which some voice qualities whose conversion parameters are stored in advance are mixed at an arbitrary ratio.

【００５８】[0058]

【発明の効果】以上説明したように、本発明の音声合成
方法によれば、音声信号や声質情報を入力し、音声信号
の声質と同様の声質の音声、あるいは声質情報で指定さ
れた声質の音声を合成することが可能であり、テキスト
音声合成において合成音声の声質を任意に変更すること
ができる。As described above, according to the voice synthesizing method of the present invention, a voice signal or voice quality information is input and a voice having a voice quality similar to that of the voice signal or a voice quality specified by the voice quality information is obtained. It is possible to synthesize a voice, and the voice quality of the synthesized voice can be arbitrarily changed in the text voice synthesis.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る音声合成装置の
構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to a first embodiment of the present invention.

【図２】本発明の第２の実施形態に係る音声合成装置の
構成を示すブロック図FIG. 2 is a block diagram showing a configuration of a speech synthesizer according to a second embodiment of the present invention.

【図３】本発明の第３の実施形態に係る音声合成装置の
構成を示すブロック図FIG. 3 is a block diagram showing a configuration of a speech synthesizer according to a third embodiment of the present invention.

【図４】同実施形態における入力音声を分析して得られ
る特徴パラメータおよび記憶されている特徴パラメータ
中のスペクトルのパワースペクトルの例を示す図FIG. 4 is a diagram showing an example of a characteristic parameter obtained by analyzing an input voice in the same embodiment and a power spectrum of a spectrum in the stored characteristic parameters.

【図５】同実施形態における変換パラメータに対応する
周波数および振幅のマッピング関数の例を示す図FIG. 5 is a diagram showing an example of frequency and amplitude mapping functions corresponding to conversion parameters in the same embodiment.

【図６】本発明の第４の実施形態に係る音声合成装置の
構成を示すブロック図FIG. 6 is a block diagram showing the configuration of a speech synthesizer according to a fourth embodiment of the present invention.

【図７】本発明の第５の実施形態に係る音声合成装置の
構成を示すブロック図のブロック図FIG. 7 is a block diagram of a block diagram showing the configuration of a speech synthesizer according to a fifth embodiment of the present invention.

【図８】本発明の第６の実施形態に係る音声合成装置の
構成を示すブロック図のブロック図FIG. 8 is a block diagram of a block diagram showing a configuration of a speech synthesizer according to a sixth embodiment of the present invention.

【図９】本発明の第７の実施形態に係る音声合成装置の
構成を示すブロック図のブロック図FIG. 9 is a block diagram of a block diagram showing the configuration of a speech synthesizer according to a seventh embodiment of the present invention.

【図１０】本発明の第８の実施形態に係る音声合成装置
の構成を示すブロック図のブロック図FIG. 10 is a block diagram of a block diagram showing the configuration of a speech synthesizer according to an eighth embodiment of the present invention.

【図１１】本発明の第９の実施形態に係る音声合成装置
の構成を示すブロック図のブロック図FIG. 11 is a block diagram of a block diagram showing the configuration of a speech synthesizer according to a ninth embodiment of the present invention.

【図１２】従来の音声合成装置の構成を示すブロック図FIG. 12 is a block diagram showing the configuration of a conventional speech synthesizer.

【図１３】従来の複数の音声合成単位セットを記憶する
ことによって合成音声の声質を変える音声合成装置の構
成を示すブロック図FIG. 13 is a block diagram showing the configuration of a conventional speech synthesizer that changes the voice quality of synthesized speech by storing a plurality of speech synthesis unit sets.

【図１４】従来のピッチパターンを変化させることによ
って合成音声の声質を変える音声合成装置の構成を示す
ブロック図FIG. 14 is a block diagram showing the configuration of a conventional speech synthesizer that changes the voice quality of synthesized speech by changing the pitch pattern.

【図１５】従来のフィルタによって声質を変える音声合
成装置の構成を示すブロック図FIG. 15 is a block diagram showing a configuration of a speech synthesizer that changes a voice quality by a conventional filter.

[Explanation of symbols]

１０…言語処理部１１…韻律処理部１２…音声信号生成部１３，１４…合成単位セット記憶部１５…韻律処理部１６…フィルタ部２１…音声入力部２２…音声認識部２３…音声分析部２４…スペクトル比較部２５…合成単位情報変換部２６，２７…合成単位セット記憶部２８…音韻呈示部３０…変換パラメータ生成部３１…音声入力部３２…変換パラメータ記憶部３３…合成単位情報変換部３４…変換パラメータ記憶部３５…変換パラメータ補間部３６…変換パラメータ記憶部３７…変換パラメータ入出力部１０１…入力テキスト１０２…解析データ１０３…音韻記号列１０４…音韻継続時間長１０５…ピッチパターン１０６…音声合成単位の特徴パラメータ１０７…合成音声信号１０８…声質変換情報２０１…音声信号２０２…音韻記号２０３…音韻境界時刻情報２０４，２０５…特徴パラメータ２０６…変換パラメータ２０７，２０８…特徴パラメータ３０１…声質情報３０２…変換パラメータ３０３…特徴パラメータ３０５，３０６…声質選択情報３０７…重み係数３０８，３０９，３１０，３１１，３１２…変換パラメ
ータ10 ... Language processing section 11 ... Prosody processing section 12 ... Speech signal generation section 13, 14 ... Synthesis unit set storage section 15 ... Prosody processing section 16 ... Filter section 21 ... Speech input section 22 ... Speech recognition section 23 ... Speech analysis section 24 ... spectrum comparison unit 25 ... synthesis unit information conversion unit 26, 27 ... synthesis unit set storage unit 28 ... phoneme presentation unit 30 ... conversion parameter generation unit 31 ... speech input unit 32 ... conversion parameter storage unit 33 ... synthesis unit information conversion unit 34 ... conversion parameter storage unit 35 ... conversion parameter interpolation unit 36 ... conversion parameter storage unit 37 ... conversion parameter input / output unit 101 ... input text 102 ... analysis data 103 ... phoneme symbol string 104 ... phoneme duration 105 ... pitch pattern 106 ... speech Characteristic parameter 107 of synthesis unit ... Synthetic voice signal 108 ... Voice quality conversion information 201 ... Voice signal 202 ... Phonological symbol 203 ... Phonological boundary time information 204, 205 ... Feature parameter 206 ... Transformation parameter 207, 208 ... Feature parameter 301 ... Voice quality information 302 ... Transformation parameter 303 ... Feature parameter 305, 306 ... Voice quality selection information 307 ... Weighting coefficient 308 , 309, 310, 311, 312 ... Conversion parameters

───────────────────────────────────────────────────── フロントページの続き (72)発明者押切正浩神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者天田皇神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者山下明延神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Masahiro Oshikiri No. 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki City, Kanagawa Prefecture, Corporate Research & Development Center, Toshiba Corp. Town No. 1 Incorporated Toshiba Corporation R & D Center (72) Inventor Akinobu Yamashita Komukai Toshiba Town No. 1 Komukai Toshiba Town Kanagawa Prefecture Kanagawa Prefecture

Claims

[Claims]

1. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein a voice signal is input, and information of a plurality of voice synthesizing units is input in accordance with the voice signal. A method for synthesizing speech, comprising:

2. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein a voice signal is input, and a plurality of voice synthesizing stored in advance according to the voice signal. A speech synthesis method characterized by processing unit information.

3. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein voice quality information is input and a plurality of voice synthesizing stored in advance according to the voice quality information. A speech synthesis method characterized by processing unit information.

4. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein a voice signal and voice quality information are inputted, and a plurality of voice synthesizing is performed by said voice signal. A method for synthesizing voice, comprising generating unit information and processing information on a plurality of voice synthesizing units according to the voice quality information.

5. A voice synthesizing method for generating a synthetic voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein voice signal and voice quality information are input, and according to these voice signal and voice quality information. A voice synthesizing method characterized by processing information of a plurality of voice synthesizing units stored in advance.

6. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from information of a plurality of voice synthesizing units, wherein a plurality of voice quality information is stored, and the voice synthesizing is stored in advance in accordance with the voice quality information. A speech synthesis method characterized by processing unit information.

7. A voice synthesizing method for generating a synthesized voice signal by connecting information selected from a plurality of stored voice synthesizing unit information, storing a plurality of voice quality information and interpolating the plurality of voice quality information. A method of synthesizing voice, characterized by processing information of a voice synthesis unit according to voice quality information generated by