JPH11249679A

JPH11249679A - Voice synthesizer

Info

Publication number: JPH11249679A
Application number: JP10052361A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-03-04
Filing date: 1998-03-04
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To bring a synthetic voice when a routine phrase is read out from an inputted character line with the synthetic voice near to a natural voice. SOLUTION: In a voice synthesizer provided with a phoneme continuous time storage part M2, a basic frequency storage part M3, an amplitude storage part M4 respectively storing a phoneme continuous time length series, a basic frequency series and an amplitude or power series extracted from the phrase uttered by a human being, and provided with a phoneme piece storage part M1 storing a phoneme or a phoneme chain as a phoneme unit, and phoneme information as the phoneme piece, a phoneme piece series read out from a phoneme piece storage part 2 is expanded/contracted according to the phoneme continuous time length series read out from a rhythm pattern storage part 1 to be connected, and a basic frequency is added according to the basic frequency series read out from the rhythm pattern storage part 1, and at this time, the normalization is performed so as to enter in the basic frequency range, and further, the amplitude is added according to the amplitude or power series read out from a rhythm pattern storage part, and the voice is synthesized.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、定型文章と非定型
文章の混在するテキストを音声に変換するような用途に
利用し得る音声合成装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizing apparatus which can be used for converting text in which a fixed text and a non-fixed text are mixed into speech.

【０００２】[0002]

【従来の技術】特開平５−２７７８９号公報には、「音
声合成装置」が記載されている。これは、規則合成、分
析合成、録音再生など異なる方式の合成音声を混在して
用いる場合に、接続部分でオーバーラップ処理を行うも
のである。特開平８−６３１８７号公報には、定型部分
に対して自然音声から抽出した基本周波数と音韻時間長
を用いる「音声合成装置」が開示されている。2. Description of the Related Art Japanese Unexamined Patent Publication No. Hei 5-27789 discloses a "voice synthesizer". This is to perform overlap processing at a connection portion when mixed voices of different systems such as rule synthesis, analysis synthesis, and recording / reproduction are used together. Japanese Patent Application Laid-Open No. 8-63187 discloses a "speech synthesis apparatus" that uses a fundamental frequency and a phoneme time length extracted from natural speech for a fixed part.

【０００３】[0003]

【発明が解決しようとする課題】従来、音声合成装置の
合成方式には録音編集方式と規則合成方式がある。前者
はアナウンサーなどがフレーズ毎に音声を登録してお
き、これを適宜選択結合してメッセージを作成するもの
で、肉声に近い良好な音声が得られる可能性がある反
面、データ量が多い、登録外のフレーズには対応できな
い、新たにフレーズを追加するために同一話者の確保が
必要であるなどの問題がある。他方、後者は音素や音節
などの細かい単位で音声データを蓄積して任意語彙の合
成を可能とするものであるが、音質的に録音編集方式に
劣り、特に基本周波数、音韻継続時間長、振幅などの韻
律パターンを規則によって付与するためどうしても機械
的で不自然なものになる。Conventionally, there are a recording and editing system and a rule synthesizing system as a synthesizing system of a voice synthesizing device. In the former, an announcer or the like registers voice for each phrase and selects and combines them as appropriate to create a message.There is a possibility that a good voice close to the real voice may be obtained, but the data volume is large. There are problems such as the inability to cope with outside phrases and the need to secure the same speaker to add a new phrase. On the other hand, the latter accumulates speech data in fine units such as phonemes and syllables and enables synthesis of arbitrary vocabulary.However, the sound quality is inferior to the recording and editing method, especially the fundamental frequency, phoneme duration, amplitude, etc. will inevitably those mechanical and unnatural for the prosody pattern imparted by the rules, such as.

【０００４】このため任意語彙の出力が不要な定型的な
メッセージには音質の良い録音編集方式が用いられ、テ
キストからの音声変換が必要な場面では規則合成が用い
られる。しかし、カーナビゲーションの音声案内で定型
的メッセージの中に地名が埋め込まれるなど、定型文の
中に一部任意語彙が埋め込まれるようなアプリケーショ
ンも多く存在する。このような場合、ごく一部の任意語
彙のために全体の音質を落として規則合成を採用する
か、任意語彙の出力を諦めて録音編集方式を用いるか、
あるいは定型部分を録音編集で行い任意語彙部分のみ規
則合成で行うという混在方式を採るかの選択をせざるを
得ない。For this reason, a standard message that does not require output of an arbitrary vocabulary uses a recording / editing method with good sound quality, and a rule synthesis is used in a case where voice conversion from text is required. However, there are many applications in which an arbitrary vocabulary is partially embedded in a fixed sentence, such as a place name being embedded in a fixed message by voice guidance of car navigation. In such a case, for all of the optional vocabulary, reduce the overall sound quality and adopt rule synthesis, or give up the output of the optional vocabulary and use the recording and editing method,
Alternatively, a choice must be made as to whether to adopt a mixed system in which the standard part is recorded and edited and only the arbitrary vocabulary part is subjected to rule synthesis.

【０００５】録音編集と規則合成を混在させる場合の問
題点は、２つの方式で出力音声の声質がまったく異なる
ため、聞いていて違和感があるばかりでなく非常に聞取
り難いものとなる点である。前記特開平５−２７７８９
号公報に記載された「音声合成装置」ではこの問題に対
し、異なる方式間の出力音声をオーバーラップさせて接
続することで対処している。しかし、このようにしても
定型部分と任意語部分で話者が変わってしまうことは避
けられず基本的な問題は解決していない。また、オーバ
ーラップ部分では２人の話者が同時に話しているように
なるため聞取り難くなる可能性がある。[0005] A problem when recording editing and rule synthesis are mixed is that the two systems have completely different output voice qualities, so that not only is it uncomfortable to hear but also very difficult to hear. JP-A-5-27789
The "voice synthesizer" described in Japanese Patent Application Laid-Open No. HEI 10-125555 addresses this problem by overlapping output voices between different systems and connecting them. However, even in this case, it is inevitable that the speaker changes between the fixed part and the arbitrary word part, and the basic problem is not solved. Further, in the overlap portion, two speakers may be talking at the same time, so that it may be difficult to hear.

【０００６】これに対し、前記特開平８−６３１８７号
公報に記載された「音声合成装置」では、定型文にも規
則合成的に音素あるいは音節等をつないで音韻パラメー
タを生成し、これに自然音声から抽出した基本周波数及
び音韻継続時間長を付与することにより、任意語部分と
の話者連続性を保持しつつ自然性を向上している。しか
し、様々な韻律パラメータや音韻パラメータは相互に関
連があり、全体としてバランスを取るように構築されて
いる規則群の一部（基本周波数と音韻継続時間長）だけ
を全く異なる話者特性、発声様式の音声から移植するこ
とは思わぬ不整合を生んで全体の自然性を損なう可能性
がある。例えば、文末にかけて基本周波数は大きく下が
ることがあるが、この時は振幅も十分小さくしないと不
自然に低い声が目立つことになる。また、このような場
合、本来口の開きも小さくなり音声スペクトル自体の変
化があるはずであり、あまりに明瞭な音素片データを用
いることも違和感を生む。さらに音素片データには対応
可能な基本周波数の範囲が存在し、自然音声の基本周波
数パターンはこれよりもダイナミックレンジが広いのが
普通であるため、無理な基本周波数付与により明瞭性の
低下を招く恐れがある。そこで、本発明はこのような問
題点を解決し、定型的フレーズの合成音声の自然性を向
上することを目的とする。On the other hand, in the "speech synthesis apparatus" described in Japanese Patent Laid-Open No. 8-63187, phoneme parameters are generated by connecting phonemes or syllables and the like to a fixed sentence in a regular synthesis. By adding the fundamental frequency and phoneme duration extracted from the voice, naturalness is improved while maintaining speaker continuity with an arbitrary word portion. However, the various prosodic and phonological parameters are related to each other, and only a part (fundamental frequency and phonological duration) of a set of rules that are constructed to balance as a whole has completely different speaker characteristics and utterances. Porting from stylized speech can create unexpected inconsistencies and undermine overall naturalness. For example, the fundamental frequency may drop significantly toward the end of the sentence. At this time, if the amplitude is not sufficiently reduced, an unnaturally low voice will stand out. In such a case, the opening of the mouth is originally small, and the voice spectrum itself should be changed. Using too clear phoneme segment data also gives a sense of incongruity. In addition, there is a range of fundamental frequencies that can be handled in phoneme segment data, and the fundamental frequency pattern of natural speech usually has a wider dynamic range than this. There is fear. Therefore, an object of the present invention is to solve such a problem and improve the naturalness of a synthesized speech of a fixed phrase.

【０００７】[0007]

【課題を解決するための手段】請求項１の発明は、人間
が発声したフレーズから抽出した音韻継続時間長系列、
基本周波数系列、振幅あるいはパワーの系列をそれぞれ
記憶した、音韻継続時間記憶部、基本周波数記憶部、振
幅記憶部、及び、音素あるいは音素連鎖を音韻単位とし
音韻情報を音素片として記憶する音素片記憶部を具備
し、入力文字列にしたがって音素片記憶部から読み出し
並べた音素片系列を、韻律パターン記憶部から読み出し
た音韻継続時間長系列にしたがって伸縮して接続し、韻
律パターン記憶部から読み出した基本周波数系列にした
がって基本周波数付与を行い、韻律パターン記憶部から
読み出した振幅又はパワー系列にしたがって振幅付与を
行って音声を合成する音声合成装置である。According to the first aspect of the present invention, there is provided a phoneme duration sequence extracted from a phrase uttered by a human,
A phoneme duration storage unit, a fundamental frequency storage unit, an amplitude storage unit, and a phoneme fragment storage that stores a phoneme or a phoneme chain as a phoneme unit and stores phoneme information as a phoneme segment. The phoneme sequence read and arranged from the phoneme storage according to the input character string is expanded and contracted according to the phoneme duration sequence read from the prosody pattern storage, and read from the prosody pattern storage. This is a speech synthesizer that performs basic frequency assignment according to a basic frequency sequence, and performs amplitude assignment according to the amplitude or power sequence read from the prosody pattern storage unit to synthesize speech.

【０００８】請求項２の発明は、請求項１に記載された
音声合成装置において、音素片記憶部に記憶した音素片
セットによって無理なく合成できる基本周波数の範囲を
記憶する基本周波数範囲記憶部を具備し、基本周波数記
憶部から読み出した基本周波数系列に対して、基本周波
数範囲記憶部から読み出した基本周波数範囲内に収まる
ように正規化を施して音声を合成する音声合成装置であ
る。According to a second aspect of the present invention, in the voice synthesizing apparatus according to the first aspect, a basic frequency range storage unit that stores a range of a basic frequency that can be reasonably synthesized by the phoneme unit set stored in the phoneme unit storage unit. The voice synthesizer includes a basic frequency sequence read from the basic frequency storage unit, and performs normalization so as to fall within the basic frequency range read from the basic frequency range storage unit to synthesize a voice.

【０００９】請求項３の発明は、請求項１に記載された
音声合成装置において、前記音素片記憶部は一つの音韻
単位に対して適応すべき基本周波数範囲毎に複数の音素
片を記憶しており、前記基本周波数記憶部から読み出し
た基本周波数に対応した音素片を選択的に用いて音声合
成を行う音声合成装置である。According to a third aspect of the present invention, in the speech synthesizer according to the first aspect, the phoneme unit storage unit stores a plurality of phoneme units for each fundamental frequency range to be adapted to one phoneme unit. A speech synthesis apparatus for selectively using speech segments corresponding to the fundamental frequency read from the fundamental frequency storage unit to perform speech synthesis.

【００１０】請求項４の発明は、請求項１に記載された
音声合成装置において、前記音素片記憶部は一つの音韻
単位に対して適応すべき振幅範囲毎に複数の音素片を記
憶しており、基本周波数記憶部から読み出した振幅ある
いはパワーに対応した音素片を選択的に用いて音声合成
を行う音声合成装置である。According to a fourth aspect of the present invention, in the voice synthesizing apparatus according to the first aspect, the phoneme unit storage unit stores a plurality of phoneme units for each amplitude range to be adapted to one phoneme unit. This is a speech synthesizer that performs speech synthesis by selectively using phonemes corresponding to the amplitude or power read from the fundamental frequency storage unit.

【００１１】請求項５の発明は、請求項１に記載された
音声合成装置において、前記音素片記憶部は一つの音韻
単位に対して適応すべき音韻継続時間長範囲毎に複数の
音素片を記憶しており、音韻継続時間長記憶部から読み
出した音韻継続時間長に対応した音素片を選択的に用い
て音声合成を行う音声合成装置である。According to a fifth aspect of the present invention, in the voice synthesizing apparatus according to the first aspect, the phoneme segment storage unit stores a plurality of phoneme segments for each phoneme duration range to be adapted to one phoneme unit. This is a speech synthesizer that performs speech synthesis by selectively using a phoneme segment corresponding to the phoneme duration stored in the phoneme duration stored in the phoneme duration storage unit.

【００１２】[0012]

【発明の実施の形態】本発明の音声合成装置の一実施例
について説明する。図１は、この実施例における構成を
示す。図１中、Ｍ１は音素あるいは音素連鎖を音韻単位
とし音韻情報を音素片として記憶する音素片記憶部であ
って、一つの音韻単位に対して適応すべき基本周波数範
囲毎に複数の音素片を記憶しており、基本周波数範囲記
憶部Ｍ５から読み出した基本周波数に対応した音素片を
選択的に用いる。また、Ｍ２，Ｍ３，Ｍ４はそれぞれ人
間が発声したフレーズから抽出した音韻継続時間長系
列、基本周波数系列、振幅あるいはパワーの系列をそれ
ぞれ記憶した、音韻継続時間長記憶部、基本周波数記憶
部、振幅記憶部を示している。Ｍ５は基本周波数範囲記
憶部であって、音素片記憶部Ｍ１に記憶された音素片セ
ットによって無理なく合成できる基本周波数の範囲を記
憶している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the speech synthesizer according to the present invention will be described. FIG. 1 shows a configuration in this embodiment. In FIG. 1, M1 is a phoneme segment storage unit that stores phonemes or phoneme chains as phoneme units and phoneme information as phoneme segments, and stores a plurality of phoneme segments for each basic frequency range to be applied to one phoneme unit. The phoneme segment corresponding to the fundamental frequency stored and read from the fundamental frequency range storage unit M5 is selectively used. M2, M3, and M4 are respectively a phoneme duration time storage unit, a fundamental frequency storage unit, a fundamental frequency storage unit, and a phoneme duration time length sequence, a fundamental frequency sequence, and an amplitude or power sequence extracted from a phrase uttered by a human. 4 shows a storage unit. M5 is a fundamental frequency range storage unit, which stores a range of fundamental frequencies that can be reasonably synthesized by the phoneme unit set stored in the phoneme unit storage unit M1.

【００１３】各部の動作について以下に説明する。韻律
パターン選択部１は入力される韻律パターンｌＤ（韻律
パターンを識別する識別子：例えば、番号等によりそれ
に対応するパターンを識別するもの）から音韻継続時間
長、基本周波数、振幅の各パターンを選択する。音素片
選択部２は入力文字列から音素片ラベルを得、また韻律
パターン選択部１で選択された音韻継続時間長、基本周
波数、振幅の各韻律パターンの範囲を参考にして、これ
らの情報を元に音素片記憶部Ｍ１から必要な音素片を検
索する。The operation of each section will be described below. The prosody pattern selection unit 1 selects each of the phoneme duration time, the fundamental frequency, and the amplitude from the input prosody pattern ID (identifier for identifying the prosody pattern: for example, a pattern corresponding to the number by a number or the like). . The phoneme unit selection unit 2 obtains a phoneme unit label from the input character string, and refers to the range of each prosody pattern of the phoneme duration, the fundamental frequency, and the amplitude selected by the prosody pattern selection unit 1 to refer to these information. A necessary phoneme segment is retrieved from the phoneme segment storage unit M1.

【００１４】図２は、音素片記憶部Ｍ１のデータ構造の
一例を示したものである。同一音素ラベル、例えば、
“ア”に対して異なる適用可能韻律パラメータ範囲のデ
ータを複数記憶している。ここに示すように韻律パラメ
ータ範囲は、例えば、時間長範囲の長短、増幅範囲の大
小などカテゴライズされたものでも、基本周波数範囲の
ように下限値と上限値を示すものでもよく、またそれら
が混在していても構わない。表中のデータ欄には実際に
は音素片データ、波形データ及びスペクトルパラメータ
が格納される。この中からラベルの一致するもので韻律
パラメータ範囲が最も近いものを選択する。FIG. 2 shows an example of the data structure of the phoneme segment storage unit M1. Same phoneme label, for example,
A plurality of data of different applicable prosodic parameter ranges are stored for “A”. As shown here, the prosody parameter range may be categorized, for example, the length of the time length range, the size of the amplification range, or may indicate the lower limit and upper limit like the fundamental frequency range, or may be a mixture of them. You can do it. In the data column in the table, phoneme piece data, waveform data, and spectrum parameters are actually stored. From these, the one with the same prosody parameter range that has the same label is selected.

【００１５】音素片伸縮接続部３は、入力文字列にした
がって音素片選択部２により選択された音素片系列を、
韻律パターン選択部１で選択された音韻継続時間長の範
囲を参考にして、音韻継続時間長記憶部Ｍ２から選択さ
れた音韻継続時間長パターンに従って伸縮してそれぞれ
の音素片を接続する。基本周波数範囲記憶部Ｍ５には、
音素片記憶部Ｍ１に記憶された音素片データセット全体
によってカバーされる基本周波数範囲が記憶されてお
り、基本周波数パターン正規化部４は、基本周波数範囲
記憶部Ｍ５から選択された基本周波数パターンがこの範
囲を逸脱している場合に、選択された基本周波数パター
ンをこの範囲に合わせて正規化する。基本周波数付与部
５は、音素片伸縮接続部３で接続された音素片系列パタ
ーンに対して、正規化された基本周波数パターンを付与
する。The phoneme unit expansion / contraction connection unit 3 converts the phoneme unit sequence selected by the phoneme unit selection unit 2 according to the input character string into
Referring to the range of the phoneme duration selected by the prosody pattern selection unit 1, the phoneme segments are connected by expanding and contracting according to the phoneme duration pattern selected from the phoneme duration storage unit M2. In the fundamental frequency range storage unit M5,
The basic frequency range covered by the entire phoneme unit data set stored in the phoneme unit storage unit M1 is stored, and the basic frequency pattern normalizing unit 4 stores the basic frequency pattern selected from the basic frequency range storage unit M5. If it is outside of this range, the selected fundamental frequency pattern is normalized according to this range. The fundamental frequency assigning unit 5 assigns a normalized fundamental frequency pattern to the phoneme unit sequence pattern connected by the phoneme unit expansion / contraction connection unit 3.

【００１６】振幅付与部６は接続されかつ基本周波数が
付与された音素片系列パターンに対し、韻律パターン選
択部１で選択された振幅パターンの範囲を参考にして、
振幅記憶部Ｍ４から選択された振幅パターンを付与して
合成音声を作成する。なお、音素片の伸縮及び接続、基
本周波数及び振幅の付与に関しては規則音声合成の一般
的技術を用いることが出来るため、ここでは詳細な説明
は省略する。The amplitude assigning unit 6 refers to the connected phoneme unit sequence pattern to which the fundamental frequency has been assigned by referring to the range of the amplitude pattern selected by the prosody pattern selecting unit 1.
A synthesized voice is created by adding the amplitude pattern selected from the amplitude storage unit M4. Note that a general technique of regular speech synthesis can be used for the expansion and contraction and connection of the phonemic segments, and the provision of the fundamental frequency and the amplitude, and thus detailed description is omitted here.

【００１７】[0017]

【発明の効果】請求項１に対応する効果：基本的な韻律
パラメータである基本周波数、音韻継続時間長、振幅の
３つを同じ親善音声フレーズから抽出したものを使用す
ることによって韻律パラメータ間の不整合を抑え、合成
音声の自然性を向上することができる。According to the first aspect of the present invention, three basic prosody parameters, namely, a fundamental frequency, a phoneme duration, and an amplitude, are extracted from the same friendly speech phrase, thereby obtaining a prosodic parameter between the prosodic parameters. The mismatch can be suppressed, and the naturalness of the synthesized speech can be improved.

【００１８】請求項２に対応する効果：音素片データベ
ースが対応可能な範囲に基本周波数を正規化することに
よって、無理な基本周波数付与を防ぎ合成音声の明瞭性
の低下を防ぐことができる。According to the second aspect, by normalizing the fundamental frequency to a range that can be supported by the phoneme unit database, it is possible to prevent the fundamental frequency from being applied unreasonably and to prevent the clarity of the synthesized speech from lowering.

【００１９】請求項３に対応する効果：付与すべき基本
周波数に対応する音素片データを選択的に用いることに
より、ダイナミックレンジの広い自然音声の基本周波数
を付与することが可能となり、明瞭性を落とすことなく
自然性を向上することが出来る。According to the third aspect of the present invention, by selectively using the speech segment data corresponding to the fundamental frequency to be imparted, a fundamental frequency of a natural voice having a wide dynamic range can be imparted, and clarity can be improved. Naturalness can be improved without dropping.

【００２０】請求項４に対応する効果：同じ音素片デー
タを使い振幅だけを変化させると、スピーカーのボリュ
ームを操作したような機械的な変化となり自然性を損な
うが、付与すべき振幅に対応する音素片データを選択的
に用いることにより、声の大小によって音韻特性に変化
を付けることが出来るため、自然音声のダイナミックレ
ンジの広い振幅変化が付与可能となり、より人間の音声
に近い合成音声が得られる。According to the fourth aspect of the present invention, if only the amplitude is changed using the same phoneme data, a mechanical change occurs as if the volume of a speaker is manipulated, and the naturalness is impaired. By selectively using phoneme segment data, it is possible to change the phonological characteristics according to the size of the voice, and it is possible to apply a wide amplitude change in the dynamic range of natural voice, and obtain a synthesized voice closer to human voice. Can be

【００２１】請求項５に対応する効果：設定すべき音韻
継続時間長に対応する音素片データを選択的に用いるこ
とにより、音素片の無理な切りつめによる子音の特徴的
部分の欠落や、短い母音定常部分の繰り返しによる機械
的な音質を避けることが出来、明瞭性を損なうことな
く、自然音声のダイナミックレンジの広いテンポ変化が
付与可能となり、より人間の音声に近い合成音声を得ら
れる。According to the fifth aspect of the present invention, by selectively using phoneme piece data corresponding to the phoneme duration to be set, a characteristic part of a consonant is lost due to excessive truncation of a phoneme piece, or a short vowel is generated. It is possible to avoid the mechanical sound quality due to the repetition of the stationary part, and it is possible to give a tempo change with a wide dynamic range of the natural sound without deteriorating the clarity, and to obtain a synthesized voice closer to human voice.

[Brief description of the drawings]

【図１】本発明による音声合成装置の構成を表すブロ
ック図である。FIG. 1 is a block diagram illustrating a configuration of a speech synthesis device according to the present invention.

【図２】図１に示す音素片記憶部のデータ構造を示し
たものである。FIG. 2 shows a data structure of a phoneme segment storage unit shown in FIG.

[Explanation of symbols]

１…韻律パターン選択部、２…音素片選択部、３…音素
片伸縮接続部、４…基本周波数パターン正規化部、５…
基本周波数付与部、６…振幅付与部、Ｍ１…音素片記憶
部、Ｍ２…音韻継続時間長記憶部、Ｍ３…基本周波数記
憶部、Ｍ４…振幅記憶部、Ｍ５…基本周波数範囲記憶
部。DESCRIPTION OF SYMBOLS 1 ... Prosody pattern selection part, 2 ... phoneme piece selection part, 3 ... phoneme piece expansion / contraction connection part, 4 ... Fundamental frequency pattern normalization part, 5 ...
Basic frequency assigning unit, 6: amplitude assigning unit, M1: phoneme unit storing unit, M2: phoneme duration storing unit, M3: fundamental frequency storing unit, M4: amplitude storing unit, M5: basic frequency range storing unit.

Claims

[Claims]

1. A phoneme duration storage unit, a fundamental frequency storage unit, an amplitude storage unit that stores a phoneme duration sequence, a fundamental frequency sequence, and an amplitude or power sequence extracted from a phrase uttered by a human. A phoneme unit storage unit that stores phonemes or phoneme chains in phoneme units and stores phoneme information as phoneme segments. A phoneme unit sequence read from the phoneme unit storage unit according to the input character string, and a phoneme continuation read from the prosodic pattern storage unit. Connect by expanding and contracting according to the time length sequence, perform the fundamental frequency assignment according to the fundamental frequency sequence read from the prosody pattern storage unit, and synthesize the voice by applying the amplitude according to the amplitude or power sequence read from the prosody pattern storage unit A speech synthesizer characterized in that:

2. The speech synthesizer according to claim 1, further comprising: a fundamental frequency range storage unit that stores a range of fundamental frequencies that can be reasonably synthesized by the phoneme unit set stored in the phoneme unit storage unit. A speech synthesizer characterized by performing normalization on a fundamental frequency sequence read from a storage unit so as to fall within the fundamental frequency range read from the fundamental frequency range storage unit to synthesize speech.

3. The speech synthesizer according to claim 1, wherein said phoneme segment storage unit stores a plurality of phoneme segments for each fundamental frequency range to be adapted to one phoneme unit. A speech synthesis apparatus characterized in that speech synthesis is performed by selectively using phonemes corresponding to a fundamental frequency read from a frequency storage unit.

4. The speech synthesizer according to claim 1, wherein said phoneme unit storage unit stores a plurality of phoneme units for each amplitude range to be adapted to one phoneme unit. A speech synthesis apparatus characterized in that speech synthesis is performed by selectively using phonemes corresponding to amplitude or power read from a unit.

5. The speech synthesizer according to claim 1, wherein said phoneme segment storage unit stores a plurality of phoneme segments for each phoneme duration length range to be adapted to one phoneme unit. A speech synthesizer characterized in that speech synthesis is performed by selectively using phoneme segments corresponding to the phoneme duration read out from the phoneme duration storage.