JPH0887297A

JPH0887297A - Voice synthesis system

Info

Publication number: JPH0887297A
Application number: JP6225396A
Authority: JP
Inventors: Tatsuro Matsumoto; 達郎松本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-09-20
Filing date: 1994-09-20
Publication date: 1996-04-02

Abstract

PURPOSE: To obtain a highly natural synthesized voice having clearness and rythm close to an actual voice. CONSTITUTION: When a text or a phonetic symbol column is inputted, a voice information retrieving section 1 analyzes an actual voice, retrieves whether or not uttering contents matched with the inputted text or an inputted voice symbol column are in existence in a voice information database 2 stored with an extracted voice feature amount and the corresponding uttering contents and when matched uttering contents exist, the contents are transmitted to a synthesized voice generating section 3. The section 3 generates a synthesized voice by performing a processing corresponding to the voice information. When matched uttering contents do not exist, the inputted text or the inputted phonetic symbol column is transmitted to the section 3 as it is, which generates a synthesized voice based on a synthesized voice generating rule 4.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は任意の入力テキスト、又
は入力表音記号列を合成音声に変換し、出力する音声合
成システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing system for converting an arbitrary input text or an input phonetic symbol string into a synthetic voice and outputting it.

【０００２】[0002]

【従来の技術】図１０は従来の音声合成システムの主要
構成を示すブロック図であり、任意の入力テキストを合
成音声に変換し、出力する場合を示している。図中３０
４は言語処理部を示しており、言語処理部３０４はテキ
スト入力に対し、単語辞書３０３を適用して入力テキス
ト中における単語の読み、アクセントの位置を決定する
と共に、文の構造を解析し、イントネーションを制御す
るための制御情報を作成し、これを音声長パターン生成
部３００へ渡す。通常単語の読み、アクセント位置、イ
ントネーション制御情報はいずれも表音記号列と呼ばれ
る記号列で表現される。2. Description of the Related Art FIG. 10 is a block diagram showing a main configuration of a conventional speech synthesis system, showing a case where an arbitrary input text is converted into synthetic speech and output. 30 in the figure
Reference numeral 4 denotes a language processing unit. The language processing unit 304 applies a word dictionary 303 to text input to determine the reading and accent positions of words in the input text, and analyzes the sentence structure. Control information for controlling the intonation is created and passed to the voice length pattern generation unit 300. Usually, reading of a word, accent position, and intonation control information are all represented by a symbol string called a phonetic symbol string.

【０００３】音声長パターン生成部３００は、音声長生
成規則４００を適用して表音記号列を各音の長さ情報
（音声長）パターンに変換し、この音声長パターン及び
表音記号列をＦＯパターン生成部３０１へ渡す。ＦＯパ
ターン生成部３０１は、ＦＯ生成規則４０１を適用して
表音記号列と音声長パターンとに基づきアクセント，イ
ントネーションに相当する物理量である基本周波数（Ｆ
Ｏ）の時間変化パターン、所謂ＦＯパターンを生成し、
表音記号列，音声長パターン及びＦＯパターンを音声パ
ラメータパターン生成部３１０へ渡す。The voice length pattern generation unit 300 applies the voice length generation rule 400 to convert a phonetic symbol string into a length information (voice length) pattern of each sound, and outputs the voice length pattern and the phonetic symbol string. It is passed to the FO pattern generation unit 301. The FO pattern generation unit 301 applies the FO generation rule 401 and based on the phonetic symbol string and the voice length pattern, a fundamental frequency (F) which is a physical quantity corresponding to accent and intonation.
O) time change pattern, so-called FO pattern is generated,
The phonetic symbol string, the voice length pattern, and the FO pattern are passed to the voice parameter pattern generation unit 310.

【０００４】音声パラメータパターン生成部３１０は、
音声パラメータ生成規則４１０を適用して表音記号列
（特に読み情報）、音声長パターン，ＦＯパターンに基
づき音声パラメータパターンを生成し、これを波形生成
部３１１へ渡す。なおここに音声パラメータパターンは
通常ＲＡＲＣＯＲ係数（声道断面積に対応する係数）、
又はフォルマント（声道共振）周波数等の音声特徴量と
音源信号の時間変化パターンである。また音声波形の短
い単位を接続して合成音声を生成する、所謂波形編集方
式の場合は、音声波形の短い単位である素片波形の種
類，接続タイミング等の接続情報である。The voice parameter pattern generator 310 is
The voice parameter generation rule 410 is applied to generate a voice parameter pattern based on the phonetic symbol string (especially reading information), the voice length pattern, and the FO pattern, and passes this to the waveform generation unit 311. Here, the voice parameter pattern is usually the RARCOR coefficient (coefficient corresponding to the vocal tract cross-sectional area),
Alternatively, it is a temporal change pattern of a sound source signal such as a formant (vocal tract resonance) frequency and a sound source signal. Further, in the case of a so-called waveform editing method in which short speech waveform units are connected to generate a synthetic speech, the information is connection information such as the type of segment waveform, which is a short speech waveform unit, and connection timing.

【０００５】波形生成部３１１は、渡された音声パラメ
ータパターンに基づいて実際のディジタル音声波形を生
成し、これをＤＡ変換部５へ渡す。例えば音声パラメー
タパターンがＰＡＲＣＯＲ係数である場合には、波形生
成部３１１はＰＡＲＣＯＲフィルタと音源生成部とで構
成され、フィルタを音源信号で駆動する処理を行う。ま
た、波形編集方式の場合は素片波形を適切な位置に配
し、これらを滑らかに接続する処理を行う。ＤＡ変換部
５は波形生成部３１１で生成されたディジタル音声波形
をアナログ音声波形に変換し、合成音声として出力す
る。The waveform generator 311 generates an actual digital voice waveform based on the passed voice parameter pattern and passes it to the DA converter 5. For example, when the voice parameter pattern is a PARCOR coefficient, the waveform generation unit 311 includes a PARCOR filter and a sound source generation unit, and performs a process of driving the filter with a sound source signal. Further, in the case of the waveform editing method, the segment waveforms are arranged at appropriate positions and a process for smoothly connecting them is performed. The DA converter 5 converts the digital voice waveform generated by the waveform generator 311 into an analog voice waveform and outputs it as a synthesized voice.

【０００６】[0006]

【発明が解決しようとする課題】ところでこのような従
来の音声合成システムでは、全ての音響的，韻律的処理
を予め用意してある各規則に基づいて行っているが、圧
縮された状態の音響情報に基づいて合成音声を生成する
ため、合成音声の品質の劣化を免れ得ず、また規則によ
って音声長パターン，ＦＯパターン及び音声パラメータ
パターンを生成するため、不自然なアクセント，イント
ネーションになる等の問題があった。By the way, in such a conventional speech synthesis system, all acoustic and prosodic processing is performed based on each rule prepared in advance. Since the synthesized voice is generated based on the information, the quality of the synthesized voice is unavoidably deteriorated. Also, since the voice length pattern, the FO pattern and the voice parameter pattern are generated according to the rule, unnatural accent, intonation, etc. There was a problem.

【０００７】本発明の第１の目的は、音響情報データベ
ース，韻律情報データベース等、複数種類の音声情報デ
ータベースを用意して入力テキストがこれらデータベー
スに登録されているテキストである場合には、データベ
ースから引き出した情報に基づいて音声を再構築し、出
力することで再生音声の品質向上を図ることにある。A first object of the present invention is to prepare a plurality of types of voice information databases such as an acoustic information database and a prosody information database, and when the input text is a text registered in these databases, It is to improve the quality of the reproduced voice by reconstructing the voice based on the extracted information and outputting it.

【０００８】本発明の他の目的は、音響情報データベー
スとして音声波形データベース及び／又は音声パラメー
タデータベースを、また韻律情報データベースとして音
声長データベース及び／又は音声長・音声強度データべ
ース及び／又は音声長・ＦＯパターンデータベースを用
いることで様々な入力テキスト、又は入力表音記号列に
対応した音声合成を可能とすることにある。Another object of the present invention is to use a voice waveform database and / or a voice parameter database as an acoustic information database and a voice length database and / or a voice length / speech intensity database and / or voice as a prosody information database. The purpose is to enable speech synthesis corresponding to various input texts or input phonetic symbol strings by using the long / FO pattern database.

【０００９】[0009]

【課題を解決するための手段】本発明の原理を説明す
る。図１は本発明の第１の原理を示す原理図１であり、
図中１は音声情報検索部、３は合成音声生成部を示して
いる。The principle of the present invention will be described. FIG. 1 is a principle diagram 1 showing the first principle of the present invention,
In the figure, reference numeral 1 indicates a voice information search unit, and 3 indicates a synthetic voice generation unit.

【００１０】音声情報検索部１は、テキスト又は表音記
号列の入力があると人が発した音声である実音声から抽
出した各種の音声特徴量、及びこれと対応する発声内容
（音声でどのように話されたかを示すラベル）を格納し
てある音声情報データベース２中に、これら入力テキス
ト又は入力表音記号列と一致する発声内容が存在するか
否かを検索する。The voice information retrieving unit 1 extracts various voice feature amounts from the real voice that is a voice uttered by a person when a text or a phonetic symbol string is input, and utterance contents (corresponding to which voice feature) It is searched whether or not there is utterance content that matches these input texts or input phonetic symbol strings in the voice information database 2 that stores (labels indicating whether or not they are spoken).

【００１１】なお、ここに音声情報とは音声波形，音声
パラメータ，ＦＯ，音声強度及び音声長等をいう。また
音声特徴量とは音声の物理的な特徴量、所謂音声パラメ
ータであり、一般的には音声周波数領域の特徴量を指
し、これにはスペクトル（周波数強度）フォルマント
（声道共振周波数），ＬＰＣ（線形予測係数），ＰＡＲ
ＣＯＲ係数（声道断面積に対応する係数）等がある。The voice information is a voice waveform, voice parameters, FO, voice strength, voice length, etc. The voice feature amount is a physical feature amount of a voice, a so-called voice parameter, and generally refers to a feature amount in the voice frequency region, which includes a spectrum (frequency intensity) formant (vocal tract resonance frequency), LPC. (Linear prediction coefficient), PAR
There are COR coefficients (coefficients corresponding to vocal tract cross-sectional areas) and the like.

【００１２】検索の結果、一致する発声内容が存在する
場合はこれを合成音声生成部３へ渡し、また一致する発
声内容が存在しない場合には、入力テキスト又は入力表
音記号列をそのまま合成音声生成部３へ渡す。合成音声
生成部３は、合成音声生成規則４に基づいて、入力テキ
スト又は入力表音記号列から合成音声を生成する。As a result of the search, if there is a matching utterance content, this is passed to the synthetic speech generation unit 3, and if there is no matching utterance content, the input text or the input phonetic symbol string is directly synthesized speech. Pass to the generation unit 3. The synthetic speech generation unit 3 generates synthetic speech from an input text or an input phonetic symbol string based on the synthetic speech generation rule 4.

【００１３】第１の発明は、この原理に基づく発明であ
り、入力テキスト又は入力表音記号列を合成音声に変換
して出力する音声合成システムにおいて、実音声から抽
出した音声特徴量を格納した音声情報データベースと、
入力テキスト又は入力表音記号列と対応する前記音声情
報データベースに格納されている音声特徴量を検索する
検索手段と、検索の結果、音声情報データべースに該当
する音声特徴量が存在した場合はその音声特徴量に基づ
いて音声を構成する手段と、該当する音声特徴量が存在
しなかった場合は予め定めた規則に従って合成音声を生
成する手段とを具備することを特徴とする。A first invention is an invention based on this principle. In a speech synthesis system for converting an input text or an input phonetic symbol string into synthetic speech and outputting the synthetic speech, a speech feature quantity extracted from an actual speech is stored. Voice information database,
Search means for searching the voice feature amount stored in the voice information database corresponding to the input text or the input phonetic symbol string, and as a result of the search, the voice feature amount corresponding to the voice information database exists. Is provided with means for forming a voice based on the voice feature amount, and means for generating a synthetic voice according to a predetermined rule when the corresponding voice feature amount does not exist.

【００１４】図２は本発明の第２の原理を示す原理図２
であり、図中１０は音響情報検索部、３０は韻律情報生
成部、３１は音響情報生成部を示している。なお、音響
情報とは、音声情報のうちの音声波形の如き時間領域の
情報、スぺクトルの如き周波数領域の情報を意味し、ま
た音響的特徴量とはこれら情報のうちの音声合成上の有
意な情報を意味する。また、韻律情報とは、音声情報の
うちの韻律（イントネーション，アクセント，リズム，
強度）についての情報を意味し、また韻律的特徴量とは
イントネーション，アクセントに対応する韻律の物理的
特徴量である基本周波数（ＦＯ）、リズムに対応する音
声長，強度に対応する音声強度をいう。FIG. 2 is a principle diagram showing a second principle of the present invention.
In the figure, 10 is an acoustic information retrieval unit, 30 is a prosody information generation unit, and 31 is an acoustic information generation unit. Note that the acoustic information means information in the time domain such as a voice waveform of the voice information, information in the frequency domain such as a spectrum, and the acoustic feature amount is information on the speech synthesis of the information. Means significant information. The prosody information is the prosody (intonation, accent, rhythm,
Intensity), and the prosodic features are the fundamental frequency (FO), which is the physical feature of the prosody corresponding to the intonation and accent, the voice length corresponding to the rhythm, and the voice intensity corresponding to the intensity. Say.

【００１５】音響情報検索部１０はテキスト又は表音記
号列が入力されると実音声から抽出した様々な音響的特
徴量及びこれと対応する発声内容が格納されている音響
情報データベース２０中に入力テキスト又は入力表音記
号列と一致する発声内容が存在するか否かを検索する。
一致する発声内容が存在する場合は、これを直接音響情
報として出力し、また一致する発声内容が存在しない場
合には入力テキスト又は入力表音記号列をそのまま韻律
情報生成部３０へ渡す。When a text or phonetic symbol string is input, the acoustic information retrieval unit 10 inputs it into an acoustic information database 20 in which various acoustic feature quantities extracted from actual speech and corresponding utterance contents are stored. It is searched whether or not there is utterance content that matches the text or the input phonetic symbol string.
If the matched utterance content exists, this is directly output as acoustic information, and if the matched utterance content does not exist, the input text or the input phonetic symbol string is directly passed to the prosody information generation unit 30.

【００１６】韻律情報生成部３０は、韻律情報生成規則
４０に基づいてテキスト又は表音記号列から韻律情報を
生成し、これを音響情報生成部３１へ渡す。音響情報生
成部３１は音響情報生成規則４１に基づいて韻律情報か
ら音響情報を生成する。The prosody information generation unit 30 generates prosody information from a text or phonetic symbol string based on the prosody information generation rule 40, and passes this to the acoustic information generation unit 31. The acoustic information generation unit 31 generates acoustic information from the prosody information based on the acoustic information generation rule 41.

【００１７】第２の発明は、この原理に基づく発明であ
り、音声情報データベースとして音声特徴量中の実音声
から抽出した音響的特徴量を格納した音響情報データベ
ースを用いることを特徴とする。A second invention is based on this principle, and is characterized in that an audio information database storing acoustic feature amounts extracted from actual voices in the voice feature amounts is used as the voice information database.

【００１８】図３は本発明の第３の原理を示す原理図３
であり、図中１１は韻律情報検索部である。韻律情報検
索部１１はテキスト又は表音記号列が入力されると、実
音声から抽出した様々な韻律的特徴量及びこれと対応す
る発声内容が格納されている韻律情報データベース２１
中に入力テキスト、又は入力表音記号列と一致する発声
内容が存在するか否かを検索する。一致する発声内容が
存在する場合は、得られた韻律情報を音響情報生成部３
１へ渡し、また一致する発声内容が存在しない場合は、
入力テキスト又は入力表音記号列をそのまま韻律情報生
成部３０へ渡す。FIG. 3 is a principle diagram showing a third principle of the present invention.
11 is a prosody information search unit. When a text or phonetic symbol string is input, the prosody information search unit 11 stores a prosody information database 21 in which various prosodic feature quantities extracted from actual speech and corresponding utterance contents are stored.
It is searched whether or not there is utterance content that matches the input text or the input phonetic symbol string. If there is a matching utterance content, the obtained prosody information is used as the acoustic information generation unit 3
If there is no matching utterance content,
The input text or the input phonetic symbol string is directly passed to the prosody information generation unit 30.

【００１９】音響情報生成部３０は、韻律情報生成規則
４０に基づいて入力テキスト又は入力表音記号列から韻
律情報を生成し、これを音響情報生成部３１へ渡す。音
響情報生成部３１は、音響情報生成規則４１に基づいて
韻律情報検索部１１又は韻律情報生成部３０から渡され
た韻律情報から音響情報を生成する。The acoustic information generation unit 30 generates prosody information from the input text or the input phonetic symbol string based on the prosody information generation rule 40, and passes this to the acoustic information generation unit 31. The acoustic information generation unit 31 generates acoustic information from the prosody information passed from the prosody information search unit 11 or the prosody information generation unit 30 based on the acoustic information generation rule 41.

【００２０】第３の発明は、この原理に基づく発明であ
り、音声情報データベースとして、音声特徴量中の実音
声から抽出した韻律的な特徴量を格納した韻律情報デー
タベースを用いることを特徴とする。A third invention is based on this principle, and is characterized in that a prosodic information database storing prosodic characteristic amounts extracted from real speech in the speech characteristic amounts is used as the speech information database. .

【００２１】図４は本発明の第４の原理を示す原理図４
であり、図中１０は音響情報検索部を示している。音響
情報検索部１０はテキスト又は表音記号列が入力される
と実音声から抽出した様々な音響的特徴量及びこれに対
応する発声内容を格納した音響情報データベース２０中
に入力テキスト又は入力表音記号列と一致する発声内容
が存在するか否かを検索する。FIG. 4 is a principle diagram showing a fourth principle of the present invention.
In the figure, 10 indicates an acoustic information retrieval unit. When a text or phonetic symbol string is input, the acoustic information search unit 10 inputs an input text or an input phonetic sound into an acoustic information database 20 that stores various acoustic feature amounts extracted from actual speech and corresponding utterance contents. It is searched whether or not there is utterance content that matches the symbol string.

【００２２】一致する発声内容が存在する場合には直接
音響情報としてこれを出力する。また一致する音声内容
が存在しない場合は入力テキスト又は入力表音記号列を
そのまま韻律情報検索部１１へ渡す。韻律情報検索部１
１は、実音声から抽出した種々の韻律的特徴量及びこれ
に対応する発声内容が格納されている韻律情報データベ
ース２１中に入力テキスト又は入力表音記号列と一致す
る発声内容が存在するか否かを検索する。When the utterance contents that match each other are present, this is directly output as acoustic information. If there is no matching voice content, the input text or the input phonetic symbol string is directly passed to the prosody information searching unit 11. Prosody information search unit 1
1 indicates whether or not the utterance content that matches the input text or the input phonetic symbol string exists in the prosody information database 21 in which various prosodic feature quantities extracted from the actual speech and the utterance content corresponding thereto are stored. To search for

【００２３】一致する発声内容が存在する場合は、韻律
的特徴量を含む韻律情報を音響情報生成部３１へ渡し、
また一致する発声内容が存在しない場合は入力テキスト
又は入力表音記号列をそのまま韻律情報生成部３０へ渡
す。韻律情報生成部３０は、韻律情報生成規則４０に基
づいて、入力テキスト又は入力表音記号列から韻律情報
を生成し、これを音響情報生成部３１へ渡す。音響情報
生成部３１は、音響情報生成規則４１に基づいて韻律情
報検索部１１又は韻律情報生成部３０から渡された韻律
情報から音響情報を生成する。When the utterance contents that match each other are present, the prosody information including the prosody feature amount is passed to the acoustic information generation unit 31,
If there is no matching utterance content, the input text or the input phonetic symbol string is passed to the prosody information generation unit 30 as it is. The prosody information generation unit 30 generates prosody information from the input text or the input phonetic symbol string based on the prosody information generation rule 40, and passes this to the acoustic information generation unit 31. The acoustic information generation unit 31 generates acoustic information from the prosody information passed from the prosody information search unit 11 or the prosody information generation unit 30 based on the acoustic information generation rule 41.

【００２４】第４の発明は、この原理に基づく発明であ
り、音声情報データベースとして、実音声から抽出した
音響的特徴量を格納した音響情報データベース及び実音
声から抽出した韻律的な特徴量を格納した韻律情報デー
タベースを用いることを特徴とする。A fourth invention is an invention based on this principle. As a speech information database, an acoustic information database storing acoustic characteristic amounts extracted from actual speech and a prosodic characteristic amount extracted from actual speech are stored. It is characterized by using the prosody information database.

【００２５】第５の発明は、前記第２，第４の原理に基
づく発明であり、音響情報データベースとして、音声波
形を格納した音声波形データベースを用いることを特徴
とする。A fifth invention is based on the above-mentioned second and fourth principles, and is characterized in that a voice waveform database storing voice waveforms is used as the acoustic information database.

【００２６】第６の発明は、同じく前記第２，第４の原
理に基づく発明であり、音響情報データべースとして、
スペクトル，声道断面積又はフォルマント周波数を格納
した音声パラメータデータベースを用いることを特徴と
する。A sixth invention is also an invention based on the above second and fourth principles, and as an acoustic information database,
It is characterized by using a voice parameter database that stores spectrum, vocal tract cross-sectional area, or formant frequency.

【００２７】第７の発明は、同じく前記第２，第４の原
理に基づく発明であり、音響情報データベースとして、
音声波形データベース及び音声パラメータデータベース
を用いることを特徴とする。The seventh invention is also an invention based on the above second and fourth principles, and as an acoustic information database,
It is characterized by using a voice waveform database and a voice parameter database.

【００２８】第８の発明は、前記第３，第４の原理に基
づく発明であり、韻律情報データベースとして、音声
長，音声強度，基本周波数のうちの、音声長のみ、又は
いずれか２つ以上を格納したデータベースを用いること
を特徴とする。An eighth invention is an invention based on the above-mentioned third and fourth principles. As a prosodic information database, only the voice length of voice length, voice intensity, and fundamental frequency, or any two or more of them are used. It is characterized by using a database storing.

【００２９】第９の発明は、同じく前記第３，第４の原
理に基づく発明であり、韻律情報データベースとして音
声長及び音声強度を格納した音声長・音声強度データベ
ースを用いることを特徴とする。A ninth aspect of the present invention is also based on the third and fourth principles, and is characterized in that a voice length / speech intensity database storing voice length and voice intensity is used as a prosody information database.

【００３０】第１０の発明は、同じく前記第３，第４の
原理に基づく発明であり、韻律情報データベースとし
て、音声長及び基本周波数を格納した音声長・ＦＯデー
タベースを用いることを特徴とする。A tenth aspect of the present invention is also based on the above third and fourth principles, and is characterized in that a voice length / FO database storing voice length and fundamental frequency is used as a prosody information database.

【００３１】第１１の発明は、同じく前記第３，第４の
原理に基づく発明であり、韻律情報データベースとして
音声長のみを格納した音声長データベースを用いること
を特徴とする。An eleventh invention is also an invention based on the above third and fourth principles, and is characterized in that a voice length database storing only voice length is used as a prosody information database.

【００３２】第１２の発明は、同じく前記第３，第４の
原理に基づく発明であり、韻律情報データベースとし
て、音声長・音声強度・ＦＯデータベース、音声長・音
声強度データベース、音声長・ＦＯデータベース又は音
声長データベースのいずれか２以上のデータベースを用
いることを特徴とする。The twelfth invention is also based on the above-mentioned third and fourth principles, and as a prosodic information database, a voice length / speech intensity / FO database, a voice length / speech intensity database, and a voice length / FO database. Alternatively, any two or more databases of the voice length database are used.

【００３３】[0033]

【作用】第１の発明にあっては、音声特徴量を格納した
データベースを用いることで高品質の合成音声の出力が
可能となる。According to the first aspect of the present invention, it is possible to output high quality synthetic speech by using the database storing the speech feature amount.

【００３４】第２の発明にあっては、音響的特徴量を用
いることで実音声に近い明瞭度の高い合成音声が得られ
る。According to the second aspect of the present invention, by using the acoustic feature quantity, a synthetic speech having a high degree of intelligibility close to an actual speech can be obtained.

【００３５】第３の発明にあっては、韻律的特徴量とを
用いるから、実音声に近い自然性の高い音声が得られ
る。According to the third aspect of the invention, since the prosodic feature quantity is used, it is possible to obtain a highly natural voice close to an actual voice.

【００３６】第４の発明にあっては、第２，第３の機能
を兼ね備えた機能が得られる。According to the fourth aspect of the present invention, a function having both the second and third functions can be obtained.

【００３７】第５の発明にあっては、音響情報データと
して音声波形を用いることで自然性，明瞭度の高い音声
が得られる。According to the fifth aspect of the invention, by using a voice waveform as the acoustic information data, a voice with a high degree of naturalness and clarity can be obtained.

【００３８】第６の発明にあっては、明瞭度は低いが少
ないデータ量で自然性の高い合成音声が得られる。According to the sixth aspect of the invention, synthetic speech having a low degree of intelligibility but a small amount of data and high naturalness can be obtained.

【００３９】第７の発明にあっては、第５，第６の発明
の両機能を兼ねた作用が得られる。According to the seventh aspect of the invention, it is possible to obtain an operation which has both functions of the fifth and sixth aspects.

【００４０】第８の発明にあっては、音声長，音声強
度，ＦＯのうちの１又は複数を用いることで自然性の高
い韻律を持った合成音声が得られる。In the eighth invention, by using one or more of the voice length, the voice intensity, and the FO, a synthetic voice having a highly natural prosody can be obtained.

【００４１】第９の発明にあっては、自然性の高いリズ
ムを持った合成音声が得られる。According to the ninth invention, a synthetic voice having a rhythm with high naturalness can be obtained.

【００４２】第１０の発明にあっては、自然性の高いリ
ズム，イントネーション，アクセントを持った合成音声
が得られる。According to the tenth aspect of the present invention, a synthetic voice having highly natural rhythm, intonation and accent can be obtained.

【００４３】第１１の発明にあっては、少ないデータ量
で自然性の高いリズムを持った合成音声が得られる。According to the eleventh invention, a synthetic voice having a highly natural rhythm can be obtained with a small amount of data.

【００４４】第１２の発明にあっては、自然な韻律を持
った合成音声が得られる。According to the twelfth invention, a synthetic speech having a natural prosody can be obtained.

【００４５】[0045]

【実施例】以下本発明をその実施例を示す図面に基づき
具体的に説明する。（実施例１）実施例１は第１，第２，第３の原理を具象
化したものであり、図５は本発明の実施例１の構成を示
すブロック図である。図５中１００は音声波形検索部を
示している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be specifically described below with reference to the drawings showing the embodiments. (Embodiment 1) Embodiment 1 embodies the first, second, and third principles, and FIG. 5 is a block diagram showing the configuration of Embodiment 1 of the present invention. Reference numeral 100 in FIG. 5 denotes a voice waveform search unit.

【００４６】音声波形検索部１００はテキスト又は表音
記号列が入力されると、音声波形データベース２００を
検索する。音声波形データベース２００には発声内容を
示す表音記号列と、それに対応した音声波形データ（Ｐ
ＣＭデータ又はＡＤＰＣＭ等の符号化技術により情報圧
縮されたデータ）の対が複数格納されて、この内に入力
テキスト又は入力表音記号列と一致する発声内容が存在
する場合には、対応する音声波形を直ちにＤＡ変換部５
へ渡し、また一致する発声内容が存在しない場合には入
力テキスト又は入力表音記号列をそのまま音声パラメー
タ検索部１０１へ渡す。When the text or phonetic symbol string is input, the voice waveform search section 100 searches the voice waveform database 200. The voice waveform database 200 includes a phonetic symbol string indicating utterance content and voice waveform data (P
If a plurality of pairs of CM data or data that has been information-compressed by an encoding technique such as ADPCM are stored, and if there is utterance content that matches the input text or the input phonetic symbol string, the corresponding voice The waveform is immediately converted to DA converter 5
If there is no matching utterance content, the input text or the input phonetic symbol string is directly passed to the voice parameter searching unit 101.

【００４７】音声パラメータ検索部１０１は、実音声か
ら抽出した音声パラメータ、例えばＰＡＲＣＯＲ，ＬＳ
Ｐ，フォルマント周波数、及びこれに対応する発声内容
を格納した音声パラメータデータベース２０１中に入力
テキスト又は入力表音記号列と一致する発声内容が存在
するか否かを検索する。The voice parameter retrieving unit 101 extracts voice parameters from real voice, for example PARCOR and LS.
A search is made as to whether or not there is utterance content that matches the input text or input phonetic symbol string in the voice parameter database 201 that stores P, formant frequencies, and utterance content corresponding thereto.

【００４８】一致する発声内容が存在すれば、その音声
パラメータを波形生成部３１１へ渡し、また一致する発
声内容が存在しない場合は入力テキスト又は入力表音記
号列をそのまま韻律情報検索部１１へ渡す。韻律情報検
索部１１は、実音声から抽出した種々の韻律的特徴量及
びこれに対応する発声内容を格納した韻律情報データベ
ース２１中に入力テキスト又は表音記号列と一致する発
声内容が存在するか否かを検索する。一致する発声内容
が存在すれば、その韻律情報を音声パラメータパターン
生成部３１０へ渡し、また一致する発声内容が存在しな
い場合は入力テキスト又は入力表音記号列をそのまま韻
律情報生成部３０へ渡す。If there is a matching utterance content, the voice parameter is passed to the waveform generating section 311, and if there is no matching utterance content, the input text or the input phonetic symbol string is passed to the prosody information searching section 11 as it is. . The prosody information retrieval unit 11 determines whether or not the utterance content that matches the input text or the phonetic symbol string exists in the prosody information database 21 that stores various prosodic feature quantities extracted from the actual speech and the corresponding utterance content. Search whether or not. If the matched utterance content exists, the prosody information is passed to the voice parameter pattern generation unit 310, and if the matched utterance content does not exist, the input text or the input phonetic symbol string is passed to the prosody information generation unit 30 as it is.

【００４９】韻律情報生成部３０は韻律情報生成規則４
０に基づいて、入力テキスト又は入力表音記号列から韻
律情報を生成し、これを音声パラメータパターン生成部
３１０へ渡す。音声パラメータパターン生成部３１０は
音声パラメータ生成規則４１０に基づいて、渡された韻
律情報、即ち表音記号列（特に読み情報）、音声長及び
音声強度パターン，ＦＯパターンから音声パラメータパ
ターンを生成し、これを波形生成部３１１へ渡す。生成
される音声パラメータパターンは具体的にはＲＡＲＣＯ
Ｒ係数、又はフォルマント周波数と音源信号との時間変
化パターンであったり、所謂波形編集方式の場合にあっ
ては、音声波形の短い単位である素片波形の種類，接続
タイミング等の接続情報である。The prosody information generation unit 30 uses the prosody information generation rule 4
Based on 0, prosody information is generated from the input text or the input phonetic symbol string, and this is passed to the voice parameter pattern generation unit 310. The voice parameter pattern generation unit 310 generates a voice parameter pattern from the passed prosody information, that is, a phonetic symbol string (especially reading information), a voice length and a voice intensity pattern, and an FO pattern, based on the voice parameter generation rule 410. This is passed to the waveform generation unit 311. The generated voice parameter pattern is specifically RARCO
It is connection information such as the R coefficient or the temporal change pattern of the formant frequency and the sound source signal, and in the case of a so-called waveform editing method, the type of a segment waveform, which is a short unit of the audio waveform, the connection timing, and the like. .

【００５０】波形生成部３１１は、音声パラメータ検索
部１０１又は音声パラメータ生成部３１０から渡された
音声パラメータパターンに基づいて実際の音声波形を生
成し、ＤＡ変換部５へ渡す。音声波形の生成処理は、音
声パラメータパターンがＲＡＲＣＯＲ係数の場合は波形
生成部３１１はＰＡＲＣＯＲフィルタと音源生成部とか
らなり、ＲＡＲＣＯＲフィルタを音源信号で駆動処理
し、また波形編集方式の場合は素片波形を適切な位置に
配し、これらを滑らかに接続する処理を行う。ＤＡ変換
部５は波形生成部３１１で生成され、又は音声波形検索
部１００によって音声波形データベース２００から引き
出されたディジタル音声波形をアナログ音声波形に変換
して合成音声として出力する。The waveform generation unit 311 generates an actual voice waveform based on the voice parameter pattern passed from the voice parameter search unit 101 or the voice parameter generation unit 310, and passes it to the DA conversion unit 5. In the voice waveform generation process, when the voice parameter pattern is the RARCOR coefficient, the waveform generation unit 311 includes a PARCOR filter and a sound source generation unit. The RARCOR filter is driven by the sound source signal. The waveform is placed at an appropriate position, and the process of connecting them smoothly is performed. The DA conversion unit 5 converts the digital voice waveform generated by the waveform generation unit 311 or extracted from the voice waveform database 200 by the voice waveform search unit 100 into an analog voice waveform, and outputs the analog voice waveform.

【００５１】このような実施例１にあっては、音声波形
データベース２００及び音声パラメータデータベース２
０１を用いるこによって高品質な合成音声の生成が、ま
た韻律情報データベース２１を用いることで自然な合成
音声の生成が可能となる。In the first embodiment, the voice waveform database 200 and the voice parameter database 2 are used.
By using 01, high quality synthetic speech can be generated, and by using the prosody information database 21, natural synthetic speech can be generated.

【００５２】（実施例２）実施例２は第３の原理を具象
化したものであり、図６は、本発明の実施例２の構成を
示すブロック図である。図６中１１０は音声長・音声強
度・ＦＯパターン検索部を示している。音声長・音声強
度・ＦＯパターン検索部６１はテキスト又は表音記号列
が入力されると、実音声から抽出した音声長・音声強度
・ＦＯパターン及びこれに対応する発声内容を格納した
音声長・音声強度・ＦＯデータベース２１０中に、入力
テキスト又は入力表音記号列と一致する発声内容が存在
するか否かを検索する。(Embodiment 2) Embodiment 2 embodies the third principle, and FIG. 6 is a block diagram showing the configuration of Embodiment 2 of the present invention. Reference numeral 110 in FIG. 6 denotes a voice length / voice strength / FO pattern search unit. When a text or phonetic symbol string is input, the voice length / voice strength / FO pattern search unit 61 stores the voice length / voice strength / FO pattern extracted from the actual voice and the voice length / storing content corresponding to this. The voice strength / FO database 210 is searched for whether or not there is utterance content that matches the input text or the input phonetic symbol string.

【００５３】一致する発声内容が存在すれば、その音声
長・音声強度・ＦＯパターンを音響情報生成部３１へ渡
し、また一致する発声内容が存在しない場合には入力テ
キスト又は入力表音記号列をそのまま音声長・ＦＯパタ
ーン検索部１１１へ渡す。音声長・音声強度・ＦＯデー
タベースには、発声内容を示す情報とそれに対応した音
素及び音節等の合成単位毎の時間長である音声長パター
ンと、音声強度の時間変化パターンである音声強度パタ
ーンと、基準周波数の時間変化パターンであるＦＯパタ
ーンとの組が複数格納されている。If there is a matching utterance content, the voice length / sound intensity / FO pattern is passed to the acoustic information generating section 31, and if there is no matching utterance content, the input text or the input phonetic symbol string is input. It is passed to the voice length / FO pattern search unit 111 as it is. The speech length / speech strength / FO database contains information indicating utterance content, a speech length pattern corresponding to the synthesis unit such as a phoneme and a syllable, and a speech strength pattern which is a temporal variation pattern of the speech strength. , A plurality of sets of FO patterns, which are temporal change patterns of reference frequencies, are stored.

【００５４】音声長・ＦＯパターン検索部１１１は音声
長・ＦＯデータベース２１１中に入力テキスト又は入力
表音記号列と一致する発声内容が存在するか否かを検索
する。一致する発声内容が存在する場合は、その音声長
・ＦＯパターンデータを音声強度パターン生成部３０２
に渡し、また一致する発声内容が存在しない場合は、入
力テキスト又は入力表音記号列をそのまま音声長・音声
強度パターン検索部１１２へ渡す。音声長・音声強度パ
ターン検索部１１２は音声長・音声強度データベース２
１２中に入力テキスト又は入力表音記号列と一致する発
声内容が存在するか否かを検索し、一致する発声内容が
存在する場合はその音声長・音声強度パターンをＦＯパ
ターン生成部３０１へ渡し、また存在しない場合は入力
テキスト又は入力表音記号列をそのまま音声長パターン
検索部１１３へ渡す。The voice length / FO pattern search unit 111 searches the voice length / FO database 211 for utterance contents matching the input text or the input phonetic symbol string. When the utterance contents that match each other are present, the voice length / FO pattern data is used as the voice intensity pattern generation unit 302.
If there is no matching utterance content, the input text or the input phonetic symbol string is passed as it is to the voice length / speech intensity pattern search unit 112. The voice length / speech intensity pattern search unit 112 uses the voice length / speech intensity database 2
It searches whether or not there is utterance content that matches the input text or the input phonetic symbol string in 12, and if there is utterance content that matches, the voice length / speech intensity pattern is passed to the FO pattern generation unit 301. If it does not exist, the input text or the input phonetic symbol string is directly passed to the voice length pattern search unit 113.

【００５５】音声長パターン検索部１１３は、音素，音
節等の合成単位毎の時間長である音声長パターンとこれ
に対応する発声内容が格納された音声長データベース２
１３中に、入力テキスト又は入力表音記号列が存在する
か否かを検索し、存在する場合はこれをＦＯパターン生
成部３０１へ渡し、また存在しない場合は入力テキスト
又は入力表音記号列をそのまま音声長パターン生成部３
００へ渡す。音声長パターン生成部３００は音声長生成
規則４００に基づいて入力テキスト又は入力表音記号列
を音声長パターンに変換し、これをＦＯパターン生成部
３０１へ渡す。The voice length pattern search unit 113 stores a voice length pattern, which is a time length for each synthesis unit of phonemes, syllables, and the like, and a voice length database 2 in which utterance contents corresponding thereto are stored.
13 is searched for whether or not the input text or the input phonetic symbol string exists, and if it exists, it is passed to the FO pattern generation unit 301, and if it does not exist, the input text or the input phonetic symbol string is input. As it is, the voice length pattern generation unit 3
Pass to 00. The voice length pattern generation unit 300 converts the input text or the input phonetic symbol string into a voice length pattern based on the voice length generation rule 400, and passes this to the FO pattern generation unit 301.

【００５６】ＦＯパターン生成部３０１はＦＯ生成規則
４０１に基づいて、音声長・音声強度パターン、又は音
声長パターンと表音記号列とからアクセント，イントネ
ーションに相当する物理量であるＦＯパターンを生成
し、表音記号列と共にこれを音声強度パターン生成部３
０２へ渡す。音声強度パターン生成部３０２は音声強度
生成規則４０２に基づいて、表音記号列、音声長・ＦＯ
パターン又はＦＯパターンから音声強度の時間変化パタ
ーンである音声強度パターンを生成し、これを音響情報
生成部３１へ渡す。音響情報生成部３１は音響情報生成
規則４１に基づいて実際の音響情報を生成する。Based on the FO generation rule 401, the FO pattern generation unit 301 generates a FO pattern, which is a physical quantity corresponding to an accent or intonation, from a voice length / voice intensity pattern or a voice length pattern and a phonetic symbol string. This is used together with the phonetic symbol string to generate the voice intensity pattern generation unit 3
Pass to 02. The voice strength pattern generation unit 302, based on the voice strength generation rule 402, a phonetic symbol string, a voice length / FO.
A voice intensity pattern, which is a temporal change pattern of voice intensity, is generated from the pattern or the FO pattern and is passed to the acoustic information generation unit 31. The acoustic information generation unit 31 generates actual acoustic information based on the acoustic information generation rule 41.

【００５７】このような実施例２にあっては、音声長・
音声強度・ＦＯデータベース２１０、音声長・ＦＯデー
タベース２１１、音声長・音声強度データベース２１２
及び音声長データベース２１３等を用いることで、自然
なイントネーション，アクセント及びリズムを持った合
成音声の生成が可能となる。In the second embodiment, the voice length
Voice strength / FO database 210, voice length / FO database 211, voice length / voice strength database 212
By using the voice length database 213 and the like, it is possible to generate synthetic voice having natural intonation, accent and rhythm.

【００５８】（実施例３）実施例３は第１，第２，第
３，第４の原理を具象化したものであり、図７は実施例
３の構成を示すブロック図である。図７中３０４は言語
処理部を示している。言語処理部３０４はテキスト入力
があると単語辞書３０３を参照して入力テキスト中の単
語の読み、アクセント情報及び文の構成を解析し、イン
トネーションを制御するための制御情報からなる表音記
号列を作成し、これを音声波形検索部１００へ渡す。(Embodiment 3) Embodiment 3 embodies the first, second, third, and fourth principles, and FIG. 7 is a block diagram showing the configuration of Embodiment 3. Reference numeral 304 in FIG. 7 denotes a language processing unit. When a text input is made, the language processing unit 304 refers to the word dictionary 303 to analyze the reading of the words in the input text, the accent information and the structure of the sentence, and outputs a phonetic symbol string consisting of control information for controlling the intonation. It is created and passed to the voice waveform search unit 100.

【００５９】音声波形検索部１００は言語処理部で作成
された表音記号列をキーとして音声波形データベース２
００からキーと一致する発声内容に対応する音声波形デ
ータを検索し、一致する発声内容が存在する場合はその
音声波形データをＤＡ変換部５に渡し、また存在しない
場合には入力表音記号列をそのまま音声パラメータ検索
部１０１へ渡す。音声パラメータ検索部１０１は入力表
音記号列をキーとして、音声パラメータデータベース２
０１からキーと一致する発声内容に対応する音声パラメ
ータデータを検索し、一致する発声内容が存在する場合
はその音声パラメータデータを波形生成部３１１へ渡
し、また一致しない場合は入力表音記号列をそのまま音
声長・ＦＯパターン検索部１１０へ渡す。The voice waveform search unit 100 uses the phonetic symbol string created by the language processing unit as a key, and the voice waveform database 2
00 to search the voice waveform data corresponding to the utterance content that matches the key, pass the voice waveform data to the DA conversion unit 5 when the utterance content that matches the key exists, and the input phonetic symbol string when it does not exist. Is directly passed to the voice parameter search unit 101. The voice parameter search unit 101 uses the input phonetic symbol string as a key to store the voice parameter database 2
The voice parameter data corresponding to the utterance content that matches the key is searched from 01, and if the utterance content that matches the key exists, the voice parameter data is passed to the waveform generation unit 311, and if they do not match, the input phonetic symbol string is output. It is passed to the voice length / FO pattern search unit 110 as it is.

【００６０】音声長・ＦＯパターン検索部１１０は入力
表音記号列をキーとして、音声長，ＦＯデータベース２
１０からこれと一致する発声内容に対応する音声長，Ｆ
Ｏデータを検索し、一致する発声内容が存在すれば、そ
の音声長・ＦＯデータを音声パラメータパターン生成部
３１０へ渡し、また存在しない場合には入力された表音
記号列をそのまま音声長パターン検索部１１１へ渡す。The voice length / FO pattern search unit 110 uses the input phonetic symbol string as a key to set the voice length / FO database 2
The voice length corresponding to the utterance content from 10 to F, F
The O data is searched, and if there is a matching utterance content, the voice length / FO data is passed to the voice parameter pattern generation unit 310, and if it does not exist, the input phonetic symbol string is directly searched for the voice length pattern. Hand it over to section 111.

【００６１】音声長・音声強度・ＦＯデータベース２１
０には、音声内容を示す表音記号列とそれに対応した音
素，音節等の合成単位毎の時間長である音声長パターン
と、基本周波数の時間変化パターンであるＦＯパターン
との対が複数格納されている。音声長パターン検索部１
１１は入力表音文字列をキーとして、音声長データベー
ス２１１からキーと一致する発声内容に対応する音声長
データを検索し、一致する発声内容が存在する場合は、
それをＦＯパターン生成部３０１へ渡し、また存在しな
い場合には入力表音記号列をそのまま音声長パターン生
成部３００へ渡す。音声長パターン生成部３００は音声
長生成規則４００に基づいて入力表音記号列を音声長パ
ターンに変換し、これを入力表音記号列と共にＦＯパタ
ーン生成部３０１へ渡す。Voice length / voice intensity / FO database 21
In 0, a plurality of pairs of a phonetic symbol string indicating a voice content, a corresponding voice length pattern which is a time length for each synthesis unit such as a phoneme and a syllable, and a FO pattern which is a time change pattern of a fundamental frequency are stored. Has been done. Voice length pattern search unit 1
Reference numeral 11 is an input phonetic character string used as a key to search the voice length database 211 for voice length data corresponding to the utterance content that matches the key.
It is passed to the FO pattern generation unit 301, and if it does not exist, the input phonetic symbol string is passed as it is to the voice length pattern generation unit 300. The voice length pattern generation unit 300 converts the input phonetic symbol string into a voice length pattern based on the voice length generation rule 400, and passes this to the FO pattern generation unit 301 together with the input phonetic symbol string.

【００６２】ＦＯパターン生成部３０１はＦＯ生成規則
４０１に基づいて入力表音記号列と音声長パターンとか
ら、アクセント，イントネーションに相当する物理量で
あるＦＯパターンを生成し、これを入力表音記号列と共
に音声パラメータパターン生成部３１０へ渡す。音声パ
ラメータパターン生成部３１０は音声パラメータ生成規
則４１０に基づいて入力表音記号列と音声長・ＦＯパタ
ーン又はＦＯパターンから音声パラメータパターンを生
成し、これを波形生成部３１１へ渡す。Based on the FO generation rule 401, the FO pattern generation unit 301 generates an FO pattern, which is a physical quantity corresponding to an accent or intonation, from an input phonetic symbol string and a voice length pattern, and this is generated as an input phonetic symbol string. Along with it, it is passed to the voice parameter pattern generation unit 310. The voice parameter pattern generation unit 310 generates a voice parameter pattern from the input phonetic symbol string and the voice length / FO pattern or the FO pattern based on the voice parameter generation rule 410, and passes this to the waveform generation unit 311.

【００６３】波形生成部３１１は、音声パラメータ検索
部１０１又は音声パラメータパターン生成部３１０から
された音声パラメータパターンからディジタルに音声
波形を生成し、ＤＡ変換部５へ渡す。ＤＡ変換部５は波
形生成部３１１で生成され、又は音声波形検索部１００
にて検索されたディジタル音声波形をアナログ音声波形
に変換し、合成音声として出力する。The waveform generation unit 311 digitally generates a voice waveform from the voice parameter pattern obtained from the voice parameter search unit 101 or the voice parameter pattern generation unit 310, and passes it to the DA conversion unit 5. The DA conversion unit 5 is generated by the waveform generation unit 311 or the voice waveform search unit 100.
The digital voice waveform searched in is converted into an analog voice waveform and output as a synthetic voice.

【００６４】このような実施例３にあっては、最初に言
語処理部３０４による言語処理を行い、入力テキストを
読み、アクセントを示す表音記号列に変換した後、音声
波形データベースの検索を行うから同じ単語が漢字，平
仮名，片仮名，異なる送りがな等種々の表記で書かれて
いてもデータベースの容量削減が出来る効果がある。In the third embodiment, the language processing unit 304 first performs language processing to read the input text and convert it into a phonetic symbol string indicating an accent, and then searches the voice waveform database. Even if the same word is written in various notations such as kanji, hiragana, katakana, and different syllabary, there is an effect that the capacity of the database can be reduced.

【００６５】（実施例４）実施例４は第１，第２，第
３，第４の原理を具象化したものであり、図８は実施例
４の構成を示すブロック図である。この実施例４にあっ
ては、図７に示す実施例３において、言語処理部３０４
を音声長パターン検索部１１１と音声長パターン生成部
３００との中間に位置させたのと実質的に同じ構成とな
っている。このような実施例４については、いま「こん
にちは。」，「今日は３月３日です。」，「ありがとう
ございました。」の３文が入力された場合について具体
的にその処理過程を説明する。(Embodiment 4) Embodiment 4 embodies the first, second, third, and fourth principles, and FIG. 8 is a block diagram showing the configuration of Embodiment 4. In the fourth embodiment, the language processing unit 304 in the third embodiment shown in FIG. 7 is used.
Is substantially the same as that of the voice length pattern search unit 111 and the voice length pattern generation unit 300. For such an embodiment 4 is now "Hello.", "Today is March 3 days.", Specifically to the process described for the case where 3 statement of "Thank you." Has been input .

【００６６】なお「こんにちは」の音声波形データは音
声波形データベース２００に格納され、「ありがとうご
ざいました」の音声長・ＦＯパターンは音声長．ＦＯデ
ータベース２１０に格納されているものとし、また「今
日は３月３日です。」はいずれのデータベースにも格納
されていないものとする。いま上述した３文が入力され
た場合、「こんにちは」の文が入力されると音声波形検
索部１００が音声波形データベース２００を検索し、こ
こに格納されている音声波形データを検索し、この音声
波形データを直接ＤＡ変換部５へ送り、合成音声が「コ
ンニチハ」として出力される。[0066] It should be noted that the speech waveform data of "Hello" is stored in the speech waveform database 200, voice length · FO pattern voice length of "Thank you". It is assumed that it is stored in the FO database 210, and "Today is March 3rd." Is not stored in any database. If 3 statement now above is input, the speech waveform retrieval unit 100 when the statement is entered in "Hello" searches the speech waveform database 200, searches the speech waveform data that is stored here, the audio The waveform data is sent directly to the DA converter 5, and the synthesized voice is output as "Konichiwa".

【００６７】また「ありがとうございました」の場合
は、音声波形検索部１００，音声パラメータ検索部１０
１では検索されないが、音声長・ＦＯパターン検索部１
１０が音声長・ＦＯデータベース２１０を検索すること
でこれを検出し、これを音声パラメータパターン生成部
３１０へ渡し、波形生成部３１１を経て合成音声として
「アリガトウゴザイマシタ」が出力される。一方「今日
は３月３日です」の場合はいずれのデータベースにも格
納されていないから言語処理部３０４に達し、ここで単
語の読み，アクセント位置を決定しその文の構造を解析
し、イントネーションを制御する制御情報を音声パター
ン生成部３００へ渡す。音声情報パターン生成部３００
は音声長生成規則４００に基づき音声長パターンを生成
し、これをＦＯパターン生成部３０１へ渡す。In the case of "Thank you", the voice waveform search unit 100 and the voice parameter search unit 10
1 is not searched, but voice length / FO pattern search unit 1
10 detects this by searching the voice length / FO database 210, passes it to the voice parameter pattern generation unit 310, and outputs “Arigatogogo zaimashita” as a synthesized voice via the waveform generation unit 311. On the other hand, in the case of "Today is March 3," it is not stored in any database and reaches the language processing unit 304, where the reading and accent position of the word is determined, the structure of the sentence is analyzed, and the intonation is determined. The control information for controlling is passed to the voice pattern generation unit 300. Voice information pattern generation unit 300
Generates a voice length pattern based on the voice length generation rule 400, and passes this to the FO pattern generation unit 301.

【００６８】以下実施例３と同様にして、ＦＯパラメー
タパターン生成部３０１，音声パラメータパターン生成
部３１０，波形生成部３１１を経てＤＡ変換部５へ渡さ
れ、「キョウハサンガツミッカデス」の合成音声が出力
される。Thereafter, in the same manner as in the third embodiment, the FO parameter pattern generating section 301, the voice parameter pattern generating section 310, and the waveform generating section 311 are passed to the DA converting section 5, and the synthesized voice of "Kyoha Sangatsu Mikades" is synthesized. Is output.

【００６９】このような実施例４にあっては、最初に音
声波形データベース２００，音声パラメータデータベー
ス２０１，音声長・ＦＯデータベース２１０，音声長デ
ータベース２１１の検索を行うことで、入力テキストが
データベース内に存在する場合、それ以降の検索処理を
行う必要がなく、高速に合成音声の生成が可能となる。In the fourth embodiment, the input text is stored in the database by first searching the voice waveform database 200, the voice parameter database 201, the voice length / FO database 210, and the voice length database 211. When it exists, it is not necessary to perform the subsequent search processing, and the synthesized speech can be generated at high speed.

【００７０】（実施例５）実施例５は第１，第２，第
３，第４の原理を具象化したものであり、図９は、実施
例５の構成を示すブロック図である。図９中１０ａ，１
０ｂは音響情報検索部を、１１ａ，１１ｂは韻律情報検
索部を夫々示している。音響情報検索部１０ａは、テキ
スト入力されると入力テキストをキーにして音響情報デ
ータベース２０を検索し、入力テキストと一致する発声
内容が存在する場合は、これを音響情報として直接出力
する。また、一致する発声内容が存在しない場合は、そ
のまま入力テキストを韻律情報検索部１１ａへ渡す。(Embodiment 5) Embodiment 5 embodies the first, second, third, and fourth principles, and FIG. 9 is a block diagram showing the configuration of Embodiment 5. 10a, 1 in FIG.
Reference numeral 0b indicates an acoustic information search unit, and reference numerals 11a and 11b indicate prosody information search units. When the text information is input, the acoustic information search unit 10a searches the acoustic information database 20 using the input text as a key, and if there is utterance content that matches the input text, directly outputs it as acoustic information. If there is no matching utterance content, the input text is directly passed to the prosody information search unit 11a.

【００７１】韻律情報処理１１ａは入力テキストをキー
にして韻律情報データベース２１を検索し、入力テキス
トと一致する発声内容が存在する場合はそれを音響情報
生成部３１へ渡し、また一致する発声内容が存在しない
場合はそのまま言語処理部３０４へ渡す。言語処理部３
０４は単語辞書３０３を参照しつつ、入力テキストを解
析し、表音記号列に変換し、これを音響情報検索部１０
ｂへ渡す。The prosody information processing 11a searches the prosody information database 21 using the input text as a key, and if there is utterance content that matches the input text, passes it to the acoustic information generation unit 31 and the matching utterance content If it does not exist, it is passed to the language processing unit 304 as it is. Language processing unit 3
Reference numeral 04 refers to the word dictionary 303, analyzes the input text, converts it into a phonetic symbol string, and converts this into a phonetic information retrieval unit 10
pass to b.

【００７２】音響情報検索部１０ｂは表音記号列をキー
にして音響情報データベース２０を検索し、表音記号列
と一致する発声内容が存在する場合は、韻律情報データ
ベースを検索の生成，韻律情報の生成は行うことなく、
音響情報データベース２０から取り出した音響情報を直
接出力する。また、一致する発声内容が存在しない場合
は、そのまま音響情報を韻律情報検索部１１ｂへ渡す。The acoustic information retrieving unit 10b searches the acoustic information database 20 using the phonetic symbol string as a key. If there is utterance content that matches the phonetic symbol string, the prosody information database is used to generate a search, prosody information. Without generating
The acoustic information extracted from the acoustic information database 20 is directly output. If there is no matching utterance content, the acoustic information is directly passed to the prosody information searching unit 11b.

【００７３】韻律情報検索部１１ｂは表音記号列をキー
にして韻律情報データベース２１を検索し、表音記号列
と一致する発声内容が存在する場合はこれを直に音響情
報として音響情報生成部３０へ渡し、また存在しない場
合はそのまま韻律情報生成部３０へ渡す。韻律情報生成
部３０は、韻律情報生成規則４０に基づいて入力表音記
号列から韻律情報を生成し、これを音響情報生成部３１
へ渡す。韻律情報生成部３１は音響情報生成規則４１を
用いて韻律情報から音響情報を生成し、これを出力す
る。The prosody information searching unit 11b searches the prosody information database 21 using the phonetic symbol string as a key, and if there is utterance content that matches the phonetic symbol string, this is directly used as acoustic information in the acoustic information generating unit. 30 to the prosody information generation unit 30 if it does not exist. The prosody information generation unit 30 generates prosody information from the input phonetic symbol sequence based on the prosody information generation rule 40, and the prosody information generation unit 31 generates the prosody information.
Hand over to. The prosody information generation unit 31 generates acoustic information from the prosody information using the acoustic information generation rule 41, and outputs this.

【００７４】このような実施例５にあっては、最初に音
響情報データベース２０，韻律情報データベース２１の
検索を行うから、入力テキストが音響情報データベース
２０、又は韻律情報データベース２１内に存在する場
合、それ以降の処理を省略出来ると共に、データベース
の検索後に言語処理を行い、入力テキストを表音記号列
に変換した後、音響情報データベース，韻律情報データ
ベースの検索を行うから、同じ単語が種々異なる表記で
書かれていてもデータベースに格納された発声内容との
マッチングが可能となる。In the fifth embodiment, since the acoustic information database 20 and the prosody information database 21 are searched first, if the input text exists in the acoustic information database 20 or the prosody information database 21, The subsequent processing can be omitted, and after the database is searched, linguistic processing is performed, the input text is converted into a phonetic symbol string, and then the acoustic information database and the prosody information database are searched. Even if it is written, matching with the utterance content stored in the database is possible.

【００７５】[0075]

【発明の効果】第１の発明にあっては、実音声を分析し
て抽出した音声特徴量を格納した音声情報データベース
を用いることで、高品質な合成音声の出力が可能となる
ことは勿論、データベースに存在しない場合には規則に
よる音声合成を行うことで広範囲の入力テキスト，入力
表音記号列からの合成音声の生成が可能となる。According to the first aspect of the present invention, it is of course possible to output a high-quality synthesized voice by using a voice information database that stores voice feature amounts obtained by analyzing and extracting real voice. If it does not exist in the database, it is possible to generate synthetic speech from a wide range of input texts and input phonetic symbol strings by performing speech synthesis according to rules.

【００７６】第２の発明にあっては、実音声を分析し、
抽出した音響的特徴量を用いることで、実音声に近い明
瞭度の高い合成音声が得られる。In the second invention, the actual voice is analyzed,
By using the extracted acoustic feature amount, a synthesized voice with high intelligibility close to real voice can be obtained.

【００７７】第３の発明にあっては、実音声を分析して
抽出した韻律的特徴量を用いることで、実音声に近い自
然性の高い合成音声が得られる。According to the third aspect of the invention, by using the prosodic feature quantity obtained by analyzing the real voice, a synthetic voice with high naturalness close to the real voice can be obtained.

【００７８】第４の発明にあっては、第２，第３の発明
の両効果を備える。The fourth invention has both effects of the second and third inventions.

【００７９】第５の発明にあっては、音響データベース
として音声波形を用いることで、自然性と明瞭性の高い
合成音声が得られる。According to the fifth aspect of the invention, by using a voice waveform as the acoustic database, a synthetic voice with high naturalness and clarity can be obtained.

【００８０】第６の発明にあっては、第５の発明よりも
明瞭度は低いが、少ないデータ量で明瞭度の高い合成音
声が得られる。According to the sixth aspect of the invention, although the clarity is lower than that of the fifth aspect, a synthetic voice having a high degree of clarity can be obtained with a small amount of data.

【００８１】第７の発明にあっては、第５，第６の発明
の効果を兼ね備え得る。In the seventh invention, the effects of the fifth and sixth inventions can be combined.

【００８２】第８の発明にあっては、韻律情報として音
声長・音声強度・ＦＯを用いることで、自然性の高い韻
律情報を持った合成音声が得られる。According to the eighth aspect of the present invention, by using the voice length, voice intensity, and FO as the prosody information, it is possible to obtain a synthetic voice having prosody information with high naturalness.

【００８３】第９の発明にあっては、韻律情報として音
声長・音声強度を用いることで、自然性の高いリズムを
持った合成音声が得られる。According to the ninth aspect of the invention, by using the voice length and voice intensity as the prosody information, a synthetic voice having a highly natural rhythm can be obtained.

【００８４】第１０の発明にあっては、韻律情報データ
として音声長・ＦＯパターンを用いることで、自然性の
高いリズム，イントネーション，アクセントを合成音声
が得られる。According to the tenth aspect of the invention, by using the voice length / FO pattern as the prosody information data, a synthesized voice with highly natural rhythm, intonation and accent can be obtained.

【００８５】第１１の発明にあっては、韻律情報データ
として音声長を用いることで、少ないデータ量で自然性
の高いリズムを持った合成音声が得られる。In the eleventh aspect of the invention, by using the voice length as the prosody information data, a synthetic voice having a highly natural rhythm can be obtained with a small amount of data.

【００８６】第１２の発明にあっては、より自然な韻律
を持った合成音声が得られる。According to the twelfth invention, a synthetic speech having a more natural prosody can be obtained.

[Brief description of drawings]

【図１】本発明の原理を示す原理図である。FIG. 1 is a principle diagram showing the principle of the present invention.

【図２】本発明の他の原理を示す原理図である。FIG. 2 is a principle diagram showing another principle of the present invention.

【図３】本発明の更に他の原理を示す原理図である。FIG. 3 is a principle diagram showing still another principle of the present invention.

【図４】本発明の更に他の原理を示す原理図である。FIG. 4 is a principle diagram showing still another principle of the present invention.

【図５】本発明の実施例１の構成を示すブロック図であ
る。FIG. 5 is a block diagram showing the configuration of the first embodiment of the present invention.

【図６】本発明の実施例２の構成を示すブロック図であ
る。FIG. 6 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.

【図７】本発明の実施例３の構成を示すブロック図であ
る。FIG. 7 is a block diagram showing a configuration of a third exemplary embodiment of the present invention.

【図８】本発明の実施例４の構成を示すブロック図であ
る。FIG. 8 is a block diagram showing a configuration of a fourth embodiment of the present invention.

【図９】本発明の実施例５の構成を示すブロック図であ
る。FIG. 9 is a block diagram showing a configuration of a fifth embodiment of the present invention.

【図１０】従来の音声合成システムの構成を示すブロッ
ク図である。FIG. 10 is a block diagram showing a configuration of a conventional speech synthesis system.

[Explanation of symbols]

１音声情報検索部２音声情報データベース３合成音声生成部４合成音声生成規則５ＤＡ変換部１０音声情報検索部１１韻律情報検索部２０音声情報データベース２１韻律情報データベース３０韻律情報生成部３１音響情報生成部４０韻律情報生成規則４１音響情報生成規則１００音声波形検索部１０１音声パラメータ検索部１１０音声長・音声強度・ＦＯパターン検索部１１１音声長・ＦＯパターン検索部１１２音声長・音声強度パターン検索部１１３音声長パターン検索部２００音声波形データベース２０１音声パラメータデータベース２１０音声長・音声強度・ＦＯデータベース３０１ＦＯパターン生成部３０２音声強度パターン生成部３１０音声パラメータパターン生成部３１１波形生成部 1 Speech Information Retrieval Section 2 Speech Information Database 3 Synthetic Speech Generation Section 4 Synthetic Speech Generation Rule 5 DA Conversion Section 10 Speech Information Retrieval Section 11 Prosody Information Retrieval Section 20 Speech Information Database 21 Prosody Information Database 30 Prosody Information Generation Section 31 Acoustic Information Generation Part 40 Prosody Information Generation Rule 41 Acoustic Information Generation Rule 100 Speech Waveform Search Unit 101 Speech Parameter Search Unit 110 Speech Length / Voice Strength / FO Pattern Search Unit 111 Speech Length / FO Pattern Search Unit 112 Speech Length / Speech Strength Pattern Search Unit 113 Voice length pattern search unit 200 Voice waveform database 201 Voice parameter database 210 Voice length / voice intensity / FO database 301 FO pattern generation unit 302 Voice intensity pattern generation unit 310 Voice parameter pattern generation unit 311 Waveform generation unit

Claims

[Claims]

1. A speech synthesis system for converting an input text or an input phonetic symbol string into synthetic speech and outputting the synthesized speech, a speech information database storing speech features extracted from actual speech, and an input text or an input phonetic symbol. Search means for searching a voice feature amount stored in the voice information database corresponding to a column, and based on the voice feature amount if a voice feature amount corresponding to the voice information database exists as a result of the search. A voice synthesizing system comprising: a means for forming a voice and a means for generating a synthesized voice according to a predetermined rule when a corresponding voice feature amount does not exist.

2. The voice synthesis system according to claim 1, wherein an acoustic information database storing acoustic feature amounts extracted from actual voices in the voice feature amounts is used as the voice information database.

3. The prosody information database that stores prosody features extracted from real voices in the voice features is used as the voice information database.
The voice synthesis system described.

4. The voice information database is characterized by using an acoustic information database storing acoustic feature amounts extracted from real voice and a prosody information database storing prosodic feature amounts extracted from real voice. Item 1
The voice synthesis system described.

5. The voice synthesis system according to claim 2, wherein a voice waveform database storing voice waveforms is used as the acoustic information database.

6. The voice synthesis system according to claim 2, wherein a voice parameter database that stores a spectrum, vocal tract cross-sectional area, or formant frequency is used as the acoustic information database.

7. The voice synthesis system according to claim 2, wherein a voice waveform database and a voice parameter database are used as the acoustic information database.

8. The prosodic information database is a voice length,
5. The voice synthesis system according to claim 3, wherein only a voice length of the voice intensity and the fundamental frequency, or a database storing any two or more is used.

9. The speech synthesis system according to claim 3, wherein a speech length / speech intensity database storing speech length and speech intensity is used as the prosody information database.

10. The voice synthesis system according to claim 3, wherein a voice length / FO database storing a voice length and a fundamental frequency is used as the prosody information database.

11. The voice synthesis system according to claim 3, wherein a voice length database storing only voice length is used as the prosody information database.

12. The prosodic information database includes at least two databases of a voice length / voice strength / FO database, a voice length / voice strength database, a voice length / FO database, and a voice length database. The speech synthesis system according to Item 3 or 4.