JPH0962286A

JPH0962286A - Voice synthesizer and the method thereof

Info

Publication number: JPH0962286A
Application number: JP7213134A
Authority: JP
Inventors: Masato Shimakawa; 真人島川; Tetsuya Kagami; 徹也加賀美; Makoto Akaha; 誠赤羽; Koji Asano; 康治浅野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1995-08-22
Filing date: 1995-08-22
Publication date: 1997-03-07

Abstract

PROBLEM TO BE SOLVED: To generate synthesized voice sound of a proper rhythm according to an input information or sentence. SOLUTION: An input sentence is analyzed by a sentense analyzing part 2, and the result is supplied to a rhythm information generation part 14. The part 14 has therein stored several kinds of pause setting rules for setting pause in synthesized sound. Initial information such as the field of input sentence, average length of one sentence (basic length of sentence), application of synthesized sound (to e.g. reading or composition of sentence), any one of the above rules is selected based on the initial information. Pause is set in the synthesized sound according to the selected rule.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成装置およ
び音声合成方法に関する。特に、合成音の韻律情報を生
成するための、複数種類の韻律情報生成規則のうちのい
ずれかを選択し、その韻律情報生成規則に基づいて、合
成音を生成するようにすることにより、適切な合成音を
提供することができるようにした音声合成装置および音
声合成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer and a speech synthesis method. In particular, by selecting one of a plurality of types of prosody information generation rules for generating prosody information of synthetic sound and generating a synthetic sound based on the prosody information generation rule, it is appropriate. The present invention relates to a voice synthesizing device and a voice synthesizing method capable of providing various synthetic sounds.

【０００２】[0002]

【従来の技術】従来の、例えば漢字かな混じり文などの
入力文（テキスト文）から、その入力文に対応する合成
音を生成する音声合成装置では、入力文を自然言語処理
することにより、入力文の音韻情報および韻律情報が求
められ、その音韻情報および韻律情報に基づいて、入力
文に対応する合成音が生成されるようになされている。2. Description of the Related Art In a conventional speech synthesizing apparatus for generating a synthetic voice corresponding to an input sentence (text sentence) such as a sentence containing kanji and kana, the input sentence is input by natural language processing. Phonological information and prosody information of a sentence are obtained, and a synthetic sound corresponding to an input sentence is generated based on the phonological information and prosody information.

【０００３】このような音声合成装置では、合成音を、
人間の発話に近づけるため、韻律情報を生成するのに、
所定の韻律情報生成規則が用いられるようになされてい
る。韻律情報生成規則は、例えば句読点の位置などに基
づいて、あるいは種々の文章を統計的に分析することに
よって求められる（定められる）。In such a voice synthesizer, the synthesized voice is
To generate prosodic information in order to approximate human speech,
A predetermined prosody information generation rule is used. The prosody information generation rule is obtained (determined) based on, for example, the position of punctuation marks, or by statistically analyzing various sentences.

【０００４】[0004]

【発明が解決しようとする課題】ところで、従来の音声
合成装置では、どのような入力文に対しても、その装置
に内蔵された、一定の韻律情報生成規則に基づいて韻律
情報が生成されるようになされていた。このため、入力
文の特徴にあった韻律情報が生成されず、適切な合成音
が得られない場合があった。By the way, in the conventional speech synthesizer, prosody information is generated for any input sentence based on a constant prosody information generation rule incorporated in the device. It was done like this. Therefore, prosody information that matches the characteristics of the input sentence may not be generated, and an appropriate synthesized speech may not be obtained.

【０００５】即ち、入力文が、例えば天気予報の分野に
属するものである場合には、晴、雨、曇、あるいは地名
などの特定の単語や、降水確率などの数値は強調した方
が、ユーザに対し、情報を的確に伝えることができる。
さらに、入力文の文長が長い場合には、発話速度を速く
し、ポーズの間隔を空けた方が聞き取り易く、また、入
力文の文長が短い場合には、発話速度をあまり速くしな
い方が聞き取り易い。That is, when the input sentence belongs to the field of weather forecast, for example, it is better for the user to emphasize a specific word such as fine, rainy, cloudy, or a place name, or a numerical value such as a probability of precipitation. Can accurately convey information to
Furthermore, if the sentence length of the input sentence is long, it is easier to hear if the utterance speed is faster and the pauses are spaced, and if the sentence length of the input sentence is shorter, the utterance speed is not so fast. Is easy to hear.

【０００６】しかしながら、従来の音声合成装置では、
１つの韻律情報生成規則に基づいて韻律情報が生成され
るため、あらゆる入力文について、情報を的確に伝え、
かつ聞き取り易い合成音を提供するのが困難であった。However, in the conventional speech synthesizer,
Prosody information is generated based on one prosody information generation rule, so information is accurately transmitted for every input sentence,
Moreover, it was difficult to provide a synthetic sound that was easy to hear.

【０００７】さらに、従来の音声合成装置では、必ずし
も、ユーザが希望する韻律の合成音が生成されるとは限
らなかった。即ち、ユーザが、音声合成装置を、例えば
書籍などの朗読という用途で使用する場合には、発話速
度が速く、滑らかな合成音の提供が望まれ、また、音声
合成装置を、例えば文書の校正という用途で使用する場
合には、文全体がはっきり発話され、特定の単語が強調
された合成音の提供が望まれる。しかしながら、従来の
音声合成装置においては、上述のようなユーザが希望す
る合成音が提供されるとは限らなかった。Further, the conventional speech synthesizer does not always generate a synthesized voice having a prosody desired by the user. That is, when the user uses the speech synthesizer for reading a book, for example, it is desirable to provide a smooth synthesized voice with a high speech rate. When used for such a purpose, it is desired to provide a synthetic voice in which the entire sentence is clearly uttered and a specific word is emphasized. However, the conventional speech synthesizer does not always provide the synthesized speech desired by the user as described above.

【０００８】本発明は、このような状況に鑑みてなされ
たものであり、ユーザに対し、適切な合成音を提供する
ことができるようにするものである。The present invention has been made in view of such a situation, and it is possible to provide a user with an appropriate synthesized sound.

【０００９】[0009]

【課題を解決するための手段】請求項１に記載の音声合
成装置は、合成音の韻律情報を生成するための、複数種
類の韻律情報生成規則を記憶している規則記憶手段と、
規則記憶手段に記憶されている複数種類の韻律情報生成
規則のうちのいずれかを選択するための選択情報を入力
するための選択情報入力手段と、入力文を解析する解析
手段と、選択情報入力手段から入力された選択情報に基
づいて、規則記憶手段に記憶されている複数種類の韻律
情報生成規則のうちのいずれかを選択し、その韻律情報
生成規則を用いて、解析手段の解析結果から、韻律情報
を生成する韻律情報生成手段とを備えることを特徴とす
る。According to a first aspect of the present invention, there is provided a voice synthesizing apparatus, which stores a plurality of types of prosody information generation rules for generating prosody information of synthetic voice.
Selection information input means for inputting selection information for selecting one of a plurality of types of prosody information generation rules stored in the rule storage means, analysis means for analyzing an input sentence, and selection information input Based on the selection information input from the means, any one of the plurality of prosody information generation rules stored in the rule storage means is selected, and the prosody information generation rule is used to analyze the analysis result of the analysis means. , Prosody information generating means for generating prosody information.

【００１０】請求項７に記載の音声合成装置は、合成音
の韻律情報を生成するための、複数種類の韻律情報生成
規則を記憶している規則記憶手段と、入力文の特徴を検
出する特徴検出手段と、入力文を解析する解析手段と、
特徴検出手段によって検出された入力文の特徴に基づい
て、規則記憶手段に記憶されている複数種類の韻律情報
生成規則のうちのいずれかを選択し、その韻律情報生成
規則を用いて、解析手段の解析結果から、韻律情報を生
成する韻律情報生成手段とを備えることを特徴とする。According to a seventh aspect of the present invention, there is provided a voice synthesizing device, which stores a plurality of types of prosody information generation rules for generating prosody information of synthesized voice, and a feature for detecting features of an input sentence. Detection means, analysis means for analyzing the input sentence,
Based on the characteristics of the input sentence detected by the characteristic detection means, one of a plurality of types of prosody information generation rules stored in the rule storage means is selected, and the prosody information generation rule is used to analyze means. And a prosody information generating means for generating prosody information from the analysis result of 1.

【００１１】請求項１３に記載の音声合成方法は、入力
文を解析し、規則記憶手段に記憶されている複数種類の
韻律情報生成規則のうちのいずれかを選択し、その韻律
情報生成規則を用いて、入力文の解析結果から、韻律情
報を生成し、その韻律情報に基づいて、合成音を生成す
ることを特徴とする。A speech synthesis method according to a thirteenth aspect analyzes an input sentence, selects one of a plurality of types of prosody information generation rules stored in the rule storage means, and selects the prosody information generation rule. It is characterized by generating prosody information from the analysis result of the input sentence and generating a synthetic sound based on the prosody information.

【００１２】請求項１４に記載の音声合成装置は、合成
音の韻律を制御するための、複数の韻律パラメータを記
憶しているパラメータ記憶手段と、パラメータ記憶手段
に記憶されている複数の韻律パラメータのうちのいずれ
かを選択するための選択情報を入力するための選択情報
入力手段と、入力文を解析する解析手段と、選択情報入
力手段から入力された選択情報に基づいて、パラメータ
記憶手段に記憶されている複数の韻律パラメータのうち
のいずれかを選択する選択手段と、解析手段の解析結果
に基づいて、選択手段により選択された韻律パラメータ
に対応する韻律の合成音を生成する合成音生成手段とを
備えることを特徴とする。According to a fourteenth aspect of the present invention, there is provided a voice synthesizing device, wherein a parameter storage means for controlling a prosody of a synthesized voice stores a plurality of prosody parameters and a plurality of prosody parameters stored in the parameter storage means. A selection information input means for inputting selection information for selecting any one of the above, an analysis means for analyzing an input sentence, and a parameter storage means based on the selection information input from the selection information input means. Selection means for selecting any one of a plurality of stored prosody parameters, and synthesis sound generation for generating a synthesis sound of a prosody corresponding to the prosody parameter selected by the selection means based on the analysis result of the analysis means. And means.

【００１３】請求項１９に記載の音声合成方法は、パラ
メータ記憶手段に記憶されている複数の韻律パラメータ
のうちのいずれかを選択するための選択情報が入力され
たとき、選択情報に基づいて、パラメータ記憶手段に記
憶されている複数の韻律パラメータのうちのいずれかを
選択し、その韻律パラメータに対応する韻律の合成音を
生成することを特徴とする。According to a nineteenth aspect of the present invention, in the voice synthesizing method, when selection information for selecting one of a plurality of prosody parameters stored in the parameter storage means is input, based on the selection information, It is characterized in that any one of a plurality of prosody parameters stored in the parameter storage means is selected and a synthetic sound of a prosody corresponding to the prosody parameter is generated.

【００１４】請求項２０に記載の音声合成装置は、合成
音の韻律を制御するための、複数の韻律パラメータを記
憶しているパラメータ記憶手段と、入力文の特徴を検出
する特徴検出手段と、入力文を解析する解析手段と、特
徴検出手段によって検出された入力文の特徴に基づい
て、パラメータ記憶手段に記憶されている複数の韻律パ
ラメータのうちのいずれかを選択する選択手段と、解析
手段の解析結果に基づいて、選択手段により選択された
韻律パラメータに対応する韻律の合成音を生成する合成
音生成手段とを備えることを特徴とする。According to a twentieth aspect of the present invention, there is provided a speech synthesizing apparatus, which stores a plurality of prosody parameters for controlling the prosody of synthesized speech, a feature detecting means for detecting a feature of an input sentence, and a feature detecting means. Analyzing means for analyzing the input sentence; selecting means for selecting one of a plurality of prosody parameters stored in the parameter storing means based on the characteristics of the input sentence detected by the characteristic detecting means; and analyzing means Based on the analysis result of (1), there is provided a synthetic sound generating means for generating a synthetic sound of a prosody corresponding to the prosody parameter selected by the selecting means.

【００１５】請求項２４に記載の音声合成方法は、入力
文の特徴を検出し、入力文の特徴に基づいて、パラメー
タ記憶手段に記憶されている複数の韻律パラメータのう
ちのいずれかを選択し、その韻律パラメータに対応する
韻律の合成音を生成することを特徴とする。A speech synthesis method according to a twenty-fourth aspect of the present invention detects a feature of an input sentence and selects one of a plurality of prosody parameters stored in the parameter storage means based on the feature of the input sentence. , Producing a synthetic sound of a prosody corresponding to the prosody parameter.

【００１６】請求項１に記載の音声合成装置において
は、規則記憶手段は、合成音の韻律情報を生成するため
の、複数種類の韻律情報生成規則を記憶しており、選択
情報入力手段は、規則記憶手段に記憶されている複数種
類の韻律情報生成規則のうちのいずれかを選択するため
の選択情報を入力することができるようになされてい
る。解析手段は、入力文を解析し、韻律情報生成手段
は、選択情報入力手段から入力された選択情報に基づい
て、規則記憶手段に記憶されている複数種類の韻律情報
生成規則のうちのいずれかを選択し、その韻律情報生成
規則を用いて、解析手段の解析結果から、韻律情報を生
成するようになされている。In the voice synthesizing apparatus according to the first aspect, the rule storage means stores a plurality of types of prosody information generation rules for generating prosody information of synthesized voice, and the selection information input means, Selection information for selecting any one of a plurality of types of prosody information generation rules stored in the rule storage means can be input. The analyzing means analyzes the input sentence, and the prosody information generating means, based on the selection information inputted from the selection information inputting means, selects one of a plurality of types of prosody information generating rules stored in the rule storing means. Is selected, and the prosody information generation rule is used to generate prosody information from the analysis result of the analysis means.

【００１７】請求項７に記載の音声合成装置において
は、規則記憶手段は、合成音の韻律情報を生成するため
の、複数種類の韻律情報生成規則を記憶しており、特徴
検出手段は、入力文の特徴を検出するようになされてい
る。解析手段は、入力文を解析し、韻律情報生成手段
は、特徴検出手段によって検出された入力文の特徴に基
づいて、規則記憶手段に記憶されている複数種類の韻律
情報生成規則のうちのいずれかを選択し、その韻律情報
生成規則を用いて、解析手段の解析結果から、韻律情報
を生成するようになされている。In the speech synthesizer according to the seventh aspect, the rule storage means stores a plurality of types of prosody information generation rules for generating prosody information of synthetic speech, and the feature detection means inputs It is designed to detect sentence characteristics. The analysis unit analyzes the input sentence, and the prosody information generation unit selects one of the plurality of types of prosody information generation rules stored in the rule storage unit based on the feature of the input sentence detected by the feature detection unit. Is selected and the prosody information generation rule is used to generate prosody information from the analysis result of the analysis means.

【００１８】請求項１３に記載の音声合成方法において
は、入力文を解析し、規則記憶手段に記憶されている複
数種類の韻律情報生成規則のうちのいずれかを選択し、
その韻律情報生成規則を用いて、入力文の解析結果か
ら、韻律情報を生成し、その韻律情報に基づいて、合成
音を生成するようになされている。In the speech synthesis method according to the thirteenth aspect, the input sentence is analyzed, and one of a plurality of types of prosody information generation rules stored in the rule storage means is selected,
By using the prosody information generation rule, prosody information is generated from the analysis result of the input sentence, and a synthetic sound is generated based on the prosody information.

【００１９】請求項１４に記載の音声合成装置において
は、パラメータ記憶手段は、合成音の韻律を制御するた
めの、複数の韻律パラメータを記憶しており、選択情報
入力手段は、パラメータ記憶手段に記憶されている複数
の韻律パラメータのうちのいずれかを選択するための選
択情報を入力することができるようになされている。解
析手段は、入力文を解析し、選択手段は、選択情報入力
手段から入力された選択情報に基づいて、パラメータ記
憶手段に記憶されている複数の韻律パラメータのうちの
いずれかを選択するようになされている。合成音生成手
段は、解析手段の解析結果に基づいて、選択手段により
選択された韻律パラメータに対応する韻律の合成音を生
成するようになされている。In the speech synthesizer according to the fourteenth aspect, the parameter storage means stores a plurality of prosody parameters for controlling the prosody of the synthetic voice, and the selection information input means stores in the parameter storage means. Selection information for selecting any one of the stored prosody parameters can be input. The analysis unit analyzes the input sentence, and the selection unit selects one of the plurality of prosody parameters stored in the parameter storage unit based on the selection information input from the selection information input unit. Has been done. The synthetic sound generation means is adapted to generate a synthetic sound of a prosody corresponding to the prosody parameter selected by the selection means, based on the analysis result of the analysis means.

【００２０】請求項１９に記載の音声合成方法において
は、パラメータ記憶手段に記憶されている複数の韻律パ
ラメータのうちのいずれかを選択するための選択情報が
入力されたとき、選択情報に基づいて、パラメータ記憶
手段に記憶されている複数の韻律パラメータのうちのい
ずれかを選択し、その韻律パラメータに対応する韻律の
合成音を生成するようになされている。In the speech synthesis method according to claim 19, when selection information for selecting one of the plurality of prosody parameters stored in the parameter storage means is input, based on the selection information. , Any one of a plurality of prosody parameters stored in the parameter storage means is selected, and a synthetic sound having a prosody corresponding to the selected prosody parameter is generated.

【００２１】請求項２０に記載の音声合成装置において
は、パラメータ記憶手段は、合成音の韻律を制御するた
めの、複数の韻律パラメータを記憶しており、特徴検出
手段は、入力文の特徴を検出するようになされている。
解析手段は、入力文を解析し、選択手段は、特徴検出手
段によって検出された入力文の特徴に基づいて、パラメ
ータ記憶手段に記憶されている複数の韻律パラメータの
うちのいずれかを選択するようになされている。合成音
生成手段は、解析手段の解析結果に基づいて、選択手段
により選択された韻律パラメータに対応する韻律の合成
音を生成するようになされている。In the speech synthesizer according to the twentieth aspect, the parameter storage means stores a plurality of prosody parameters for controlling the prosody of the synthetic voice, and the feature detection means stores the features of the input sentence. It is designed to detect.
The analysis unit analyzes the input sentence, and the selection unit selects one of the plurality of prosody parameters stored in the parameter storage unit based on the feature of the input sentence detected by the feature detection unit. Has been done. The synthetic sound generation means is adapted to generate a synthetic sound of a prosody corresponding to the prosody parameter selected by the selection means, based on the analysis result of the analysis means.

【００２２】請求項２４に記載の音声合成方法において
は、入力文の特徴を検出し、入力文の特徴に基づいて、
パラメータ記憶手段に記憶されている複数の韻律パラメ
ータのうちのいずれかを選択し、その韻律パラメータに
対応する韻律の合成音を生成するようになされている。In the speech synthesis method according to the twenty-fourth aspect, the feature of the input sentence is detected, and based on the feature of the input sentence,
Any one of a plurality of prosody parameters stored in the parameter storage means is selected, and a synthetic sound having a prosody corresponding to the prosody parameter is generated.

【００２３】[0023]

【発明の実施の形態】以下に、本発明の実施例を説明す
るが、その前に、特許請求の範囲に記載の発明の各手段
と以下の実施例との対応関係を明らかにするために、各
手段の後の括弧内に、対応する実施例（但し、一例）を
付加して、本発明の特徴を記述すると、次のようにな
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below, but before that, in order to clarify the correspondence between each means of the invention described in the claims and the following embodiments. The features of the present invention are described as follows by adding a corresponding embodiment (however, an example) in parentheses after each means.

【００２４】即ち、請求項１に記載の音声合成装置は、
入力文に対応する合成音を生成する音声合成装置であっ
て、合成音の韻律情報を生成するための、複数種類の韻
律情報生成規則を記憶している規則記憶手段（例えば、
図１１に示すポーズ設定規則辞書３８Ａなど）と、規則
記憶手段に記憶されている複数種類の韻律情報生成規則
のうちのいずれかを選択するための選択情報を入力する
ための選択情報入力手段（例えば、図１に示す初期情報
設定部１など）と、入力文を解析する解析手段（例え
ば、図１に示す言語解析部２など）と、選択情報入力手
段から入力された選択情報に基づいて、規則記憶手段に
記憶されている複数種類の韻律情報生成規則のうちのい
ずれかを選択し、その韻律情報生成規則を用いて、解析
手段の解析結果から、韻律情報を生成する韻律情報生成
手段（例えば、図１１に示すポーズ設定部３８など）
と、韻律情報生成手段により生成された韻律情報に基づ
いて、合成音を生成する合成音生成手段（例えば、図１
に示す音響処理部１５など）とを備えることを特徴とす
る。That is, the speech synthesizer according to claim 1 is
A rule synthesizing device for generating a synthetic voice corresponding to an input sentence, which stores a plurality of types of prosody information generation rules for generating prosodic information of the synthetic voice (for example,
(Pose setting rule dictionary 38A, etc. shown in FIG. 11) and selection information input means for inputting selection information for selecting any one of a plurality of types of prosody information generation rules stored in the rule storage means. For example, based on the initial information setting unit 1 shown in FIG. 1), analysis means for analyzing an input sentence (for example, language analysis unit 2 shown in FIG. 1), and selection information input from the selection information input means. , A prosody information generation means for selecting any one of a plurality of types of prosody information generation rules stored in the rule storage means, and using the prosody information generation rule to generate prosody information from the analysis result of the analysis means. (For example, the pose setting unit 38 shown in FIG. 11)
And a synthetic sound generation unit that generates a synthetic sound based on the prosody information generated by the prosody information generation unit (for example, FIG. 1).
Sound processing unit 15 shown in FIG.

【００２５】請求項４に記載の音声合成装置は、入力文
が属する分野についての情報である選択情報に基づい
て、強調すべき単語を決定する強調単語決定手段（例え
ば、図１に示す強調語選択部１１など）をさらに備え、
韻律情報生成手段が、強調単語決定手段により決定され
た単語が強調されるように、韻律情報を生成することを
特徴とする。The speech synthesizer according to a fourth aspect of the present invention is an emphasized word determining means (for example, the emphasized word shown in FIG. 1) that determines a word to be emphasized based on selection information which is information about a field to which an input sentence belongs. Further includes a selection unit 11),
It is characterized in that the prosody information generation means generates prosody information so that the word determined by the emphasis word determination means is emphasized.

【００２６】請求項７に記載の音声合成装置は、入力文
に対応する合成音を生成する音声合成装置であって、合
成音の韻律情報を生成するための、複数種類の韻律情報
生成規則を記憶している規則記憶手段（例えば、図１１
に示すポーズ設定規則辞書３８Ａなど）と、入力文の特
徴を検出する特徴検出手段（例えば、図１に示す分野判
定部６または基準文長判定部９など）と、入力文を解析
する解析手段（例えば、図１に示す言語解析部２など）
と、特徴検出手段によって検出された入力文の特徴に基
づいて、規則記憶手段に記憶されている複数種類の韻律
情報生成規則のうちのいずれかを選択し、その韻律情報
生成規則を用いて、解析手段の解析結果から、韻律情報
を生成する韻律情報生成手段（例えば、図１１に示すポ
ーズ設定部３８など）と、韻律情報生成手段により生成
された韻律情報に基づいて、合成音を生成する合成音生
成手段（例えば、図１に示す音響処理部１５など）とを
備えることを特徴とする。A speech synthesizer according to a seventh aspect is a speech synthesizer for generating synthetic speech corresponding to an input sentence, and has a plurality of types of prosody information generation rules for generating prosody information of synthetic speech. The stored rule storage means (for example, FIG. 11).
The pose setting rule dictionary 38A shown in FIG. 2), feature detecting means for detecting the features of the input sentence (for example, the field determining unit 6 or the reference sentence length determining unit 9 shown in FIG. 1), and analyzing means for analyzing the input sentence. (For example, the language analysis unit 2 shown in FIG. 1)
Based on the characteristics of the input sentence detected by the characteristic detection means, any one of a plurality of types of prosody information generation rules stored in the rule storage means is selected, and using the prosody information generation rules, Based on the analysis result of the analysis means, a synthetic sound is generated based on the prosody information generation means (for example, the pose setting unit 38 shown in FIG. 11) which generates prosody information and the prosody information generated by the prosody information generation means. It is characterized by comprising a synthetic sound generation means (for example, the sound processing unit 15 shown in FIG. 1).

【００２７】請求項９に記載の音声合成装置は、特徴検
出手段によって検出された入力文が属する分野に基づい
て、強調すべき単語を決定する強調単語決定手段（例え
ば、図１１に示す強調語選択部１１など）をさらに備
え、韻律情報生成手段が、強調単語決定手段により決定
された単語が強調されるように、韻律情報を生成するこ
とを特徴とする。A speech synthesis apparatus according to a ninth aspect of the present invention is an emphasized word determining means (for example, an emphasized word shown in FIG. 11) that determines a word to be emphasized based on the field to which the input sentence detected by the feature detecting means belongs. Further, the prosody information generating means is characterized in that the prosody information generating means generates prosody information so that the word determined by the emphasis word determining means is emphasized.

【００２８】請求項１０に記載の音声合成装置は、入力
文から、所定のキーワードを検出するキーワード検出手
段（例えば、図１に示すキーワード抽出部３など）をさ
らに備え、特徴検出手段が、キーワード検出手段によっ
て検出されたキーワードに基づいて、入力文が属する分
野を検出することを特徴とする。A speech synthesizer according to a tenth aspect of the invention further comprises keyword detecting means (for example, the keyword extracting section 3 shown in FIG. 1) for detecting a predetermined keyword from the input sentence, and the feature detecting means is a keyword. It is characterized in that the field to which the input sentence belongs is detected based on the keyword detected by the detecting means.

【００２９】請求項１３に記載の音声合成方法は、入力
文に対応する合成音の韻律情報を生成するための、複数
種類の韻律情報生成規則を記憶している規則記憶手段
（例えば、図１１に示すポーズ設定規則辞書３８Ａな
ど）を備える音声合成装置の音声合成方法であって、入
力文を解析し、規則記憶手段に記憶されている複数種類
の韻律情報生成規則のうちのいずれかを選択し、その韻
律情報生成規則を用いて、入力文の解析結果から、韻律
情報を生成し、その韻律情報に基づいて、合成音を生成
することを特徴とする。A speech synthesis method according to a thirteenth aspect is a rule storage means (for example, FIG. 11) storing a plurality of types of prosody information generation rules for generating prosody information of synthesized speech corresponding to an input sentence. A pose setting rule dictionary 38A shown in FIG. 2) and the like, wherein the input sentence is analyzed and any one of a plurality of prosody information generation rules stored in the rule storage means is selected. Then, using the prosody information generation rule, prosody information is generated from the analysis result of the input sentence, and a synthetic sound is generated based on the prosody information.

【００３０】請求項１４に記載の音声合成装置は、入力
文に対応する合成音を生成する音声合成装置であって、
合成音の韻律を制御するための、複数の韻律パラメータ
を記憶しているパラメータ記憶手段（例えば、図８に示
す韻律パラメータ辞書３５など）と、パラメータ記憶手
段に記憶されている複数の韻律パラメータのうちのいず
れかを選択するための選択情報を入力するための選択情
報入力手段（例えば、図１に示す初期情報設定部１な
ど）と、入力文を解析する解析手段（例えば、図１に示
す言語解析部２など）と、選択情報入力手段から入力さ
れた選択情報に基づいて、パラメータ記憶手段に記憶さ
れている複数の韻律パラメータのうちのいずれかを選択
する選択手段（例えば、図８に示す韻律パラメータ選択
部３７など）と、解析手段の解析結果に基づいて、選択
手段により選択された韻律パラメータに対応する韻律の
合成音を生成する合成音生成手段（例えば、図１に示す
音響処理部１５など）とを備えることを特徴とする。A voice synthesizer according to a fourteenth aspect is a voice synthesizer which generates a synthesized voice corresponding to an input sentence.
A parameter storage unit (for example, the prosody parameter dictionary 35 shown in FIG. 8) that stores a plurality of prosody parameters for controlling the prosody of the synthetic sound, and a plurality of prosody parameters stored in the parameter storage unit. Selection information input means (for example, the initial information setting unit 1 shown in FIG. 1) for inputting selection information for selecting one of them, and analysis means for analyzing an input sentence (for example, shown in FIG. 1) Selection means for selecting any one of a plurality of prosody parameters stored in the parameter storage means based on the selection information input from the language analysis unit 2) and the selection information input means (for example, in FIG. 8). Based on the analysis result of the analyzing means and the prosody parameter selecting section 37, etc.) and a synthetic sound of a prosody corresponding to the prosody parameter selected by the selecting means. Sound generation means (for example, the acoustic processing unit 15 shown in FIG. 1), characterized in that it comprises a.

【００３１】請求項１９に記載の音声合成方法は、入力
文に対応する合成音の韻律を制御するための、複数の韻
律パラメータを記憶しているパラメータ記憶手段（例え
ば、図８に示す韻律パラメータ辞書３５など）を備える
音声合成装置の音声合成方法であって、パラメータ記憶
手段に記憶されている複数の韻律パラメータのうちのい
ずれかを選択するための選択情報が入力されたとき、選
択情報に基づいて、パラメータ記憶手段に記憶されてい
る複数の韻律パラメータのうちのいずれかを選択し、そ
の韻律パラメータに対応する韻律の合成音を生成するこ
とを特徴とする。A speech synthesis method according to a nineteenth aspect of the present invention is a parameter storage means (for example, the prosody parameter shown in FIG. 8) that stores a plurality of prosody parameters for controlling the prosody of a synthesized voice corresponding to an input sentence. A voice synthesizing method for a voice synthesizing device including a dictionary 35, etc., wherein when selection information for selecting one of a plurality of prosody parameters stored in the parameter storage means is input, Based on this, any one of the plurality of prosody parameters stored in the parameter storage means is selected, and a synthetic sound of a prosody corresponding to the prosody parameter is generated.

【００３２】請求項２０に記載の音声合成装置は、入力
文に対応する合成音を生成する音声合成装置であって、
合成音の韻律を制御するための、複数の韻律パラメータ
を記憶しているパラメータ記憶手段（例えば、図８に示
す韻律パラメータ辞書３５など）と、入力文の特徴を検
出する特徴検出手段（例えば、図１に示す分野判定部６
または基準文長判定部９など）と、入力文を解析する解
析手段（例えば、図１に示す言語解析部２など）と、特
徴検出手段によって検出された入力文の特徴に基づい
て、パラメータ記憶手段に記憶されている複数の韻律パ
ラメータのうちのいずれかを選択する選択手段（例え
ば、図８に示す韻律パラメータ選択部３７など）と、解
析手段の解析結果に基づいて、選択手段により選択され
た韻律パラメータに対応する韻律の合成音を生成する合
成音生成手段（例えば、図１に示す音響処理部１５な
ど）とを備えることを特徴とする。A speech synthesizer according to a twentieth aspect is a speech synthesizer which generates synthetic speech corresponding to an input sentence,
Parameter storage means (for example, the prosody parameter dictionary 35 shown in FIG. 8) that stores a plurality of prosody parameters for controlling the prosody of the synthetic voice, and feature detection means (for example, the prosody parameter dictionary 35 shown in FIG. 8). Field determination unit 6 shown in FIG.
Alternatively, based on the features of the input sentence detected by the feature detection unit, the reference sentence length determination unit 9 and the like), analysis means for analyzing the input sentence (for example, the language analysis unit 2 and the like shown in FIG. 1), and parameter storage. Selecting means for selecting one of the plurality of prosody parameters stored in the means (for example, the prosody parameter selecting unit 37 shown in FIG. 8) and the selecting means based on the analysis result of the analyzing means. It is characterized by comprising a synthesized sound generating means (for example, the acoustic processing unit 15 shown in FIG. 1) which generates a synthesized sound of a prosody corresponding to the prosody parameter.

【００３３】請求項２４に記載の音声合成方法は、入力
文に対応する合成音の韻律を制御するための、複数の韻
律パラメータを記憶しているパラメータ記憶手段（例え
ば、図８に示す韻律パラメータ辞書３５など）を備える
音声合成装置の音声合成方法であって、入力文の特徴を
検出し、入力文の特徴に基づいて、パラメータ記憶手段
に記憶されている複数の韻律パラメータのうちのいずれ
かを選択し、その韻律パラメータに対応する韻律の合成
音を生成することを特徴とする。A speech synthesis method according to a twenty-fourth aspect of the present invention is a parameter storage means for storing a plurality of prosody parameters for controlling the prosody of a synthesized voice corresponding to an input sentence (for example, the prosody parameters shown in FIG. 8). A voice synthesis method for a voice synthesizer including a dictionary 35, etc., wherein any one of a plurality of prosody parameters stored in the parameter storage means is detected based on the feature of the input sentence. Is selected, and a synthetic sound of a prosody corresponding to the prosody parameter is generated.

【００３４】なお、勿論この記載は、各手段を上記した
ものに限定することを意味するものではない。Of course, this description does not mean that each means is limited to the above.

【００３５】図１は、本発明を適用した音声合成装置の
一実施例の構成を示している。この音声合成装置では、
入力文の特徴を反映し、さらに、ユーザの希望に即した
合成音を生成することができるようになされている。FIG. 1 shows the configuration of an embodiment of a speech synthesizer to which the present invention is applied. In this speech synthesizer,
By reflecting the characteristics of the input sentence, it is possible to generate a synthesized voice that meets the user's wishes.

【００３６】即ち、初期情報設定部１は、後述するポー
ズ設定規則辞書３８Ａ（図１１）に記憶されている、複
数種類の韻律情報生成規則としての、例えばポーズ設定
規則（優先度規則および絞り込み規則）のうちのいずれ
かを選択するための初期情報（選択情報）を入力するこ
とができるようになされている。従って、ユーザは、初
期情報設定部１を操作することで、初期情報としての、
例えば入力文が属する分野についての情報（例えば、入
力文が天気予報、あるいは政治などに関するものである
というような情報）や、入力文の基準文長についての情
報、この音声合成装置から出力される合成音の用途につ
いての情報（例えば、合成音を、朗読、あるいは文書の
校正などに用いるというような情報）を入力することが
できるようになされている。That is, the initial information setting unit 1 stores, for example, a pause setting rule (priority rule and narrowing rule) as a plurality of types of prosody information generation rules stored in a pose setting rule dictionary 38A (FIG. 11) described later. ), Initial information (selection information) for selecting any of the above can be input. Therefore, the user operates the initial information setting unit 1 to obtain the initial information,
For example, information about the field to which the input sentence belongs (for example, information that the input sentence is related to weather forecast, politics, etc.), information about the reference sentence length of the input sentence, and output from this speech synthesizer. It is possible to input information about the intended use of the synthetic voice (for example, the information that the synthetic voice is used for reading aloud or proofreading a document).

【００３７】ここで、入力文の基準文長とは、次に説明
する言語解析部２に入力される入力文の平均的な文長を
意味する。なお、文長とは、その入力文の韻律的な長さ
を意味し、ここでは、入力文のモーラ数（拍数）と一致
するものとする。Here, the reference sentence length of the input sentence means the average sentence length of the input sentence input to the language analysis unit 2 described below. The sentence length means the prosodic length of the input sentence, and here, it is assumed to match the mora number (beat number) of the input sentence.

【００３８】初期情報設定部１を操作することで入力さ
れた初期情報は、分野判定部６、基準文長判定部９、韻
律情報生成規則選択部１３、および韻律情報生成部１４
に供給されるようになされている。The initial information input by operating the initial information setting unit 1 includes the field determination unit 6, the reference sentence length determination unit 9, the prosody information generation rule selection unit 13, and the prosody information generation unit 14.
To be supplied.

【００３９】言語解析部２には、合成音とする入力文
が、例えばテキストデータの形で入力されるようになさ
れている。なお、言語解析部２に入力する入力文は、必
ずしもテキストデータの形である必要はなく、その他
（例えば、他のプログラムからの出力データなど）の形
であっても良い。An input sentence, which is a synthesized voice, is input to the language analysis unit 2, for example, in the form of text data. The input sentence input to the language analysis unit 2 does not necessarily have to be in the form of text data, but may be in the form of another (for example, output data from another program).

【００４０】言語解析部２は、入力文を自然言語処理、
即ち言語解析するようになされている。言語解析部２に
おける入力文の解析結果は、キーワード抽出部３、文長
取得部７、および韻律情報生成部１４に供給されるよう
になされている。The language analysis unit 2 processes the input sentence by natural language processing,
That is, the language is analyzed. The analysis result of the input sentence in the language analysis unit 2 is supplied to the keyword extraction unit 3, the sentence length acquisition unit 7, and the prosody information generation unit 14.

【００４１】キーワード抽出部３は、言語解析部２から
の解析結果に基づいて、入力文に含まれるキーワードを
検出（抽出）し、分野判定部６に出力するようになされ
ている。キーワード辞書４は、キーワードと、そのキー
ワードが使用される分野とを、対応付けて記憶してい
る。キーワード履歴テーブル５は、キーワード抽出部３
で入力文から抽出されたキーワードの履歴（以下、適
宜、キーワード履歴という）を記憶するようになされて
いる。分野判定部６は、キーワード抽出部３から供給さ
れるキーワードを、キーワード辞書４から検索し、その
キーワードと対応付けられている分野を、キーワード履
歴テーブル５に記憶させるようになされている。そし
て、分野判定部６は、キーワード履歴テーブル５に記憶
されたキーワード履歴および初期情報設定部１からの初
期情報のうちの分野に基づいて、言語解析部２に入力さ
れた入力文の特徴の１つとしての、その入力文が属する
分野を判定し、その判定結果を強調語選択部１１、韻律
情報生成規則選択部１３、および韻律情報生成部１４に
供給するようになされている。The keyword extraction unit 3 detects (extracts) a keyword included in the input sentence based on the analysis result from the language analysis unit 2 and outputs it to the field determination unit 6. The keyword dictionary 4 stores keywords and fields in which the keywords are used in association with each other. The keyword history table 5 includes the keyword extraction unit 3
The history of keywords extracted from the input sentence (hereinafter, appropriately referred to as keyword history) is stored. The field determination unit 6 is configured to search the keyword supplied from the keyword extraction unit 3 from the keyword dictionary 4 and store the field associated with the keyword in the keyword history table 5. Then, the field determination unit 6 determines 1 of the characteristics of the input sentence input to the language analysis unit 2 based on the field of the keyword history stored in the keyword history table 5 and the initial information from the initial information setting unit 1. As an example, the field to which the input sentence belongs is determined, and the determination result is supplied to the emphasized word selection unit 11, the prosody information generation rule selection unit 13, and the prosody information generation unit 14.

【００４２】文長取得部７は、言語解析部２からの入力
文の解析結果に基づいて、その入力文の特徴の１つとし
ての文長を算出し、基準文長判定部９に出力するように
なされている。文長履歴テーブル８は、文長取得部７か
ら供給される文長の履歴（以下、適宜、文長履歴とい
う）を記憶するようになされている。基準文長判定部９
は、文長履歴テーブル８に記憶された文長履歴に基づい
て、基準文長を算出し、その基準文長が、長いか、また
は短いか、あるいはその中間程度かを表す情報としての
文長情報を、韻律情報生成規則選択部１３および韻律情
報生成部１４に出力するようになされている。なお、文
長情報は、基準文長が、長いか、または短いか、あるい
はその中間程度かの３ランクである必要はなく、２ラン
クや４ランク以上の情報とすることが可能である。The sentence length acquisition unit 7 calculates the sentence length as one of the features of the input sentence based on the analysis result of the input sentence from the language analysis unit 2, and outputs it to the reference sentence length determination unit 9. It is done like this. The sentence length history table 8 stores a sentence length history supplied from the sentence length acquisition unit 7 (hereinafter, appropriately referred to as a sentence length history). Standard sentence length determination unit 9
Is a sentence length history table 8 that calculates a reference sentence length based on the sentence length history stored in the sentence length history table 8. The information is output to the prosody information generation rule selection unit 13 and the prosody information generation unit 14. Note that the sentence length information does not need to be three ranks, that is, whether the reference sentence length is long, short, or intermediate, and can be information of two ranks or four ranks or more.

【００４３】強調語辞書１０は、入力文が属する分野別
に、強調すべき単語およびその品詞を記憶している。強
調語選択部１１は、分野判定部６から供給される入力文
の分野（入力文が属する分野）に基づいて、強調すべき
単語および品詞を決定し、韻律情報生成部１４に供給す
るようになされている。The emphasized word dictionary 10 stores words to be emphasized and their parts of speech for each field to which the input sentence belongs. The emphasized word selection unit 11 determines a word and a part of speech to be emphasized based on the field of the input sentence (the field to which the input sentence belongs) supplied from the field determination unit 6, and supplies the word and the part of speech to the prosody information generation unit 14. Has been done.

【００４４】パラメータ辞書１２は、韻律情報を生成す
るためのパラメータを記憶している。パラメータ選択部
１３は、初期情報設定部１からの初期情報、基準文長判
定部９からの文長情報、および分野判定部６からの入力
文の分野に基づいて、パラメータ辞書１２から、韻律情
報を生成するためのパラメータを選択して読み出し、韻
律情報生成部１４に供給するようになされている。The parameter dictionary 12 stores parameters for generating prosody information. The parameter selection unit 13 uses the initial information from the initial information setting unit 1, the sentence length information from the reference sentence length determination unit 9, and the field of the input sentence from the field determination unit 6 to extract prosodic information from the parameter dictionary 12. Are selected and read out and supplied to the prosody information generation unit 14.

【００４５】韻律情報生成部１４は、初期情報設定部１
からの初期情報、基準文長判定部９からの文長情報、分
野判定部６からの入力文の分野、強調語選択部１１から
の強調すべき単語と品詞、パラメータ選択部１３からの
パラメータ、および言語解析部２からの入力文の解析結
果に基づいて、入力文の韻律情報を生成し、音響処理部
１５に供給するようになされている。The prosody information generation unit 14 includes an initial information setting unit 1
From the reference sentence length determination unit 9, the sentence length information from the reference sentence length determination unit 9, the field of the input sentence from the field determination unit 6, the words and parts of speech to be emphasized from the emphasized word selection unit 11, the parameters from the parameter selection unit 13, Based on the analysis result of the input sentence from the language analysis unit 2, prosody information of the input sentence is generated and supplied to the sound processing unit 15.

【００４６】即ち、韻律情報生成部１４は、初期情報設
定部１からの初期情報、基準文長判定部９からの文長情
報、および分野判定部６からの入力文の分野に基づい
て、ポーズ設定規則辞書３８Ａ（図１１）に記憶されて
いる複数種類のポーズ設定規則のうちのいずれかを選択
し、そのポーズ設定規則にしたがい、パラメータ選択部
１３からのパラメータを用いて、言語解析部２からの構
造木に、韻律情報の１つであるポーズの位置を設定する
ようになされている。また、韻律情報生成部１４は、強
調語選択部１１からの強調すべき単語および品詞に対応
して、韻律情報の１つであるアクセントの大きさ（大き
さと位置）を決定するようにもなされている。さらに、
韻律情報生成部１４は、パラメータ選択部１３からのパ
ラメータに基づいて、その他の韻律情報としての、例え
ば発話速度やフレーズの大きさなどを決定するようにも
なされている。That is, the prosody information generation unit 14 poses based on the initial information from the initial information setting unit 1, the sentence length information from the reference sentence length determination unit 9, and the field of the input sentence from the field determination unit 6. The language analyzing unit 2 is selected by selecting one of a plurality of types of pose setting rules stored in the setting rule dictionary 38A (FIG. 11) and using the parameters from the parameter selecting unit 13 according to the pose setting rule. The position of the pose which is one of the prosody information is set in the structure tree from. Further, the prosody information generation unit 14 is also configured to determine the size (size and position) of the accent, which is one of the prosody information, corresponding to the word and the part of speech to be emphasized from the emphasis word selection unit 11. ing. further,
The prosody information generation unit 14 is also configured to determine, for example, the speech rate and the size of a phrase as other prosody information based on the parameters from the parameter selection unit 13.

【００４７】音響処理部１５は、韻律情報生成部１４か
らの韻律情報に基づいて合成音を生成し、スピーカ１５
Ａに供給して出力させるようになされている。The acoustic processing unit 15 generates a synthetic sound based on the prosody information from the prosody information generation unit 14, and the speaker 15
It is designed to be supplied to A and output.

【００４８】以上のように構成される音声合成装置で
は、まず、ユーザによって初期情報設定部１が操作され
ることで、初期情報が入力される。ここで、初期情報設
定部１は、モニタを有し、そのモニタには、例えば図２
（Ａ）乃至図２（Ｃ）にそれぞれ示すような、初期情報
としての合成音の用途、入力文の分野、または入力文の
基準文長を設定するためのリスト（用途リスト、分野リ
スト、基準文長リスト）が表示されるようになされてい
る。ユーザは、それぞれのリストの項目の中からいずれ
かを選択することで、初期情報の入力を行うことができ
る。In the speech synthesizer configured as described above, first, the user operates the initial information setting section 1 to input the initial information. Here, the initial information setting unit 1 has a monitor, and the monitor has, for example, FIG.
Lists (application list, field list, reference) for setting the use of the synthesized voice as the initial information, the field of the input sentence, or the reference sentence length of the input sentence as shown in (A) to (C) of FIG. Sentence length list) is displayed. The user can input initial information by selecting one of the items in each list.

【００４９】即ち、ユーザは、例えば、合成音を文書の
校正のために用いる場合であって、その文書が政治に関
するものであり、基準文長が短いときには、用途リスト
（図２（Ａ））から「校正」を、分野リスト（図２
（Ｂ））から「政治」を、基準文長リスト（図２
（Ｃ））から「短」を選択することで、初期情報の入力
を行うことができる。That is, when the user uses, for example, a synthetic voice for proofreading a document and the document is related to politics and the reference sentence length is short, the user list (FIG. 2A). "Proofing" from the field list (Fig. 2
From (B)), select "Politics" from the standard sentence length list (Fig. 2
By selecting "short" from (C)), the initial information can be input.

【００５０】なお、用途、分野、または基準文長が分か
らない場合は、各リストからは「不明」を選択する。ま
た、初期情報は、必ずしも入力する必要はなく、初期情
報の入力がなかった場合には、「不明」が選択されたも
のとして取り扱われる。If the purpose, field, or standard sentence length is unknown, "unknown" is selected from each list. Further, the initial information does not necessarily have to be input, and when the initial information is not input, it is handled as "unknown" is selected.

【００５１】初期情報のうち、分野または基準文長につ
いては、分野判定部６または基準文長判定部９にそれぞ
れ供給され、また、用途については、パラメータ選択部
１３および韻律情報生成部１４に供給される。Of the initial information, the field or the reference sentence length is supplied to the field determination unit 6 or the reference sentence length determination unit 9, and the use is supplied to the parameter selection unit 13 and the prosody information generation unit 14. To be done.

【００５２】一方、言語解析部２に音声合成すべき入力
文が入力されると、そこでは、その入力文の解析が行わ
れる。ここで、図３は、言語解析部２の構成例を示して
いる。言語解析部２は、同図に示すように、形態素解析
部２１、単語辞書２２、接続規則辞書２３、および構造
木生成部２４から構成されている。On the other hand, when an input sentence to be speech-synthesized is input to the language analysis unit 2, the input sentence is analyzed there. Here, FIG. 3 illustrates a configuration example of the language analysis unit 2. As shown in the figure, the language analysis unit 2 includes a morpheme analysis unit 21, a word dictionary 22, a connection rule dictionary 23, and a structure tree generation unit 24.

【００５３】形態素解析部２１では、単語（形態素）の
見出し、読み（音韻情報）、および品詞などが記憶され
ている単語辞書２２、および形態素間の接続関係を規定
した接続規則が記述されている接続規則辞書２３を参照
し、入力文が形態素解析される。これにより、入力文
は、形態素に分割される。The morpheme analysis unit 21 describes a word dictionary 22 in which headings of words (morphemes), readings (phonological information), parts of speech, and the like are stored, and connection rules defining connection relationships between morphemes. The input sentence is morphologically analyzed with reference to the connection rule dictionary 23. As a result, the input sentence is divided into morphemes.

【００５４】即ち、例えば、入力文「平成７年７月２５
日、参議院選で社会党は大敗を喫し、村山首相の連立与
党内での立場はさらに苦しくなった。」が入力された場
合、形態素解析部２１からは、例えば次のような形態素
解析結果が出力される。That is, for example, the input sentence "July 25, 1995"
The Social Democratic Party suffered a major defeat in the House of Representatives election on the day, and the position of Prime Minister Murayama in the coalition ruling party became even more difficult. Is input, the morphological analysis unit 21 outputs the following morphological analysis result, for example.

【００５５】（平成へーせー名詞）（７しち数詞）（年ねん助数詞）・・・（、ｎｕｌｌ読点）（参議院選さんぎいんせん名詞）・・・（たた助動詞）（。ｎｕｌｌ句点）・・・（１）(Heisei Heise noun) (7 shichi number) (year nen number) ・・・ (, null reading point) (Council-selection sangiinsen noun) ・・・ (ta ta auxiliary verb) (. Null phrase) ... (1)

【００５６】なお、上述の形態素解析結果（Ａ，Ｂ，
Ｃ）のうち、Ａは形態素の見出しを、Ｂは形態素の音韻
情報（読み）を、Ｃは形態素の品詞を、それぞれ意味す
る。また、形態素の音韻情報の欄がｎｕｌｌとなってい
るのは、その読みがないことを意味する。The above morphological analysis results (A, B,
In C), A means the morpheme heading, B means the phonological information (reading) of the morpheme, and C means the part of speech of the morpheme. In addition, the column of null for the phoneme information of the morpheme means that there is no reading.

【００５７】形態素解析部２１から出力された形態素解
析結果は、構造木生成部２４に供給されるとともに、キ
ーワード抽出部３および文長取得部７に供給される。The morpheme analysis result output from the morpheme analysis unit 21 is supplied to the structure tree generation unit 24, and also to the keyword extraction unit 3 and the sentence length acquisition unit 7.

【００５８】構造木生成部２４では、形態素解析部２１
からの形態素解析結果に基づいて、入力文を構成する形
態素の文法的な構造を表す構造木が生成される。即ち、
構造木生成部２４においては、入力文における文節どう
しの関係を木構造で表現した構造木が、所定の構文規則
にしたがって生成される。具体的には、構造木生成部２
４に、形態素解析結果（１）が入力された場合には、次
のような構造木が生成される。In the structure tree generator 24, the morpheme analyzer 21
Based on the morpheme analysis result from, a structure tree representing the grammatical structure of the morphemes forming the input sentence is generated. That is,
In the structure tree generation unit 24, a structure tree in which the relation between clauses in the input sentence is expressed by a tree structure is generated according to a predetermined syntax rule. Specifically, the structure tree generation unit 2
When the morphological analysis result (1) is input to 4, the following structure tree is generated.

【００５９】（（副詞句（平成へーせー名詞）（７しち数詞）（年ねん助数詞）・・・（、ｎｕｌｌ読点）））（（連用句（参議院選さんぎいんせん名詞）（でで助詞））（連用句（社会党しゃかいとー名詞）（はは助詞））・・・((Adverb phrase (Heisei Heise noun) (7 Shichi number) (Yearn number classifier) ・・・ (, null reading))) ((Consecutive phrase (Council election Sangiinsen noun) (de Particles)) (Consecutive phrases (Social Party Shakaito nouns) (Haha particles)) ・・・

【００６０】構造木生成部２４で生成された構造木は、
韻律情報生成部１４に供給される。The structure tree generated by the structure tree generator 24 is
It is supplied to the prosody information generation unit 14.

【００６１】図１に戻り、キーワード抽出部３では、言
語解析部２（形態素解析部２１）からの形態素解析結果
に基づいて、入力文に含まれるキーワードが抽出され
る。なお、本実施例では、品詞が名詞の単語（形態素）
がキーワードとして抽出されるようになされており、従
って、上述したような形態素解析結果（１）について
は、次のようなキーワード抽出結果が得られる。Returning to FIG. 1, the keyword extraction unit 3 extracts the keyword included in the input sentence based on the morphological analysis result from the language analysis unit 2 (morpheme analysis unit 21). It should be noted that in the present embodiment, a word whose word part is a noun (morpheme)
Is extracted as a keyword. Therefore, with respect to the morphological analysis result (1) as described above, the following keyword extraction result is obtained.

【００６２】（平成へーせー名詞）（参議院選さんぎいんせん名詞）（社会党しゃかいとー名詞）・・・(Heisei Heise noun) (Council elected Sangiinsen noun) (Social Party Shakaito noun)

【００６３】以上のようなキーワードは、分野判定部６
に出力される。分野判定部６では、キーワード辞書４を
参照することで、キーワード抽出部３より供給されたキ
ーワードを含む入力文の分野が判定される。The above keywords are used for the field determination unit 6
Is output to The field determination unit 6 determines the field of the input sentence including the keyword supplied from the keyword extraction unit 3 by referring to the keyword dictionary 4.

【００６４】ここで、図４は、キーワード辞書４の構成
例を示している。同図に示すように、キーワード辞書４
には、各キーワードが、そのキーワードが使用される分
野と対応付けられ、さらに、キーワードから、それに対
応付けられている分野を、入力文の分野と推定する場合
の確度（確からしさ）とも対応付けられて記憶されてい
る。なお、本実施例では、確度が大きいほど、入力文の
分野を、そのキーワードに対応付けられている分野と推
定するのが確からしいことを意味するものとする。Here, FIG. 4 shows an example of the structure of the keyword dictionary 4. As shown in the figure, the keyword dictionary 4
, Each keyword is associated with the field in which the keyword is used, and is also associated with the accuracy (probability) in estimating the associated field from the keyword as the field of the input sentence. Are remembered. In the present embodiment, it is assumed that the higher the accuracy, the more likely it is that the field of the input sentence is estimated to be the field associated with the keyword.

【００６５】分野判定部６は、キーワード抽出部３から
のキーワードを、キーワード辞書４から検索し、その検
索の結果得られたキーワードを、それに対応付けられて
いる分野および確度とともに読み出して、キーワード履
歴テーブル５に記憶させる。なお、分野判定部６は、キ
ーワード抽出部３からのキーワードを、キーワード辞書
４から検索することができなかった場合、即ち、キーワ
ード抽出部３からのキーワードが、キーワード辞書４に
登録されていない場合、その旨を示すメッセージ（以
下、適宜、「推定不可」メッセージという）を、キーワ
ード履歴テーブル５に記憶させる。The field determination unit 6 searches the keyword extracted from the keyword extraction unit 3 from the keyword dictionary 4, reads out the keywords obtained as a result of the search together with the fields and the degrees of accuracy associated with them, and searches the keyword history. Store in Table 5. It should be noted that the field determination unit 6 cannot retrieve the keyword from the keyword extraction unit 3 from the keyword dictionary 4, that is, when the keyword from the keyword extraction unit 3 is not registered in the keyword dictionary 4. A message indicating that fact (hereinafter, appropriately referred to as “estimation impossible” message) is stored in the keyword history table 5.

【００６６】図５は、キーワード履歴テーブル５の記憶
内容を示している。キーワード履歴テーブル５は、分野
判定部６からの最新の情報を、所定数だけ記憶するよう
になされている。即ち、分野判定部６からの情報は、所
定数までは、キーワード履歴テーブル５に順次記憶され
ていき、その後は、新たな情報の入力があると、最も古
い情報がキーワード履歴テーブル５から削除され、新た
な情報が記憶されるようになされている。なお、図５の
キーワード履歴テーブル５の最も右の欄の「ポイント」
は、上述の分野の確度に相当する。FIG. 5 shows the stored contents of the keyword history table 5. The keyword history table 5 stores a predetermined number of latest information from the field determination unit 6. That is, the information from the field determination unit 6 is sequentially stored in the keyword history table 5 up to a predetermined number, and thereafter, when new information is input, the oldest information is deleted from the keyword history table 5. , New information is stored. Note that "point" in the rightmost column of the keyword history table 5 in FIG.
Corresponds to the accuracy of the field mentioned above.

【００６７】分野判定部６は、キーワード抽出部３から
キーワードを受信するたびに、キーワード履歴テーブル
５に、キーワード、分野、およびポイント（確度）（ま
たは「推定不可」メッセージ）（以下、適宜、これらを
まとめて、キーワード情報という）を記憶させる。そし
て、分野判定部６は、キーワード履歴テーブル５に記憶
されたポイントを、分野別に集計し、その集計値が最も
大きい分野を、入力文の分野として、強調語選択部１１
およびパラメータ選択部１３に出力する。即ち、キーワ
ード履歴テーブル５に記憶されたポイントを分野別に集
計した結果得られた集計値のうち、例えば分野「政治」
が最も大きい場合には、分野「政治」が、強調語選択部
１１およびパラメータ選択部１３に出力される。Each time the field determination unit 6 receives a keyword from the keyword extraction unit 3, the field, the field, and the point (accuracy) (or the “estimation impossible” message) in the keyword history table 5 (hereinafter referred to as appropriate) Are collectively stored as keyword information). Then, the field determination unit 6 totals the points stored in the keyword history table 5 for each field, and the field having the largest total value is set as the field of the input sentence, and the emphasized word selection unit 11
And output to the parameter selection unit 13. That is, of the total values obtained as a result of totaling the points stored in the keyword history table 5 by field, for example, the field “politics”
When is the largest, the field “politics” is output to the emphasis word selection unit 11 and the parameter selection unit 13.

【００６８】なお、キーワード履歴テーブル５に、いま
記憶されたキーワードと、過去に記憶されたキーワード
とを同等に扱うと、音声合成装置に、いま入力されてい
る入力文の分野が、過去に入力された入力文の分野と異
なる場合には、過去に記憶されたキーワードによる影響
で、いま入力されている入力文の分野の判定（推定）に
誤りが生じることがある。そこで、そのような誤りを防
止するため、分野判定部６は、キーワード履歴テーブル
５に、新たにキーワード情報を記憶させるたびに、既に
記憶されているキーワード情報のポイントを、一定の割
合で減少させるようになされており、これにより、入力
文の分野の判定には、最新のキーワードが最も影響を与
えるようになされている。If the keyword stored in the keyword history table 5 and the keyword stored in the past are treated equally, the field of the input sentence currently input in the speech synthesizer is input in the past. If the field is different from the field of the input sentence that has been input, an error may occur in the determination (estimation) of the field of the input sentence that is currently being input due to the influence of the keywords stored in the past. Therefore, in order to prevent such an error, the field determination unit 6 reduces the points of already stored keyword information at a constant rate each time new keyword information is stored in the keyword history table 5. Thus, the latest keyword has the greatest influence on the determination of the field of the input sentence.

【００６９】また、分野判定部６は、初期情報設定部１
から、ユーザによって入力された初期情報のうちの分野
（但し、「不明」でないもの）を受信した場合には、そ
の初期情報を優先して用いるようになされている。従っ
て、この場合、分野判定部６からは、初期情報設定部１
から供給された分野がそのまま出力される。但し、初期
情報設定部１から分野が供給された場合であっても、キ
ーワード履歴テーブル５に記憶されたキーワード情報に
基づいて分野を判定し、その分野を出力するようにする
こともできるし、また、キーワード履歴テーブル５に記
憶されたキーワード情報に、初期情報設定部１から供給
された分野を加味して、入力文の分野を判定して出力す
るようにすることも可能である。Further, the field determination unit 6 is the initial information setting unit 1
Therefore, when the field (however, not "unknown") of the initial information input by the user is received, the initial information is preferentially used. Therefore, in this case, from the field determination unit 6, the initial information setting unit 1
The field supplied from is output as it is. However, even when the field is supplied from the initial information setting unit 1, it is possible to determine the field based on the keyword information stored in the keyword history table 5 and output the field. It is also possible to add the field supplied from the initial information setting unit 1 to the keyword information stored in the keyword history table 5 to determine and output the field of the input sentence.

【００７０】強調語選択部１１は、分野判定部６から入
力文の分野を受信すると、強調語辞書１０を参照して、
その分野において強調すべき単語（強調語）および品詞
を検出する。即ち、強調語辞書１０には、例えば図６に
示すように、分野と対応付けられて、各分野において強
調した方が好ましい単語（強調語）と品詞が記憶されて
いる。強調語選択部１１では、分野判定部６からの入力
文の分野と対応付けられている強調語および品詞が、強
調語辞書１０から検索され、韻律情報生成部１４に出力
される。即ち、例えば、強調語辞書１０が図６に示した
ように構成される場合において、分野判定部６から分野
「天気予報」が供給されたときには、強調語選択部１１
では、「晴」、「曇」、および「雨」が強調語として決
定されるとともに、数詞、数助詞、固有名詞（県名）
が、強調すべき品詞として決定され、韻律情報生成部１
４に出力される。Upon receiving the field of the input sentence from the field determination section 6, the emphasized word selection section 11 refers to the emphasized word dictionary 10 and
Detect words (emphasized words) and parts of speech to be emphasized in the field. That is, in the emphasized word dictionary 10, as shown in FIG. 6, for example, words (emphasized words) and parts of speech that are preferably emphasized in each field are stored in association with the fields. In the emphasized word selection unit 11, the emphasized word and the part of speech associated with the field of the input sentence from the field determination unit 6 are searched from the emphasized word dictionary 10 and output to the prosody information generation unit 14. That is, for example, in the case where the emphasized word dictionary 10 is configured as shown in FIG. 6, when the field “weather forecast” is supplied from the field determination unit 6, the emphasized word selection unit 11 is used.
Then, "fine", "cloudy", and "rain" are determined as emphasis words, and the numbers, particles, proper nouns (prefecture name)
Is determined as the part of speech to be emphasized, and the prosody information generation unit 1
4 is output.

【００７１】なお、強調語辞書１０には、図６に示した
ような分野と対応付けた強調語および品詞の他、単語
を、その意味に基づいて階層構造化したシソーラスを記
憶させておき、強調語選択部１１には、上述したように
して強調語を得た後、その同義語を、シソーラスを用い
て検索させ、その結果得られる単語も、強調語とさせる
ようにすることが可能である。The emphasized word dictionary 10 stores a thesaurus in which words are hierarchically structured based on their meanings, in addition to emphasized words and parts of speech associated with the fields as shown in FIG. After the emphasizing word is obtained as described above, the emphasizing word selecting unit 11 can search for a synonym using a thesaurus, and the word obtained as a result can also be an emphasizing word. is there.

【００７２】一方、文長取得部７では、言語解析部２か
ら供給された形態素それぞれのモーラ数が計算され、さ
らに、そのモーラ数の、入力文（１文）についての総和
が計算されることで、入力文の文長が求められる。この
文長は、基準文長判定部９に供給され、文長履歴テーブ
ル８に出力されて記憶される。ここで、図７は、文長履
歴テーブル８の記憶内容を示している。文長履歴テーブ
ル８は、上述したキーワード履歴テーブル５と同様に、
基準文長判定部９からの文長を記憶するようになされて
いる。即ち、図７の実施例では、文長履歴テーブル８
は、文長取得部７から供給される、最新の入力文からＮ
文前までの入力文の文長（Ｎ文分の文長履歴）を記憶す
ることができるようになされている。On the other hand, the sentence length acquisition unit 7 calculates the number of mora for each morpheme supplied from the language analysis unit 2, and further calculates the sum of the number of mora for the input sentence (one sentence). Then, the sentence length of the input sentence is required. This sentence length is supplied to the reference sentence length determination unit 9, and is output and stored in the sentence length history table 8. Here, FIG. 7 shows the stored contents of the sentence length history table 8. The sentence length history table 8 is similar to the keyword history table 5 described above.
The sentence length from the reference sentence length determination unit 9 is stored. That is, in the embodiment of FIG. 7, the sentence length history table 8
Is N from the latest input sentence supplied from the sentence length acquisition unit 7.
The sentence length of the input sentence up to the sentence before (sentence history for N sentences) can be stored.

【００７３】基準文長判定部９では、文長履歴テーブル
８に記憶されたＮ文の文長に基づいて、その平均的な値
としての基準文長が算出される。さらに、基準文長判定
部９では、基準文長が、長いか、または短いか、あるい
はその中間程度かどうかが判定される。そして、基準文
長が、長い場合、短い場合、またはその中間程度である
場合、文長情報として、それぞれ「長」、「短」、また
は「中」が、パラメータ選択部１３および韻律情報生成
部１４に供給される。Based on the sentence lengths of the N sentences stored in the sentence length history table 8, the reference sentence length determination unit 9 calculates the reference sentence length as an average value thereof. Further, the reference sentence length determination unit 9 determines whether the reference sentence length is long, short, or an intermediate level. When the reference sentence length is long, short, or in the middle thereof, “long”, “short”, or “medium” is used as the sentence length information, respectively, and the parameter selection unit 13 and the prosody information generation unit. 14 are supplied.

【００７４】なお、基準文長が、長いか、短いか、また
はその中間程度かどうかを判定するための基準となるモ
ーラ数は、例えば統計的手法によって、あらかじめ算出
し、基準文長判定部９に設定しておくようにする。The number of moras, which serves as a reference for determining whether the reference sentence length is long, short, or an intermediate value, is calculated in advance by, for example, a statistical method, and the reference sentence length determination unit 9 Be set to.

【００７５】また、基準文長判定部９において、文長情
報は、Ｎ文分の文長に基づいて求めるのではなく、文長
取得部７から供給される最新の入力文の文長のみに基づ
いて求めるようにすることも可能である。但し、入力文
の文長は、通常、１文ごとに異なる場合が多く、従っ
て、最新の入力文の文長のみによって、文長情報を求め
ると、文長情報が、頻繁に変化することとなる。その結
果、後述するパラメータ選択部１３で選択されるパラメ
ータや、韻律情報生成部１４で用いられる、韻律情報の
生成のための規則（本実施例では、ポーズ設定規則）な
ども、頻繁に変更されることとなるが、このような状態
になることは好ましくない。従って、そのようなパラメ
ータや規則の頻繁な変更を防止するため、上述したよう
に、文長情報は、Ｎ文分の文長に基づいて求めるように
するのが望ましい。In the standard sentence length determination unit 9, the sentence length information is not calculated based on the sentence length of N sentences, but only the sentence length of the latest input sentence supplied from the sentence length acquisition unit 7. It is also possible to ask based on it. However, the sentence length of the input sentence is usually different from one sentence to another. Therefore, if the sentence length information is obtained only by the sentence length of the latest input sentence, the sentence length information may change frequently. Become. As a result, the parameters selected by the parameter selection unit 13, which will be described later, and the rules for generating the prosody information (the pose setting rules in this embodiment) used by the prosody information generation unit 14 are frequently changed. However, such a state is not preferable. Therefore, in order to prevent such frequent changes of parameters and rules, it is desirable to obtain the sentence length information based on the sentence length of N sentences, as described above.

【００７６】さらに、文長履歴テーブル８に、Ｎ文分の
文長が記憶されるまでは、最新の入力文の文長の影響に
よって、やはり文長情報が、頻繁に変化するおそれがあ
る。そこで、文長履歴テーブル８には、初期値として、
例えば統計的に求めた１文あたりの平均的なモーラ数を
記憶させておくのが望ましい。Furthermore, until the sentence length of N sentences is stored in the sentence length history table 8, the sentence length information may change frequently due to the influence of the sentence length of the latest input sentence. Therefore, in the sentence length history table 8, as an initial value,
For example, it is desirable to store the average number of mora per sentence obtained statistically.

【００７７】また、基準文長判定部９は、初期情報設定
部１から、ユーザによって入力された初期情報のうちの
基準文長についての情報（但し、「不明」でないもの）
を受信した場合には、その初期情報を優先して用いるよ
うになされている。従って、この場合、基準文長判定部
９からは、初期情報設定部１から供給された初期情報
（基準文長についての情報）が、文長情報として出力さ
れる。但し、初期情報設定部１から初期情報が供給され
た場合であっても、文長履歴テーブル８に記憶された文
長に基づいて、文長情報を求めて出力するようにするこ
ともできるし、また、文長履歴テーブル８に記憶された
文長に、初期情報設定部１から供給された初期情報を加
味して、文長情報を求めて出力するようにすることも可
能である。Further, the reference sentence length judgment unit 9 includes information about the reference sentence length in the initial information input by the user from the initial information setting unit 1 (however, it is not "unknown").
When it receives, the initial information is preferentially used. Therefore, in this case, the reference sentence length determination unit 9 outputs the initial information (information about the reference sentence length) supplied from the initial information setting unit 1 as sentence length information. However, even when the initial information is supplied from the initial information setting unit 1, the sentence length information may be obtained and output based on the sentence length stored in the sentence length history table 8. It is also possible to add the initial information supplied from the initial information setting unit 1 to the sentence length stored in the sentence length history table 8 and obtain and output the sentence length information.

【００７８】パラメータ選択部１３では、初期情報設定
部１からの初期情報のうちの用途についての情報、基準
文長判定部９からの文長情報、および分野判定部６から
の入力文の分野に基づいて、パラメータ辞書１２から、
韻律情報を生成するためのパラメータとしての、例えば
ポーズ設定パラメータおよび韻律パラメータが選択さ
れ、韻律情報生成部１４に供給される。In the parameter selection unit 13, the information about the use of the initial information from the initial information setting unit 1, the sentence length information from the reference sentence length determination unit 9, and the field of the input sentence from the field determination unit 6 are selected. Based on the parameter dictionary 12,
For example, a pose setting parameter and a prosody parameter as parameters for generating the prosody information are selected and supplied to the prosody information generation unit 14.

【００７９】即ち、図８は、パラメータ辞書１２および
パラメータ選択部１３の構成例を示している。パラメー
タ辞書１２は、合成音に挿入するポーズの位置を決定す
るのに用いるポーズ設定パラメータを記憶しているポー
ズ設定パラメータ辞書３４と、合成音の、その他の韻律
情報を決めるパラメータとしての、例えば合成音の発話
速度、ポーズの長さ、フレーズ指令の大きさ、およびア
クセント指令の大きさなどに関するパラメータ（以下、
適宜、韻律パラメータという）を記憶している韻律パラ
メータ辞書３５とから構成されている。パラメータ選択
部１３は、ポーズ設定パラメータ辞書３４からポーズ設
定パラメータを選択するポーズ設定パラメータ選択部３
６と、韻律パラメータ辞書３５から韻律パラメータを選
択する韻律パラメータ選択部３７とから構成されてい
る。That is, FIG. 8 shows a configuration example of the parameter dictionary 12 and the parameter selection unit 13. The parameter dictionary 12 stores a pose setting parameter dictionary 34 that stores the pose setting parameters used to determine the position of the pose to be inserted in the synthesized voice, and a parameter that determines other prosody information of the synthesized voice, for example, the synthesized voice. Parameters related to the speed of speech, pause length, phrase command size, accent command size, etc.
And a prosody parameter dictionary 35 that stores the prosody parameters. The parameter selecting unit 13 selects a pose setting parameter from the pose setting parameter dictionary 34.
6 and a prosody parameter selection unit 37 that selects a prosody parameter from the prosody parameter dictionary 35.

【００８０】ポーズ設定パラメータ選択部３６では、用
途についての情報、文長情報、および入力文の分野に基
づいて、ポーズ設定パラメータ辞書３４からポーズ設定
パラメータが選択される。ここで、ポーズ設定パラメー
タ辞書３４には、例えば図９に示すように、ポーズ設定
パラメータが、それを用いるのに好ましい入力文の用
途、分野、および基準文長と対応付けられて記憶されて
いる。ポーズ設定パラメータ選択部３６では、初期情報
設定部１からの用途についての情報、分野判定部６から
の入力文の分野、または基準文長判定部９からの文長情
報と、「用途」、「分野」、または「基準文長」の記述
とが、それぞれ等しい行が検索され、その行に記述され
ているポーズ設定パラメータ（図９の実施例では、ポー
ズ設定パラメータ１乃至３の３つのパラメータ）が読み
出される。In the pose setting parameter selection unit 36, the pose setting parameters are selected from the pose setting parameter dictionary 34 based on the information about the use, the sentence length information, and the field of the input sentence. Here, in the pose setting parameter dictionary 34, for example, as shown in FIG. 9, the pose setting parameters are stored in association with the intended use of the input sentence, the field, and the reference sentence length. . In the pose setting parameter selection unit 36, information about the use from the initial information setting unit 1, the field of the input sentence from the field determination unit 6, or the sentence length information from the reference sentence length determination unit 9, the “use”, A line in which the description of “field” or “standard sentence length” is equal to each other is searched, and the pose setting parameters (three parameters of the pose setting parameters 1 to 3 in the embodiment of FIG. 9) described in the line are searched. Is read.

【００８１】即ち、例えば、初期情報設定部１から供給
された用途についての情報、分野判定部６から供給され
た入力文の分野、または基準文長判定部９から供給され
た文長情報が、それぞれ「朗読」、「政治」、または
「長」である場合には、ポーズ設定パラメータ１乃至３
それぞれとして、ポーズ設定パラメータ辞書３４の第１
行目に記述されているＰ１−１，Ｐ２−１，Ｐ３−１が
読み出される。That is, for example, the information about the use supplied from the initial information setting unit 1, the field of the input sentence supplied from the field determination unit 6, or the sentence length information supplied from the reference sentence length determination unit 9 is If each is “reading”, “politics”, or “long”, the pose setting parameters 1 to 3
As each, the first of the pose setting parameter dictionary 34
P1-1, P2-1 and P3-1 described in the line are read.

【００８２】なお、本実施例では、ポーズ設定パラメー
タ１乃至３として、例えばモーラ数が用いられている。
即ち、図９において、用途、分野、または基準文長の欄
が、それぞれ「一般」、「一般」、または「中」の行
（第５行目）には、ポーズ設定パラメータ１乃至３の実
際の値として、１１、２６、または４６が記述されてい
るが、これは、モーラ数を表している。In this embodiment, for example, the number of mora is used as the pose setting parameters 1 to 3.
That is, in FIG. 9, the columns of “usage, field, or standard sentence length” are “general”, “general”, or “medium” (fifth line), and the actual pose setting parameters 1 to 3 are set. 11, 26, or 46 is described as the value of, which represents the number of mora.

【００８３】また、初期情報設定部１から供給された用
途についての情報は、上述したように「不明」の場合が
あるが、この場合、ポーズ設定パラメータ選択部３６で
は、用途は、例えば、一般的な用途を表す「一般」とし
て扱われるようになされている。The information about the application supplied from the initial information setting section 1 may be "unknown" as described above. In this case, the application is, for example, general in the pause setting parameter selecting section 36. It is designed to be treated as a "general" that represents a specific purpose.

【００８４】さらに、本実施例では、入力文の用途につ
いては、初期情報設定部１から供給される用途について
の情報のみを用いるようにしてあるが、入力文の用途
は、何らかの方法で推定するようにしても良い。Further, in the present embodiment, regarding the use of the input sentence, only the information about the use supplied from the initial information setting unit 1 is used, but the use of the input sentence is estimated by some method. You may do it.

【００８５】一方、韻律パラメータ選択部３７では、用
途についての情報、文長情報、および入力文の分野に基
づいて、韻律パラメータ辞書３５から韻律パラメータが
選択される。ここで、韻律パラメータ辞書３５には、例
えば図１０（Ａ）に示すように、韻律パラメータ、即
ち、本実施例では、上述したように、発話速度、ポーズ
長（ポーズの長さ）、フレーズ指令の大きさ（フレーズ
記号）、およびアクセント指令の大きさ（アクセント記
号）の具体的数値が、それを用いるのに好ましい入力文
の用途、分野、および基準文長と対応付けられて記憶さ
れている。韻律パラメータ選択部３７では、初期情報設
定部１からの用途についての情報、分野判定部６からの
入力文の分野、または基準文長判定部９からの文長情報
と、「用途」、「分野」、または「基準文長」の記述と
が、それぞれ等しい行が検索され、その行に記述されて
いる韻律パラメータが読み出される。On the other hand, the prosody parameter selection unit 37 selects a prosody parameter from the prosody parameter dictionary 35 based on the information about the use, the sentence length information, and the field of the input sentence. Here, in the prosody parameter dictionary 35, for example, as shown in FIG. 10A, prosody parameters, that is, in the present embodiment, as described above, the speech rate, the pause length (pause length), and the phrase command. (Phrase symbol) and a specific numerical value of the accent command magnitude (accent symbol) are stored in association with the intended use, field, and reference sentence length of the input sentence. . In the prosody parameter selection unit 37, information about the use from the initial information setting unit 1, the field of the input sentence from the field determination unit 6, or the sentence length information from the reference sentence length determination unit 9 and the “use” and “field” , Or “standard sentence length” is searched for a line that is equal to the description, and the prosody parameter described in that line is read.

【００８６】即ち、例えば、初期情報設定部１から供給
された用途についての情報、分野判定部６から供給され
た入力文の分野、または基準文長判定部９から供給され
た文長情報が、それぞれ「朗読」、「政治」、または
「長」である場合には、発話速度Ｍ、ポーズ長（ポーズ
長のセット）Ｓ１，Ｓ２，Ｓ３，Ｓ４，Ｓ５、フレーズ
指令の大きさ（フレーズ指令の大きさのセット）Ｐ０，
Ｐ１，Ｐ２，Ｐ３，Ｐ４、アクセント指令の大きさ（ア
クセント指令の大きさのセット）Ａ０，Ａ１，Ａ２，Ａ
３，Ａ４が、韻律パラメータ辞書３５の第１行目から読
み出される。That is, for example, the information about the application supplied from the initial information setting unit 1, the field of the input sentence supplied from the field determination unit 6, or the sentence length information supplied from the reference sentence length determination unit 9 is When the reading is “reading”, “politics”, or “long”, the speech speed M, the pause length (set of pause lengths) S1, S2, S3, S4, S5, the size of the phrase command (of the phrase command) Size set) P0,
P1, P2, P3, P4, size of accent command (set of size of accent command) A0, A1, A2, A
3 and A4 are read from the first line of the prosody parameter dictionary 35.

【００８７】なお、本実施例では、合成音に設定するポ
ーズ長、フレーズ指令の大きさ、またはアクセント指令
の大きさは、上述のようにして韻律パラメータ辞書３５
から読み出されたポーズ長のセットＳ１，Ｓ２，Ｓ３，
Ｓ４，Ｓ５、フレーズ指令の大きさのセットＰ０，Ｐ
１，Ｐ２，Ｐ３，Ｐ４、またはアクセント指令の大きさ
のセットＡ０，Ａ１，Ａ２，Ａ３，Ａ４それぞれの中か
ら選択される（この選択は、後述するように、音響処理
部１５において行われる）。In the present embodiment, the pause length, the phrase command size, and the accent command size set for the synthesized voice are set as described above in the prosody parameter dictionary 35.
Set of pose lengths S1, S2, S3 read from
S4, S5, set of phrase command size P0, P
1, P2, P3, P4, or each of the accent command size sets A0, A1, A2, A3, A4 (this selection is performed in the sound processing unit 15 as described later). .

【００８８】ここで、図１０（Ｂ）は、韻律パラメータ
辞書３５から読み出された発話速度、ポーズ長のセッ
ト、フレーズ指令の大きさのセット、およびアクセント
指令の大きさのセットの具体的数値を示している。図１
０（Ｂ）に示したような韻律パラメータが、韻律パラメ
ータ辞書３５から読み出された場合、発話速度は、７モ
ーラ／秒とされる。また、ポーズ長は、１５００、８０
０、４００、２００、８０ｍｓ（ミリ秒）の中から選択
される。さらに、フレーズ指令の大きさは、０．５，
０．３５，０．２５，０．１５，０．１の中から選択さ
れ、アクセント指令の大きさは、０．７，０．５，０．
３５，０．２，０．１の中から選択される。Here, FIG. 10B shows specific numerical values of the speech rate, the pose length set, the phrase command size set, and the accent command size set read from the prosody parameter dictionary 35. Is shown. FIG.
When the prosody parameter as shown in 0 (B) is read from the prosody parameter dictionary 35, the speech rate is set to 7 mora / sec. Also, the pose length is 1500, 80
It is selected from 0, 400, 200, and 80 ms (milliseconds). Furthermore, the size of the phrase command is 0.5,
0.35, 0.25, 0.15, 0.1, and the size of the accent command is 0.7, 0.5, 0.
It is selected from 35, 0.2 and 0.1.

【００８９】以上のようにして、パラメータ選択部１３
（図１）において、パラメータ辞書１２から読み出され
たポーズ設定パラメータおよび韻律パラメータは、韻律
情報生成部１４に供給される。As described above, the parameter selection unit 13
In FIG. 1, the pose setting parameters and prosody parameters read from the parameter dictionary 12 are supplied to the prosody information generation unit 14.

【００９０】次に、図１１は、韻律情報生成部１４の構
成例を示している。ポーズ設定規則辞書３８Ａは、合成
音の韻律情報としての、例えばポーズの位置を決定（生
成）するための、複数種類のポーズ設定規則（韻律情報
生成規則）を記憶している。ポーズ設定部３８は、ポー
ズ設定規則辞書３８Ａが記憶している複数種類のポーズ
設定規則のうちのいずれかを選択し、そのポーズ設定規
則を用いて、合成音に設定するポーズの位置を決定する
ようになされている。なお、決定されたポーズの位置
は、言語解析部２から供給される構造木に反映され、韻
律木生成部３９に供給されるようになされている。Next, FIG. 11 shows an example of the structure of the prosody information generator 14. The pose setting rule dictionary 38A stores a plurality of types of pose setting rules (prosodic information generation rules) for determining (generating) the position of a pose, for example, as prosody information of synthetic speech. The pose setting unit 38 selects one of a plurality of types of pose setting rules stored in the pose setting rule dictionary 38A, and uses the pose setting rule to determine the position of the pose to be set in the synthetic sound. It is done like this. The position of the determined pose is reflected in the structure tree supplied from the language analysis unit 2 and supplied to the prosody tree generation unit 39.

【００９１】韻律木生成部３９は、ポーズ設定部３８か
ら供給される構造木に基づいて、合成音の韻律情報を木
構造で表現した韻律木を生成し、アクセント処理部４０
に供給するようになされている。アクセント処理部４０
は、韻律木生成部３９からの韻律木を、アクセント融合
を考慮したものに変換し、韻律情報生成部４１に出力す
るようになされている。韻律情報出力生成部４１は、ア
クセント処理部４０からの韻律木を、音響処理部１５で
処理可能なフォーマットに変換し、音響処理部１５に出
力するようになされている。The prosody tree generation unit 39 generates a prosody tree in which the prosody information of the synthetic voice is represented by a tree structure based on the structure tree supplied from the pose setting unit 38, and the accent processing unit 40.
To be supplied. Accent processing unit 40
Is adapted to convert the prosody tree from the prosody tree generation unit 39 into a prosody tree in consideration of accent fusion and output it to the prosody information generation unit 41. The prosody information output generation unit 41 is configured to convert the prosody tree from the accent processing unit 40 into a format that can be processed by the acoustic processing unit 15, and output the prosody tree to the acoustic processing unit 15.

【００９２】以上のように構成される韻律情報生成部１
４では、ポーズ設定部３８に対し、言語解析部２から構
造木が供給され、また、パラメータ選択部１３からポー
ズ設定パラメータが供給されるようになされている。さ
らに、ポーズ設定部３８には、初期情報設定部１、分野
判定部６、または基準文長判定部９から、入力文の用途
についての情報（以下、適宜、用途情報という）、入力
文の分野、または文長情報もそれぞれ供給されるように
なされている。Prosody information generator 1 configured as described above
In 4, the language analysis unit 2 supplies the structure tree to the pose setting unit 38, and the parameter selecting unit 13 supplies the pose setting parameters. Further, in the pause setting unit 38, information about the use of the input sentence (hereinafter referred to as use information) from the initial information setting unit 1, the field determining unit 6, or the reference sentence length determining unit 9 and the field of the input sentence. , Or sentence length information is also supplied.

【００９３】ポーズ設定部３８は、初期情報設定部１、
分野判定部６、または基準文長判定部９それぞれから、
用途情報、入力文の分野、または文長情報を受信する
と、それらの情報に基づいて、ポーズ設定規則辞書３８
Ａに記憶されている複数種類のポーズ設定規則のうちの
いずれかを選択する。即ち、ポーズ設定規則辞書３８Ａ
には、複数種類のポーズ設定規則それぞれが、適切な用
途情報、入力文の分野、および文長情報と対応付けられ
て記憶されており、ポーズ設定部３８は、受信した用途
情報、入力文の分野、および文長情報と対応付けられて
いるポーズ設定規則を、ポーズ設定規則辞書３８Ａから
読み出す。The pause setting unit 38 is the initial information setting unit 1,
From the field determination unit 6 or the standard sentence length determination unit 9 respectively,
When the usage information, the field of the input sentence, or the sentence length information is received, the pose setting rule dictionary 38 is based on the information.
Any one of a plurality of types of pose setting rules stored in A is selected. That is, the pose setting rule dictionary 38A
, Each of a plurality of types of pose setting rules is stored in association with appropriate usage information, fields of the input sentence, and sentence length information. The pose setting unit 38 receives the usage information of the received input information and the input sentence. The pose setting rule associated with the field and the sentence length information is read from the pose setting rule dictionary 38A.

【００９４】その後、ポーズ設定部３８は、言語解析部
２またはパラメータ選択部１３それぞれから、構造木ま
たはポーズ設定パラメータを受信すると、言語解析部２
からの構造木に対し、選択したポーズ設定規則を、パラ
メータ選択部１３からのポーズ設定パラメータを加味し
て適用することで、合成音に挿入するポーズの位置が決
定される。After that, when the pose setting unit 38 receives the structure tree or the pose setting parameter from the language analyzing unit 2 or the parameter selecting unit 13, respectively, the language analyzing unit 2
By applying the selected pose setting rule to the structure tree from 1 to 3 in consideration of the pose setting parameters from the parameter selecting unit 13, the position of the pose to be inserted into the synthesized sound is determined.

【００９５】従って、ポーズ設定部３８では、入力文の
特徴にあった位置にポーズが設定されるので、ユーザに
対し、聞き取り易い合成音を提供することができる。Therefore, since the pose setting unit 38 sets a pose at a position that matches the characteristics of the input sentence, it is possible to provide the user with a synthetic sound that is easy to hear.

【００９６】ここで、本実施例では、ポーズ設定部３８
において、例えば「形態素間修飾関係に基づくポーズ導
出手法」、加賀見、浅野、赤羽、音響学会公演論文集、
１−Ｐ−３２，ｐｐ４０１、平成６年１０月〜１１月な
どに記載されている方法を用いて、ポーズの位置が決定
されるようになされている。この方法においては、複合
語、文節、句、節などの文法単位が、その先頭に位置す
る形態素と最後に位置する形態素によって特徴づけられ
ると仮定し、ある文法単位Ａと、それに続く文法単位Ｂ
との間にポーズを設定する優先順位（優先度）が、文法
単位Ａの最後に位置する形態素と、文法単位Ｂの先頭に
位置する形態素との関係に基づいて決定され、その優先
度に、ポーズ間のモーラ数などを考慮して、ポーズを挿
入する位置が絞り込まれることで、最終的なポーズの位
置が決定されるようになされている。Here, in this embodiment, the pose setting unit 38
In, for example, "Pose derivation method based on morpheme modification relation", Kagami, Asano, Akabane, Proceedings of ASJ,
The position of the pose is determined by using the method described in 1-P-32, pp401, October-November 1994 and the like. In this method, it is assumed that a grammatical unit such as a compound word, a clause, a phrase, or a clause is characterized by a morpheme located at the beginning and a morpheme located at the end thereof, and a grammatical unit A and a grammatical unit B following the grammatical unit
The priority order (priority) for setting a pause between and is determined based on the relationship between the morpheme located at the end of the grammatical unit A and the morpheme located at the beginning of the grammar unit B. The position of the final pose is determined by narrowing down the positions where the poses are inserted in consideration of the number of mora between the poses.

【００９７】この方法では、ポーズを設定する優先順位
を決定するための優先度規則と、ポーズを挿入する位置
を絞り込むための絞り込み規則とが、ポーズの位置を決
定するための規則として用いられており、従って、ポー
ズ設定規則辞書３８Ａに記憶されているポーズ設定規則
も、そのような優先度規則および絞り込み規則から構成
されている。In this method, the priority rule for determining the priority order for setting the pose and the narrowing down rule for narrowing down the position to insert the pose are used as the rules for determining the position of the pose. Therefore, the pose setting rules stored in the pose setting rule dictionary 38A are also composed of such priority rules and narrowing down rules.

【００９８】図１２は、優先度規則および絞り込み規則
の例を示している。なお、図１２（Ａ）が優先度規則
を、図１２（Ｂ）が絞り込み規則を示している。FIG. 12 shows an example of the priority rule and the narrowing rule. Note that FIG. 12A shows the priority rule and FIG. 12B shows the narrowing rule.

【００９９】図１２に示したポーズ設定規則が選択され
た場合、ポーズ設定部３８では、まず、図１２（Ａ）の
優先度規則が、構造木に適用されることで、句点の直前
に、最も高い優先度（優先度１）が付され、また、特定
の自立語や助詞と、読点との間に、その次に高い優先度
（優先度２）が付される。さらに、特定の助詞（優先度
２が適用される助詞を除く）と読点との間、特定の接続
助詞、格助詞、係助詞、並立助詞と非用言自立語との
間、または活用語連体形、連体詞、格助詞「の」と名詞
との間には、優先度３乃至５が、それぞれ付される。ま
た、優先度１乃至５が適用された部分を除く非修飾関係
がある部分には、優先度６が付される。When the pause setting rule shown in FIG. 12 is selected, the pause setting unit 38 first applies the priority rule shown in FIG. The highest priority (priority 1) is given, and the next highest priority (priority 2) is given between a specific independent word or particle and the reading point. Furthermore, between a specific particle (excluding particles to which priority 2 is applied) and the reading point, between a specific connecting particle, a case particle, a particle, a parallel particle and a non-verbal independent word, or a conjugation word sequence. Priorities 3 to 5 are given between the body shape, the noun, the case particle “no”, and the noun. In addition, a priority 6 is attached to a portion having a non-modifying relationship except a portion to which the priorities 1 to 5 are applied.

【０１００】優先度の設定が終了すると、ポーズ設定部
３８は、図１２（Ｂ）の絞り込み規則にしたがって、ポ
ーズを設定する位置を、最終決定する。即ち、ポーズ設
定部３８では、まず絞り込み規則１にしたがい、優先度
１が付された部分には、節境界に相当するポーズ長のポ
ーズが設定され、優先度２が付された部分には、句境界
に相当するポーズ長のポーズが設定される。When the setting of the priority is completed, the pose setting unit 38 finally determines the position where the pose is set according to the narrowing-down rule of FIG. That is, in the pose setting unit 38, according to the narrowing-down rule 1, first, a pose having a pose length corresponding to a node boundary is set in a portion having a priority of 1, and a portion having a priority of 2 is set in a portion having a priority of 2. A pose with a pose length corresponding to the phrase boundary is set.

【０１０１】ここで、節境界または句境界に相当するポ
ーズ長とは、上述したパラメータ選択部１３で選択され
たポーズ長のセットのうちの最も長いものまたは２番目
に長いものをそれぞれ意味する。従って、図１０（Ｂ）
に示したようなポーズ長のセット（１５００、８００、
４００、２００、８０）が、パラメータ選択部１３で選
択された場合、優先度１または２が付された部分には、
１５００または８００ｍｓのポーズがそれぞれ挿入され
ることとなる。Here, the pose length corresponding to the clause boundary or phrase boundary means the longest one or the second longest one of the set of pose lengths selected by the parameter selecting unit 13 described above, respectively. Therefore, FIG.
The pose length set (1500, 800,
400, 200, 80) is selected by the parameter selection unit 13, the part with priority 1 or 2 is
A 1500 or 800 ms pause will be inserted, respectively.

【０１０２】さらに、ポーズ設定部３８では、絞り込み
規則２にしたがい、ポーズ間のモーラ数が、＜パラメー
タ１＞モーラ以上、＜パラメータ２＞モーラ未満になる
ように、優先度３および４が付された部分に、優先度の
高い順（本実施例においては、優先度３，４の順番）
に、ポーズが挿入される。また、ポーズ設定部３８で
は、絞り込み規則３および４にしたがい、絞り込み規則
２の条件を満たさず、ポーズ間のモーラ数が＜パラメー
タ３＞以上になっている区間には、優先度が付されてい
る部分に、優先度の高い順に、ポーズが設定される。Further, according to the narrowing-down rule 2, the pose setting unit 38 assigns the priorities 3 and 4 so that the number of mora between the poses is equal to or more than <parameter 1> mora and less than <parameter 2> mora. In order of priority, the parts with higher priority (in this embodiment, the order of priority 3 and 4)
A pose is inserted in. In addition, according to the narrowing-down rules 3 and 4, the pose setting unit 38 assigns a priority to a section in which the condition of the narrowing-down rule 2 is not satisfied and the number of mora between pauses is <parameter 3> or more. Pauses are set in the areas in which the priority is high.

【０１０３】ここで、ポーズ設定部３８においては、絞
り込み規則の＜パラメータ１＞、＜パラメータ２＞、ま
たは＜パラメータ３＞の部分は、パラメータ選択部１３
からのポーズ設定パラメータ１乃至３（図９）に置き換
えられる。従って、パラメータ選択部１３から、ポーズ
設定パラメータ１乃至３として、図９のポーズ設定パラ
メータ辞書３４の第５行目に記述されている１１、２
６、および４６が、ポーズ設定部３８に供給された場合
には、絞り込み規則の＜パラメータ１＞、＜パラメータ
２＞、または＜パラメータ３＞の部分は、１１、２６、
または４６にそれぞれ置き換えられることになる。Here, in the pose setting section 38, the <parameter 1>, <parameter 2>, or <parameter 3> portion of the narrowing-down rule corresponds to the parameter selecting section 13
To the pose setting parameters 1 to 3 (FIG. 9). Therefore, from the parameter selection unit 13, as the pose setting parameters 1 to 3, 11 and 2 described in the fifth line of the pose setting parameter dictionary 34 of FIG.
When 6 and 46 are supplied to the pose setting unit 38, the <parameter 1>, <parameter 2>, or <parameter 3> part of the narrowing rule is 11, 26, or
Or 46 respectively.

【０１０４】上述の文献において開示されている絞り込
み規則の＜パラメータ１＞、＜パラメータ２＞、または
＜パラメータ３＞の部分は、主として、新聞記事を統計
的に分析することで得られた数値とされており、従っ
て、その絞り込み規則が、あらゆる種類の入力文に対し
て適切であるかどうかは不明である。これに対し、図１
の音声合成装置では、パラメータ選択部１３において、
入力文の特徴（本実施例では、入力文の分野、用途、基
準文長）に対応した数値（ポーズ設定パラメータ１乃至
３）が用いられるので、種々の入力文ごとに、適切な絞
り込み規則を適用することが可能となる。The <parameter 1>, <parameter 2>, or <parameter 3> part of the narrowing rule disclosed in the above-mentioned document is mainly a numerical value obtained by statistically analyzing newspaper articles. Therefore, it is unclear whether the narrowing rule is appropriate for all kinds of input sentences. In contrast, FIG.
In the speech synthesizer of, in the parameter selection unit 13,
Since numerical values (pose setting parameters 1 to 3) corresponding to the characteristics of the input sentence (in the present embodiment, the field of the input sentence, the purpose, and the reference sentence length) are used, an appropriate narrowing rule can be set for each of the various input sentences. It becomes possible to apply.

【０１０５】以下、同様にして、絞り込み規則が適用さ
れることで、ポーズの位置が決定され、これにより、ポ
ーズ設定部３８からは、例えば、次のような、ポーズの
位置を反映した構造木が出力される。In the same manner, the position of the pose is determined by applying the narrowing-down rule in the same manner, whereby the pose setting unit 38 causes the following structure tree to reflect the position of the pose, for example. Is output.

【０１０６】（（副詞句（平成へーせー名詞）（７しち数詞）（年ねん助数詞）・・・（、ｎｕｌｌ読点））ＰＡＵ２）（（連用句（参議院選さんぎいんせん名詞）（でで助詞））（連用句（社会党しゃかいとー名詞）（はは助詞））・・・ＰＡＵ１）((Adverb phrase (Heisei Heise noun) (7 Shichi number) (Yearn number auxiliary number) ・・・ (, null reading point) PAU2) ((Conjunction phrase (Council-selection Sangiinsen noun) ( And particle))) (Consecutive phrase (Social Party Shakaito noun) (Haha particle)) ・・・ PAU1)

【０１０７】なお、上述の構造木においては、ＰＡＵ＃
ｉ（上述の場合、ＰＡＵ１およびＰＡＵ２）が、ポーズ
の位置を示している。また、ＰＡＵに続く数字＃ｉは、
ポーズ長に対応している。即ち、パラメータ選択部１３
から、例えば、図１０（Ｂ）に示したようなポーズ長の
セット（１５００、８００、４００、２００、８０）が
供給された場合において、数字＃ｉが１乃至５のとき、
音響処理部１５では、そのポーズ長は、それぞれ１５０
０、８００、４００、２００、または８０とされる。In the above structure tree, PAU #
i (PAU1 and PAU2 in the above case) indicates the position of the pose. The number #i following PAU is
Corresponds to the pose length. That is, the parameter selection unit 13
From, for example, when a set of pose lengths (1500, 800, 400, 200, 80) as shown in FIG. 10B is supplied, when the number #i is 1 to 5,
In the sound processing unit 15, the pose length is 150
It is set to 0, 800, 400, 200, or 80.

【０１０８】また、ポーズ設定部３８では、必要に応じ
て、公知の技術を用いて、構造木の、韻律語への分割な
ども行われる。Further, the pose setting section 38 also divides the structure tree into prosodic words by using a known technique, if necessary.

【０１０９】さらに、ポーズの設定方法は、上述の文献
に開示された方法に限定されるものではない。Furthermore, the pose setting method is not limited to the method disclosed in the above-mentioned document.

【０１１０】ポーズの位置が反映された構造木は、ポー
ズ設定部３８から韻律木生成部３９に供給される。韻律
木生成部３９では、ポーズ設定部３８からの構造木が韻
律木に変換される。The structure tree in which the position of the pose is reflected is supplied from the pose setting unit 38 to the prosody tree generating unit 39. In the prosody tree generation unit 39, the structure tree from the pose setting unit 38 is converted into a prosody tree.

【０１１１】即ち、ポーズ設定部３８から、例えば上述
したような構造木が供給された場合、韻律木生成部３９
では、例えば、次のような韻律木が生成される。That is, when the structure tree as described above is supplied from the pose setting section 38, for example, the prosody tree generation section 39.
Then, for example, the following prosodic tree is generated.

【０１１２】 [0112]

【０１１３】なお、括弧内の先頭部分は、形態素の読み
（音韻情報）を、原則として、大文字のローマ字で表し
ており、また括弧内の終わりの部分は、その形態素の品
詞を表している。また、音韻情報＠ＳＨＩは、「し」
（ＳＨＩ）の無声化音「すぃ」を表しており、ＱＩは、
「ぎ」（ＧＩ）の鼻濁音「んぎ」（但し、「ん」は小さ
く表記される）を表している。さらに、−は、調音を表
しており、＄は、促音「っ」を表している。また、Ｘ
は、撥音「ん」を表しており、’は、アクセントの位置
を表している。Note that, in principle, the leading part in the parentheses represents the reading (phonological information) of the morpheme in capitalized Roman letters, and the ending part in the parentheses represents the part of speech of the morpheme. In addition, phonological information @ SHI is "shi"
(SHI) represents the unvoiced sound "Sui", and QI is
It represents the nasal dull sound "Nigi" of "Gi" (GI) (however, "N" is written in small size). Furthermore, -represents articulation and $ represents the consonant "tsu". Also, X
Indicates the sound repellency "n", and "represents the position of the accent.

【０１１４】韻律木は、韻律木生成部３９からアクセン
ト処理部４０に供給される。アクセント処理部４０に
は、韻律木の他、強調語選択部１１から強調すべき単語
および品詞（正確には強調すべき品詞の単語）（以下、
両方含めて、強調語という）も供給されるようになされ
ている。アクセント処理部４０では、韻律木に対して、
アクセント融合により変化するアクセントに関する情報
が反映される。さらに、アクセント処理部４０では、強
調語選択部１１からの強調語と同一の形態素が、韻律木
から検索され、その形態素に対し、アクセントが設定さ
れる。The prosody tree is supplied from the prosody tree generator 39 to the accent processor 40. In addition to the prosody tree, the accent processing unit 40 includes a word and a part of speech (words of a part of speech to be emphasized) to be emphasized from the emphasized word selecting unit 11 (hereinafter,
Including both, it is said that the emphasis word) is also supplied. In the accent processing unit 40, for the prosodic tree,
Information on accents that change due to accent fusion is reflected. Further, in the accent processing unit 40, the same morpheme as the emphasized word from the emphasized word selecting unit 11 is searched from the prosodic tree, and an accent is set for the morpheme.

【０１１５】強調語は、上述したように入力文の分野に
おいて強調した方が好ましい単語であるから、ユーザに
は、そのような単語を強調した合成音が提供されること
となり、その結果、ユーザに対し、合成音によって、情
報を的確に伝えることが可能となる。Since the emphasized word is a word that is preferably emphasized in the field of the input sentence as described above, the user is provided with a synthetic sound emphasizing such a word. On the other hand, it becomes possible to accurately convey information by the synthesized voice.

【０１１６】なお、このアクセントの設定方法として
は、例えば、「日本語文章音声の合成のための韻律規
則」、河井、広瀬、藤崎、音響学会誌Ｖｏｌ．５０，
Ｎｏ．６，ｐｐ４３３−４４２などに開示されている手
法を用いることが可能である。但し、アクセントの設定
方法は、これに限定されるものではない。The accent setting method includes, for example, “Prosodic Rule for Synthesis of Japanese Text Speech”, Kawai, Hirose, Fujisaki, Acoustical Society Journal, Vol. Fifty,
No. 6, it is possible to use the method disclosed in pp433-442 and the like. However, the accent setting method is not limited to this.

【０１１７】例えば、上述したような韻律木は、アクセ
ント処理部４０において処理されることにより、次のよ
うに変換される。For example, the above-described prosodic tree is processed by the accent processing unit 40 and converted as follows.

【０１１８】 [0118]

【０１１９】アクセント処理部４０で処理された韻律木
は、韻律情報出力生成部４１に供給される。韻律情報出
力生成部４１では、アクセント処理部４０からの韻律木
が、音響処理部１５で処理可能なフォーマットに変換さ
れる。即ち、例えば、上述したような韻律木は、次のよ
うに変換される。The prosody tree processed by the accent processor 40 is supplied to the prosody information output generator 41. In the prosody information output generation unit 41, the prosody tree from the accent processing unit 40 is converted into a format that can be processed by the acoustic processing unit 15. That is, for example, the above prosodic tree is converted as follows.

【０１２０】”（ＨＥ−ＳＥ−）（＠ＳＨＩＣＨＩＮ
Ｅ’Ｘ）・・・ＰＡＵ２（ＳＡＸＱＩＩＸＳＥＸＤＥ）
（ＳＨＡＫＡＩＴＯ−ＷＡ）・・・（ＫＩ＄ＳＨＩ）Ｐ
ＡＵ３・・・ＰＡＵ１”"(HE-SE-) (@ SHICHIN
E'X) ... PAU2 (SAXQIIXSEXDE)
(Shakaito-wa) ... (KI $ SHI) P
AU3 ... PAU1 "

【０１２１】ここで、上述のように変換された情報は、
音韻情報および韻律情報の両方を含むので、以下、音韻
韻律情報という。Here, the information converted as described above is
Since it includes both phonological information and prosody information, it is hereinafter referred to as phonological prosody information.

【０１２２】韻律情報出力生成部４１には、韻律木の
他、パラメータ選択部１３から韻律パラメータも供給さ
れるようになされており、韻律情報出力生成部４１は、
韻律木を音韻韻律情報に変換した後、韻律パラメータと
ともに、音響処理部１５に出力する。In addition to the prosody tree, the prosody information output generation unit 41 is also supplied with prosody parameters from the parameter selection unit 13. The prosody information output generation unit 41
After converting the prosodic tree into phonological prosodic information, the prosodic tree is output to the acoustic processing unit 15 together with the prosodic parameters.

【０１２３】音響処理部１５は、例えば合成音の基本周
波数（ピッチ周波数）、音韻の継続時間長、およびパワ
ー（例えば、音素単位のパワー）などの韻律的特徴を制
御する韻律制御モデルを記憶しており、韻律情報出力生
成部４１からの韻律パラメータを用いて、韻律制御モデ
ルを駆動する。即ち、韻律制御モデルとして、例えばい
わゆる藤崎モデルが記憶されている場合、音響処理部１
５では、藤崎モデルを駆動するための、例えばフレーズ
指令やアクセント指令の大きさなどが、韻律情報出力生
成部４１からの韻律パラメータ（フレーズ指令の大きさ
のセット、アクセント指令の大きさのセット）の中から
選択され、それを用いて、藤崎モデルが駆動される。こ
れにより、例えば合成音の基本周波数や、音素のパワー
および継続時間長などの韻律データが生成される。The acoustic processing unit 15 stores a prosody control model for controlling prosodic features such as a fundamental frequency (pitch frequency) of a synthetic voice, a phoneme duration, and power (eg, phoneme-based power). Therefore, the prosody control model is driven using the prosody parameters from the prosody information output generation unit 41. That is, when a so-called Fujisaki model, for example, is stored as the prosody control model, the acoustic processing unit 1
In FIG. 5, for example, the size of a phrase command or an accent command for driving the Fujisaki model is a prosody parameter from the prosody information output generation unit 41 (a set of the size of the phrase command and a set of the size of the accent command). The Fujisaki model is driven by using one of the following. As a result, prosody data such as the fundamental frequency of the synthetic sound, the power and duration of the phoneme, is generated.

【０１２４】具体的には、韻律データとして、例えば合
成音のフレーズごとの基本周波数値、音素ごとの継続時
間を示すフレーム数、および音節ごとのパワーの値を示
すフレーム数などが生成される。Specifically, as the prosody data, for example, the basic frequency value for each phrase of the synthetic sound, the number of frames indicating the duration of each phoneme, the number of frames indicating the power value of each syllable, and the like are generated.

【０１２５】さらに、音響処理部１５は、例えば従来と
同様の手法で音声合成を行う合成器としての、例えば波
形素片接続による音声合成器などを内蔵しているととも
に、ＣＶやＣＶＣ／ＶＣＶなどの単位で、規則音声合成
に必要な音声素片データ（例えば、ディジタル化された
波形データ）を記憶しており、韻律情報出力生成部４１
からの音韻韻律情報に含まれる音韻情報、および上述し
た韻律データ（基本周波数、継続時間、およびパワーの
時間の具体的数値など）に基づいて、必要な音声素片デ
ータ（音素片データ）を連続した音声波形となるように
接続する。また、音響処理部１５は、韻律情報出力生成
部４１からの音韻韻律情報に基づき、同じく韻律情報出
力生成部４１からの韻律パラメータの中のポーズ長のセ
ットを参照して、音声波形にポーズを挿入する。そし
て、音響処理部１５は、以上のようにして得られた音声
波形に対し、例えばＤ／Ａ変換処理などの必要な処理を
施し、スピーカ１５Ａに供給して出力させる。これによ
り、入力文に対応する合成音が出力される。Further, the sound processing section 15 has a built-in voice synthesizer, for example, by a waveform segment connection, as a synthesizer for performing voice synthesis by a method similar to the conventional one, and CV, CVC / VCV, etc. The voice unit data (eg, digitized waveform data) necessary for the regular voice synthesis is stored in units of, and the prosody information output generation unit 41
Based on the phonological information included in the phonological prosody information from the above, and the above-mentioned prosody data (specific numerical values of the fundamental frequency, duration, and power time, etc.), the necessary speech segment data (speech segment data) is continuous. Connect so that the sound waveform is as specified. The acoustic processing unit 15 also refers to the set of pause lengths in the prosody parameters from the prosody information output generation unit 41 based on the phonological prosody information from the prosody information output generation unit 41, and poses the voice waveform. insert. Then, the sound processing unit 15 performs necessary processing such as D / A conversion processing on the sound waveform obtained as described above and supplies the sound waveform to the speaker 15A for output. As a result, a synthetic sound corresponding to the input sentence is output.

【０１２６】以上のように、初期情報設定部１を操作し
て入力される初期情報に基づいて、音声合成に用いる韻
律パラメータおよびポーズ設定規則を変更するようにし
たので、即ち、初期情報に対応した韻律パラメータおよ
びポーズ設定規則によって音声合成が行われるので、ユ
ーザに対し、適切な合成音を提供することが可能とな
る。さらに、入力文の特徴に対応して、音声合成に用い
る韻律パラメータおよびポーズ設定規則を変更するよう
にしたので、ユーザに対し、合成音によって、情報を的
確に伝えることが可能となる。As described above, the prosody parameter and the pause setting rule used for the voice synthesis are changed based on the initial information input by operating the initial information setting section 1, that is, the initial information corresponding to the initial information. Since the voice synthesis is performed according to the prosody parameter and the pause setting rule, it is possible to provide the user with an appropriate synthesized voice. Furthermore, since the prosody parameters and pause setting rules used for voice synthesis are changed in accordance with the characteristics of the input sentence, it is possible to accurately convey the information to the user by the synthesized voice.

【０１２７】なお、本実施例では、パラメータ選択部１
３において、ポーズ長などのセットを、５つのポーズ長
から構成するようにしたが、この他、ポーズ長のセット
は、５つ未満、あるいは５より多くのポーズ長から構成
するようにすることが可能である。但し、ポーズ長のセ
ットを１つのポーズ長から構成するようにした場合に
は、合成音に挿入されるポーズ長が、常に一定の長さと
なり、自然な合成音を得ることが困難になる。従って、
ポーズ長などのセットは、複数のポーズ長から構成する
ようにするのが好ましい。In this embodiment, the parameter selection unit 1
In 3, the set of pose lengths is made up of 5 pose lengths. In addition, the set of pose lengths may be made up of less than 5, or more than 5 pose lengths. It is possible. However, when the set of pause lengths is configured by one pause length, the pause length inserted in the synthetic voice is always a fixed length, and it becomes difficult to obtain a natural synthetic voice. Therefore,
The set of pose lengths and the like preferably comprises a plurality of pose lengths.

【０１２８】また、本実施例では、ポーズを設定するた
めのポーズ設定規則を、複数種類用意し、その中から、
使用すべきものを、初期情報や入力文の特徴に基づいて
選択するようにしたが、その他の韻律情報を生成するた
めの規則も複数種類用意し、その中から、使用すべきも
のを選択するようにすることが可能である。Further, in the present embodiment, a plurality of types of pose setting rules for setting a pose are prepared, and among them,
We chose the one to be used based on the initial information and the characteristics of the input sentence, but we also prepared multiple types of rules for generating other prosodic information, and select the one to be used from them. It is possible to

【０１２９】[0129]

【発明の効果】請求項１に記載の音声合成装置によれ
ば、規則記憶手段には、合成音の韻律情報を生成するた
めの、複数種類の韻律情報生成規則が記憶されている。
そして、選択情報入力手段によって、規則記憶手段に記
憶されている複数種類の韻律情報生成規則のうちのいず
れかを選択するための選択情報が入力されると、韻律情
報生成手段において、その選択情報に基づいて、規則記
憶手段に記憶されている複数種類の韻律情報生成規則の
うちのいずれかが選択され、その韻律情報生成規則を用
いて、解析手段の解析結果から、韻律情報が生成され
る。従って、ユーザが所望する韻律の合成音を提供する
ことが可能となる。According to the speech synthesizing apparatus of the first aspect, the rule storage means stores a plurality of types of prosody information generation rules for generating prosody information of synthetic speech.
Then, when the selection information inputting means inputs the selection information for selecting any one of the plurality of types of prosody information generation rules stored in the rule storage means, the selection information is inputted in the prosody information generation means. On the basis of the above, any one of a plurality of types of prosody information generation rules stored in the rule storage means is selected, and the prosody information generation rule is used to generate prosody information from the analysis result of the analysis means. . Therefore, it becomes possible to provide a synthesized sound having a prosody desired by the user.

【０１３０】請求項７に記載の音声合成装置によれば、
規則記憶手段には、合成音の韻律情報を生成するため
の、複数種類の韻律情報生成規則が記憶されている。一
方、特徴検出手段では、入力文の特徴が検出され、韻律
情報生成手段において、特徴検出手段によって検出され
た入力文の特徴に基づいて、規則記憶手段に記憶されて
いる複数種類の韻律情報生成規則のうちのいずれかが選
択される。さらに、韻律情報生成手段では、その韻律情
報生成規則を用いて、解析手段の解析結果から、韻律情
報が生成される。従って、入力文の特徴に対応した韻律
の合成音を提供することが可能となる。According to the speech synthesizer of claim 7,
The rule storage means stores a plurality of types of prosody information generation rules for generating prosody information of synthetic sounds. On the other hand, the feature detection means detects the features of the input sentence, and the prosody information generation means generates a plurality of types of prosody information stored in the rule storage means based on the features of the input sentence detected by the feature detection means. One of the rules is selected. Further, the prosody information generating means uses the prosody information generation rule to generate prosody information from the analysis result of the analyzing means. Therefore, it is possible to provide a synthetic sound having a prosody corresponding to the characteristics of the input sentence.

【０１３１】請求項１３に記載の音声合成方法によれ
ば、入力文が解析され、規則記憶手段に記憶されている
複数種類の韻律情報生成規則のうちのいずれかが選択さ
れる。そして、その韻律情報生成規則を用いて、入力文
の解析結果から、韻律情報が生成され、その韻律情報に
基づいて、合成音が生成される。従って、ユーザに対
し、適切な韻律の合成音を提供することが可能となる。According to the speech synthesis method of the thirteenth aspect, the input sentence is analyzed, and one of the plural types of prosody information generation rules stored in the rule storage means is selected. Then, using the prosody information generation rule, prosody information is generated from the analysis result of the input sentence, and a synthetic sound is generated based on the prosody information. Therefore, it is possible to provide the user with a synthetic sound having an appropriate prosody.

【０１３２】請求項１４に記載の音声合成装置および請
求項１９に記載の音声合成方法によれば、パラメータ記
憶手段に記憶されている複数の韻律パラメータのうちの
いずれかを選択するための選択情報が入力されると、そ
の選択情報に基づいて、パラメータ記憶手段に記憶され
ている複数の韻律パラメータのうちのいずれかが選択さ
れ、その韻律パラメータに対応する韻律の合成音が生成
される。従って、ユーザが希望する韻律の合成音を提供
郷することが可能となる。According to the voice synthesizing apparatus and the voice synthesizing method of the nineteenth aspect, the selection information for selecting one of the plurality of prosody parameters stored in the parameter storage means. Is input, any one of the plurality of prosody parameters stored in the parameter storage means is selected based on the selection information, and a synthetic sound having a prosody corresponding to the prosody parameter is generated. Therefore, it is possible to provide a synthesized sound having a prosody desired by the user.

【０１３３】請求項２０に記載の音声合成装置および請
求項２４に記載の音声合成方法によれば、入力文の特徴
が検出され、その特徴に基づいて、パラメータ記憶手段
に記憶されている複数の韻律パラメータのうちのいずれ
かが選択される。そして、その韻律パラメータに対応す
る韻律の合成音が生成される。従って、入力文の特徴に
あった韻律の合成音を提供することが可能となる。According to the speech synthesizing apparatus of claim 20 and the speech synthesizing method of claim 24, the characteristics of the input sentence are detected, and a plurality of parameters stored in the parameter storage means are detected based on the characteristics. Any of the prosody parameters are selected. Then, a synthetic sound having a prosody corresponding to the prosody parameter is generated. Therefore, it is possible to provide a synthetic sound having a prosody that matches the characteristics of the input sentence.

[Brief description of drawings]

【図１】本発明を適用した音声合成装置の一実施例の構
成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a speech synthesizer to which the present invention is applied.

【図２】図１の初期情報設定部１で設定することのでき
る初期情報（合成音の用途、入力文の分野、および入力
文の基準文長）のリストを示す図である。FIG. 2 is a diagram showing a list of initial information (application of synthesized speech, field of input sentence, and reference sentence length of input sentence) that can be set by initial information setting unit 1 in FIG.

【図３】図１の言語解析部２の構成例を示すブロック図
である。3 is a block diagram showing a configuration example of a language analysis unit 2 in FIG.

【図４】図１のキーワード辞書４の構成例を示す図であ
る。FIG. 4 is a diagram showing a configuration example of a keyword dictionary 4 of FIG.

【図５】図１のキーワード履歴テーブル５の構成例を示
す図である。5 is a diagram showing a configuration example of a keyword history table 5 in FIG.

【図６】図１の強調語辞書１０の構成例を示す図であ
る。FIG. 6 is a diagram showing a configuration example of an emphasized word dictionary 10 in FIG.

【図７】図１の文長履歴テーブル８の構成例を示す図で
ある。7 is a diagram showing a configuration example of a sentence length history table 8 in FIG.

【図８】図１のパラメータ選択部１３の構成例を示す図
である。8 is a diagram showing a configuration example of a parameter selection unit 13 of FIG.

【図９】図８のポーズ設定パラメータ辞書３４の構成例
を示す図である。9 is a diagram showing a configuration example of a pose setting parameter dictionary 34 of FIG.

【図１０】図８の韻律パラメータ辞書３５の構成例を示
す図である。10 is a diagram showing a configuration example of a prosody parameter dictionary 35 of FIG.

【図１１】図１の韻律情報生成部１４の構成例を示すブ
ロック図である。11 is a block diagram showing a configuration example of a prosody information generation unit 14 in FIG.

【図１２】図１１のポーズ設定規則辞書３８Ａに記憶さ
れているポーズ設定規則の例を示す図である。12 is a diagram showing an example of pose setting rules stored in a pose setting rule dictionary 38A of FIG. 11. FIG.

【符号の説明】１初期情報設定部２言語解析部３キーワード度抽出部４キーワード辞書５キーワード履歴テーブル６分野判定部７文長取得部８文長履歴テーブル９基準文長判定部１０強調語辞書１１強調語選択部１２パラメータ辞書１３パラメータ選択部１４韻律情報生成部１５音響処理部３４ポーズ設定パラメータ辞書３５韻律パラメータ辞書３６ポーズ設定パラメータ選択部３７韻律パラメータ選択部３８ポーズ設定部３９ポーズ設定規則辞書４０韻律木生成部４１アクセント処理部４２韻律情報出力生成部[Description of symbols] 1 initial information setting unit 2 language analysis unit 3 keyword degree extraction unit 4 keyword dictionary 5 keyword history table 6 field determination unit 7 sentence length acquisition unit 8 sentence length history table 9 standard sentence length determination unit 10 emphasized word dictionary 11 Enhanced Word Selection Section 12 Parameter Dictionary 13 Parameter Selection Section 14 Prosody Information Generation Section 15 Acoustic Processing Section 34 Pose Setting Parameter Dictionary 35 Prosody Parameter Dictionary 36 Pose Setting Parameter Selection Section 37 Prosody Parameter Selection Section 38 Pose Setting Section 39 Pose Setting Rule Dictionary 40 Prosody Tree Generation Unit 41 Accent Processing Unit 42 Prosody Information Output Generation Unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者浅野康治東京都品川区北品川６丁目７番35号ソニー株式会社内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Koji Asano 6-735 Kitashinagawa, Shinagawa-ku, Tokyo Sony Corporation

Claims

[Claims]

1. A voice synthesizing device for generating a synthetic voice corresponding to an input sentence, comprising rule storage means for storing a plurality of types of prosody information generation rules for generating prosody information of the synthetic voice. A selection information input unit for inputting selection information for selecting one of a plurality of types of prosody information generation rules stored in the rule storage unit; an analysis unit for analyzing the input sentence; Based on the selection information input from the selection information input unit, any one of a plurality of types of prosody information generation rules stored in the rule storage unit is selected, and the prosody information generation rule is used, From the analysis result of the analysis means,
A speech synthesis apparatus comprising: a prosody information generation unit that generates the prosody information; and a synthetic sound generation unit that generates the synthetic sound based on the prosody information generated by the prosody information generation unit.

2. The voice synthesizing apparatus according to claim 1, wherein the selection information is information about an application of the synthetic sound.

3. The speech synthesizing apparatus according to claim 1, wherein the selection information is information about a field to which the input sentence belongs.

4. An emphasis word determining means for determining a word to be emphasized based on the selection information, which is information about a field to which the input sentence belongs, wherein the prosody information generating means includes the emphasis word determining means. The speech synthesis apparatus according to claim 3, wherein the prosody information is generated so that the word determined by is emphasized.

5. The speech synthesizer according to claim 1, wherein the selection information is information about a sentence length of the input sentence.

6. The speech synthesizer according to claim 1, wherein the prosody information is information about a position or a length of a pose.

7. A voice synthesizing device for generating a synthetic voice corresponding to an input sentence, comprising rule storage means for storing a plurality of types of prosody information generation rules for generating prosody information of the synthetic voice. A feature detecting unit that detects a feature of the input sentence, an analyzing unit that analyzes the input sentence, and a feature storing unit that stores the feature in the rule storage unit based on the feature of the input sentence detected by the feature detecting unit. A prosody information generation unit that selects any one of a plurality of types of prosody information generation rules and uses the prosody information generation rule to generate the prosody information from the analysis result of the analysis unit; and the prosody information generation unit. And a synthesized sound generating means for generating the synthesized sound based on the prosody information generated by the speech synthesis device.

8. The speech synthesizer according to claim 7, wherein the feature detecting means detects a field to which the input sentence belongs.

9. The method further comprises: an emphasized word determination unit that determines a word to be emphasized based on a field to which the input sentence detected by the feature detection unit belongs, wherein the prosody information generation unit includes the emphasized word determination unit. The speech synthesis apparatus according to claim 8, wherein the prosody information is generated so that the word determined by is emphasized.

10. A keyword detecting means for detecting a predetermined keyword from the input sentence is further provided, wherein the feature detecting means determines a field to which the input sentence belongs based on the keyword detected by the keyword detecting means. The speech synthesis apparatus according to claim 8, wherein the speech synthesis apparatus detects the speech.

11. The speech synthesis apparatus according to claim 7, wherein the feature detecting means detects a sentence length of the input sentence.

12. The speech synthesizer according to claim 7, wherein the prosody information is information about a position or a length of a pose.

13. A voice synthesizing method for a voice synthesizing device, comprising rule storing means for storing a plurality of types of prosody information generation rules for generating prosodic information of synthesized voice corresponding to an input sentence, said method comprising: The input sentence is analyzed, one of a plurality of types of prosody information generation rules stored in the rule storage means is selected, and the prosody information generation rule is used to determine the prosody from the analysis result of the input sentence. A voice synthesizing method characterized by generating information and generating the synthesized voice based on the prosody information.

14. A voice synthesizing device for generating a synthetic voice corresponding to an input sentence, comprising: parameter storage means for storing a plurality of prosody parameters for controlling the prosody of the synthetic voice; and the parameter storage. Selection information input means for inputting selection information for selecting one of a plurality of prosody parameters stored in the means, analysis means for analyzing the input sentence, input from the selection information input means Selection means for selecting any one of the plurality of prosody parameters stored in the parameter storage means on the basis of the selected information selected, and selection by the selection means based on the analysis result of the analysis means. And a synthesized sound generation unit that generates the synthesized sound having a prosody corresponding to the prosody parameter.

15. The voice synthesizing apparatus according to claim 14, wherein the selection information is information about a use of the synthetic sound.

16. The selection information is information about a field to which the input sentence belongs.
The speech synthesizer according to.

17. The speech synthesizer according to claim 14, wherein the selection information is information about a sentence length of the input sentence.

18. The speech synthesis apparatus according to claim 14, wherein the prosody parameter represents a speech rate, a phrase size, or an accent size.

19. A voice synthesizing method for a voice synthesizing device, comprising: a parameter storing means for storing a plurality of prosodic parameters for controlling a prosody of a synthetic voice corresponding to an input sentence, wherein the parameter storing means stores When selection information for selecting one of the plurality of stored prosody parameters is input, any one of the plurality of prosody parameters stored in the parameter storage means based on the selection information. A voice synthesizing method, characterized in that the selected voice is selected and the synthesized voice of the prosody corresponding to the prosody parameter is generated.

20. A voice synthesis device for generating a synthetic voice corresponding to an input sentence, comprising: parameter storage means for storing a plurality of prosody parameters for controlling the prosody of the synthetic voice; and the input sentence. Of the prosody parameters stored in the parameter storage unit based on the features of the input sentence detected by the feature detection unit Selection means for selecting any one of the above, and synthetic sound generation means for generating the synthetic sound of the prosody corresponding to the prosody parameter selected by the selection means based on the analysis result of the analysis means. A speech synthesizer characterized by the above.

21. The selection information is information about a field to which the input sentence belongs.
The speech synthesizer according to.

22. The speech synthesizer according to claim 20, wherein the selection information is information about a sentence length of the input sentence.

23. The speech synthesis apparatus according to claim 20, wherein the prosody parameter represents a speech rate, a phrase size, or an accent size.

24. A voice synthesizing method of a voice synthesizing device, comprising a parameter storage unit for storing a plurality of prosodic parameters for controlling a prosody of a synthetic voice corresponding to an input sentence, the feature of the input sentence. Detecting any one of the plurality of prosody parameters stored in the parameter storage means based on the characteristics of the input sentence, and generating the synthesized voice of the prosody corresponding to the prosody parameter. A method for synthesizing speech.