JP2009204795A

JP2009204795A - Fundamental frequency estimation device, fundamental frequency estimation method, fundamental frequency estimation program, and storage medium

Info

Publication number: JP2009204795A
Application number: JP2008045929A
Authority: JP
Inventors: Hideji Nakajima; 秀治中嶋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-02-27
Filing date: 2008-02-27
Publication date: 2009-09-10
Anticipated expiration: 2028-02-27
Also published as: JP4829912B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device, a method and a program capable of estimating a fundamental frequency more approximate to interactive voice, and its storage medium. <P>SOLUTION: The fundamental frequency estimation device is comprised of a morpheme analysis dictionary, a category name dictionary, a text analysis part, an information re-forming part, a fundamental frequency summary value estimation part, and a detail fundamental frequency estimation part. The text analysis part analyzes an inputted sentence and outputs information such as a category name different from a the part of speech in a main word of an accent phrase and a relative pitch of the accent phrase as linguistic information. The information re-forming part forms the linguistic information in a predetermined form. The fundamental frequency summary value estimation part obtains a fundamental frequency summary value every accent phrase using with a regression model. The detail fundamental frequency estimation part adapts a temporal change in the fundamental frequency obtained from the linguistic information to the fundamental frequency summary value and outputs the fundamental frequency. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、テキスト情報から音声合成を行う際に、合成対象のテキスト情報からアクセント句ごとの基本周波数を推定する基本周波数推定装置、基本周波数推定方法、基本周波数推定プログラム、記憶媒体に関する。 The present invention relates to a fundamental frequency estimation device, a fundamental frequency estimation method, a fundamental frequency estimation program, and a storage medium that estimate a fundamental frequency for each accent phrase from text information to be synthesized when speech synthesis is performed from text information.

従来の音声合成用の基本周波数推定方法は、非特許文献１や非特許文献２のように、アクセント句を単位としてアクセント句の基本周波数の平均値や最高値や始点の高さや終点の高さなどの基本周波数の要約値を、当該および前後のアクセント句のアクセント型、長さ、位置や品詞の情報から推定し、それを基準として詳細な基本周波数を推定する。
阿部匡伸、佐藤大和、「音声区分化モデルに基づく基本周波数の2段階制御方式」、音響学会論文誌４９巻１０号，pp.682-690,1993. Yoshinori Sagisaka, “On the prediction of global F0 shape for Japanese Text -to-speech”, Proceedings of ICASSP,pp.325-328,1990. The conventional fundamental frequency estimation method for speech synthesis is, as in Non-Patent Document 1 and Non-Patent Document 2, an average value, maximum value, start point height, and end point height of the basic frequency of accent phrases in units of accent phrases. Are estimated from the accent type, length, position and part-of-speech information of the corresponding and preceding and following accent phrases, and a detailed basic frequency is estimated based on that.
Abe Yasunobu and Sato Yamato, “Two-step control method of fundamental frequency based on speech segmentation model”, Journal of Acoustical Society, Vol. 49, No. 10, pp.682-690, 1993. Yoshinori Sagisaka, “On the prediction of global F0 shape for Japanese Text -to-speech”, Proceedings of ICASSP, pp.325-328, 1990.

非特許文献１及び２に記載されている方法によって、新聞記事等を対象とした淡々とした読み上げ音声の合成をすることができる。ところが、童話の語り聞かせや商品宣伝や会話のような人に語りかける対話音声では、従来技術で扱った新聞読み上げの基本周波数とは大きく異なる。そのため、対話音声における基本周波数の推定には、従来技術で推定に用いられた情報だけでは不十分であった。例えば、副詞と形容詞が連なる揚合、従来法では品詞を与えるのみであるので、形容詞に属するすべての単語連鎖において一様な基本周波数制御であったが、人に語りかけるような対話音声では、形容詞の品詞の文法機能とは異なるカテゴリ、例えば、意味のポジティブかネガティブか及びその意味の強さに応じて基本周波数の高さが大きく異なり、従来の方法では違いを正しく再現することが偶然を除いて不可能である。本発明は、上述の問題に鑑みて発明されたもので、より対話音声に近い基本周波数の推定が可能となる装置、方法、プログラム及びその記憶媒体を提供することを目的とする。 By the methods described in Non-Patent Documents 1 and 2, it is possible to synthesize a read-out speech for a newspaper article or the like. However, conversational voices spoken to people such as storytelling of children's stories, product promotions, and conversations differ greatly from the basic frequency of newspaper reading handled in the prior art. For this reason, information used for estimation in the prior art is not sufficient for estimation of the fundamental frequency in dialogue speech. For example, the combination of adverbs and adjectives, the conventional method only gives part of speech, so it was uniform fundamental frequency control in all word chains belonging to adjectives, but in dialogue speech that speaks to people, adjectives The frequency of the fundamental frequency differs greatly depending on the category different from the grammatical function of the part of speech, such as positive or negative meaning and the strength of the meaning. Is impossible. The present invention was invented in view of the above-described problems, and an object of the present invention is to provide an apparatus, a method, a program, and a storage medium capable of estimating a fundamental frequency that is closer to a dialog voice.

本発明の基本周波数推定装置は、形態素解析辞書、カテゴリ名辞書、テキスト解析部、情報整形部、基本周波数要約値推定部、詳細基本周波数推定部から構成される。形態素解析辞書は、単語に対する品詞、読み、アクセント型を記録している。カテゴリ名辞書は、単語に対する品詞とは異なるカテゴリ名を記録している。例えば、カテゴリ名として、事前の分析によって、基本周波数の要約値が大きく異なることがわかっている単語に対する品詞の下位分類のカテゴリ名（例えば、単語に対する意味とその強度によって構成されたカテゴリ名）がある。テキスト解析部は、入力された文を解析し、アクセント句の先頭と末尾の単語の品詞、アクセント句のアクセント型、アクセント句の読み、アクセント句の長さ、アクセント句の主たる単語での品詞とは異なるカテゴリ名、アクセント句間の音調結合型、及びアクセント句の相対的な高さの情報を言語情報として出力する。ただし、アクセント句の主たる単語での品詞とは異なるカテゴリ名、または、アクセント句の相対的な高さの情報のうち、何れか一方を言語情報に含めるだけでもよい。なお、アクセント句の主たる単語での品詞とは異なるカテゴリ名を言語情報に含まない場合には、カテゴリ名辞書を設けなくともよい。情報整形部は、テキスト解析部から出力された言語情報を、あらかじめ定めた形式に整形する。基本周波数要約値推定部は、回帰モデルを用いて、整形された言語情報から、アクセント句ごとの基本周波数要約値を求める。なお、回帰モデルとは、前記の情報整形部で取り出された情報を入力として基本周波数の要約値を推定するモデルである。詳細基本周波数推定部は、前記言語情報から求まる基本周波数の時間的変化を、前記基本周波数要約値に適応させ、基本周波数を出力する。 The fundamental frequency estimation apparatus according to the present invention includes a morphological analysis dictionary, a category name dictionary, a text analysis unit, an information shaping unit, a fundamental frequency summary value estimation unit, and a detailed fundamental frequency estimation unit. The morphological analysis dictionary records parts of speech, readings, and accent types for words. The category name dictionary records a category name different from the part of speech for the word. For example, as a category name, a category name of a subcategory of part of speech for a word whose fundamental frequency summary value is known to be greatly different by a prior analysis (for example, a category name constituted by the meaning and the strength of the word) is there. The text analysis unit analyzes the input sentence and determines the part of speech of the beginning and end of the accent phrase, the accent type of the accent phrase, the reading of the accent phrase, the length of the accent phrase, the part of speech in the main word of the accent phrase, Outputs information on the different category names, the tone combination type between accent phrases, and the relative height of accent phrases as language information. However, any one of the category name different from the part of speech in the main word of the accent phrase or the relative height information of the accent phrase may be included in the language information. If the language information does not include a category name different from the part of speech of the main word of the accent phrase, the category name dictionary may not be provided. The information shaping unit shapes the language information output from the text analysis unit into a predetermined format. The fundamental frequency summary value estimation unit obtains a fundamental frequency summary value for each accent phrase from the formatted language information using the regression model. The regression model is a model that estimates the summary value of the fundamental frequency by using the information extracted by the information shaping unit as an input. The detailed fundamental frequency estimation unit adapts the temporal change of the fundamental frequency obtained from the language information to the fundamental frequency summary value and outputs the fundamental frequency.

テキスト解析部での処理を詳述する。テキスト解析部は、形態素解析手段、アクセント句決定手段、アクセント句読み推定手段、アクセント句アクセント型推定手段、アクセント句間音調結合推定手段、相対的高さ算出手段、カテゴリ名付与手段を有する。形態素解析手段は、入力された文を単語ごとに分解して単語列を生成し、形態素解析辞書を参照して各単語に品詞、読み、単語単体でのアクセント型を求める。アクセント句決定手段は、単語列を、アクセントの単位となる１つ以上の単語からなるアクセント句に単語列をまとめあげる。アクセント句読み推定手段は、アクセント句ごとの読みを推定し、アクセント句の長さを推定する。アクセント句アクセント型推定手段は、アクセント句ごとのアクセント型を推定する。アクセント句間音調結合型推定手段は、隣り合うアクセント句間の音調結合型を推定する。相対的高さ算出手段は、音調結合型の情報を用いて、各アクセント句の相対的な高さを算出する。カテゴリ名付与手段は、アクセント句ごとに、アクセント句内の主たる単語に、その単語の表層だけまたは単語の表層と品詞との組みをキーとして前記カテゴリ名辞書を検索して得られるカテゴリ名を付与する。ただし、言語情報として、単語に対する品詞とは異なるカテゴリ名を用いないときは、カテゴリ名付与手段及びカテゴリ名辞書は不要である。また言語情報としてアクセント句の相対的な高さの情報を用いないときは、相対的高さ算出手段は不要である。 Processing in the text analysis unit will be described in detail. The text analysis unit includes morphological analysis means, accent phrase determination means, accent phrase reading estimation means, accent phrase accent type estimation means, accent phrase intertone combination estimation means, relative height calculation means, and category name assignment means. The morpheme analysis means generates a word string by decomposing the inputted sentence for each word, refers to the morpheme analysis dictionary, and obtains the part of speech, the reading, and the accent type for the word alone. The accent phrase determining means collects the word strings into accent phrases composed of one or more words serving as accent units. The accent phrase reading estimation means estimates the reading for each accent phrase and estimates the length of the accent phrase. The accent phrase accent type estimation means estimates an accent type for each accent phrase. The accent phrase-to-accent phrase type estimation means estimates the tone-joint type between adjacent accent phrases. The relative height calculation means calculates the relative height of each accent phrase using the tone combination type information. The category name assigning means assigns, for each accent phrase, a category name obtained by searching the category name dictionary to the main word in the accent phrase using only the surface layer of the word or a combination of the word surface layer and the part of speech as a key. To do. However, when a category name different from the part of speech for the word is not used as language information, the category name assigning means and the category name dictionary are unnecessary. When the relative height information of the accent phrase is not used as the language information, the relative height calculating means is not necessary.

単語に対する品詞とは異なるカテゴリ名を導入して推定を行なうので、従来に比べて、アクセント句ごとの基本周波数のより正確な要約値の推定が可能となる。また、位置の情報の代わりに相対的な高さの情報を用いるので、再現される基本周波数の要約値の変動と実際の基本周波数の要約値の変動との相関が高くなる。その結果、従来技術に比べて、より対話音声に近い基本周波数の推定が可能となる。 Since the estimation is performed by introducing a category name different from the part of speech for the word, it is possible to estimate the summary value of the fundamental frequency for each accent phrase more accurately than in the past. Further, since relative height information is used instead of position information, the correlation between the reproduction of the fundamental frequency summary value and the actual fundamental frequency summary value fluctuation increases. As a result, it is possible to estimate the fundamental frequency closer to the conversational voice as compared with the prior art.

ここで、本発明の実施例について述べる。 Now, an embodiment of the present invention will be described.

図１は、実施例１の基本周波数推定装置の構成例を示す図である。図２は、実施例１の基本周波数推定方法の処理の流れの例を示す図である。基本周波数推定装置１０は、形態素解析辞書１１０、カテゴリ名辞書１２０、テキスト解析部１００、情報整形部２００、基本周波数要約値推定部４００及び詳細基本周波数推定部５００から構成される。なお、形態素解析辞書１１０やカテゴリ名辞書１２０を基本周波数推定装置１０の外部に設けてもよい。 FIG. 1 is a diagram illustrating a configuration example of the fundamental frequency estimation apparatus according to the first embodiment. FIG. 2 is a diagram illustrating an example of a processing flow of the fundamental frequency estimation method according to the first embodiment. The fundamental frequency estimation device 10 includes a morphological analysis dictionary 110, a category name dictionary 120, a text analysis unit 100, an information shaping unit 200, a fundamental frequency summary value estimation unit 400, and a detailed fundamental frequency estimation unit 500. Note that the morphological analysis dictionary 110 and the category name dictionary 120 may be provided outside the fundamental frequency estimation device 10.

形態素解析辞書１１０には、各単語に対する、単語の品詞、読み、アクセント型が記憶されている。図３は、実施例１のカテゴリ名辞書の項目例を示す図である。カテゴリ名辞書１２０には、検索のキーとなる単語の表層、及び単語に対する品詞とは異なるカテゴリ名（例えば、単語の意味によって構成されたカテゴリ名）が記憶されている。なお、単語の表層とは、単語の字面を意味する。検索のキーとして単語の表層および品詞が記憶されていてもよい。 The morphological analysis dictionary 110 stores the part of speech, reading, and accent type of each word. FIG. 3 is a diagram illustrating an example of items in the category name dictionary of the first embodiment. The category name dictionary 120 stores a surface layer of words as search keys and a category name different from the part of speech for the word (for example, a category name constituted by the meaning of the word). The word surface layer means the word face. A word surface layer and part of speech may be stored as a search key.

テキスト解析部１００は、合成対象文を単語列に分割し、形態素解析辞書１１０を参照して、全単語に品詞情報、読み、アクセント型の情報を付与する。その単語列からアクセント句をまとめあげ、各アクセント句の読み、アクセント型を決定する。アクセント句間の音調結合型を決定し、各アクセント句の相対的な高さを算出する。さらに、カテゴリ名辞書１２０を用いて、アクセント句内の主たる単語に、その単語の表層、または、単語の表層および品詞をキーとして、カテゴリ名辞書を検索して得られるカテゴリ名を付与する（Ｓ１００）。 The text analysis unit 100 divides the synthesis target sentence into word strings, refers to the morphological analysis dictionary 110, and gives part-of-speech information, reading, and accent-type information to all words. Accent phrases are collected from the word string, and the reading and accent type of each accent phrase are determined. The tone combination type between accent phrases is determined, and the relative height of each accent phrase is calculated. Further, by using the category name dictionary 120, a category name obtained by searching the category name dictionary is assigned to the main word in the accent phrase using the surface of the word or the surface of the word and the part of speech as a key (S100). ).

図４は、実施例１のテキスト解析部の構成例を示す図である。図５は、実施例１のテキスト解析ステップの処理の流れの例を示す図である。テキスト解析部１００は、形態素解析手段１０１、アクセント句決定手段１０２、アクセント句読み推定手段１０３、アクセント句アクセント型推定手段１０４、アクセント句音調結合推定手段１０５、相対的高さ算出手段１０６及びカテゴリ名付与手段１０７から構成される。 FIG. 4 is a diagram illustrating a configuration example of the text analysis unit according to the first embodiment. FIG. 5 is a diagram illustrating an example of the processing flow of the text analysis step according to the first embodiment. The text analysis unit 100 includes a morphological analysis unit 101, an accent phrase determination unit 102, an accent phrase reading estimation unit 103, an accent phrase accent type estimation unit 104, an accent phrase tone combination estimation unit 105, a relative height calculation unit 106, and a category name. It is comprised from the provision means 107. FIG.

形態素解析手段１０１は、合成対象文に対して、形態素解析を適用し、文を構成する単語に分解し、形態素解析辞書１１０を参照して単語ごとに品詞、読み及びアクセント型を付与する（Ｓ１０１）。なお、同時に単語ごとの活用型と活用形を付与し、装置を構成する後の部において利用してもよい。アクセント句決定手段１０２は、形態素解析手段１０１の出力結果である単語列を用いてアクセント句をまとめあげる。さらに、アクセント句が決定すると、アクセント句の先頭の単語の品詞及び末尾の単語の品詞が決定する（Ｓ１０２）。アクセント句読み推定手段１０３は、形態素解析手段１０１の出力結果である単語の読みとアクセント句決定手段１０２の出力結果であるアクセント句の情報からアクセント句の読み及びアクセント句の長さを決定する（Ｓ１０３）。なお、決定に際し、単語のアクセント型の情報を用いてもよい。アクセント句アクセント型推定手段１０４は、形態素解析手段１０１の出力結果である単語のアクセント型とアクセント句決定手段１０２の出力結果であるアクセント句の情報からアクセント句のアクセント型を決定する（Ｓ１０４）。なお、決定に際し、単語の読みの情報を用いてもよい。アクセント句音調結合推定手段１０５は、先行するアクセント句との音調結合型を決定する（Ｓ１０５）。なお、後続のアクセント句との音調結合型を単独で決定しても、または、先行するアクセント句との音調結合型と同時に決定してもよい。その場合、後続のアクセント句との音調結合型を単独、または、先行するアクセント句との音調結合型と同時に用いて、基本周波数要約値推定部４００で要約値を推定してもよい。相対的高さ算出手段１０６は、先行するアクセント句との音調結合型の情報を用いて、アクセント句の相対的な高さを算出する（Ｓ１０６）。相対的な高さの情報は、音調結合型のうち弱結合を上昇、強結合を下降とみなして、冒頭のアクセント句の高さを０として、次への結合型が弱結合であれば、１をプラスし、強結合であれば、−１をプラスすることで、アクセント句の相対的な高さを表現する。高さの計算は文頭、ポーズ、文末のいずれか２つによって挟まれたアクセント句列の範囲で行う。なお、文頭から開始して次の弱結合が始まるまでを1つの範囲として、同様の方法による範囲設定を文末まで繰り返し、相対的な高さの計算を行なうことも可能である。カテゴリ名付与手段１０７は、アクセント句ごとにアクセント句の主たる単語の表層をキーとして、カテゴリ名辞書１２０を検索して、各単語の意味によって構成されたカテゴリ名を付与する（Ｓ１０７）。カテゴリ名には、形容詞の「楽しい」「明るい」など、形容詞と形容動詞由来の名詞の「希少」などにはポジティブさを表現するカテゴリ名としてのＰｏｓｉｔｉｖｅの「Ｐ」、形容詞の「悲しい」「つらい」など、形容詞と形容動詞由来の名詞の「悲惨」などにはネガティブさを表現するカテゴリ名としてのＮｅｇａｔｉｖｅの「Ｎ」、どちらにも該当しないことを示すカテゴリ名としての「＊」という２つ以上のカテゴリ名を与える。なお、カテゴリ名辞書には、カテゴリ名として、単語に対する品詞とは異なるカテゴリ名、単語に対する品詞の下位分類のカテゴリ名、または、単語に対する意味とその強度によって構成されたカテゴリ名を記録してもよい。よって、カテゴリ名辞書には、実施例の「Ｐ」、「Ｎ」、「＊」という３種類のクラスを、さらに細分類したクラスを設け、それぞれのクラスに属する単語を対応付けておくことも可能である。 The morpheme analysis unit 101 applies morpheme analysis to the composition target sentence, breaks it down into words constituting the sentence, and gives a part of speech, a reading, and an accent type for each word with reference to the morpheme analysis dictionary 110 (S101). ). In addition, the utilization type | mold and utilization form for every word may be provided simultaneously, and you may utilize in the subsequent part which comprises an apparatus. The accent phrase determination unit 102 collects accent phrases using the word string that is the output result of the morpheme analysis unit 101. Further, when the accent phrase is determined, the part of speech of the first word and the last word of the accent phrase are determined (S102). The accent phrase reading estimation means 103 determines the accent phrase reading and the length of the accent phrase from the word reading as the output result of the morpheme analysis means 101 and the accent phrase information as the output result of the accent phrase determination means 102 ( S103). In the determination, word accent type information may be used. The accent phrase accent type estimation unit 104 determines the accent type of the accent phrase from the accent type of the word that is the output result of the morpheme analysis unit 101 and the accent phrase information that is the output result of the accent phrase determination unit 102 (S104). Note that word reading information may be used in the determination. The accent phrase tone combination estimation means 105 determines the tone combination type with the preceding accent phrase (S105). Note that the tone combination type with the subsequent accent phrase may be determined independently, or may be determined simultaneously with the tone combination type with the preceding accent phrase. In this case, the fundamental frequency summary value estimation unit 400 may estimate the summary value using the tone combination type with the subsequent accent phrase alone or simultaneously with the tone combination type with the preceding accent phrase. The relative height calculation means 106 calculates the relative height of the accent phrase by using the tone combination type information with the preceding accent phrase (S106). The relative height information is that the weak coupling of the tonal coupling type is considered as rising, the strong coupling is regarded as falling, the height of the accent phrase at the beginning is 0, and the next coupling type is weak coupling, If 1 is added and if it is a strong coupling, -1 is added to express the relative height of the accent phrase. The height is calculated within the range of the accent phrase sequence sandwiched by any two of the sentence head, pose, and sentence end. It is also possible to calculate the relative height by repeating the range setting by the same method until the end of the sentence with one range starting from the beginning of the sentence until the next weak coupling starts. The category name assigning means 107 searches the category name dictionary 120 for each accent phrase using the surface layer of the main word of the accent phrase as a key, and assigns a category name constituted by the meaning of each word (S107). The category names include the adjectives “fun” and “bright”, the adjectives and “noble” nouns derived from the adjective verbs, the positive “P” as a category name that expresses positiveness, the adjectives “sad” “ “Night” as a category name that expresses negativeness, and “*” as a category name that does not correspond to either of the adjectives and nouns derived from adjective verbs such as “Tsurai” Give one or more category names. In the category name dictionary, a category name different from the part of speech for the word, a category name of a subcategory of the part of speech for the word, or a category name constituted by the meaning and strength of the word may be recorded as the category name. Good. Therefore, the category name dictionary may be provided with classes obtained by further subdividing the three types of classes “P”, “N”, and “*” of the embodiment, and associating words belonging to the respective classes. Is possible.

情報整形部２００は、回帰モデル３００の説明変数に与える情報をあらかじめ定めた形式に整形する（Ｓ２００）。具体的には、情報整形部２００は、テキスト解析部１００の結果から当該アクセント句、及び先行の１つ以上のアクセント句と後続の1つ以上のアクセント句からそれぞれのアクセント句の先頭と末尾の単語の品詞情報とアクセント句のアクセント型情報、アクセント句の長さの情報、及び主たる単語での品詞とは異なるカデゴリ名を取り出す。また、先行のアクセント句から当該アクセント句への音調結合型、及び音調結合型から算出した当該のアクセント句の相対的な高さの情報を取り出す。例えば、当該アクセント句、１つ先行のアクセント句、及び１つ後続のアクセント句の情報を取り出す場合には、
（先行のアクセント句の先頭の単語の品詞情報、
先行のアクセント句の末尾の単語の品詞情報、
先行のアクセント句のアクセント型情報、
先行のアクセント句の長さの情報、
先行のアクセント句の主たる単語での品詞とは異なるカテゴリ名
当該のアクセント句の先頭の単語の品詞情報、
当該のアクセント句の末尾の単語の品詞情報、
当該のアクセント句のアクセント型情報、
当該のアクセント句の長さの情報、
当該のアクセント句の主たる単語での品詞とは異なるカテゴリ名、
後続のアクセント句の先頭の単語の品詞情報、
後続のアクセント句の末尾の単語の品詞情報、
後続のアクセント句のアクセント型情報、
後続のアクセント句の長さの情報、
後続のアクセント句の主たる単語での品詞とは異なるカテゴリ名、
先行のアクセント句から当該アクセント句への音調結合型、
音調結合型から算出した当該のアクセント句の相対的な高さの情報）
を整形結果として出力し、基本周波数要約値推定部４００に情報を渡す。なお、アクセント句のアクセント型とは、アクセント核の位置を示す整数値であり、アクセント句の長さも整数値である。なお、アクセント核の位置や長さの測り方は常に同じ単位が用いられるのであれば、モーラを単位としても良いし、音素や音節を単位としても良い。図６は、実施例１の情報整形部２００から出力される整形された言語情報の例を示す図である。なお、各アクセント句の情報は全て用いることも、その一部を用いることも可能である。 The information shaping unit 200 shapes the information given to the explanatory variable of the regression model 300 into a predetermined format (S200). Specifically, the information shaping unit 200 determines the beginning and end of each accent phrase from the accent phrase, one or more preceding accent phrases and one or more subsequent accent phrases from the result of the text analysis unit 100. The word part-of-speech information and accent phrase accent type information, accent phrase length information, and a category name different from the part of speech of the main word are extracted. In addition, the tone combination type from the preceding accent phrase to the accent phrase and the relative height information of the accent phrase calculated from the tone combination type are extracted. For example, when retrieving information of the accent phrase, one preceding accent phrase, and one succeeding accent phrase,
(Part of speech information for the first word of the preceding accent phrase,
Part of speech information at the end of the preceding accent phrase,
Accent type information of the preceding accent phrase,
Information on the length of the preceding accent phrase,
A category name different from the part of speech of the main word of the preceding accent phrase, the part of speech information of the first word of the accent phrase,
Part of speech information at the end of the accent phrase,
Accent type information of the corresponding accent phrase,
Information on the length of the accent phrase,
A category name different from the part of speech in the main word of the accent phrase,
Part-of-speech information for the first word in the following accent phrase,
Part-of-speech information for the last word of the following accent phrase,
Accent type information for subsequent accent phrases,
Information about the length of the following accent phrase,
A category name that is different from the part of speech in the main word of the following accent phrase,
Tone coupling type from the preceding accent phrase to the accent phrase,
Information on the relative height of the accent phrase calculated from the tone combination type)
Is output as a shaping result, and information is passed to the fundamental frequency summary value estimation unit 400. The accent type of the accent phrase is an integer value indicating the position of the accent nucleus, and the length of the accent phrase is also an integer value. Note that if the same unit is always used to measure the position and length of the accent nucleus, a mora may be used as a unit, and a phoneme or syllable may be used as a unit. FIG. 6 is a diagram illustrating an example of the formatted language information output from the information shaping unit 200 according to the first embodiment. Note that it is possible to use all the information of each accent phrase or a part thereof.

基本周波数要約値推定部４００は、整形された言語情報を回帰モデル３００の説明変数に設定し、回帰モデル３００の推定した従属変数の値をアクセント句ごとの基本周波数の要約値として、詳細基本周波数推定部５００に渡す（Ｓ４００）。回帰モデル３００は、カテゴリカルな情報を説明変数とし、実数値を従属変数として推定を行なうことが可能な回帰木(Regression Tree)、数量化Ｉ類などの任意のモデルである。回帰モデル３００の係数や構造は、情報整形部２００で整形された言語情報と当該の各アクセント句での基本周波数の要約値とが対となって構成されている学習用のデータを用いて計算する。なお、要約値とは、平均値、中央値、ダイナミックレンジ、最大値などである。回帰モデル３００は、平均値、中央値、ダイナミックレンジの内、１つ以上を要約値として求める。基本周波数要約値推定部４００では、要約値ごとに、それぞれ専用の回帰モデル３００を構築して個別に推定を行なう。情報整形部２００と基本周波数要約値推定部４００の処理は、アクセント句ごとに、全てのアクセント句について行われる（Ｓ２１０）。 The fundamental frequency summary value estimation unit 400 sets the formatted language information as an explanatory variable of the regression model 300, and uses the value of the dependent variable estimated by the regression model 300 as a summary value of the fundamental frequency for each accent phrase. It passes to the estimation part 500 (S400). The regression model 300 is an arbitrary model such as a regression tree or quantification type I that can be estimated using categorical information as explanatory variables and real values as dependent variables. The coefficients and structure of the regression model 300 are calculated using learning data in which the language information shaped by the information shaping unit 200 and the summary value of the fundamental frequency at each accent phrase are paired. To do. The summary value includes an average value, a median value, a dynamic range, a maximum value, and the like. The regression model 300 obtains one or more of the average value, median value, and dynamic range as summary values. The fundamental frequency summary value estimation unit 400 constructs a dedicated regression model 300 for each summary value and performs estimation individually. The processing of the information shaping unit 200 and the fundamental frequency summary value estimation unit 400 is performed for all accent phrases for each accent phrase (S210).

詳細基本周波数推定部５００は、当該のアクセント句のアクセント型から基本周波数の上昇が終わる音節位置、下降が始まる音節位置を計算し、それらの間を線形の線分を繋いだ線分の列を基本周波数の時間的変化とする（Ｓ５００）。なお、これはもっとも簡単な例であり、その他の情報や記載方法（非線形の線分を繋いだ線分の列を基本周波数の時間的変化とするなど）を用いて基本周波数の時間的変化を推定したものであってもよい。基本周波数の時間的変化の推定後に、この時間的変化の要約値が基本周波数要約値推定部４００から出力される要約値と適合するように時間的変化を上下動、拡大伸縮させる。 The detailed fundamental frequency estimation unit 500 calculates a syllable position where the fundamental frequency rises and a syllable position where the fundamental frequency begins to descend from the accent type of the accent phrase, and obtains a line segment connecting linear segments between them. The basic frequency is changed with time (S500). This is the simplest example, and the temporal change of the fundamental frequency can be determined using other information and description methods (such as a line segment connecting nonlinear line segments as a temporal change of the fundamental frequency). It may be estimated. After estimating the temporal change in the fundamental frequency, the temporal change is moved up and down, expanded and contracted so that the summary value of the temporal change matches the summary value output from the fundamental frequency summary value estimation unit 400.

図７は、実施例１の詳細基本周波数推定部において、平均値に基づいて詳細な基本周波数の推定を行っている例である。要約値がアクセント句での基本周波数の平均値または中央直である場合には、時間的変化の平均値または中央値と基本周波要約推定部４００の推定した平均値または中央値が一致するように時間的変化を上下に平行移動させる。平行移動後の時間的変化の値を詳細な基本周波数とする。また、図８は、実施例１の詳細基本周波数推定部において、ダイナミックレンジに基づいて詳細な基本周波数の推定を行っている例である。要約値がアクセント句での基本周波数のダイナミックレンジである場合には、時間的変化の最小値及び最大値と基本周波要約推定部４００の推定した最小値及び最大値が一致するように時間的変化を拡大縮小させる。拡大縮小後の時間的変化の値を詳細な基本周波数とする。 FIG. 7 is an example in which the detailed fundamental frequency estimation unit according to the first embodiment performs detailed fundamental frequency estimation based on the average value. When the summary value is the average value or median of the fundamental frequency in the accent phrase, the average value or median value of the temporal change and the average value or median value estimated by the fundamental frequency summary estimation unit 400 are matched. Translate temporal changes up and down. Let the value of the temporal change after translation be a detailed fundamental frequency. FIG. 8 is an example in which the detailed fundamental frequency estimation unit of the first embodiment performs detailed fundamental frequency estimation based on the dynamic range. When the summary value is the dynamic range of the fundamental frequency in the accent phrase, the temporal change so that the minimum value and the maximum value of the temporal change coincide with the minimum value and the maximum value estimated by the fundamental frequency summary estimation unit 400. Scale up and down. The value of the temporal change after scaling is taken as the detailed fundamental frequency.

上述のように実施例１の基本周波数推定装置は、単語に対する品詞とは異なるカテゴリ名を導入していること、及び相対的な高さの情報を用いることによって、より対話音声に近い基本周波数の推定が可能である。 As described above, the fundamental frequency estimation apparatus according to the first embodiment introduces a category name different from the part of speech for a word, and uses a relative height information, so that the fundamental frequency closer to the dialogue voice can be obtained. Estimation is possible.

［変形例］
実施例１では、テキスト解析部１００は、形態素解析手段１０１、アクセント句決定手段１０２、アクセント句読み推定手段１０３、アクセント句アクセント型推定手段１０４、アクセント句音調結合推定手段１０５、相対的高さ算出手段１０６及びカテゴリ名付与手段１０７から構成されるが、相対的高さ算出手段１０６は設けなくてもよい。以下これについて上記実施例１と異なる部分のみ説明する。 [Modification]
In the first embodiment, the text analysis unit 100 includes a morphological analysis unit 101, an accent phrase determination unit 102, an accent phrase reading estimation unit 103, an accent phrase accent type estimation unit 104, an accent phrase tone combination estimation unit 105, and a relative height calculation. Although it comprises means 106 and category name assigning means 107, the relative height calculating means 106 may not be provided. Hereinafter, only the difference from the first embodiment will be described.

テキスト解析部から出力される言語情報、及び情報整形部２００で整形された情報に相対的な高さの情報は含まれない。基本周波数要約値推定部４００は、相対的な高さの情報を含まない整形された言語情報を回帰モデル３００の説明変数に設定し、回帰モデル３００の推定した従属変数の値をアクセント句ごとの基本周波数の要約値として、詳細基本周波数推定部５００に渡す（Ｓ４００）。 Relative height information is not included in the language information output from the text analysis unit and the information shaped by the information shaping unit 200. The fundamental frequency summary value estimation unit 400 sets the formatted language information that does not include the relative height information as the explanatory variable of the regression model 300, and sets the value of the dependent variable estimated by the regression model 300 for each accent phrase. The summary value of the fundamental frequency is passed to the detailed fundamental frequency estimation unit 500 (S400).

上述のように変形例の基本周波数推定装置も、単語に対する品詞とは異なるカテゴリ名を導入していることによって、より対話音声に近い基本周波数の推定が可能である。 As described above, the fundamental frequency estimation apparatus according to the modified example can also estimate the fundamental frequency closer to the dialogue voice by introducing a category name different from the part of speech for the word.

図９は、実施例２のカテゴリ名付与手段とカテゴリ名辞書を設けない場合のテキスト解析部の構成例を示す図である。図１０は、実施例２のカテゴリ名付与手段とカテゴリ名辞書を設けない場合のテキスト解析ステップの処理の流れの例を示す図である。以下これについて上記実施例１と異なる部分のみ説明する。なお、実施例２の基本周波数推定装置の構成例を示す図として図１、実施例２の基本周波数推定方法の処理の流れの例を示す図として図２を用いて説明する。 FIG. 9 is a diagram illustrating a configuration example of the text analysis unit when the category name assigning unit and the category name dictionary according to the second embodiment are not provided. FIG. 10 is a diagram illustrating an example of the flow of processing in the text analysis step when the category name assigning unit and the category name dictionary according to the second embodiment are not provided. Hereinafter, only the difference from the first embodiment will be described. A configuration example of the fundamental frequency estimation apparatus according to the second embodiment will be described with reference to FIG. 1, and a diagram illustrating an example of a process flow of the fundamental frequency estimation method according to the second embodiment with reference to FIG. 2.

基本周波数推定装置１０は、形態素解析辞書１１０、テキスト解析部１００’、情報整形部２００、基本周波数要約値推定部４００及び詳細基本周波数推定部５００から構成される。実施例２では、図１のカテゴリ名辞書１２０は設けない。テキスト解析部１００’は、合成対象文を単語列に分割し、形態素解析辞書１１０を参照して、全単語に品詞情報、読み、アクセント型の情報を付与する。その単語列からアクセント句をまとめあげ、各アクセント句の読み、アクセント型を決定する。アクセント句間の音調結合型を決定し、各アクセント句の相対的な高さを算出する（Ｓ１００’）。テキスト解析部１００’は、形態素解析手段１０１、アクセント句決定手段１０２、アクセント句読み推定手段１０３、アクセント句アクセント型推定手段１０４、アクセント句音調結合推定手段１０５、相対的高さ算出手段１０６から構成される。カテゴリ名付与手段１０７は設けない。テキスト解析部から出力される言語情報や情報整形部２００で整形された情報に主たる単語の品詞以外のカテゴリ名の情報は含まれない。基本周波数要約値推定部４００は、主たる単語の品詞以外のカテゴリ名の情報を含まない整形された言語情報を回帰モデル３００の説明変数に設定し、回帰モデル３００の推定した従属変数の値をアクセント句ごとの基本周波数の要約値として、詳細基本周波数推定部５００に渡す（Ｓ４００）。 The fundamental frequency estimation device 10 includes a morphological analysis dictionary 110, a text analysis unit 100 ', an information shaping unit 200, a fundamental frequency summary value estimation unit 400, and a detailed fundamental frequency estimation unit 500. In the second embodiment, the category name dictionary 120 of FIG. 1 is not provided. The text analysis unit 100 ′ divides the synthesis target sentence into word strings, refers to the morphological analysis dictionary 110, and gives part of speech information, reading, and accent type information to all words. Accent phrases are collected from the word string, and the reading and accent type of each accent phrase are determined. The tone combination type between accent phrases is determined, and the relative height of each accent phrase is calculated (S100 '). The text analysis unit 100 ′ includes a morphological analysis unit 101, an accent phrase determination unit 102, an accent phrase reading estimation unit 103, an accent phrase accent type estimation unit 104, an accent phrase tone combination estimation unit 105, and a relative height calculation unit 106. Is done. The category name assigning means 107 is not provided. Information on category names other than the part of speech of the main word is not included in the linguistic information output from the text analysis unit or the information shaped by the information shaping unit 200. The basic frequency summary value estimation unit 400 sets formatted language information that does not include category name information other than the part of speech of the main word as an explanatory variable of the regression model 300, and accentuates the value of the dependent variable estimated by the regression model 300. As a summary value of the fundamental frequency for each phrase, it is passed to the detailed fundamental frequency estimation unit 500 (S400).

上述のように実施例２の基本周波数推定装置でも、相対的な高さの情報を用いることによって、より対話音声に近い基本周波数の推定が可能である。 As described above, the fundamental frequency estimation apparatus according to the second embodiment can also estimate the fundamental frequency closer to the dialog voice by using the information on the relative height.

実施例１の基本周波数推定装置の構成例を示す図である。It is a figure which shows the structural example of the fundamental frequency estimation apparatus of Example 1. FIG. 実施例１の基本周波数推定方法の処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process of the fundamental frequency estimation method of Example 1. FIG. 実施例１のカテゴリ名辞書の項目例を示す図である。It is a figure which shows the example of an item of the category name dictionary of Example 1. FIG. 実施例１のテキスト解析部の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of a text analysis unit according to the first embodiment. 実施例１のテキスト解析ステップの処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process of the text analysis step of Example 1. FIG. 実施例１の情報整形部から出力される整形された言語情報の例を示す図である。アクセント型やアクセント句の長さの定義できない文先頭（図ではＳＳ）や文末やポーズ（図ではＰＰ）では、非該当の記号として「−」を付与している。It is a figure which shows the example of the formatted language information output from the information shaping part of Example 1. FIG. At the beginning of a sentence (SS in the figure) or the end of a sentence or pause (PP in the figure) where the accent type or accent phrase length cannot be defined, “-” is given as a non-applicable symbol. 実施例１の詳細基本周波数推定部において、平均値に基づいて詳細な基本周波数の推定を行っている例である。In the detailed fundamental frequency estimation unit of the first embodiment, detailed fundamental frequency estimation is performed based on an average value. 実施例１の詳細基本周波数推定部において、ダイナミックレンジに基づいて詳細な基本周波数の推定を行っている例である。This is an example in which a detailed fundamental frequency estimation unit according to the first embodiment performs detailed fundamental frequency estimation based on a dynamic range. 実施例２のカテゴリ名付与手段とカテゴリ名辞書を設けない場合のテキスト解析部の構成例を示す図である。It is a figure which shows the structural example of the text analysis part in the case of not providing the category name provision means and category name dictionary of Example 2. 実施例２のカテゴリ名付与手段とカテゴリ名辞書を設けない場合のテキスト解析ステップの処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process of the text analysis step in the case of not providing the category name provision means and category name dictionary of Example 2.

Explanation of symbols

１０基本周波数推定装置
１００テキスト解析部Ｓ１００テキスト解析ステップ
１１０形態素解析辞書
１２０カテゴリ名辞書
２００情報整形部Ｓ２００情報整形ステップ
３００回帰モデル
４００基本周波数要約値推定部Ｓ４００基本周波数要約推定ステップ
５００詳細基本周波数推定部Ｓ５００詳細基本周波数推定ステップ
１０１形態素解析手段Ｓ１０１形態素解析サブステップ
１０２アクセント句決定手段Ｓ１０２アクセント句決定サブステップ
１０３アクセント句読み推定手段
Ｓ１０３アクセント句読み推定サブステップ
１０４アクセント句アクセント型推定手段
Ｓ１０４アクセント句アクセント型推定サブステップ
１０５アクセント句間音調結合型推定手段
Ｓ１０５アクセント句間音調結合型推定サブステップ
１０６相対的高さ算出手段
Ｓ１０６相対的高さ算出サブステップ
１０７カテゴリ名付与手段
Ｓ１０７カテゴリ名付与サブステップ DESCRIPTION OF SYMBOLS 10 Fundamental frequency estimation apparatus 100 Text analysis part S100 Text analysis step 110 Morphological analysis dictionary 120 Category name dictionary 200 Information shaping part S200 Information shaping step 300 Regression model 400 Fundamental frequency summary value estimation part S400 Fundamental frequency summary estimation step 500 Detailed fundamental frequency estimation Part S500 Detailed fundamental frequency estimation step 101 Morphological analysis means S101 Morphological analysis sub-step 102 Accent phrase determination means S102 Accent phrase determination sub-step 103 Accent phrase reading estimation means S103 Accent phrase reading estimation sub-step 104 Accent phrase accent type estimation means S104 Accent phrase Accent type estimation sub-step 105 Accent inter-phrase tone combination type estimation means S105 Accent type inter-phrase tone combination type estimation sub-step 106 Relative height calculation means S106 Relative height calculation substep 107 Category name assignment means S107 Category name assignment substep

Claims

A morphological analysis dictionary that records parts of speech, readings, and accent types for words;
A category name dictionary that records a category name different from the part of speech for the word,
Parses the input sentence, and the part of speech of the beginning and end of the accent phrase, the accent type of the accent phrase, the reading of the accent phrase, the length of the accent phrase, the category name different from the part of speech in the main word of the accent phrase, A text analysis unit that outputs tone-linked information between accent phrases as linguistic information;
An information shaping unit for shaping the language information into a predetermined format;
A fundamental frequency summary value estimator that obtains a fundamental frequency summary value for each accent phrase from the formatted language information using a regression model;
A detailed fundamental frequency estimating unit that adapts a temporal change of the fundamental frequency obtained from the language information to the fundamental frequency summary value and outputs the fundamental frequency; and
The text analysis unit
Morphological analysis means for decomposing an inputted sentence for each word to generate a word string, referring to the morphological analysis dictionary, each part of speech, reading, and obtaining an accent type of the word alone;
An accent phrase determining means for collecting the word string into an accent phrase composed of one or more words serving as an accent unit;
An accent phrase reading estimation means for estimating the reading of each accent phrase and estimating the length of the accent phrase;
An accent phrase accent type estimating means for estimating an accent type for each accent phrase;
An interaccent phrase concatenation type estimation means for estimating a tone combination type between adjacent accent phrases;
For each accent phrase, category name giving means for giving a category name obtained by searching the category name dictionary to the main word in the accent phrase using only the surface layer of the word or a combination of the word surface layer and the part of speech as a key When,
A fundamental frequency estimation apparatus comprising:

The fundamental frequency estimation device according to claim 1,
The language information includes information on the relative height of accent phrases,
The text analysis unit
Relative height calculation means for calculating the relative height of each accent phrase using the tone combination type information,
A fundamental frequency estimation apparatus comprising:

A morphological analysis dictionary that records parts of speech, readings, and accent types for words;
Parses the input sentence, and the part of speech of the beginning and end of the accent phrase, accent phrase accent type, accent phrase reading, accent phrase length, tone combination between accent phrases, and accent phrase relative A text analysis unit that outputs height information as language information;
An information shaping unit for shaping the language information into a predetermined format;
A fundamental frequency summary value estimator that obtains a fundamental frequency summary value for each accent phrase from the formatted language information using a regression model;
A detailed fundamental frequency estimating unit that adapts a temporal change of the fundamental frequency obtained from the language information to the fundamental frequency summary value and outputs the fundamental frequency; and
The text analysis unit
Morphological analysis means for decomposing an inputted sentence for each word to generate a word string, referring to the morphological analysis dictionary, each part of speech, reading, and obtaining an accent type of the word alone;
An accent phrase determining means for collecting the word string into an accent phrase composed of one or more words serving as an accent unit;
An accent phrase reading estimation means for estimating the reading of each accent phrase and estimating the length of the accent phrase;
An accent phrase accent type estimating means for estimating an accent type for each accent phrase;
An interaccent phrase concatenation type estimation means for estimating a tone combination type between adjacent accent phrases;
A relative height calculating means for calculating a relative height of each accent phrase using the tone-binding type information;
A fundamental frequency estimation apparatus comprising:

The fundamental frequency estimation device according to claim 1 or 2,
The category name dictionary is
A fundamental frequency estimation device that records category names of subcategories of parts of speech for words.

The fundamental frequency estimation device according to claim 1 or 2,
The category name dictionary is
A fundamental frequency estimation device, wherein category names composed of meanings and intensities of words are recorded.

A morphological analysis dictionary that records parts of speech, readings, and accent types for words;
A category name dictionary that records a category name different from the part of speech for the word,
A fundamental frequency estimation method for estimating a fundamental frequency using
The text analysis unit analyzes the input sentence, and the part of speech of the beginning and end of the accent phrase, the accent type of the accent phrase, the reading of the accent phrase, the length of the accent phrase, the part of speech in the main word of the accent phrase Is a text analysis step for outputting information on the combination of tones between different category names and accent phrases as linguistic information,
An information shaping step in which the information shaping unit shapes the language information into a predetermined format;
A fundamental frequency summary value estimation unit for obtaining a fundamental frequency summary value for each accent phrase from the formatted language information using a regression model;
A detailed fundamental frequency estimating unit adapted to adapt a temporal change of the fundamental frequency obtained from the language information to the fundamental frequency summary value and output a fundamental frequency; and
The text analysis step includes:
Decomposing the input sentence for each word to generate a word string, referring to the morpheme analysis dictionary, each part of speech, reading, morpheme analysis substep to obtain the accent type of the word alone,
An accent phrase determination sub-step for collecting the word string into an accent phrase composed of one or more words as an accent unit;
An accent phrase reading estimation sub-step that estimates the reading of each accent phrase and estimates the length of the accent phrase;
An accent phrase accent type estimation substep for estimating an accent type for each accent phrase;
An interaccent phrase combination estimation sub-step for estimating a tone combination between adjacent accent phrases;
For each accent phrase, a category name giving sub that gives a category name obtained by searching the category name dictionary to the main word in the accent phrase using only the surface layer of the word or a combination of the word surface layer and the part of speech as a key Steps,
A fundamental frequency estimation method comprising:

The fundamental frequency estimation method according to claim 6, wherein
The language information includes information on the relative height of accent phrases,
The text analysis step includes:
A relative height calculation sub-step for calculating a relative height of each accent phrase using the tone combination type information,
A fundamental frequency estimation method characterized by comprising:

A morphological analysis dictionary that records parts of speech, readings, and accent types for words,
A fundamental frequency estimation method for estimating a fundamental frequency using
The text analysis unit analyzes the input sentence, and the part of speech of the beginning and end of the accent phrase, the accent type of the accent phrase, the reading of the accent phrase, the length of the accent phrase, the tone combination type between the accent phrases and the accent A text parsing step that outputs the relative height information of the phrase as linguistic information;
An information shaping step in which the information shaping unit shapes the language information into a predetermined format;
A fundamental frequency summary value estimation unit for obtaining a fundamental frequency summary value for each accent phrase from the formatted language information using a regression model;
A detailed fundamental frequency estimation step for adapting a temporal change of the fundamental frequency obtained from the language information to the fundamental frequency summary value and outputting a fundamental frequency; and
The text analysis step includes:
Decomposing the input sentence for each word to generate a word string, referring to the morpheme analysis dictionary, each part of speech, reading, morpheme analysis substep to obtain the accent type of the word alone,
An accent phrase determination sub-step for collecting the word string into an accent phrase composed of one or more words as an accent unit;
An accent phrase reading estimation sub-step that estimates the reading of each accent phrase and estimates the length of the accent phrase;
An accent phrase accent type estimation substep for estimating an accent type for each accent phrase;
An interaccent phrase combination estimation sub-step for estimating a tone combination between adjacent accent phrases;
A relative height calculation sub-step for calculating a relative height of each accent phrase using the tone combination type information;
A fundamental frequency estimation method comprising:

A fundamental frequency estimation method according to claim 6 or 7,
The category name dictionary is
A fundamental frequency estimation method characterized by recording a category name of a subcategory of parts of speech for a word.

A fundamental frequency estimation method according to claim 6 or 7,
The category name dictionary is
A fundamental frequency estimation method characterized in that a category name composed of the meaning and strength of a word is recorded.

6. A fundamental frequency estimation program for causing a computer to operate as the fundamental frequency estimation apparatus according to claim 1.

A computer-readable storage medium storing the fundamental frequency estimation program according to claim 11.