JP4787686B2

JP4787686B2 - TEXT SELECTION DEVICE, ITS METHOD, ITS PROGRAM, AND RECORDING MEDIUM

Info

Publication number: JP4787686B2
Application number: JP2006169352A
Authority: JP
Inventors: 秀之水野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-06-19
Filing date: 2006-06-19
Publication date: 2011-10-05
Anticipated expiration: 2026-06-19
Also published as: JP2007334264A

Abstract

<P>PROBLEM TO BE SOLVED: To select a candidate text to additionally be collected in an existent speech database storage section for synthesis so as to improve a synthesized speech. <P>SOLUTION: Candidate texts including keywords in a keyword list storage section 2 are retrieved from a candidate text database storage section 4. Here, the database storage section 4 is stored with a large amount of texts in advance and the storage section 2 is stored with important keywords in advance. Sets of components are generated from the retrieved candidate texts, hierarchical coverages are computed by referring to a frequency distribution table by the sets of components, and importances are generated from the hierarchical coverages, and a combination of candidate texts which has the highest importance, includes all the keywords, and is least in amount of text data is selected by greedy algorithm, dynamic programming, etc. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、日本語やその他の言語の音声合成において必要となる音声データベース記憶部が既に存在した状態において、新たに追加で格納する音声データを収録する際に、発声者が読み上げる元となる収録する候補となるテキストの選択に用いられると同時に、音声に変換したいテキストが格納されたテキストデータベース記憶部から収録する候補となるテキストの組を選択する装置、その方法、そのプログラム、及びその記録媒体に関する。 In the present invention, in the state where a speech database storage unit required for speech synthesis in Japanese and other languages already exists, when recording additional speech data to be stored, the recording from which the speaker reads aloud A device for selecting a set of candidate texts to be recorded from a text database storage unit in which texts to be converted into speech are stored at the same time, the method, the program, and the recording medium About.

従来の音声合成技術において、近年では大容量な記憶装置の使用費用の低下と電子計算機の計算能力の向上に伴って、数十分から数時間に及ぶ発声した音声データをそのまま大容量の記憶装置に記憶しておき、入力された音声合成すべきテキスト及び韻律情報に応じて、前記記憶装置に記憶された音声データから音声素片を適切に選択し、そのまま接続するか又は韻律情報に応じてそれらを変形して接続することで高品質な音声を合成する音声合成方法が特許文献１、非特許文献１等に記されている。
しかしながら、いかに大容量の記憶装置に数十時間に及び音声データを記憶することが可能になったとしても、音声合成の素となる音声データベース記憶部を作成するためには、音声を録音し、かつその音声を音声素片として、利用できるようにセグメンテーション作業を行うなどして、音声合成用データベース記憶部を構築する必要がある。つまり、現実的には、上記の作業に伴う時間的、費用的なコストから現実的に集めることが可能な音声の量は決まってくるため、高品質な合成音声を実現可能な音声合成用データベース記憶部をいかに短期間・小コストで構築するかというのは、大きな課題であった。そのため、音声合成用データベース記憶部を自動的に構築する上で、重要なセグメンテーション作業を自動化するための技術が提案されてきており、音声合成用データさえあれば、音声合成用データベース記憶部を効率的に構築可能となることが、非特許文献２等に記されている。 In the conventional speech synthesis technology, in recent years, as the cost of using a large-capacity storage device decreases and the calculation capacity of an electronic computer improves, speech data uttered over several tens of minutes to several hours can be stored as it is. In accordance with the input text to be synthesized and the prosody information, the speech unit is appropriately selected from the speech data stored in the storage device and connected as it is or according to the prosody information A speech synthesis method for synthesizing high-quality speech by deforming and connecting them is described in Patent Document 1, Non-Patent Document 1, and the like.
However, even if it becomes possible to store voice data for tens of hours in a large-capacity storage device, in order to create a voice database storage unit that is a source of voice synthesis, voice is recorded, In addition, it is necessary to construct a database storage unit for speech synthesis by performing a segmentation operation so that the speech can be used as a speech unit. In other words, in reality, the amount of speech that can be collected realistically is determined from the time and cost costs associated with the above work, so a speech synthesis database that can realize high-quality synthesized speech. How to build a storage unit in a short period of time and at a low cost was a big issue. For this reason, technologies for automating important segmentation tasks have been proposed for automatically constructing a speech synthesis database storage unit. If there is only data for speech synthesis, the database storage unit for speech synthesis is efficient. It is described in Non-Patent Document 2 and the like that it can be constructed automatically.

また、音声データを集める方法としては、入力テキストを音声合成する際に、使用すべき音声素片が収録されている確率が音響的に見て最大となるように、音声合成用データベース記憶部を設計する方法が非特許文献３等に記されている。また、合成処理による劣化を避けるため同一内容の発声を韻律的に多重化する方法が非特許文献４等に記されている。
一方前記収録用テキストを作る方法として、確実に使用された言語表現を蓄積・整理した言語データを発生した文コーパスに含まれる素片単位（例えば連鎖音韻）の出現回数表あるいは、出現率を作成し、文コーパス中の各文に含まれる単位の出現回数あるいは出現率の累計値を選択基準スコアとして、スコアの高い文を文コーパスから逐次選択することにより、収録用テキストの集合を生成する方法が非特許文献５及び非特許文献６等に記されている。
特許２７６１５５２号公報 “Ｃｈｏｏｓｅｔｈｅｂｅｓｔｔｏｍｏｄｉｆｙｔｈｅｌｅａｓｔ：Ａｎｅｗｇｅｎｅｒａｔｉｏｎｃｏｎｃａｔｅｎａｔｉｖｅｓｙｎｔｈｅｓｉｓ”，Ｐｒｏｃ．Ｅｕｒｏｓｐｅｅｃｈ’９９，１９９９，ｐｐ．２２９１−２２９４ “ＨＭＭエキスパートシステムの手法を用いた音素ラベリングワークベンチ” 信学技法、ＳＰ９２−１３２、１９９３Ｃｈｕ，Ｍ．，Ｙａｎｇ，Ｈ．ａｎｄＣｈａｎｇ，Ｅ．，“ＳｅｌｅｃｔｉｎｇＮｏｎ−ｕｎｉｆｏｒｏｍＵｎｉｔｓＦｒｏｍａＶｅｒｙＬａｒｇｅＣｏｒｐｕｓｆｏｒＣｏｎｃａｔｅｎａｔｉｖｅＳｐｅｅｃｈＳｙｎｔｈｅｓｉｚｅｒ”，ＩＣＡＳＳＰ２００１，Ｖｏｌ．２，ＳＰＥＥＣＨ−Ｌ２．２，２００１．枡田他、“韻律的に多重なデータベースの設計と評価”、音響学会講演論文集、ｐｐ．２８９−２９０、２００２−３ＪａｎＰ．Ｈ．ｖａｎＳａｎｔｅｎ，“Ｄｉａｇｎｏｓｔｉｃｐｅｒｃｅｐｔｕａｌｅｘｐｅｒｉｍｅｎｔｓｆｏｒｔｅｘｔ−ｔｏ−ｓｐｅｅｃｈｓｙｓｔｅｍｅｖａｌｎａｔｉｏｎ”，Ｐｒｏｃ．ＩＣＳＬＰ９２，ｐｐ５５５−５５８，１９９２Ｍ．Ｉｓｏｇａｉ，Ｈ．Ｍｉｚｕｎｏ，Ｋ．Ｍａｎｏ，“ＲｅｃｏｒｄｉｎｇＳｃｒｉｐｔＤｅｓｉｇｎｆｏｒＣｏｒｐｕｓ−ＢａｓｅｄＴＴＳＳｙｓｔｅｍＢａｓｅｄｏｎＣｏｖｅｒａｇｅｏｆＶａｒｉｏｕｓＰｈｏｎｅｔｉｃＥｌｅｍｅｎｔｓ”，ＰｒｏｃＩＣＡＳＳＰ２００５，ｖｏｌ．１，ｐｐ．３０１−３０４，２００５ Also, as a method of collecting speech data, when synthesizing the input text, the speech synthesis database storage unit is set so that the probability that the speech unit to be used is recorded is maximized in terms of acoustics. A design method is described in Non-Patent Document 3 and the like. Also, Non-Patent Document 4 and the like describe a method of prosody multiplexing of utterances of the same content in order to avoid deterioration due to synthesis processing.
On the other hand, as a method for creating the text for recording, an appearance frequency table or an appearance rate of a unit of unit (for example, chain phoneme) included in a sentence corpus that generates linguistic data obtained by accumulating and organizing linguistic expressions used reliably is created. A method for generating a set of recording texts by sequentially selecting sentences with high scores from the sentence corpus using the total number of occurrences or the rate of occurrence of units included in each sentence in the sentence corpus as a selection criterion score Are described in Non-Patent Document 5, Non-Patent Document 6, and the like.
Japanese Patent No. 2761552 “Choose the best to modi ﬁ cation the least: A new generation concatenative synthesis”, Proc. Eurospeech '99, 1999, pp. 2291-2294 “Phoneme labeling workbench using the HMM expert system” IEICE Technical, SP92-132, 1993 Chu, M .; Yang, H .; and Chang, E.A. , “Selecting Non-Uniform Units From a Very Large Corpus for Concatenative Speech Synthesizer”, ICASSP 2001, Vol. 2, SPEECH-L2.2, 2001. Hirota et al., “Design and Evaluation of Prosodic Multiple Databases”, Proc. 289-290, 2002-3 Jan P. H. van Santen, “Diagnostic perceptual experiments for text-to-speech system validation”, Proc. ICSLP92, pp555-558, 1992 M.M. Isogai, H .; Mizuno, K.M. Mano, “Recording Script Design for Corpus-Based TTS System Based on Coverage of Variant Phonetic Elements”, Proc ICASP2005, vol. 1, pp. 301-304, 2005

重要なキーワードを含む文書等を音声合成した場合に合成音声の品質に問題があったり、特定分野のテキストに対して特に高い合成音声の品質を求めたりする場合など、既に、存在する音声合成用データベース記憶部に対し追加で、音声を収録することで、合成音声の品質を向上させたい場合などが存在する。
しかし、従来の技術では、収録用テキストは予め決められた形態素や、３連鎖音韻などのある一定単位のみを基準にした選択を行っているため、既存の合成用データベース記憶部を用いて音声合成した場合の合成音声の品質上問題があった単語や文節などのキーワードについて収録対象とするかどうかは考慮しておらず、合成音声の品質を向上させるようなキーワードを収録できる保証がない。従って追加で、収録を行ったとしても、音声合成の品質を向上させるようなキーワードの収録をする保証がなかった。 For speech synthesis that already exists, such as when there is a problem with the quality of synthesized speech when a document containing important keywords is synthesized, or when a particularly high quality of synthesized speech is required for text in a specific field. There are cases where it is desired to improve the quality of synthesized speech by recording speech in addition to the database storage unit.
However, in the conventional technology, since the text to be recorded is selected based on a predetermined unit such as a predetermined morpheme or 3-chain phoneme, speech synthesis is performed using an existing synthesis database storage unit. In this case, it is not considered whether or not keywords such as words and phrases that have a problem in the quality of the synthesized speech are to be recorded, and there is no guarantee that keywords that improve the quality of the synthesized speech can be recorded. Therefore, even if additional recording was performed, there was no guarantee that keywords would be recorded that would improve the quality of speech synthesis.

大量の候補テキストがデジタルデータとして、格納された候補テキストデータベース記憶部と、音声合成において重要なキーワードが予め格納されているキーワードリスト記憶部とを備え、上記キーワードリスト記憶部中のキーワードを含む候補テキストを上記候補テキストデータベース記憶部から検索すると共に、その候補テキスト中に含まれるキーワードの数を計数し、上記検索された候補テキスト中から、上記キーワードリスト記憶部中の全てのキーワードを全体で含む候補テキストの組み合わせを選択し、上記選択された候補テキストの組み合わせを出力する。 A candidate text database storage unit that stores a large amount of candidate text as digital data, and a keyword list storage unit in which important keywords in speech synthesis are stored in advance, and a candidate that includes a keyword in the keyword list storage unit The text is searched from the candidate text database storage unit, the number of keywords included in the candidate text is counted, and all keywords in the keyword list storage unit are included from the searched candidate texts. A candidate text combination is selected, and the selected candidate text combination is output.

以上の構成によれば、既に存在する合成音声用のデータベース記憶部に対し、追加で、音声を収録することで、ある特定分野の合成音声の品質を向上させたい場合、この特定分野について重要なキーワードが予め格納されているキーワードリスト記憶部を設けて、このキーワードリスト記憶部中の全てのキーワードを含むようなテキストの組み合わせを、合成音声用のデータベース記憶部に収録する候補となる候補テキストとして選択することができる。このため、重要なキーワードの収録漏れが無く合成音声の品質保証が可能となる。 According to the above configuration, in the case where it is desired to improve the quality of synthesized speech in a specific field by additionally recording the voice to the already existing synthesized speech database storage unit, it is important for this specific field. A keyword list storage unit in which keywords are stored in advance is provided, and a combination of texts including all keywords in the keyword list storage unit is selected as candidate text to be recorded in the database storage unit for synthesized speech. You can choose. For this reason, there is no omission of important keywords, and the quality of synthesized speech can be guaranteed.

実施例１
図１にこの発明の実施例１の機能構成を示し、処理の流れを図２に示す。
キーワードリスト記憶部２には音声合成において重要なキーワードが予め格納されている。候補テキストデータベース記憶部４には、例えば大量の日本語テキストがデジタルデータとして格納されている。ここで例えば候補テキストデータベース記憶部には読み上げ原稿としてのテキストや音声合成の用途として想定されているタスクに関係するテキスト等音声出力の必要が高そうなテキストを格納してもよいし、または、キーワードリスト記憶部２に格納されたキーワードを少なくとも１つ含む候補テキストを多数集めて候補テキストデータベース記憶部２に格納してもよい。
また候補テキストデータベース記憶部４は学習用データベース記憶部でも問題はない。
キーワード計数部６は、キーワードリスト記憶部２に格納されているキーワードを含む候補テキストを候補テキストデータベース記憶部４から検索する（ステップＳ１０１）。そして、その検索された候補テキストのキーワードの数の計数も、キーワード計数部６は行う（ステップＳ１０２）。この計数は重複せずに行う。 Example 1
FIG. 1 shows a functional configuration of Embodiment 1 of the present invention, and FIG. 2 shows a processing flow.
The keyword list storage unit 2 stores important keywords in speech synthesis in advance. In the candidate text database storage unit 4, for example, a large amount of Japanese text is stored as digital data. Here, for example, the candidate text database storage unit may store text as a read-out manuscript, text related to a task assumed to be used for speech synthesis, text that is likely to need voice output, or A large number of candidate texts including at least one keyword stored in the keyword list storage unit 2 may be collected and stored in the candidate text database storage unit 2.
The candidate text database storage unit 4 may be a learning database storage unit.
The keyword counting unit 6 searches the candidate text database storage unit 4 for candidate texts containing the keywords stored in the keyword list storage unit 2 (step S101). The keyword counting unit 6 also counts the number of keywords in the retrieved candidate text (step S102). This counting is performed without duplication.

図３にキーワードリスト記憶部２の格納内容の具体例を示す。この場合、キーワードリスト記憶部２には「日本電信電話」「株式会社」「佐藤花子」「山田太郎」「営業」等が格納されている。そして候補テキストの１つが「日本電信電話株式会社の営業担当山田太郎より営業窓口までお問い合わせがありました。」である場合、「日本電信電話」「株式会社」「営業」「山田太郎」「営業」と５個のキーワードが出現するが、３個目と５個目のキーワードが同一の「営業」であるため、この候補テキストキーワードの数は４個と計数される。
キーワード計数部６で検索された候補テキストのテキスト番号とキーワードの数はキーワード計数記憶部５に記憶される。
テキスト選択部８は、検索された候補テキスト中からキーワードリスト記憶部中の全てのキーワードを全体で含む候補テキストの組み合わせを選択する。以下に、テキスト選択部８で候補テキストの組み合わせを選択する方法の一例を示す。 FIG. 3 shows a specific example of the contents stored in the keyword list storage unit 2. In this case, the keyword list storage unit 2 stores “Nippon Telegraph and Telephone”, “Co., Ltd.”, “Hanako Sato”, “Taro Yamada”, “Sales”, and the like. If one of the candidate texts is “Taro Yamada, sales representative of Nippon Telegraph and Telephone Corporation, contacted the sales office”, “Nippon Telegraph and Telephone” “Co.” “Sales” “Taro Yamada” “Sales” ”And five keywords appear, but since the third and fifth keywords are the same“ business ”, the number of candidate text keywords is counted as four.
The text number of the candidate text retrieved by the keyword counting unit 6 and the number of keywords are stored in the keyword counting storage unit 5.
The text selection unit 8 selects a combination of candidate texts including all the keywords in the keyword list storage unit from the retrieved candidate texts. Hereinafter, an example of a method for selecting a combination of candidate texts by the text selection unit 8 will be described.

まず、テキスト選択部８はキーワード計数記憶部５内の最も多いキーワード数を含む候補テキストのテキスト番号を選択し、当該テキスト番号と対応する候補テキストを候補テキストデータベース記憶部４から取り出す（ステップＳ１０３）。取り出された候補テキストはテキスト選択部８中の候補テキスト記憶部７に記憶される（ステップＳ１０４）。なお、キーワード計数記憶部５には検索候補テキストと、この候補テキストのキーワードの数を記憶するようにしても良い。この場合は、テキスト選択部８による候補テキストデータベース記憶部４からの検索候補テキストの取り出しは行わないで済む。以下の説明実施例でも同様である。
一度選んだ候補テキストに含まれるキーワードをキーワードリスト記憶部２からテキスト選択部８は除外する（ステップＳ１０６）。そして、テキスト選択部８中の制御部９はキーワードリスト記憶部２のキーワードが空になるまで（ステップＳ１０８）、キーワードの除外、候補テキストを検索、キーワードの計数、最多キーワードの検索候補テキストの選択、その候補テキスト記憶部７への記憶を繰り返し行う。
キーワードリスト記憶部２中のキーワードが空になると、候補テキスト記憶部７中の候補テキストが出力部１０より候補テキストの組み合わせとして、出力される（ステップＳ１０）。上述した繰り返し制御による候補テキストの組の選択は、候補テキストの組み合わせのデータ量を減少させる効果がある。なお、テキスト選択部８の処理方法はこれに限られるものではない。上述では、重要なキーワードを含む候補テキストを検索したが各候補テキストに含まれる重要なキーワードの数を計数してもよい。更に、同一の候補テキストの同一の重要なキーワードは複数個あっても１個と計数したが、重複して計数してもよい。 First, the text selection unit 8 selects a text number of a candidate text including the largest number of keywords in the keyword count storage unit 5, and takes out the candidate text corresponding to the text number from the candidate text database storage unit 4 (step S103). . The extracted candidate text is stored in the candidate text storage unit 7 in the text selection unit 8 (step S104). The keyword count storage unit 5 may store search candidate text and the number of keywords of the candidate text. In this case, it is not necessary to retrieve the search candidate text from the candidate text database storage unit 4 by the text selection unit 8. The same applies to the following explanation embodiments.
The text selection unit 8 excludes keywords included in the selected candidate text from the keyword list storage unit 2 (step S106). The control unit 9 in the text selection unit 8 then excludes keywords, searches for candidate text, counts keywords, and selects search candidate text for the most keywords until the keyword in the keyword list storage unit 2 is empty (step S108). The storage in the candidate text storage unit 7 is repeated.
When the keyword in the keyword list storage unit 2 becomes empty, the candidate text in the candidate text storage unit 7 is output from the output unit 10 as a combination of candidate texts (step S10). Selection of a candidate text set by the above-described repetitive control has an effect of reducing the data amount of the candidate text combination. In addition, the processing method of the text selection part 8 is not restricted to this. In the above description, candidate texts including important keywords are searched. However, the number of important keywords included in each candidate text may be counted. Furthermore, although there are a plurality of the same important keywords in the same candidate text, they are counted as one, but they may be counted in duplicate.

実施例２
次に、図４に実施例２の機能構成を示し、処理の流れを図５に示す。実施例２において、テキスト選択部８は実施例１で説明した候補テキスト記憶部７、制御部９の他に重要度生成部１２と、重要度記憶部１４と、最大重要度組選択部１６と、構成要素生成部１８と、で構成される。なお同一の処理を行う部分については同一の符号を付ける。また、以下の実施例２から実施例５において、同一機能構成部分には、同一参照番号を付け、重複説明は省略する。
キーワード計数部６でキーワードを含む候補テキストを候補テキストデータベース記憶部４から検索し、検索された候補テキストのテキスト番号が検索キーワード計数記憶部５に記憶される。テキスト選択部８中の構成要素生成部１８は検索キーワード計数記憶部５より順次テキスト番号を取り出し、そのテキスト番号と対応する候補テキストを候補テキストデータベース記憶部４から取り出す（ステップＳ２）。
構成要素生成部１８は、例えば公知の技術である形態素解析・読み付与を行い（ステップＳ４）、単語境界の決定、単語の品詞の付与、単語の読みを決定する。また、この単語の読みに対応する音節・音素系列に変換する（ステップＳ６）。解析された形態素、音節・音素を用いて、検索された候補テキストに対し、構成要素生成部１８は、音声言語の音響的及び／または言語的な階層構造における少なくとも１つの層に基づいた分析により、上記検索された候補テキストのそれぞれについて、１つ以上の構成要素の集合を生成する（ステップＳ８）。 Example 2
Next, FIG. 4 shows a functional configuration of the second embodiment, and FIG. 5 shows a processing flow. In the second embodiment, the text selection unit 8 includes the candidate text storage unit 7 and the control unit 9 described in the first embodiment, an importance generation unit 12, an importance storage unit 14, and a maximum importance set selection unit 16. , And a component generation unit 18. Note that the same reference numerals are given to portions that perform the same processing. In the following second to fifth embodiments, the same reference numerals are assigned to the same functional components, and the duplicate description is omitted.
The keyword counting unit 6 searches for candidate texts including keywords from the candidate text database storage unit 4, and the text numbers of the searched candidate texts are stored in the search keyword count storage unit 5. The component generation unit 18 in the text selection unit 8 sequentially extracts text numbers from the search keyword count storage unit 5 and extracts candidate texts corresponding to the text numbers from the candidate text database storage unit 4 (step S2).
The component generation unit 18 performs, for example, morphological analysis / reading, which is a known technique (step S4), and determines word boundaries, word part of speech, and word reading. Further, the syllable / phoneme series corresponding to the reading of the word is converted (step S6). With respect to the candidate text searched using the analyzed morpheme, syllable / phoneme, the component generation unit 18 performs analysis based on at least one layer in the acoustic and / or linguistic hierarchical structure of the spoken language. A set of one or more components is generated for each of the retrieved candidate texts (step S8).

具体的には、例えば、候補テキストデータベース記憶部４中の１つの候補テキストが「これはきれいな花です」という文章であったとすると、言語的階層については、まず上記形態素解析により、形態素を生成し、この場合の形態素は「これ」「は」「きれいな」「花」「です」となる。次に隣り合う２つの形態素を組み合わせたものを、１つの連鎖形態素と定義すると、連鎖形態素は「これは」「はきれいな」「きれいな花」「花です」となる。次に隣り合う３つの形態素を組み合わせたものを１つの３連鎖形態素と定義すると、「これはきれいな」「はきれいな花」「きれいな花です」となる。また形態素から主語・述語を生成し、「これは」「きれいな花です」となる。
一方、音響的階層については、上記音韻系列変換により、音素としては、「Ｋ」「Ｏ」「Ｒ」「Ｅ」「Ｗ」「Ａ」「Ｋ」「Ｉ」「Ｒ」「Ｅ」「Ｉ」「Ｎ」「Ａ」「Ｈ」「Ａ」「Ｎ」「Ａ」「Ｄ」「Ｅ」「Ｓ」「Ｕ」が生成され、続いて音節としては、「コ」「レ」「ワ」「キ」「レ」「イ」「ナ」「ハ」「ナ」「デ」「ス」が生成される。次に隣り合う２つの音節を連鎖音節として、「コレ」「レワ」「ワキ」「キレ」「レイ」「イナ」「ナハ」「ハナ」「ナデ」「デス」が生成され、次に隣り合う３つの音節を３連鎖音節として、「コレワ」「レワキ」「ワキレ」「キレイ」「レイナ」「イナハ」「イナハ」「ナハナ」「ハナデ」「ナデス」が生成される。ここでは生成されたそれぞれが例えば、図６に示すような階層構造として構成要素記憶部２０に記憶される。図６に示すように、構成要素生成部１８で生成されたそれぞれを構成要素と定義し、形態素、連鎖形態素、３連鎖形態素、主語・述語、音素、音節、連鎖音節、３連鎖音節のそれぞれを層と定義し、ある層についての構成要素の集合を構成要素集合と定義する。なお、層について、これらは例示的に列挙されたものであり、これらに限られるものではない。また、言語的階層、音響的階層のうちの少なくとも１つ考慮すればよく、層についてはこれらのうちの少なくとも１つ考慮すればよい。また階層構造の生成は人手によってでも行うことが出来る。 Specifically, for example, if one candidate text in the candidate text database storage unit 4 is a sentence “This is a beautiful flower”, a morpheme is first generated by the above morphological analysis for the linguistic hierarchy. In this case, the morphemes are “this” “ha” “beautiful” “flower” “is”. Next, if a combination of two adjacent morphemes is defined as one chain morpheme, the chain morpheme is “this is” “is beautiful” “beautiful flower” “is a flower”. Next, if a combination of three adjacent morphemes is defined as one three-chain morpheme, it is “this is beautiful”, “is a beautiful flower”, and “is a beautiful flower”. In addition, the subject and predicate are generated from the morpheme, and this is “This is a beautiful flower”.
On the other hand, with respect to the acoustic hierarchy, the phoneme series conversion results in “K” “O” “R” “E” “W” “A” “K” “I” “R” “E” “I”. “N” “A” “H” “A” “N” “A” “D” “E” “S” “U” are generated, followed by “K” “R” “W” as syllables. “Ki” “Le” “I” “Na” “Ha” “Na” “De” “Su” are generated. Next, using the next two syllables as a chain syllable, “Kore” “Rewa” “Waki” “Kire” “Rei” “Ina” “Naha” “Hana” “Nade” “Death” is generated, and next Three syllables are made into three chain syllables, and “Colewa”, “Rewaki”, “Wakire”, “Beautiful”, “Reina”, “Inaha”, “Inaha”, “Nahana”, “Hanade”, “Nades” are generated. Here, each of the generated items is stored in the component storage unit 20 as a hierarchical structure as shown in FIG. As shown in FIG. 6, each generated by the component generation unit 18 is defined as a component, and each of the morpheme, chain morpheme, three chain morpheme, subject / predicate, phoneme, syllable, chain syllable, and three chain syllable is defined. A layer is defined, and a set of components for a certain layer is defined as a component set. In addition, about a layer, these are enumerated exemplarily and are not restricted to these. Further, at least one of the linguistic hierarchy and the acoustic hierarchy may be considered, and at least one of these may be considered for the layer. The generation of the hierarchical structure can also be performed manually.

図４に説明を戻すと、生成された構成要素の集合は一旦、構成要素記憶部２０に記憶され、その後、読み出されて重要度生成部１２に入力される。重要度生成部１２は、各検索された候補テキストについて、重要度を生成する（ステップＳ１２）。
ここで、重要度とは、候補テキストを選択する上での重要性を示す値であり、つまり、候補テキストを既存の音声合成用データベース記憶部（図示せず）に収録する重要性を表す値であると定義する。従って、重要度が高い候補テキストを選択し、その音声を収録して、既存の音声合成用データベース記憶部に追加すれば、より合成音声の品質を高めることが出来る。
この実施例では、一般的な文章において、固有名詞の部分は重要性が高い場合が多く、助詞の部分は重要性が低い場合が多い。この思想に基づき、上記重要度について、構成要素である「山田太郎」「東京」などの固有名詞を重要度として高い数値である、例えば「１０」とし、構成要素である「を」「は」などの助詞を重要度として低い数値である、例えば「１」とし、その他の品詞の構成要素の重要度を、例えば「５」と設定する。そして、それぞれの重要度を品詞の個数に乗じ、それらの和をその候補テキストの重要度として算出する。 Returning to FIG. 4, the generated set of components is temporarily stored in the component storage unit 20, and then read out and input to the importance generation unit 12. The importance generation unit 12 generates an importance for each searched candidate text (step S12).
Here, the importance is a value indicating the importance in selecting the candidate text, that is, a value indicating the importance of recording the candidate text in an existing speech synthesis database storage unit (not shown). Is defined as Therefore, if the candidate text with high importance is selected, the voice is recorded, and added to the existing voice synthesis database storage unit, the quality of the synthesized voice can be further improved.
In this embodiment, in general sentences, the proper noun part is often highly important, and the particle part is often less important. Based on this concept, the proper nouns such as “Taro Yamada” and “Tokyo”, which are constituent elements, are set to high numerical values, for example “10”, and the constituent elements “to” and “ha” are based on this idea. For example, “1” is set as a low numerical value as the importance level, and the importance levels of the other components of the part of speech are set as “5”, for example. Then, the importance is multiplied by the number of parts of speech, and the sum thereof is calculated as the importance of the candidate text.

例えば、構成要素生成部１８で取り出された候補テキストが「私は東京と大阪へ行く」である場合、「私」「は」「東京」「と」「大阪」「へ」「行く」という構成要素（形態素）が生成され（ステップＳ８）、これらの構成要素は、１つの構成要素集合として、構成要素記憶部２０で記憶され、その後、重要度生成部１２に入力される。重要度生成部１２は、「東京」「大阪」が２つの固有名詞、「は」「と」「へ」が３つの助詞としてその候補テキストの重要度を生成する。この場合の候補テキストの重要度は２３（＝２×１０＋３×１）となる。なお、重要度の考え方はこれに限られるものではない。
各検索された候補テキストについて、重要度がそれぞれ生成され（ステップＳ１２）、当該候補テキストのテキスト番号と重要度が組となって、重要度記憶部１４に記憶される。またテキスト番号でなく、候補テキストと当該テキストの重要度とを組として、重要度記憶部１４に記憶させてもよい。
最大重要度組選択部１６は重要度の和が最大である検索された候補テキストの組み合わせを選択する。最大重要度組選択部１６による候補テキストの組み合わせを選択する方法の一例を以下に示す。
まず重要度が大きい順にテキスト番号または、候補テキストを並べ、上位から候補テキストを選択していき、キーワードリスト記憶部中のキーワードを全て含むまで選択する。選ばれた候補テキストの組み合わせは一旦、候補テキスト記憶部７に記憶され、出力部１０により出力される。なお、最大重要度組選択部１６による候補テキストの組み合わせを選択する方法はこれに限られるものではない。 For example, when the candidate text extracted by the component generation unit 18 is “I go to Tokyo and Osaka”, the configuration is “I”, “ha”, “Tokyo”, “to”, “Osaka”, “to”, “go”. Elements (morphemes) are generated (step S8), and these components are stored as one component set in the component storage unit 20 and then input to the importance generation unit 12. The importance generation unit 12 generates the importance of the candidate text with “Tokyo” and “Osaka” as two proper nouns and “ha”, “to”, and “to” as three particles. In this case, the importance of the candidate text is 23 (= 2 × 10 + 3 × 1). Note that the concept of importance is not limited to this.
Importance is generated for each retrieved candidate text (step S12), and the text number and importance of the candidate text are paired and stored in the importance storage unit 14. Further, instead of the text number, the candidate text and the importance of the text may be stored as a set in the importance storage unit 14.
The maximum importance set selecting unit 16 selects a combination of searched candidate texts having the maximum importance. An example of a method for selecting a combination of candidate texts by the maximum importance set selecting unit 16 will be described below.
First, text numbers or candidate texts are arranged in descending order of importance, candidate texts are selected from the top, and selected until all keywords in the keyword list storage unit are included. The selected combination of candidate texts is temporarily stored in the candidate text storage unit 7 and output by the output unit 10. Note that the method of selecting a combination of candidate texts by the maximum importance set selecting unit 16 is not limited to this.

実施例３
この実施例では、既存の合成音声用データベース記憶部に格納されている音声に対応する発声テキストのデータベース記憶部に存在しない、又はまれにしか存在しない構成要素を含む候補テキストについて重要度が高いものであると考える。一般的に、既存の音声合成用データベース記憶部を用いて、音声合成を行って、音声の品質が悪い場合、その音声合成用データベース記憶部と対応する発声テキストデータベース記憶部に、まれにしか存在しない構成要素からなっている候補テキストを収録することで、合成音声の品質をより向上させることが出来る場合が多いからである。
この実施例では、上記実施例１、２で説明したテキスト選択部８内に、階層被覆率計算部２２、階層被覆率記憶部２４、頻度分布記憶部３０が更に追加される。また、説明の便宜上、発声テキストデータベース記憶部２６と構成要素リスト記憶部２８が追加される。
発声テキストデータベース記憶部２６には、既存の音声合成用データベース記憶部に格納された音声に対応するテキストがデジタルデータとして格納されている。構成要素リスト記憶部２８には、上述した構成要素生成部１８または人手により、発声テキストデータベース記憶部２６中の全ての発声テキストについて、構成要素リスト記憶部として例えば、図６の階層構造が生成格納されている。 Example 3
In this embodiment, a candidate text that includes a component that does not exist or rarely exists in the database storage unit of the utterance text corresponding to the speech stored in the existing synthesized speech database storage unit is highly important. I believe that. Generally, when speech synthesis is performed using an existing speech synthesis database storage unit and the speech quality is poor, the speech text database storage unit corresponding to the speech synthesis database storage unit is rarely present. This is because the quality of the synthesized speech can often be improved by recording candidate texts that are not composed components.
In this embodiment, a tier coverage calculation unit 22, a tier coverage storage unit 24, and a frequency distribution storage unit 30 are further added to the text selection unit 8 described in the first and second embodiments. For the convenience of explanation, an utterance text database storage unit 26 and a component list storage unit 28 are added.
The utterance text database storage unit 26 stores text corresponding to speech stored in the existing speech synthesis database storage unit as digital data. In the constituent element list storage unit 28, for example, the hierarchical structure of FIG. 6 is generated and stored as the constituent element list storage unit for all the utterance texts in the utterance text database storage unit 26 by the above-described constituent element generation unit 18 or manually. Has been.

また、この実施例では構成要素の被覆率をいう概念を用いる。被覆率とは、ある構成要素が発声テキストデータベース記憶部２６に含まれる全ての構成要素全体に占める割合と定義する。つまり、被覆率が低い構成要素は発声テキストデータベース記憶部２６に比較的含まれていないことを意味し、被覆率が低い構成要素を含む候補テキストの音声を既存の合成音声用データベース記憶部に収録することで、より高い品質の音声を合成することが出来る。なお、構成要素生成部１８により生成される構成要素の層は、構成要素リスト記憶部に含まれる構成要素の層と同一である。
被覆率の求め方は、例えば、音素の層については、発声テキストデータベース記憶部２６に格納されている全てのテキストに含まれる全ての音素の出現総数をＲ、対象となる音素ｘの総数をｒとすると、音素ｘの被覆率はｒ／Ｒで求めることが出来る。全ての構成要素ごとに、被覆率を求め、発声テキストデータベース記憶部２６に含まれるテキストの全ての層について、頻度分布表が作成され、頻度分布記憶部３０に記憶される。
図７に音素の層についての頻度分布表の具体例を示す。ある音素の前音素環境とは当該音素の１つ前の音素を意味し、ある音素の後音素環境とは当該音素の１つ後の音素を意味する。「♯」は無音であることを意味する。そして、被覆率が例えば高い順に並べ替える。
ここで、図７には、順序１の音素「Ａ」と順序２の音素「Ａ」とが２つ含まれているが、前音素環境については、同じであるが、後音素環境については、順序１の音素「Ａ」については「♯」、順序２の音素「Ａ」については「Ｓ」となっている点で違っているので、これらは別なものとして扱う。 In this embodiment, the concept of the coverage of components is used. The coverage is defined as the ratio of a certain component to all the components included in the utterance text database storage unit 26. That is, a component with a low coverage rate is relatively not included in the utterance text database storage unit 26, and the speech of candidate text including a component with a low coverage rate is recorded in the existing synthesized speech database storage unit. By doing so, it is possible to synthesize higher quality speech. The component layers generated by the component generation unit 18 are the same as the component layers included in the component list storage unit.
For example, for the phoneme layer, R is the total number of appearances of all phonemes included in all texts stored in the utterance text database storage unit 26, and r is the total number of target phonemes x. Then, the coverage of phoneme x can be obtained by r / R. The coverage is obtained for every component, and a frequency distribution table is created for all layers of text included in the utterance text database storage unit 26 and stored in the frequency distribution storage unit 30.
FIG. 7 shows a specific example of the frequency distribution table for the phoneme layer. The phoneme environment of a certain phoneme means the phoneme immediately before the phoneme, and the phoneme environment of a phoneme means the phoneme immediately after the phoneme. “#” Means silence. And it rearranges in order with a high coverage, for example.
Here, FIG. 7 includes two phonemes “A” of order 1 and phonemes “A” of order 2, but the front phoneme environment is the same, but the rear phoneme environment is the same. Since the phoneme “A” of the order 1 is different from the point “#” and the phoneme “A” of the order 2 is “S”, these are treated as different.

なお、音響的階層がここでは、「音素」「音節」「連鎖音節」「３連鎖音節」を挙げているが、これら各階層の各要素に対する環境としては前音素環境、後音素環境ではなく、他の音響でもよい。例えば、３連鎖音節についての前環境、後環境は音素、音節、連鎖音節、３連鎖音節のどれでもよい。また、音響的階層の各層についての頻度分布表については、前環境、後環境のどちらか一方を考慮するか、２つとも考慮しない構成にしてもよく、言語的階層の各層についての頻度分布表においては前環境、後環境の両方については考慮しない。
図４に説明を戻すと、候補テキストから構成要素生成部１８により生成された構成要素の集合は構成要素記憶部２０に一旦記憶され、その後、これら構成要素の集合は階層被覆率計算部２２に入力される。階層被覆率計算部２２は、検索された候補テキストごとの構成要素集合ごとに、その構成要素に対する被覆率を頻度分布記憶部３０を参照して求め、これら被覆率の和を求める。
例えば、上記の候補テキスト「これはきれいな花です」の場合、階層被覆率計算部２２が図７の頻度分布表を参照して、まず被覆率を求める。例えば１つめの「Ｋ」の前音素環境は無音、後音素環境は「Ｏ」であるので、順序３の「Ｋ」であるので、被覆率がａ_３となる。
そして、残りの音素全てについて被覆率が求められ、また、その他の層の構成要素についても被覆率を求められ、求められた被覆率を用いて、各層ごとに、階層被覆率が求められる（図５中のステップＳ１０）。 Note that the acoustic hierarchy here is “phoneme”, “syllable”, “chained syllable”, “three-chained syllable”, but the environment for each element of these layers is not a prephoneme environment or a postphoneme environment, Other sounds may be used. For example, the pre-environment and post-environment for a three-chain syllable may be any of phonemes, syllables, chain syllables, and three-chain syllables. In addition, the frequency distribution table for each layer of the acoustic hierarchy may be configured so that either the previous environment or the rear environment is considered or not both, and the frequency distribution table for each layer of the linguistic hierarchy. Does not consider both the pre-environment and the post-environment.
Returning to FIG. 4, the set of components generated by the component generation unit 18 from the candidate text is temporarily stored in the component storage unit 20, and then the set of these components is stored in the hierarchical coverage rate calculation unit 22. Entered. For each constituent element set for each retrieved candidate text, the hierarchical coverage calculation unit 22 obtains the coverage for the constituent element with reference to the frequency distribution storage unit 30 and obtains the sum of these coverage ratios.
For example, in the case of the above candidate text “This is a beautiful flower”, the hierarchical coverage calculation unit 22 refers to the frequency distribution table of FIG. For example 1 claws of the front phoneme environment silence "K", the rear phoneme environment is "O", since the "K" in order 3, the coverage is a _3.
Then, the coverage is obtained for all the remaining phonemes, the coverage is also obtained for the constituent elements of the other layers, and the hierarchical coverage is obtained for each layer using the obtained coverage (see FIG. Step S10 in step 5).

階層被覆率とは、各階層毎に対象候補テキストに含まれる構成要素の被覆率の総和である。つまり、ある候補テキストＩの層Ｊにおいて、Ｎ（ＩＪ）個の構成要素があり、構成要素ｋの被覆率をＣ_ＩＪｋとすると、階層被覆率Ｃ_ＩＪは、下記の式（１）で求めることが出来る。
Ｃ_ＩＪ＝Σ_ｋ＝１ ^{Ｎ（ＩＪ）}Ｃ_ＩＪｋ・・・・（１）
例えば、上記の候補テキスト「これはきれいな花です」の形態素層の構成要素は「これ」「は」「きれいな」「花」「です」となる。そしてこの候補テキストの形態素の層における階層被覆率は図７で示したものになる場合は、これらの総和により求める。なお、図８は例えば、「これ」という形態素の被覆率が７．２５×１０^−５ということを表している。
よって、７．２５×１０^−５＋２．８３×１０^−４＋５．８４×１０^−６＋１．４３×１０^−５＋６．９３×１０^−４≒１．０７×１０^−３となる。
次に重要度生成部１２において、検索された候補テキストごとに階層被覆率の和を求め、この和が大となれば、小となるような値を重要度として求める。ここでは、階層被覆率の重み付き和を求めた後、逆数を求める。
つまり候補テキストＩの重要度Ｓ_Ｉは、Ｌを層の数、Ｗ_Ｊを階層Ｊの重み係数とすると、以下の式（２）で求めることが出来る。
Ｓ_Ｉ＝１／Σ_Ｊ＝１ ^ＬＷ_Ｊ・Ｃ_ＩＪ・・・・（２）
例えば、上記の「これはきれいな花です」という候補テキストについて、図９に示すように、主語・述語、３連鎖形態素、連鎖形態素、形態素、３連鎖音節、連鎖音節、音節、音素についての階層被覆率がそれぞれｂ、ｃ、ｄ、ｅ、ｆ、ｇ、ｈ、ａとなる場合に、重み係数をそれぞれ、Ｗ_ｂ、Ｗ_ｃ、Ｗ_ｄ、Ｗ_ｅ、Ｗ_ｆ、Ｗ_ｇ、Ｗ_ｈ、Ｗ_ａ、とすると、この候補テキストの重要度は、以下のように求めることが出来る。
１／（ｂ・Ｗ_ｂ＋ｃ・Ｗ_ｃ＋ｄ・Ｗ_ｄ＋ｅ・Ｗ_ｅ＋ｆ・Ｗ_ｆ＋ｇ・Ｗ_ｇ＋ｈ・Ｗ_ｈ＋ａ・Ｗ_ａ） The hierarchy coverage is the total sum of the coverage of the constituent elements included in the target candidate text for each hierarchy. That is, in a layer J of a candidate text I, there are N (IJ) components, and when the coverage of the component k is C _IJk , the hierarchical coverage C _IJ is obtained by the following equation (1). I can do it.
C _IJ = Σ _{k = 1} ^{N (IJ)} C _IJk (1)
For example, the constituent elements of the morpheme layer of the candidate text “This is a beautiful flower” are “this” “ha” “beautiful” “flower” “is”. When the candidate text has the tier coverage in the morpheme layer as shown in FIG. FIG. 8 shows, for example, that the coverage of the morpheme “this” is 7.25 × 10 ⁻⁵ .
Therefore, 7.25 × 10 ⁻⁵ + 2.83 × 10 ⁻⁴ + 5.84 × 10 ⁻⁶ + 1.43 × 10 ⁻⁵ + 6.93 × 10 ⁻⁴ ≈1.07 × 10 ⁻³ .
Next, the importance generation unit 12 calculates the sum of the hierarchical coverages for each retrieved candidate text, and if this sum is large, a value that is small is determined as the importance. Here, after obtaining the weighted sum of the hierarchical coverage, the reciprocal is obtained.
That severity S _I of the candidate text I, the number of layers to L, and the W _J and weighting coefficients of the hierarchical J, can be obtained by the following equation (2).
S _I = 1 / Σ _{J = 1} ^L W _J・ C _IJ・・・・ (2)
For example, for the above candidate text “This is a beautiful flower”, as shown in FIG. 9, hierarchical coverage of the subject / predicate, 3 chain morpheme, chain morpheme, morpheme, 3 chain syllable, chain syllable, syllable, phoneme When the rates are b, c, d, e, f, g, h, a, the weighting factors are W _b , W _c , W _d , W _e , W _f , W _g , W _h , W, respectively. _{If a} , then the importance of this candidate text can be determined as follows.
1 / (b · W _b + c · W _c + d · W _d + e · W _e + f · W _f + g · W _g + h · W _h + a · W _a )

ここで、階層被覆率の（重み付き）和を求め、この値が大となれば、小となるような値を重要度とする理由は、この候補テキストの階層被覆率の（重み付き）和とは、候補テキストが、どの程度、既存の音声合成用データベース記憶部に対応した発声テキストデータベース記憶部２６に含まれている構成要素から再構成可能化を示すものである。階層被覆率の（重み付き）和が低ければ、発声テキストデータベース記憶部２６にまれにしか存在しない構成要素からなる候補テキストである。よって、階層被覆率の（重み付き）和の逆数である重要度が高い構成要素を含む候補テキストを収録して発声テキストデータベース記憶部２６より作成した音声合成用データベース記憶部に追加すれば合成音声の品質を向上させることが出来る。何故なら、上述したように、既存の音声合成用データベース記憶部にまれにしか存在していない構成要素を含む候補テキストを収録して既存の音声合成用データベース記憶部に追加することで、合成音声の品質を向上させることが出来る場合が多いからである。
なお、重み係数については、キーワード計数部６により、検索された候補テキストをある分野に依存した用途向けにする場合は、音響的階層構造の階層被覆率に対する重み係数より、言語的階層構造の階層被覆率に対する重み係数を大きくすればよい。何故なら検索された候補テキストをある分野に依存した用途向けにする場合は、候補テキストから生成された形態素にはその特定の分野に依存した単語等が含まれている場合が多いからである。逆に、検索された候補テキストをある分野に比較的、依存させない用途向けにする場合は、音響的階層構造の階層被覆率に対する重み係数より、言語的階層構造の階層被覆率に対する重み係数を小さくすればよい。また、重み付けを考えない場合は、重み係数を全て「１」とすればよい。
また、上記では、上記重み付き和の逆数をとることで重要度を生成したが、これに限られず、例えば、上記重み付き和にマイナスを付けてこれを重要度とすることなども考えられる。 Here, the (weighted) sum of the hierarchical coverage is obtained, and if this value becomes large, the reason why the value that becomes small becomes the importance is the (weighted) sum of the hierarchical coverage of the candidate text. Indicates that the candidate text can be reconstructed from the components included in the utterance text database storage unit 26 corresponding to the existing speech synthesis database storage unit. If the (weighted) sum of the hierarchical coverage is low, the candidate text is composed of components that rarely exist in the utterance text database storage unit 26. Therefore, if a candidate text including a highly important component that is the reciprocal of the (weighted) sum of the hierarchical coverage is recorded and added to the speech synthesis database storage unit created from the utterance text database storage unit 26, the synthesized speech Quality can be improved. This is because, as described above, the synthesized speech is recorded by adding candidate texts including components that are rarely present in the existing speech synthesis database storage unit and adding them to the existing speech synthesis database storage unit. This is because the quality of the product can be improved in many cases.
As for the weighting factor, when the keyword counting unit 6 makes the retrieved candidate text for an application depending on a certain field, the linguistic hierarchical structure hierarchy is obtained from the weighting coefficient for the hierarchical coverage of the acoustic hierarchical structure. What is necessary is just to enlarge the weighting coefficient with respect to a coverage. This is because, when the retrieved candidate text is intended for use depending on a certain field, the morpheme generated from the candidate text often includes words or the like depending on the specific field. Conversely, if the retrieved candidate text is to be used for applications that do not depend relatively on a certain field, the weighting factor for the hierarchical coverage of the linguistic hierarchical structure is made smaller than the weighting factor for the hierarchical coverage of the acoustic hierarchical structure. do it. If weighting is not considered, all the weighting factors may be set to “1”.
In the above description, the importance is generated by taking the reciprocal of the weighted sum. However, the importance is not limited to this. For example, it is possible to add a minus to the weighted sum to make the importance.

実施例４
この実施例では、実施例１〜３で説明した最大重要度組選択部１６の具体的な処理の流れの例を説明する。なお、以下で説明する最大重要度組選択部１６の処理は公知の技術であるグリーディアルゴリズム（図５中のステップＳ１４）を用いる。最大重要度組選択部１６は、最重要テキスト選択手段３２、キーワード除去手段３４、繰り返し制御手段３６、とで構成されている。また図１０にグリーディアルゴリズムの処理の流れを示す。
重要度記憶部１４では、例えば図１１に示す形式で記憶させればよい。例えば、構成要素生成部１８で、層を連鎖形態素、連鎖音節、音素について構成要素を生成した場合を説明すると、キーワードを含む各候補テキストごとに、候補テキスト番号、例えば「７」と、キーワード数、例えばＫ_１と、各層名と、各層についての構成要素集合と、各層についての階層被覆率と、重要度とを記憶させる。なおこの記憶形式は以下の実施例５においても同様である。 Example 4
In this embodiment, an example of a specific processing flow of the maximum importance set selecting unit 16 described in the first to third embodiments will be described. Note that the processing of the maximum importance set selecting unit 16 described below uses a greedy algorithm (step S14 in FIG. 5) which is a known technique. The maximum importance group selection unit 16 includes a most important text selection unit 32, a keyword removal unit 34, and a repetition control unit 36. FIG. 10 shows the flow of processing of the greedy algorithm.
For example, the importance storage unit 14 may store the importance in the format shown in FIG. For example, a case where the constituent element generation unit 18 generates constituent elements for a chain morpheme, a chain syllable, and a phoneme will be described. For each candidate text including a keyword, a candidate text number such as “7” and the number of keywords For example, K ₁ , each layer name, a set of components for each layer, a hierarchical coverage for each layer, and importance are stored. This storage format is the same in the fifth embodiment below.

最重要テキスト選択手段３２が重要度生成部１２で生成された重要度が最も高い候補テキストを候補テキストデータベース記憶部４から選択し、候補テキスト記憶部７に記憶する（ステップＳ１６）。キーワード除去手段３４が、選択された候補テキストに含まれる全てのキーワードをキーワードリスト記憶部２から除去する（ステップＳ１８）。
そして、上述したキーワード計数部６、構成要素生成部１８、重要度生成部１２、最重要テキスト選択手段３２、キーワード除去手段３４、の処理を繰り返し制御手段３６が繰り返し行う（ステップＳ２０）。この繰り返し処理を、上記最重要テキスト選択手段３２により選択された全ての候補テキストの組中に含まれるキーワードが上記キーワードリスト記憶部中の最初のキーワードの全てと一致するまで行う。
このグリーディアルゴリズムによる最大重要度組選択部１６の処理において用いる重要度は実施例３で示した手法により求める場合に限らず、実施例２で示した手法により求めてもよい。 The most important text selection means 32 selects the candidate text having the highest importance generated by the importance generation unit 12 from the candidate text database storage unit 4 and stores it in the candidate text storage unit 7 (step S16). The keyword removal unit 34 removes all keywords included in the selected candidate text from the keyword list storage unit 2 (step S18).
Then, the control unit 36 repeatedly performs the processes of the keyword counting unit 6, the component generation unit 18, the importance generation unit 12, the most important text selection unit 32, and the keyword removal unit 34 described above (step S20). This iterative process is repeated until the keywords included in the set of all candidate texts selected by the most important text selection means 32 match all the first keywords in the keyword list storage unit.
The importance used in the processing of the maximum importance set selecting unit 16 by this greedy algorithm is not limited to the case of obtaining by the method shown in the third embodiment, but may be obtained by the method shown in the second embodiment.

実施例５
この実施例では、最大重要度組選択部１６の処理の流れを公知技術である動的計画法（図５中のステップＳ２２）を用いて処理を行う。また動的計画法とはある１つの条件を満たしながら、ある値が最大もしくは最小になるものを選択するアルゴリズムである。なお、動的計画法の詳細に関しては、「Ｒ．Ｂｅｌｌｍａｎ “Ｄｙｎａｍｉｃｐｒｏｇｒａｍｍｉｎｇ” ＰｒｉｎｃｅｔｏｎＵｎｉｖｅｒｓｅＰｒｅｓｓ１９５７」に記載されている。
この場合の最大重要度組選択部１６はデータ量測定手段３８、動的計画法実行手段４０とで構成されている。
まずデータ量測定手段３８により、重要度記憶部１４に記憶されている全ての候補テキストについてのデータ量を測定する。図１１に破線で示すように、各候補テキストのデータ量が付加された形式で重要度記憶部１４に記憶される。そして、動的計画法実行手段４０が以下の処理により候補テキストを選択する。
動的計画法実行手段４０が動的計画法を用いて、検索された候補テキストの組み合わせの重要度の総和を最大とし、かつキーワードリスト記憶部２のキーワードを全て全体で含み、かつ候補テキストのデータ量の総和が最小となる候補テキストの組み合わせを選択する。 Example 5
In this embodiment, the processing flow of the maximum importance set selecting unit 16 is processed using a dynamic programming method (step S22 in FIG. 5) which is a known technique. Dynamic programming is an algorithm that selects a value that maximizes or minimizes a certain value while satisfying a certain condition. Details of dynamic programming are described in “R. Bellman“ Dynamic programming ”Princeton University Press 1957”.
In this case, the maximum importance set selecting unit 16 includes a data amount measuring unit 38 and a dynamic programming executing unit 40.
First, the data amount measuring means 38 measures the data amount for all candidate texts stored in the importance storage unit 14. As indicated by a broken line in FIG. 11, the importance storage unit 14 stores the candidate text in a format to which the data amount is added. And the dynamic programming execution means 40 selects a candidate text by the following process.
The dynamic programming execution means 40 uses the dynamic programming to maximize the sum of the importance levels of the combinations of retrieved candidate texts, includes all the keywords in the keyword list storage unit 2, and Select a combination of candidate texts that minimizes the total amount of data.

ここで、キーワードリスト記憶部２のキーワードを全て含んでいるか否かはキーワードリスト記憶部２中のキーワード数と選択された候補テキストのキーワードの数とが等しいか否かを判定すればよい。ここで、キーワード候補テキストに含まれるキーワードの数の和を求める際に、同一のキーワードは１つと数えることにする。等しければ、キーワードリスト記憶部２のキーワードを全て含んでいることになる。また、データ量総和のしきい値を設定して、動的計画法を実行し、条件を満たさなくなるまでデータ量総和のしきい値を徐々に下げて動的計画法を繰り返し実行することも考えられる。
以上の各実施形態の他、本発明であるテキスト選択装置・方法は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記テキスト選択装置・方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。
また、上記テキスト選択装置における処理機能をコンピュータによって実現する場合、テキスト選択装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記テキスト選択装置における処理機能がコンピュータ上で実現される。 Here, whether or not all the keywords in the keyword list storage unit 2 are included may be determined by determining whether or not the number of keywords in the keyword list storage unit 2 is equal to the number of keywords in the selected candidate text. Here, when calculating the sum of the number of keywords included in the keyword candidate text, the same keyword is counted as one. If they are equal, all the keywords in the keyword list storage unit 2 are included. It is also possible to execute a dynamic programming method by setting a threshold value for the total amount of data, and repeatedly execute the dynamic programming method by gradually lowering the threshold value for the total data amount until the condition is not satisfied. It is done.
In addition to the above embodiments, the text selection device / method according to the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above text selection device / method is not only executed in time series in the order described, but also executed in parallel or individually as required by the processing capability of the device that executes the processing. It is good.
When the processing function in the text selection device is realized by a computer, the processing content of the function that the text selection device should have is described by a program. Then, by executing this program on a computer, the processing functions in the text selection device are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ＤＶＤ−ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＣＤ−Ｒ（Ｒｅｃｏｒｄａｂｌｅ）／ＲＷ（ＲｅＷｒｉｔａｂｌｅ）等を、光磁気記録媒体として、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｃ）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（ＥｌｅｃｔｒｏｎｉｃａｌｌｙＥｒａｓａｂｌｅａｎｄＰｒｏｇｒａｍｍａｂｌｅ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等を用いることができる。
また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape, and the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable-Programmable-Ready), etc. Can be used.
The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（ＡｐｐｌｉｃａｔｉｏｎＳｅｒｖｉｃｅＰｒｏｖｉｄｅｒ）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。
また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、言語モデル作成装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by an execution instruction and result acquisition without transferring a program from the server computer to the computer. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
In this embodiment, the language model creation apparatus is configured by executing a predetermined program on a computer. However, at least a part of the processing contents may be realized by hardware.

この発明の実施例１の具体的構成例を示すブロック図。The block diagram which shows the specific structural example of Example 1 of this invention. この発明の実施例１の主な処理の流れを示すフローチャート。The flowchart which shows the flow of the main processes of Example 1 of this invention. キーワードリスト記憶部２の具体例。A specific example of the keyword list storage unit 2. この発明の実施例２〜６の具体的構成例を示すブロック図。The block diagram which shows the specific structural example of Examples 2-6 of this invention. この発明の実施例２〜６の主な処理の流れを示すフローチャート。The flowchart which shows the flow of the main processes of Examples 2-6 of this invention. 構成要素生成部１８により生成される構成要素の階層構造を示す図。The figure which shows the hierarchical structure of the component produced | generated by the component generation part 18. FIG. 頻度分布記憶部４中の音素の層についての頻度分布表の具体例。A specific example of a frequency distribution table for a phoneme layer in the frequency distribution storage unit 4. 形態素の層におけるそれぞれの構成要素の被覆率を示す具体例。The example which shows the coverage of each component in the layer of a morpheme. 言語的階層における主語・述語、３連鎖形態素、連鎖形態素、形態素、音響的階層における３連鎖音節、連鎖音節、音節、音素、それぞれについての階層被覆率を示す図。The figure which shows the hierarchy coverage about each of the subject and predicate in a linguistic hierarchy, 3 chain morpheme, a chain morpheme, a morpheme, and 3 chain syllables, a chain syllable, a syllable, and a phoneme in an acoustic hierarchy. 最大重要度組選択部１６においてグリーディアルゴリズムを用いた場合の処理の流れを示すフローチャート。The flowchart which shows the flow of a process at the time of using a greedy algorithm in the maximum importance group selection part 16. FIG. 実施例４〜６において、重要度記憶部１４での具体的な記憶形式を示す図。In Examples 4-6, the figure which shows the specific memory format in the importance memory | storage part 14. FIG.

Claims

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
A keyword counting unit that counts the number of the important keywords included in each of the plurality of candidate texts, stored in the candidate text database storage unit ;
From among the counted candidate text, and the text selection unit for selecting a combination of candidate text,
Have
The text selection part
A text selection means for selecting a candidate text having the largest number of the important keywords from the counted candidate texts, and storing the selected candidate text in a candidate text storage unit;
Keyword removing means for removing keywords included in the candidate text selected by the text selecting means from the keyword list storage unit;
Repeat control means for sequentially controlling the keyword counting section, the text selection means, and the keyword removal means until the keywords in the keyword list storage section are empty,
Text selection device according to claim <br/> contain.

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
A keyword counting unit for searching candidate texts including the important keyword from the candidate text database storage unit;
A text selection unit for selecting a combination of candidate texts from the searched candidate texts;
Have
The text selection part
For each of the retrieved candidate texts , a component generation unit that generates a set of components of a predetermined hierarchy in the acoustic and / or linguistic hierarchical structure of the spoken language;
For each of the retrieved candidate texts, an importance generation unit that calculates the sum of the importance given to each component included in the set of component and generates the sum as the importance of the candidate text ; ,
The most important text selection for selecting the candidate text having the highest importance of the candidate text from the searched candidate texts not stored in the candidate text storage unit and storing the selected candidate text in the candidate text storage unit Means,
Keyword removing means for removing keywords included in the candidate text selected by the most important text selecting means from the keyword list storage unit;
The most important text selection means and the keyword removal means until the keywords included in the set of candidate texts stored in the candidate text storage unit match all of the first keywords in the keyword list storage unit. Repetitive control means for sequentially and repetitively controlling;
A text selection device comprising:

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
A keyword counting unit for searching candidate texts including the important keyword from the candidate text database storage unit;
A text selection unit for selecting a combination of candidate texts from the searched candidate texts;
Have
The text selection part
A component generation unit that generates a set of one or more components by analyzing each of the retrieved candidate texts based on at least one layer in the acoustic and / or linguistic hierarchical structure of the spoken language; ,
Percentage of the constituent element in the constituent element set corresponding to at least one constituent element set to be generated with respect to the utterance text corresponding to the speech stored in the speech synthesis database storage unit A frequency distribution storage unit in which the coverage ratio indicating is stored;
For each component set for each of the retrieved candidate texts, a coverage with respect to the component is obtained by referring to the frequency distribution storage unit, and the sum of these coverages is used as a hierarchical coverage to obtain a hierarchical coverage calculation unit When,
An importance generation unit that obtains the sum of the hierarchical coverages for each of the searched candidate texts, and obtains a value that is small as the importance if the sum is large;
The most important text selection for selecting the candidate text having the highest importance of the candidate text from the searched candidate texts not stored in the candidate text storage unit and storing the selected candidate text in the candidate text storage unit Means,
Keyword removing means for removing keywords included in the candidate text selected by the most important text selecting means from the keyword list storage unit;
The most important text selection means and the keyword removal means until the keywords included in the set of candidate texts stored in the candidate text storage unit match all of the first keywords in the keyword list storage unit. Repetitive control means for sequentially and repetitively controlling;
A text selection device comprising:

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
A keyword counting unit for searching candidate texts including the important keyword from the candidate text database storage unit;
A combination of candidate texts including all keywords in the keyword list storage unit by preferentially selecting candidate texts with high importance calculated for each candidate text from the searched candidate texts A text selection unit that stores a combination of the selected candidate texts in the candidate text storage unit,
Have
The text selection part
A component generation unit that generates a set of one or more components by analyzing each of the retrieved candidate texts based on at least one layer in the acoustic and / or linguistic hierarchical structure of the spoken language; ,
Percentage of the constituent element in the constituent element set corresponding to at least one constituent element set to be generated with respect to the utterance text corresponding to the speech stored in the speech synthesis database storage unit A frequency distribution storage unit in which the coverage ratio indicating is stored;
For each component set for each of the retrieved candidate texts, a coverage with respect to the component is obtained by referring to the frequency distribution storage unit, and the sum of these coverages is used as a hierarchical coverage to obtain a hierarchical coverage calculation unit When,
An importance generation unit that obtains the sum of the hierarchical coverages for each of the searched candidate texts, and obtains a value that is small as the importance if the sum is large;
A text selection device comprising:

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
Use
A keyword counting process in which the keyword counting means counts the number of the important keywords included in each of the plurality of candidate texts stored in the candidate text database storage unit ;
Text selection means, from among the counted candidate text, a combination of candidate text, and the text selection step of selecting,
I have a,
The text selection process is
A candidate text selection process of selecting the candidate text having the largest number of the important keywords from the counted candidate text and storing the selected candidate text in the candidate text storage unit;
A keyword removal process for removing keywords included in the candidate text selected by the text selection means from the keyword list storage unit;
Until the keyword in the keyword list storage unit is empty, a repetitive control process in which the keyword counting unit, the text selecting unit, and the keyword removing unit are sequentially repetitively controlled,
A method for selecting text , comprising :

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
Use
A keyword counting process in which the keyword counting means searches the candidate text database storage unit for candidate text containing the important keyword ;
A text selection process in which the text selection means selects a combination of candidate texts from the searched candidate texts;
Have
The text selection process is
A component generation process in which the component generation means generates a set of components in a predetermined hierarchy in the acoustic and / or linguistic hierarchical structure of the spoken language for each of the retrieved candidate texts;
For each of the retrieved candidate texts, the importance generation means calculates the sum of the importance assigned to each component included in the set of the components, and obtains the sum as the importance of the candidate text. Importance generation process ,
Among the searched candidate text is not stored in the candidate text storage unit, the most important text importance of the candidate text selects the largest candidate text, and stores the candidate selected text to the candidate text storage unit The selection process,
A keyword removal process for removing keywords included in the candidate text selected by the most important text selection means from the keyword list storage unit;
The most important text selection means and the keyword removal means until the keywords included in the set of candidate texts stored in the candidate text storage unit match all of the first keywords in the keyword list storage unit. A repetitive control process for sequentially repetitive control;
A method for selecting text, comprising:

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
Use
A keyword counting process in which the keyword counting means searches the candidate text database storage unit for candidate text containing the important keyword ;
A text selection process in which the text selection means selects a combination of candidate texts from the searched candidate texts;
Have
The text selection process is
A component generation process for generating a set of one or more components for each of the retrieved candidate texts by analysis based on at least one layer in the acoustic and / or linguistic hierarchical structure of the spoken language; ,
Percentage of the constituent element in the constituent element set corresponding to at least one constituent element set to be generated with respect to the utterance text corresponding to the speech stored in the speech synthesis database storage unit A frequency distribution storage process in which the coverage ratio indicating is stored,
Hierarchical coverage calculation process for obtaining the coverage for each constituent element for each retrieved candidate text by referring to the frequency distribution storage unit and using the sum of these coverages as the hierarchical coverage When,
An importance generation process for obtaining the sum of the hierarchical coverages for each of the searched candidate texts, and obtaining a value that is small if the sum is large,
The most important text selection for selecting the candidate text having the highest importance of the candidate text from the searched candidate texts not stored in the candidate text storage unit and storing the selected candidate text in the candidate text storage unit Process,
A keyword removal process for removing keywords included in the candidate text selected by the most important text selection means from the keyword list storage unit;
The most important text selection means and the keyword removal means until the keywords included in the set of candidate texts stored in the candidate text storage unit match all of the first keywords in the keyword list storage unit. A repetitive control process for sequentially repetitive control;
A method for selecting text, comprising:

A candidate text database storage unit in which a large amount of candidate text is stored as digital data,
A keyword list storage unit in which important keywords in speech synthesis are stored in advance;
Use
A keyword counting process for searching candidate texts containing the important keywords from the candidate text database storage unit;
A combination of candidate texts including all keywords in the keyword list storage unit by preferentially selecting candidate texts with high importance calculated for each candidate text from the searched candidate texts A text selection process for storing the selected combination of candidate texts in the candidate text storage unit, and
Have
The text selection process is
A component generation process for generating a set of one or more components for each of the retrieved candidate texts by analysis based on at least one layer in the acoustic and / or linguistic hierarchical structure of the spoken language; ,
Percentage of the constituent element in the constituent element set corresponding to at least one constituent element set to be generated with respect to the utterance text corresponding to the speech stored in the speech synthesis database storage unit A frequency distribution storage process in which the coverage ratio indicating is stored,
Hierarchical coverage calculation process for obtaining the coverage for each constituent element for each retrieved candidate text by referring to the frequency distribution storage unit and using the sum of these coverages as the hierarchical coverage When,
An importance generation process for obtaining the sum of the hierarchical coverages for each of the searched candidate texts, and obtaining a value that is small if the sum is large,
A method for selecting text, comprising:

A text selection program for causing a computer to execute each step of the text selection method according to any one of claims 5 to 8 .

A computer-readable recording medium on which the text selection program according to claim 9 is recorded.