JP5191470B2

JP5191470B2 - Reading text set creation method, mass Japanese text database repair method, apparatus, and program

Info

Publication number: JP5191470B2
Application number: JP2009258928A
Authority: JP
Inventors: 公人田中; 秀之水野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-11-12
Filing date: 2009-11-12
Publication date: 2013-05-08
Anticipated expiration: 2029-11-12
Also published as: JP2011107183A

Abstract

<P>PROBLEM TO BE SOLVED: To create a reading text set making the generation frequency of retake to the minimum even when an amateur reads the set. <P>SOLUTION: A method for creating the reading text set includes: a reading text set generation process; and a reading error patter deletion process. A reading text set generation part extracts a plurality of reading text set candidates including individual phonemic sequences by regarding a plurality of the phonemic sequences wished to be recorded as input from an extensive Japanese text database. A reading error pattern deletion part determines to refer to a reading error pattern database whether a reading error pattern is included in the individual reading text set candidates by regarding the plurality of the reading text set candidates as input, retrieves the reading text set candidate including the same phonemic sequence from the extensive Japanese text database again when including the reading error pattern, and outputs the reading text set candidates as the reading text set when not including the reading error pattern. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、テキストを音声に変換する音声合成技術において、音声の素となる音声素片のデータベースを構築する際に利用する読上げ用テキストセット作成方法と大量日本語テキストデータベース改修方法とその装置と、プログラムに関する。 The present invention relates to a method for creating a text set for reading, a method for revising a large-scale Japanese text database, and a device therefor, which are used when constructing a database of speech segments as speech elements in speech synthesis technology for converting text into speech , Related to the program.

従来の音声合成システムにおいて、音声素片データベースを構築する際に読み上げるテキストは、例えば非特許文献１に開示された以下の基準を用いて作成されていた。 In a conventional speech synthesis system, text to be read out when constructing a speech unit database has been created using, for example, the following criteria disclosed in Non-Patent Document 1.

（１）あらゆる日本語テキストをカバーするために最低限必要となる音声素片が全て含
まれること。（２）なるべく少ない文章量でデータベースが構築できること。（３）なるべく長く連続した音素列がデータベースに含まれること。（４）なるべく多くのピッチ（声の高さ）のバリエーションが含まれること。（５）なるべく多くの音韻継続長のバリエーションが含まれること。なお、（３）〜（５）は合成音声の品質劣化を避ける目的である。 (1) All speech elements necessary for covering all Japanese texts are included. (2) A database can be constructed with as little text as possible. (3) A phoneme sequence that is continuous as long as possible is included in the database. (4) As many pitch (voice pitch) variations as possible are included. (5) To include as many phoneme duration variations as possible. Note that (3) to (5) are for the purpose of avoiding quality degradation of the synthesized speech.

また、特許文献１には、ある特定分野の合成音声の品質を向上させる目的で、キーワードを含むテキストの組み合わせを候補テキストとして選択するテキスト選択装置６００が開示されている。図６にテキスト選択装置６００の機能構成を示してその動作を簡単に説明する。 Patent Document 1 discloses a text selection device 600 that selects a combination of texts including keywords as candidate texts for the purpose of improving the quality of synthesized speech in a specific field. FIG. 6 shows a functional configuration of the text selection device 600 and its operation will be briefly described.

テキスト選択装置６００は、キーワードリスト記憶部２、キーワード計数部６、キーワード計数記憶部５、候補テキストデータベース記憶部４、テキスト選択部８、出力部１０を備える。キーワードリスト記憶部２は、音声合成において重要なキーワードを予め記憶している。候補テキストデータベース記憶部４は、大量の日本語テキストをディジタルデータとして記憶している。キーワード計数部６は、キーワードリスト記憶部２に記憶されているキーワードを含む候補テキストを候補テキストデータベース記憶部４から検索すると共に、その候補テキストに含まれるキーワードの数も計数する。キーワード計数記憶部５は、検索された候補テキストのテキスト番号とキーワード数を記憶する。 The text selection device 600 includes a keyword list storage unit 2, a keyword count unit 6, a keyword count storage unit 5, a candidate text database storage unit 4, a text selection unit 8, and an output unit 10. The keyword list storage unit 2 stores keywords that are important in speech synthesis in advance. The candidate text database storage unit 4 stores a large amount of Japanese text as digital data. The keyword counting unit 6 searches the candidate text database storage unit 4 for candidate texts including keywords stored in the keyword list storage unit 2 and also counts the number of keywords included in the candidate texts. The keyword count storage unit 5 stores the text number of the retrieved candidate text and the number of keywords.

テキスト選択部８は、検索された候補テキストの中からキーワードリスト記憶部２に記憶された全てのキーワードを含む候補テキストの組み合わせを選択する。出力部１０は、その選択された候補テキストを出力する。 The text selection unit 8 selects a combination of candidate texts including all the keywords stored in the keyword list storage unit 2 from the retrieved candidate texts. The output unit 10 outputs the selected candidate text.

磯貝、水野、間野、「音声合成用データベース設計方法の検討」、音響学会講演論文集、3-2-10,pp.335-336（2004年9月）Kaigai, Mizuno, Mano, “Examination of Database Design Method for Speech Synthesis”, Proceedings of the Acoustical Society of Japan, 3-2-10, pp.335-336 (September 2004)

特開２００７−３３４２６４号公報JP 2007-334264 A

音声素片データベースは、例えば特許文献１に示された方法などを用いて、所望の音声素片の音韻列を含んだ読上げ用日本語テキストセットを作成し、それを人間が読み上げることで作成される。従来の音声素片データベース構築用のテキストは、プロのアナウンサーや声優などのプロフェッショナルに読み上げてもらうことを前提にしたものであった。また、従来は、データベース構築者が意図する音声が収録出来ているか、収録後に耳で聴いて読み誤り等があれば、必要に応じて収録し直すこと（以降、リテイク）を前提としていた。そのため、一つの音声素片データベースを構築するコストは非常に高価であった。 The speech segment database is created by creating a Japanese text set for reading that includes the phoneme sequence of a desired speech segment using the method disclosed in Patent Document 1, for example, and then reading it out by a human. The The conventional text for constructing a speech segment database is based on the assumption that professional announcers and voice actors will read it out. Further, conventionally, it has been assumed that if the voice intended by the database builder is recorded, or if there is a reading error after listening with the ear after recording, it is re-recorded as needed (hereinafter referred to as retake). Therefore, the cost of constructing one speech segment database is very expensive.

一方、近年では、様々な人の声で音声合成を行いたいというニーズが高まっている。その多様化した個別の音声素片データベースの構築費用は、安価であることが必要である。しかし、従来の方法で作成した音声素片データベース構築用のテキストでは、その要求に答えられない課題があった。 On the other hand, in recent years, there has been an increasing need to perform speech synthesis with various voices. The cost of constructing the diversified individual speech segment database needs to be low. However, the text for constructing a speech segment database created by a conventional method has a problem that the request cannot be answered.

また、様々な人、つまりプロのアナウンサーや声優ではない素人の声で音声素片データベースを構築しようとした場合に、従来の音声素片データベース構築用のテキストを用いると読み誤りが発生し易く、その読み誤りが音声合成音の品質劣化に直接影響してしまう課題もある。 In addition, when trying to construct a speech segment database with voices of various people, that is, amateurs who are not professional announcers or voice actors, using the text for constructing a conventional speech segment database is likely to cause reading errors. There is also a problem that the reading error directly affects the quality degradation of the synthesized speech.

この発明は、このような問題点に鑑みてなされたものであり、多様化した個別の音声素片データベースを安価に作成するために、且つ、素人が読んでも音声合成音の品質を劣化させない音声素片データベース構築用のテキストを作成する読上げ用テキストセット作成方法と大量日本語テキストデータベース改修方法と、その装置と、プログラムを提供することを目的とする。 The present invention has been made in view of such problems, and is intended to create a diversified individual speech segment database at a low cost, and to prevent speech deterioration that does not deteriorate the quality of speech synthesized speech even if read by an amateur. It is an object of the present invention to provide a method for creating a text set for reading, a method for modifying a large-scale Japanese text database, a device for the same, and a program for creating a text for constructing a fragment database.

この発明の読上げ用テキストセット作成方法は、読上げ用テキストセット生成過程と、読み誤りパターン削除過程と、を含む。読上げ用テキストセット生成過程は、読上げ用テキストセット生成部が、収録したい複数の音素列を入力として個々の音素列を含む複数の読上げ用テキストセット候補を、大量日本語テキストデータベースから抽出する。読み誤りパターン削除過程は、読み誤りパターン削除部が、上記複数の読上げ用テキストセット候補を入力として、個々の読上げ用テキスト候補に読み誤りパターン（素人が読み誤り易いパターン）が含まれるか否かを読み誤りパターンデータベースを参照して判定し、読み誤りパターンを含む場合は同一の音素列を含む他のテキスト候補を上記大量日本語テキストデータベースから再検索し、読み誤りパターンを含まない場合は当該他のテキスト候補を一のテキストとして出力する。 The reading text set creation method of the present invention includes a reading text set generation process and a reading error pattern deletion process. In the reading text set generation process, the reading text set generation unit extracts a plurality of reading text set candidates including individual phoneme strings from a large amount of Japanese text database by inputting a plurality of phoneme strings to be recorded. In the reading error pattern deletion process, the reading error pattern deletion unit receives the plurality of reading text set candidates as input, and whether or not each reading text candidate includes a reading error pattern (a pattern that is easily read by an amateur). Is read with reference to the reading error pattern database.If the reading error pattern is included, another text candidate including the same phoneme string is re-searched from the above large Japanese text database. Output other text candidates as one text.

この発明の読上げ用テキストセット作成方法によれば、読み誤りパターン削除過程において個々の読上げ用テキスト候補に読み誤りパターンが含まれるか否かを判定し、読み誤り易いパターンを含むテキストを削除する。その結果、読み誤り易い文章を含まない読上げ用テキストセットが作成できる。したがって、その読上げ用テキストセットを素人が読んでもリテイクの発生頻度を最小限にすることが出来る。またそれにより、低コストで多様な音声素片データベースを構築するのに好適な読上げ用テキストセットを作成することが出来る。 According to the reading text set creation method of the present invention, it is determined whether or not a reading error pattern is included in each reading text candidate in the reading error pattern deletion process, and the text including a pattern that is easy to read is deleted. As a result, it is possible to create a text set for reading that does not include sentences that are easily misread. Therefore, even if an amateur reads the text set for reading, the occurrence frequency of retake can be minimized. Thereby, it is possible to create a text set for reading which is suitable for constructing various speech segment databases at low cost.

この発明の読上げ用テキストセット作成装置１００の機能構成例を示す図。The figure which shows the function structural example of the text set preparation apparatus 100 for reading of this invention. 読上げ用テキストセット作成装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the text set preparation apparatus 100 for reading. 読み誤りパターン削除部４０の機能構成例を示す図。The figure which shows the function structural example of the reading error pattern deletion part 40. FIG. 読み誤りパターン削除部４０の動作フローを示す図。The figure which shows the operation | movement flow of the reading error pattern deletion part. 読み誤り易い文章を含まない大量日本語テキストデータベースを作成する手順を示す図。The figure which shows the procedure which produces the mass Japanese text database which does not contain the sentence which is easy to read. 特許文献１に開示されたテキスト選択装置６００の機能構成を示す図。The figure which shows the function structure of the text selection apparatus 600 disclosed by patent document 1. FIG.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１にこの発明の読上げ用テキストセット作成装置１００の機能構成例を示す。その動作フローを図２に示す。読上げ用テキストセット作成装置１００は、読上げ用テキストセット生成部２０と、大量日本語テキストデータベース３０と、読み誤りパターン削除部４０と、読み誤りパターンデータベース５０と、を具備する。 FIG. 1 shows an example of the functional configuration of a text set creation device 100 for reading according to the present invention. The operation flow is shown in FIG. The reading text set creation device 100 includes a reading text set generation unit 20, a mass Japanese text database 30, a reading error pattern deletion unit 40, and a reading error pattern database 50.

読上げ用テキストセット作成装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 The reading text set creation apparatus 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU.

読上げ用テキストセット生成部２０は、収録したい複数の音素列を入力として個々の音素列を含む複数の読上げ用テキストセット候補を、大量日本語テキストデータベース３０から抽出する（ステップＳ２０）。大量日本語テキストデータベース３０は、読上げ用テキストセット候補を記録する点において従来技術で説明した候補テキストデータベース４と、基本的に同じものであり、例えば新聞数年分のテキストを記録したものである。 The reading text set generation unit 20 extracts a plurality of reading text set candidates including individual phoneme strings from the mass Japanese text database 30 by using a plurality of phoneme strings to be recorded as input (step S20). The mass Japanese text database 30 is basically the same as the candidate text database 4 described in the prior art in that it records the text set candidates for reading, and it records, for example, several years of text in newspapers. .

表１に音素列と、読上げ用テキストセット候補の例を示す。 Table 1 shows examples of phoneme strings and text set candidates for reading.

ここで、（・）内の音素は、収録したい音素列の前後の音韻環境を表す。例えば、「（Ｉ）ＨＯ（ｎ）」の場合、収録したい音素列は「ＨＯ」であり、その前環境が「Ｉ」、かつ後環境が「ｎ」で発声されたものを表している。「＃」は文頭、文末を表す。「^＊」は母音の無声化を表す。/はポーズを表す。

Here, the phonemes in (.) Represent the phoneme environment before and after the phoneme string to be recorded. For example, in the case of “(I) HO (n)”, the phoneme string to be recorded is “HO”, which indicates that the previous environment is “I” and the subsequent environment is “n”. “#” Represents the beginning and end of a sentence. “ ^* ” Indicates devoicing of vowels. / Represents a pause.

読上げ用テキストセット生成部２０は、「＃ＡＫ（Ａ）」という音素列を収録するため、例えば「赤色の唐辛子」という読上げ用テキストセット候補を、大量日本語テキストデータベース３０から抽出する。収録したい音素列の数は、最も簡単な例としては、「あ、い、う、え、…」の音節のみを収録する場合は約２００個程度、音素の前環境と後環境を考慮すると約６０００個程度である。また、音声合成音の品質を高くするための一つの方法として、１個の音素列を長くする方法があるが、その場合音素列の数は無数となる。このように、音素列の数は、音声合成システムに応じて様々に変化する。 The reading text set generation unit 20 extracts a text set candidate for reading “red chili” from the mass Japanese text database 30 in order to record the phoneme string “#AK (A)”. The simplest example of the number of phoneme strings to be recorded is about 200 when recording only the syllables of “A, I, U, E,…”, and it is about considering the pre- and post-phoneme environment. About 6000. Further, as one method for improving the quality of the speech synthesized sound, there is a method of lengthening one phoneme string. In that case, the number of phoneme strings is infinite. As described above, the number of phoneme strings varies depending on the speech synthesis system.

読上げ用テキストセット生成部２０は、音声合成システムで要求される収録したい音素列の数に対応した複数の読み上げ用テキストセット候補を生成する。 The text set generator for reading 20 generates a plurality of text set candidates for reading corresponding to the number of phoneme strings to be recorded required by the speech synthesis system.

読み誤りパターン削除部４０は、複数の読上げ用テキストセット候補を入力として、個々の読上げ用テキストセット候補に読み誤りパターンが含まれるか否かを読み誤りパターンデータベース５０を参照して判定し、読み誤りパターンを含む場合は同一の音素列を含む読上げ用テキストセット候補を大量日本語テキストデータベース３０から再検索し、読み誤りパターンを含まない場合は当該読上げ用テキストセット候補を読上げ用テキストセットとして出力する（ステップＳ４０）。 The reading error pattern deletion unit 40 receives a plurality of reading text set candidates as input, determines whether or not each reading text set candidate includes a reading error pattern with reference to the reading error pattern database 50, and reads If an error pattern is included, a text set candidate for reading that includes the same phoneme string is re-searched from the large-scale Japanese text database 30. (Step S40).

表２に、読み誤りパターンデータベース５０に記録された読み誤りパターンの例を示す。 Table 2 shows examples of reading error patterns recorded in the reading error pattern database 50.

読み誤りパターン削除部４０は、表２に例示した単語等を含む読み上げ用テキスト候補を、読上げ用テキストセットから排除する。この結果、読み誤りパターン削除部４０から出力される読み上げ用テキストセットには、読み誤りのパターンが含まれないので、素人が読んでもリテイクの発生の少ない読み上げ用テキストセットを作成することが出来る。

The reading error pattern deletion unit 40 excludes the reading text candidates including the words and the like illustrated in Table 2 from the reading text set. As a result, the reading text set output from the reading error pattern deletion unit 40 does not include a reading error pattern, so that it is possible to create a reading text set with less retake even when read by an amateur.

なお、読み上げ用テキストセット生成部２０は、従来のテキスト選択装置６００のテキスト選択部８と同じ機能である。また、大量日本語テキストデータベース３０も、上記した通り、従来技術の候補テキストデータベース記憶部４と同じものである。よって、実施例１の特徴は、読み誤りパターン削除部４０と読み誤りパターンデータベース５０とを備える点にある。 Note that the reading text set generation unit 20 has the same function as the text selection unit 8 of the conventional text selection device 600. The large-volume Japanese text database 30 is the same as the candidate text database storage unit 4 of the prior art as described above. Therefore, the feature of the first embodiment is that the reading error pattern deletion unit 40 and the reading error pattern database 50 are provided.

図３に、この実施例１の特徴である読み誤りパターン削除部４０のより具体的な機能構成例を示して更に詳しく説明する。図４にその動作フローを示す。読み誤りパターン削除部４０は、一文章抽出手段４１と、読み誤りパターンマッチング手段４２と、データベース再検索手段４３と、他文章代替判断手段４４と、無意味単語テキストセット生成手段４５と、文章出力手段４６と、を備える。 FIG. 3 shows a more specific functional configuration example of the reading error pattern deletion unit 40 which is a feature of the first embodiment, and will be described in more detail. FIG. 4 shows the operation flow. The reading error pattern deletion unit 40 includes a single sentence extraction unit 41, a reading error pattern matching unit 42, a database re-search unit 43, another sentence substitution determination unit 44, a meaningless word text set generation unit 45, and a sentence output. Means 46.

一文章抽出手段４１は、読み上げ用テキストセット生成部２０が生成した読上げ用テキストセット候補から一の読上げ用テキスト候補（以降、読上げ用は省略する）を抽出する（ステップＳ４１ａ）。 The single sentence extraction unit 41 extracts one reading text candidate (hereinafter, reading is omitted) from the reading text set candidate generated by the reading text set generation unit 20 (step S41a).

抽出された一のテキスト候補は読み誤りパターンマッチング手段４２に入力される。読み誤りパターンマッチング手段４２は、その一のテキスト候補に読み誤りパターンが含まれているか、読み誤りパターンデータベース５０を参照して判定する（ステップＳ４２）。一のテキスト候補に読み誤りパターンが含まれない場合（ステップＳ４２ｂのＮＯ）は、その一のテキスト候補を一のテキストとして文章出力手段４６から外部に出力させる（ステップＳ４６）。その出力は、メモリに当該一のテキストを記録しても良いし、図示しないプリンタ等で印刷させても良い。 The extracted one text candidate is input to the reading error pattern matching means 42. The reading error pattern matching unit 42 determines whether or not the one text candidate includes a reading error pattern with reference to the reading error pattern database 50 (step S42). When a reading error pattern is not included in one text candidate (NO in step S42b), the one text candidate is output to the outside from the sentence output means 46 as one text (step S46). The output may be recorded in the memory with the one text or may be printed by a printer (not shown).

当該一のテキスト候補に読み誤りパターンが含まれる場合（ステップＳ４２ｂのＹＥＳ）、読み誤りパターンマッチング手段４２は、データベース再検索手段４３に対してその一のテキスト候補に含まれる収録したい音素列を含む他のテキスト候補を再検索させる制御信号を出力する。データベース再検索手段４３が、その指示（制御信号）を受けると、大量日本語テキストデータベース３０から同一の音素列を含む他のテキスト候補を再検索する（ステップＳ４３）。この再検索は、大量日本語テキストデータベース３０の途中から開始される。つまり対象音素列が、例えば大量日本語テキストデータベース３０内の１００番目のテキストであったと仮定した場合、再検索は１０１番目のテキストから行われる。 When the reading error pattern is included in the one text candidate (YES in step S42b), the reading error pattern matching unit 42 includes the phoneme string to be recorded included in the one text candidate with respect to the database re-search unit 43. A control signal for re-searching other text candidates is output. Upon receiving the instruction (control signal), the database re-search means 43 re-searches other text candidates including the same phoneme string from the large-volume Japanese text database 30 (step S43). This re-search is started from the middle of the mass Japanese text database 30. That is, assuming that the target phoneme string is, for example, the 100th text in the mass Japanese text database 30, the re-search is performed from the 101st text.

再検索した他のテキスト候補は、読み誤りパターンマッチング手段４２に入力され、読み誤りパターンを含むか否か判定される。再検索したテキスト候補に読み誤りパターンが含まれない場合は、当該他のテキスト候補は一のテキストとして文章出力手段４６から出力される。 The re-searched other text candidates are input to the reading error pattern matching unit 42, and it is determined whether or not they include a reading error pattern. When the re-searched text candidate does not include a reading error pattern, the other text candidate is output from the sentence output means 46 as one text.

再検索した他のテキスト候補にも読み誤りパターンが含まれる場合、三度、データベース再検索手段４３は、他のテキスト候補を再検索する。この再検索の動作は、大量日本語テキストデータベース３０の記録する全てのテキストの参照が終了するまで繰り返される。 If another re-searched text candidate includes a reading error pattern, the database re-search means 43 re-searches the other text candidate again. This re-searching operation is repeated until the reference of all the texts recorded in the mass Japanese text database 30 is completed.

大量日本語テキストデータベース３０の記録する全てのテキストを参照しても、所望の音素列が抽出できない場合、無意味単語テキスト生成手段４５に無意味単語テキストの生成を指示する（ステップＳ４４のＮＯ）。無意味単語テキストとは、例えば、音素「ＳＥＳ」を探して見つからない場合に、「これはＳＥＳＵである。」と言った無意味単語を含むテキストのことである。 If a desired phoneme string cannot be extracted by referring to all the texts recorded in the mass Japanese text database 30, the meaningless word text generation means 45 is instructed to generate meaningless word text (NO in step S44). . The meaningless word text is, for example, text including a meaningless word such as “This is SESU” when the phoneme “SES” is not found.

無意味単語テキスト生成手段４５は、収録したい音素列を含む無意味単語テキストである一のテキストを生成して文章出力手段４６から外部に出力させる（ステップＳ４６）。以上述べたステップＳ４１〜ステップＳ４６の動作は、複数の読上げ用テキストセット候補の全てについて終了するまで繰り返される（ステップＳ４１ｂのＹＥＳ）。 The meaningless word text generation means 45 generates one text which is a meaningless word text including the phoneme string to be recorded and outputs it to the outside from the sentence output means 46 (step S46). The operations in steps S41 to S46 described above are repeated until completion for all of the plurality of text set candidates for reading (YES in step S41b).

なお、読み誤りパターンマッチング（ステップＳ４２ａ）を、単語の一致を検出する方法で説明したが、一定の規則を適用して読み誤りパターンを検出するようにしても良い。例えば、母音の無声化は、無声子音又は文末記号に挟まれた「Ｉ」、「Ｕ」で発生することが知られている。この無声子音又は文末記号に挟まれた「Ｉ」、「Ｕ」を規則として、その規則に一のテキストセット候補が合致すれば、パターン一致（ステップＳ４２ｂのＹＥＳ）としても良い。 Although the reading error pattern matching (step S42a) has been described by the method of detecting word matching, a reading error pattern may be detected by applying a certain rule. For example, vowel devoicing is known to occur at “I” and “U” sandwiched between unvoiced consonants or sentence ending symbols. If “I” and “U” sandwiched between these unvoiced consonants or sentence ending symbols are used as a rule, and if one text set candidate matches the rule, pattern matching (YES in step S42b) may be performed.

このような規則を用いることで、読み誤りパターンデータベース５０に記録するデータ量を削減することが出来ると共に、読み誤りパターンの検出精度の向上も図れる。なお、読み誤りパターンデータベース５０には、単語と規則の両方を記録するようにしても良い。 By using such a rule, the amount of data recorded in the reading error pattern database 50 can be reduced, and the detection accuracy of the reading error pattern can be improved. Note that both the words and the rules may be recorded in the reading error pattern database 50.

実施例１では、一旦、複数の読み上げ用テキストセット候補を作成したに後に、その複数の読み上げ用テキストセット候補から読み誤りが生じ易いテキストを削除する例を説明した。逆に、読み誤り易いテキストを含まない改修版大量日本語テキストデータベース３０′を用いて読上げ用テキストセットを作成するようにしても良い。 In the first embodiment, an example has been described in which, after a plurality of reading text set candidates are once created, text that is likely to cause reading errors is deleted from the plurality of reading text set candidates. Conversely, a text set for reading may be created using a modified large-volume Japanese text database 30 'that does not include text that is easily misread.

図５に、改修版大量日本語テキストデータベース３０′を作成する手順を示す。一文章抽出部５１が、大量日本語テキストデータベース３０から所望の音素列を含む一のテキストを抽出する（ステップＳ５１）。読み誤りパターンマッチング部５３が、一のテキストに読み誤りパターンが含まれないか、読み誤りパターンデータベース５０に記録された読み誤りパターンとマッチングを取る（ステップＳ５３）。読み誤りパターンと一致しない場合（ステップＳ５４のＮＯ）、テキスト登録部５５は当該一のテキストを改修版大量日本語テキストデータベース３０′に記録する（ステップＳ５５）。なお、一文章抽出部５１、読み誤りパターンマッチング部５３、テキスト登録部５５の図示は省略する。 FIG. 5 shows a procedure for creating a modified large-volume Japanese text database 30 '. The single sentence extraction unit 51 extracts one text including a desired phoneme sequence from the large-volume Japanese text database 30 (step S51). The reading error pattern matching unit 53 performs matching with the reading error pattern recorded in the reading error pattern database 50 to determine whether or not the reading error pattern is included in one text (step S53). If it does not match the reading error pattern (NO in step S54), the text registration unit 55 records the one text in the modified large-volume Japanese text database 30 '(step S55). Note that the illustration of the single sentence extraction unit 51, the reading error pattern matching unit 53, and the text registration unit 55 is omitted.

ステップＳ５１〜Ｓ５５の動作は、大量日本語テキストデータベース３０の最後の文まで行って抽出する文が無くなるまで繰り返される（ステップＳ５２のＹＥＳ）。なお、図５ではステップＳ５３を読み誤りパターンマッチング過程と称しているが、この過程は、読上げ用テキストセット作成装置１００の読み誤りパターン削除４０で行われる処理と実質的に同じものである。 The operations in steps S51 to S55 are repeated until the last sentence in the mass Japanese text database 30 is extracted and there are no more sentences to be extracted (YES in step S52). In FIG. 5, step S53 is referred to as a reading error pattern matching process, but this process is substantially the same as the process performed in the reading error pattern deletion 40 of the text set creation apparatus 100 for reading.

以上の手順によって、読み誤り易いテキストセットを含まない改修版大量日本語テキストデータベース３０′が作成できる。この改修版大量日本語テキストデータベース３０′を用いれば、従来のテキスト選択装置６００でもリテイクの発生頻度を最小限にすることが出来る読上げ用テキストを作成することが出来る。 Through the above procedure, a modified large-volume Japanese text database 30 'that does not include a text set that is easy to read can be created. By using this modified version of the large-scale Japanese text database 30 ', it is possible to create a text for reading that can minimize the occurrence frequency of retake even with the conventional text selection device 600.

この発明の読上げ用テキストセット作成方法とその装置によれば、読み誤り易い文章を含まない読上げ用テキストが作成できる。この発明の方法で作成した読上げ用テキストは、素人でもリテイクの発生頻度を最小限にすることができ、低コストで且つ多様な音声素片データベースの構築に寄与する。 According to the method and apparatus for creating a text set for reading according to the present invention, it is possible to create a text for reading that does not include a sentence that is easy to read. The reading text created by the method of the present invention can minimize the frequency of occurrence of retake even by an amateur, and contributes to the construction of a low-cost and diverse speech segment database.

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行され
るのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A reading text set generation unit that extracts a plurality of reading text set candidates including individual phoneme strings from a large amount of Japanese text database by inputting a plurality of phoneme strings to be recorded as input,
The reading error pattern deletion unit receives the plurality of reading text set candidates as input, determines whether or not each reading text candidate includes a reading error pattern, and reads the reading error pattern database. If it contains, another text candidate containing the same phoneme string is re-searched from the large Japanese text database, and if it does not contain a reading error pattern, the other text candidate is output as one text. Process,
To create a text set for reading.

In the reading text set creation method according to claim 1,
The reading error pattern deletion process is as follows:
A sentence extraction means for extracting one text candidate from the plurality of reading text set candidates,
The reading error pattern matching means determines whether or not the one text candidate includes a reading error pattern by referring to the reading error pattern database, and if the reading error pattern does not include the one text candidate, A reading error pattern matching step for outputting as text and re-searching other text candidates including the same phoneme sequence from the mass Japanese text database when the reading error pattern is included,
A database re-searching step for performing the re-searching and inputting the re-searched other text candidates to the reading error pattern matching unit;
The alternative sentence substitution determination means determines whether or not the target phoneme string can be replaced with the output text when the target phoneme string cannot be searched again in the database re-search step. Another text alternative determination step for instructing generation of a meaningless word to the meaningless word text generation unit when the output text is set as one text corresponding to the target phoneme string and it is determined that substitution is impossible,
A meaningless word text generation means for generating one text in which the target phoneme string is replaced with a meaningless word;
A sentence output means for outputting the one text, and a sentence output step;
A method for creating a text set for reading.

In the reading text set creation method according to claim 1 or 2,
The reading error pattern database is a register of easily misread words and / or reading error rules, and the reading error pattern deletion process is performed by adding the reading error pattern database to words constituting the one text set candidate. Determining whether or not the read error word or the read error rule is included, and outputting the one text set candidate as one text when the read error word is not included. How to create a text set for reading.

A sentence extraction process for extracting one text including a desired phoneme sequence from a large amount of Japanese text database;
A reading error pattern matching process in which the reading error pattern is not included in the one text or the reading error pattern recorded in the reading error pattern database is matched;
A text registration process in which the one text is recorded in the modified mass Japanese text database when the one text does not match the reading error pattern;
A large-scale Japanese text database repair method including

A text set generator for reading, which extracts a plurality of text set candidates for reading including individual phoneme strings from a large amount of Japanese text database by inputting a plurality of phoneme strings to be recorded,
Using the plurality of reading text set candidates as input, whether or not each reading text set includes a reading error pattern is determined by referring to the reading error pattern database. Re-search other text candidates including a column from the above-mentioned large-scale Japanese text database, and if it does not include a reading error pattern, a reading error pattern deletion unit that outputs the other text candidates as one text,
A text set creation device for reading.

In the reading text set creation device according to claim 5,
The reading error pattern deletion unit
One sentence extracting means for extracting one text candidate from the plurality of reading text set candidates,
It is determined whether the one text candidate contains a reading error pattern by referring to the reading error pattern database. If the reading error pattern is not included, the one text candidate is output as one text. A reading error pattern matching means for re-searching other text candidates including the same phoneme sequence from the mass Japanese text database when including a pattern;
Database re-search means for performing the re-search and inputting the re-searched other text candidates to the reading error pattern matching means;
When it is impossible to re-search the target phoneme string in the database re-search step, it is determined whether or not the target phoneme string can be replaced with the output text. One text corresponding to the phoneme string, and another sentence substitution determining means that instructs the meaningless word text generating means to generate a meaningless word when it is determined that substitution is impossible,
Meaningless word text generation means for generating one text in which the target phoneme sequence is replaced with a meaningless word;
Sentence output means for outputting the one text, and
An apparatus for creating a text set for reading.

In the reading text set creation device according to claim 5 or 6,
The reading text set creation apparatus, wherein the reading error pattern database is registered with words and / or reading error rules that are easily misread.

A sentence extractor for extracting a single text including a desired phoneme sequence from a large amount of Japanese text database;
A reading error pattern matching unit in which the reading error pattern is not included in the one text or the reading error pattern recorded in the reading error pattern database is matched;
A text registration unit in which the one text is recorded in the modified large-volume Japanese text database when the one text does not match the reading error pattern;
Mass Japanese text database repair device equipped with

A program for causing a computer to execute the reading text set creation method according to any one of claims 1 to 3 or the large-scale Japanese text database modification method according to claim 4.