JP2008203717A

JP2008203717A - Text sentence selecting method for corpus-based speech synthesis, and program thereof and device thereof

Info

Publication number: JP2008203717A
Application number: JP2007041909A
Authority: JP
Inventors: Takashi Miki; 敬三木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2007-02-22
Filing date: 2007-02-22
Publication date: 2008-09-04

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a text sentence selecting method for corpus-based speech synthesis which evaluates a quality index of a text sentence set while searching for suitable text sentences from an expanded range, and is small in computational amount. <P>SOLUTION: The text sentence selecting method for corpus-based speech synthesis includes: a step of providing a thesaurus dictionary 110 containing synonyms of an object category of speech synthesis, selecting one arbitrary sentence from a text corpus, and selecting a text sentence (selected sentence) such that when the text is added to a work text sentence set, the corpus quality becomes maximum; a step of generating a similar sentence group by substituting synonyms that the thesaurus dictionary holds for words and phrases constituting the selected sentence; a re-decision step of selecting one arbitrary sentence from a group of sentences similar to the selected sentence and finding a text sentence (best sentence) such that when the sentence is added to the work text sentence set, the corpus quality becomes maximum; and a step of adding the best sentence obtained in the re-decision step to the work text sentence set. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、コーパスベース音声合成を行う際に用いる好適なテキスト文セットをテキストコーパスから選択する方法、そのプログラム、及びその装置に関するものである。 The present invention relates to a method for selecting a suitable text sentence set used in corpus-based speech synthesis from a text corpus, a program therefor, and an apparatus therefor.

従来、『文の１部を加工することにより頻出する可変単語を効率良く収集することができる文セットを自動的に生成する文セット自動生成方法、装置、プログラムおよびその記憶媒体を提供する。』ことを目的とした技術として、『選択テキストの候補となる特定タスクのテキストがタスク文コーパス記憶部１に格納されたタスク文コーパスと、当該特定タスクに特有な単語の単語リスト２を用い、単語リスト２中の単語がタスク文コーパスに出現する頻度を求め、タスク文コーパス中の当該単語部分を単語シンボルに置換してシンボル文コーパスを求め、シンボル文コーパスから候補テキストの組み合わせをシンボルテキストセットとして選択し、テキストセットに含まれる単語シンボル部分に対して、単語シンボルのテキストセット中における出現順序に対応して、単語の出現頻度順に、頻度上位の単語から順に埋め込む文セット自動生成方法。』というものが提案されている（特許文献１）。
また、『音声合成に用いる音声データベース作成に必要とする文（テキスト）の読上げ用文セットを効率的に生成する。』ことを目的とした技術として、『文コーパス中の各文について、３連鎖音韻を基本単位とした基本素片の出現率の合計値をスコアとして計算する（ステップ１−１）。既に文セットに格納した基本素片を除き、かつ同一文中の複数の同一形態の基本素片の重複分はスコア計算に含めない。最高値スコアの文が１つであれば（ステップ１−４）これを文セットに格納、複数であればその複数文について、環境付き音節や環境付き形態素などの拡張素片単位とした各種の拡張素片の出現率の合計値をスコアとして先と同様な手法で計算する。同一文について各種の拡張素片スコアを荷重加算して複合スコアとし、複合スコアの最高値の文を文セットに格納する。荷重加算の重み係数の設定によりタスク依存度を高めることができる。』というものが提案されている（特許文献２）。 Conventionally, “a sentence set automatic generation method, apparatus, program, and storage medium for automatically generating a sentence set that can efficiently collect variable words that frequently appear by processing a part of a sentence are provided. As a technique for the purpose of "using a task sentence corpus in which the text of a specific task that is a candidate for the selected text is stored in the task sentence corpus storage unit 1, and a word list 2 of words specific to the specific task, The frequency of occurrence of a word in the word list 2 in the task sentence corpus is obtained, the corresponding word portion in the task sentence corpus is replaced with a word symbol, a symbol sentence corpus is obtained, and a combination of candidate texts from the symbol sentence corpus is set as a symbol text set. The sentence set automatic generation method which embeds the word symbol part included in the text set in the order of the appearance frequency of the words in the order of the appearance frequency of the words corresponding to the appearance order of the word symbols in the text set. Is proposed (Patent Document 1).
In addition, “a sentence set for reading a sentence (text) necessary for creating a speech database used for speech synthesis is efficiently generated. As a technique for the purpose of the above, “for each sentence in the sentence corpus, the total value of the appearance rate of basic segments with the three-chain phoneme as a basic unit is calculated as a score (step 1-1). Except for the basic segments already stored in the sentence set, the duplicates of a plurality of basic segments of the same form in the same sentence are not included in the score calculation. If there is one sentence with the highest score (step 1-4), this is stored in the sentence set, and if there are more than one sentence, various sentences in units of extended segments, such as syllables with environment and morphemes with environment, are used. The total value of the appearance rate of the extended segment is calculated as a score by the same method as described above. Various extended segment scores for the same sentence are weighted to form a composite score, and the sentence with the highest composite score is stored in the sentence set. The task dependency can be increased by setting the weighting coefficient for load addition. Is proposed (Patent Document 2).

特開２００４−３４７９５５号公報（要約）JP 2004-347955 A (summary) 特開２００４−２４６１４０号公報（要約）JP 2004-246140 A (summary)

上記特許文献１に記載の技術では、タスク文コーパスに格納されているテキスト文の一部を単語リストで置換することにより、好適なテキスト文を探索する範囲を拡張している。しかし、探索範囲の拡張に際し、テキスト文セット全体の音韻バランスや音韻出現数といった、テキスト文セットの品質指標を考慮していないという課題があった。
一方、上記特許文献２に記載の技術は、テキスト文セットの品質指標をスコア計算により評価しているが、特許文献１のような探索範囲の拡張は行っていない。 In the technique described in Patent Document 1, a range of searching for a suitable text sentence is expanded by replacing a part of the text sentence stored in the task sentence corpus with a word list. However, when expanding the search range, there has been a problem that the quality index of the text sentence set such as the phoneme balance of the entire text sentence set and the number of phoneme appearances is not considered.
On the other hand, the technique described in Patent Document 2 evaluates the quality index of the text sentence set by score calculation, but does not extend the search range as in Patent Document 1.

そこで、特許文献１に記載されているような拡張された探索範囲に対して、特許文献２に記載されているようなテキスト文セットの品質指標評価を行うことが考えられる。
しかしながら、拡張された探索範囲は、入れ替え可能な単語数と入れ替え単語区間の組み合わせ分だけ増大する。例えば、置換単語数＝１０、単語区間＝３の場合、拡張前の１個のテキスト文に対して、１０×１０×１０＝１０００個の拡張テキスト文セットが出来ることとなる。この場合、拡張前のテキストコーパス母集団が１０万文であれば、拡張後のテキスト文セットは１０億文となる。 Therefore, it is conceivable to perform a quality index evaluation of a text sentence set as described in Patent Document 2 for an extended search range as described in Patent Document 1.
However, the expanded search range is increased by a combination of the number of replaceable words and the replacement word section. For example, when the number of replacement words = 10 and the word interval = 3, 10 × 10 × 10 = 1000 extended text sentence sets can be created for one text sentence before expansion. In this case, if the text corpus population before expansion is 100,000 sentences, the text sentence set after expansion will be 1 billion sentences.

このような膨大な数のテキスト文セット全てに対し、品質指標の評価を行い、好適なテキスト文セット（例えば１０００個）を選択する処理は、非常に多くの演算量を必要とする。
そのため、拡張された探索範囲から好適なテキスト文を探索しつつ、テキスト文セットの品質指標を評価し、かつ演算量の少ないコーパスベース音声合成のためのテキスト文セット選択方法、そのプログラム、及びその装置が望まれていた。 The process of evaluating the quality index for all such a large number of text sentence sets and selecting a suitable text sentence set (for example, 1000) requires a very large amount of calculation.
Therefore, a text sentence set selection method for corpus-based speech synthesis, which evaluates the quality index of a text sentence set while searching for a suitable text sentence from the expanded search range, and has a small amount of computation, its program, and A device was desired.

本発明に係るコーパスベース音声合成のためのテキスト文セット選択方法は、
コーパスベース音声合成のためのテキスト文セットをテキストコーパスから選択する方法であって、
音声合成の対象カテゴリにおける類義語を保持するシソーラス辞書を格納した記憶手段を設け、
最初に、作業テキスト文セットを空の状態にする初期化ステップと、
テキストコーパスから任意の１文を選び、その文を作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、選択文と呼ぶ）を前記テキストコーパスから選択する選択ステップと、
前記選択ステップにおいて選択文を構成する語句に対して、前記シソーラス辞書が保持する類義語で置換することで類似文群を作成するシソーラス置換ステップと、
前記選択文と類似文群の中から、任意の１文を選び、その文を作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、最良文と呼ぶ）を求める再判定ステップと、
前記再判定ステップにより得られた最良文を作業テキスト文セットに追加する文追加ステップと、
追加された作業テキスト文セットのコーパス品質がコーパス設計条件を満たすまで、選択ステップから文追加ステップまでを繰り返す、設計条件判断ステップ
を有することを特徴とするものである。 A text sentence set selection method for corpus-based speech synthesis according to the present invention includes:
A method for selecting a text sentence set for corpus-based speech synthesis from a text corpus,
A storage means for storing a thesaurus that holds synonyms in the target category of speech synthesis;
First, an initialization step to empty the working text statement set,
A selection step of selecting an arbitrary sentence from the text corpus and selecting from the text corpus a text sentence (hereinafter referred to as a selected sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set;
A thesaurus replacement step of creating a similar sentence group by replacing a phrase constituting the selected sentence in the selection step with a synonym held by the thesaurus dictionary;
Redetermination for selecting a text sentence (hereinafter referred to as the best sentence) that maximizes the corpus quality when an arbitrary sentence is selected from the selected sentence and similar sentence groups and the sentence is added to the working text sentence set. Steps,
A sentence adding step of adding the best sentence obtained by the redetermination step to the working text sentence set;
It has a design condition judging step that repeats from the selection step to the sentence addition step until the corpus quality of the added work text sentence set satisfies the corpus design condition.

本発明に係るコーパスベース音声合成のためのテキスト文セット選択方法によれば、好適なテキスト文を探索する範囲を広範に保ちつつ、テキスト文セットの品質指標を評価し、かつ演算量を削減することができるので、探索時間の短縮や、装置の小型化、小メモリ化、低コスト化に資する。 According to the text sentence set selection method for corpus-based speech synthesis according to the present invention, the quality index of the text sentence set is evaluated and the amount of calculation is reduced while maintaining a wide range for searching for a suitable text sentence. Therefore, it contributes to shortening of the search time, downsizing of the apparatus, reduction of memory, and cost reduction.

実施の形態１．
図１は、本発明の実施の形態１に係るコーパスベース音声合成のためのテキスト文セット選択装置１００の機能ブロック図である。
テキスト文セット選択装置１００は、シソーラス辞書１１０、作業テキスト文セット記録部１２０、初期化部１３０、選択部１４０、シソーラス置換部１５０、再判定部１６０、文追加部１７０、設計条件判断部１８０を備える。
シソーラス辞書１１０は、音声合成の対象カテゴリにおける類義語のリストを保持している。
作業テキスト文セット記録部１２０は、テキスト文セット選択処理中の途中段階での作業テキスト文セットを記憶する。
初期化部１３０は、テキスト文セット選択処理開始時に作業テキスト文セット記録部１２０の作業テキスト文セットを空にする。
選択部１４０は、テキストコーパスから任意の１文を選び、その文を作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、選択文と呼ぶ）を前記テキストコーパスから選択する。
シソーラス置換部１５０は、前記選択部１４０で選ばれた選択文を構成する語句に対して、前記シソーラス辞書１１０が保持する類義語で置換することで類似文群を作成する。
再判定部１６０は、前記選択文と前期類似文群の中から、任意の１文を選び、その文を作業テキスト文セット記録部１２０に記録されている作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、最良文と呼ぶ）を求める。
文追加部１７０は、前記再判定部１６０で得られた最良文を、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットに追加する。
設計条件判断部１８０は、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットのコーパス品質が、コーパス設計条件を満たすまで選択ステップから文追加ステップまでを繰り返す。一方、コーパス設計品質が満たされた場合、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットをテキスト文セットとして出力する。
テキストコーパス２００は、コーパスベース音声合成のためのテキスト文セットの基礎となる、テキスト文の母集合を格納している。 Embodiment 1 FIG.
FIG. 1 is a functional block diagram of a text sentence set selection apparatus 100 for corpus-based speech synthesis according to Embodiment 1 of the present invention.
The text sentence set selection device 100 includes a thesaurus dictionary 110, a work text sentence set recording unit 120, an initialization unit 130, a selection unit 140, a thesaurus replacement unit 150, a re-determination unit 160, a sentence addition unit 170, and a design condition determination unit 180. Prepare.
The thesaurus dictionary 110 holds a list of synonyms in the target category of speech synthesis.
The working text sentence set recording unit 120 stores a working text sentence set at an intermediate stage during the text sentence set selection process.
The initialization unit 130 empties the work text sentence set in the work text sentence set recording unit 120 at the start of the text sentence set selection process.
The selection unit 140 selects an arbitrary sentence from the text corpus, and selects a text sentence (hereinafter referred to as a selected sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set from the text corpus. .
The thesaurus replacement unit 150 creates a similar sentence group by replacing the words constituting the selected sentence selected by the selection unit 140 with synonyms held in the thesaurus dictionary 110.
The re-determination unit 160 selects an arbitrary sentence from the selected sentence and the previous similar sentence group, and adds the sentence to the work text sentence set recorded in the work text sentence set recording unit 120. A text sentence with the highest quality (hereinafter referred to as the best sentence) is obtained.
The sentence adding unit 170 adds the best sentence obtained by the re-determination unit 160 to the work text sentence set recorded in the work text sentence set recording unit 120.
The design condition determination unit 180 repeats from the selection step to the sentence addition step until the corpus quality of the work text sentence set recorded in the work text sentence set recording unit 120 satisfies the corpus design condition. On the other hand, when the corpus design quality is satisfied, the work text sentence set recorded in the work text sentence set recording unit 120 is output as a text sentence set.
The text corpus 200 stores a text sentence mother set that is a basis of a text sentence set for corpus-based speech synthesis.

作業テキスト文セット記録部１２０、初期化部１３０、選択部１４０、シソーラス置換部１５０、再判定部１６０、文追加部１７０、設計条件判断部１８０は、回路デバイスのようなハードウェアで構成することもできるし、マイコンやＣＰＵのような演算装置上で実行されるソフトウェアとして構成することもできる。
シソーラス辞書１１０の構成例としては、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）のような比較的容量の大きい記憶装置に、類義語リストを記録したファイルを格納することなどが考えられる。なお、シソーラス辞書１１０は、テキスト文セット選択装置１００の外部に設けてもよい。
テキストコーパス２００についても、同様にＨＤＤのような比較的容量の大きい記憶装置に、テキスト文を記録したファイル等を格納することにより構成できる。 The work text sentence set recording unit 120, the initialization unit 130, the selection unit 140, the thesaurus replacement unit 150, the re-determination unit 160, the sentence addition unit 170, and the design condition determination unit 180 are configured by hardware such as a circuit device. It can also be configured as software executed on an arithmetic device such as a microcomputer or CPU.
As a configuration example of the thesaurus dictionary 110, it is conceivable to store a file in which a synonym list is recorded in a storage device having a relatively large capacity such as an HDD (Hard Disk Drive). The thesaurus dictionary 110 may be provided outside the text sentence set selection device 100.
Similarly, the text corpus 200 can be configured by storing a file in which a text sentence is recorded in a storage device having a relatively large capacity such as an HDD.

図２は、本実施の形態１に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。
図２において、テキストコーパス２００は、小円で表されるテキスト文の集合を格納している。また、各テキスト文を構成する語句を、シソーラス辞書１１０が格納している類義語で変換することにより得られるテキスト文の範囲は、破線の大円で表している。 FIG. 2 conceptually illustrates the operation of the text sentence set selection apparatus 100 according to the first embodiment.
In FIG. 2, a text corpus 200 stores a set of text sentences represented by small circles. Further, the range of the text sentence obtained by converting the phrases constituting each text sentence with the synonyms stored in the thesaurus dictionary 110 is represented by a broken-line great circle.

先に述べたように、テキストコーパス２００が格納しているテキスト文の母集合から、コーパスベース音声合成に好適なテキスト文セットを選択する際に、シソーラス辞書１１０による類義語変換後のテキスト文までをも探索範囲とすると、演算量が膨大となってしまう可能性がある。
そこで、本実施の形態１においては、シソーラス辞書１１０による類義語変換前の段階で、最も好適なテキスト文をあらかじめ選択し、そのテキスト文をシソーラス辞書１１０により類義語変換した範囲のみを探索範囲とする。
このようにすることで、シソーラス辞書１１０による類義語変換後の全範囲を探索範囲とする場合に比較して大幅に演算量を削減できる。また、シソーラス辞書１１０による類義語変換前の段階で最も好適なテキスト文を選択しているため、その周辺にも好適なテキスト文が存在しているものと考えられ、一定の品質が確保される。 As described above, when selecting a text sentence set suitable for corpus-based speech synthesis from a set of text sentences stored in the text corpus 200, the text sentences after synonym conversion by the thesaurus dictionary 110 are processed. If the search range is also used, the amount of calculation may become enormous.
Therefore, in the first embodiment, the most suitable text sentence is selected in advance before the synonym conversion by the thesaurus dictionary 110, and only the range in which the text sentence is synonymous converted by the thesaurus dictionary 110 is set as the search range.
By doing in this way, the amount of calculation can be reduced significantly compared with the case where the entire range after synonym conversion by the thesaurus dictionary 110 is used as the search range. In addition, since the most suitable text sentence is selected at the stage before synonym conversion by the thesaurus dictionary 110, it is considered that a suitable text sentence exists in the vicinity, and a certain quality is ensured.

ここで、動作フローの説明に先立ち、テキスト文セットの品質指標について補足説明しておく。
ここで用いられるテキスト文セットの品質指標として、テキスト文中に含まれる音素単位（例えば、母音／ａ／、／ｉ／、／ｕ／、／ｅ／、／ｏ／）毎の個数やその被覆率、連鎖音韻単位（例えば、／ａａ／、／ａｉ／）毎の個数やその被覆率、などの音響的パフォーマンス量が考えられる。
さらには、特許文献１に示すように、複雑な階層的定義に基づいた被覆率を用いてもよいし、その他好適な音響的パフォーマンス量や言語的パフォーマンス量を併用することもできる。 Here, prior to the description of the operation flow, a supplementary explanation will be given regarding the quality index of the text sentence set.
As a quality index of the text sentence set used here, the number and coverage of each phoneme unit (for example, vowel / a /, / i /, / u /, / e /, / o /) included in the text sentence The number of acoustic performances such as the number of the chain phoneme units (for example, / aa /, / ai /) and the coverage thereof can be considered.
Furthermore, as shown in Patent Document 1, a coverage based on a complicated hierarchical definition may be used, or other suitable acoustic performance amount and linguistic performance amount may be used in combination.

図３は、本実施の形態１に係るテキスト文セット選択装置１００の動作フローである。以下、ステップ毎に説明する。 FIG. 3 is an operation flow of the text sentence set selection apparatus 100 according to the first embodiment. Hereinafter, each step will be described.

（Ｓ３３０）初期化ステップ
初期化部１３０は、テキスト文セット選択処理開始時に作業テキスト文セット記録部１２０の作業テキスト文セットを空にする。
（Ｓ３４０）選択ステップ
選択部１４０は、テキストコーパスから任意の１文を選び、その文を作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、選択文と呼ぶ）を前記テキストコーパスから選択する。
（Ｓ３５０）シソーラス置換ステップ
置換部１５０は、前記選択部１４０で選ばれた選択文を構成する語句に対して、前記シソーラス辞書１１０が保持する類義語で置換することで類似文群を作成する。
（Ｓ３６０）再判定ステップ
再判定部１６０は、前記選択文と前期類似文群の中から、任意の１文を選び、その文を作業テキスト文セット記録部１２０に記録されている作業テキスト文セットに追加した場合のコーパス品質が最大となるテキスト文（以下、最良文と呼ぶ）を求める。
（Ｓ３７０）文追加ステップ
文追加部１７０は、前記再判定部１６０で得られた最良文を、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットに追加する。
（Ｓ３８０）設計条件判断ステップ
設計条件判断部１８０は、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットのコーパス品質が、コーパス設計条件を満たすまで選択ステップから文追加ステップまでを繰り返す。一方、コーパス設計品質が満たされた場合、作業テキスト文セット記録部１２０に記録されている作業テキスト文セットをテキスト文セットとして出力する。 (S330) Initialization Step The initialization unit 130 empties the work text sentence set in the work text sentence set recording unit 120 at the start of the text sentence set selection process.
(S340) Selection Step The selection unit 140 selects an arbitrary sentence from the text corpus and selects a text sentence (hereinafter referred to as a selected sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set. Select from a text corpus.
(S350) Thesaurus replacement step The replacement unit 150 creates a similar sentence group by replacing the words constituting the selected sentence selected by the selection unit 140 with synonyms held by the thesaurus dictionary 110.
(S360) Re-determination step The re-determination unit 160 selects any one sentence from the selected sentence and the previous similar sentence group, and the work text sentence set recorded in the work text sentence set recording unit 120 is selected. The text sentence (hereinafter referred to as the best sentence) that maximizes the corpus quality when it is added to is obtained.
(S370) Sentence adding step The sentence adding unit 170 adds the best sentence obtained by the re-determination unit 160 to the work text sentence set recorded in the work text sentence set recording unit 120.
(S380) Design Condition Determination Step The design condition determination unit 180 repeats the selection step to the statement addition step until the corpus quality of the work text statement set recorded in the work text statement set recording unit 120 satisfies the corpus design condition. . On the other hand, when the corpus design quality is satisfied, the work text sentence set recorded in the work text sentence set recording unit 120 is output as a text sentence set.

なお、テキスト文セットの出力先は、テキスト文セット選択装置１００の外に設けられた記憶装置、ネットワークインターフェース、別途設けたメモリ領域、などとすることができる。また、出力形式は、テキスト文データそのものでもよいし、ポインタ情報のみでもよい。 The output destination of the text sentence set may be a storage device provided outside the text sentence set selection device 100, a network interface, a memory area provided separately, and the like. The output format may be text data itself or only pointer information.

以上のように、本実施の形態１によれば、シソーラス辞書１１０によりテキスト文を変換する前に、あらかじめ１個の好適なテキスト文を探索しておき、その１文のみについてシソーラス１１０による類義語変換を行って探索するので、シソーラス辞書１１０による変換後の全範囲を探索範囲とする場合に比較して大幅に演算量を削減できる。 As described above, according to the first embodiment, before a text sentence is converted by the thesaurus dictionary 110, one suitable text sentence is searched in advance, and synonym conversion by the thesaurus 110 is performed for only that one sentence. Therefore, the amount of calculation can be greatly reduced as compared with the case where the entire range after conversion by the thesaurus dictionary 110 is set as the search range.

実施の形態２．
本発明の実施の形態２では、実施の形態１の演算量をさらに削減することのできる動作例について説明する。なお、テキスト文セット選択装置１００の構成は、実施の形態１で説明した図１と同様であるため、説明を省略する。 Embodiment 2. FIG.
In the second embodiment of the present invention, an operation example that can further reduce the amount of calculation of the first embodiment will be described. Note that the configuration of the text sentence set selection apparatus 100 is the same as that of FIG.

図４は、本実施の形態２に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。本実施の形態２においては、テキスト文をシソーラス辞書１１０で類義語変換して得られる全ての範囲を探索範囲とするのではなく、その１部のみを探索範囲とする。
図４において、三角形で表されているのは、破線の大円で表される領域のうち、探索対象とするテキスト文を示すものである。実施の形態１で説明した図２では、破線の大円で表される領域内の全てを探索範囲としていたが、本実施の形態２では、図４の三角形に示すように、さらに探索範囲を絞り込む。 FIG. 4 conceptually illustrates the operation of the text sentence set selection apparatus 100 according to the second embodiment. In the second embodiment, the search range is not the entire range obtained by synonymous conversion of the text sentence with the thesaurus dictionary 110, but only a part of the range is set as the search range.
In FIG. 4, what is represented by a triangle indicates a text sentence to be searched out of an area represented by a broken-line great circle. In FIG. 2 described in the first embodiment, the entire search area within the area represented by the broken circle is the search range. However, in the second embodiment, as shown by the triangle in FIG. Narrow down.

図５は、図４で説明した概念図の具体例である。
シソーラス辞書１１０を用いてテキスト文を変換する際には、まず元のテキスト文を構成する語句のうち、変換対象を決定する。例えば、変換前の原文が「この発明は、テキスト文を選択する方法に関するものである。」となっている場合、これらのうち「テキスト文」等の名詞語句や、「関する」等の動詞語句が主な変換対象となる。
実施の形態１では、これらの全てを変換対象としていたため、例えば変換対象の音節部分が６個あり、それぞれの変換対象部分についての類義語が１０個ずつ存在する場合、原文を変換して派生するテキスト文は、１０の６乗個存在し得ることになる。 FIG. 5 is a specific example of the conceptual diagram described in FIG.
When converting a text sentence using the thesaurus dictionary 110, first, a conversion target is determined from words or phrases constituting the original text sentence. For example, when the original sentence before conversion is “this invention relates to a method for selecting a text sentence”, a noun phrase such as “text sentence” or a verb phrase such as “related” among them. Is the main conversion target.
In Embodiment 1, since all of these are conversion targets, for example, when there are six syllable parts to be converted and there are ten synonyms for each conversion target part, the original text is converted and derived. There can be 10 6 text sentences.

一方、本実施の形態２では、名詞語句のみを変換対象とする。そのため、派生するテキスト文はその分少なくなる。例えば、変換対象の音節部分６個のうち、名詞語句が３個である場合は、派生するテキスト文の数は半分以下になる。
なお、名詞語句を変換対象とするのは、テキスト文のバリエーションを作成するのに好適であるからであるが、これに限る必要はなく、テキスト文やシソーラス辞書１１０の内容によっては動詞語句を変換対象としてもよい。 On the other hand, in the second embodiment, only noun words are converted. Therefore, there are fewer derived text sentences. For example, if there are three noun phrases in the six syllable parts to be converted, the number of derived text sentences is less than half.
The reason for converting the noun word / phrase is that it is suitable for creating a variation of the text sentence. However, the present invention is not limited to this. Depending on the text sentence or the contents of the thesaurus dictionary 110, the verb phrase may be converted. It may be a target.

本実施の形態２におけるテキスト文セット選択装置１００の動作フローは、実施の形態１の図３で説明したものと原則として同じである。ただし、図４〜図５で説明したように類義語変換による派生文の数が少なくなるため、ステップＳ３５０〜Ｓ３６０の演算量を削減することができる。
これにより、探索時間の短縮や、テキスト文セット選択装置１００の小型化、小メモリ化、低コスト化に資する。また、シソーラス辞書１１０が格納している類義語は、主に名詞語句に関するものが多いため、名詞語句を変換して派生文を得ることにより、テキスト文のバリエーションも十分に確保できる。 The operation flow of the text sentence set selection apparatus 100 in the second embodiment is basically the same as that described in FIG. 3 of the first embodiment. However, since the number of derived sentences by synonym conversion decreases as described with reference to FIGS. 4 to 5, the amount of calculation in steps S350 to S360 can be reduced.
This contributes to a reduction in search time and a reduction in the size, memory, and cost of the text sentence set selection apparatus 100. Moreover, since many synonyms stored in the thesaurus dictionary 110 are mainly related to noun phrases, it is possible to secure sufficient variations of text sentences by converting noun phrases and obtaining derived sentences.

実施の形態３．
図６は、本発明の実施の形態３に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。
実施の形態２では、名詞語句のみを類義語変換の対象としたが、本実施の形態３では、文字数もしくは音節数が最も長い語句を、類義語変換の対象とする。名詞語句や動詞語句などの全ての変換対象語句の中から最も長いものを対象としてもよいし、名詞語句のみの中から最も長いものを対象としてもよい。 Embodiment 3 FIG.
FIG. 6 conceptually illustrates the operation of the text sentence set selection apparatus 100 according to Embodiment 3 of the present invention.
In the second embodiment, only noun phrases are subject to synonym conversion, but in this third embodiment, the phrase having the longest number of characters or syllables is subject to synonym conversion. The longest of all the conversion target phrases such as noun phrases and verb phrases may be the target, or the longest of the noun phrases alone may be the target.

本実施の形態３におけるテキスト文セット選択装置１００の動作フローは、実施の形態１の図３で説明したものと原則として同じである。ただし、図６で説明したように類義語変換による派生文の数が大幅に少なくなるため、ステップＳ３５０〜Ｓ３６０の演算量を大幅に削減することができる。
これにより、探索時間の短縮や、テキスト文セット選択装置１００の小型化、小メモリ化、低コスト化に資する。また、音素環境などへの影響が相対的に大きい最長語句を変換対象として探索範囲を決定しているため、探索結果の品質も一定のレベルに保つことができる。 The operation flow of the text sentence set selection apparatus 100 in the third embodiment is basically the same as that described in FIG. 3 of the first embodiment. However, as described with reference to FIG. 6, the number of derived sentences by synonym conversion is significantly reduced, so that the amount of calculation in steps S350 to S360 can be greatly reduced.
This contributes to a reduction in search time and a reduction in the size, memory, and cost of the text sentence set selection apparatus 100. In addition, since the search range is determined with the longest word or phrase having a relatively large influence on the phoneme environment or the like as the conversion target, the quality of the search result can be maintained at a certain level.

実施の形態１に係るコーパスベース音声合成のためのテキスト文セット選択装置１００の機能ブロック図である。It is a functional block diagram of the text sentence set selection apparatus 100 for corpus-based speech synthesis according to the first embodiment. 実施の形態１に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。The operation | movement of the text sentence set selection apparatus 100 which concerns on Embodiment 1 is demonstrated notionally. 実施の形態１に係るテキスト文セット選択装置１００の動作フローである。It is an operation | movement flow of the text sentence set selection apparatus 100 which concerns on Embodiment 1. FIG. 実施の形態２に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。The operation | movement of the text sentence set selection apparatus 100 which concerns on Embodiment 2 is demonstrated notionally. 図４で説明した概念図の具体例である。FIG. 5 is a specific example of the conceptual diagram described in FIG. 4. 実施の形態３に係るテキスト文セット選択装置１００の動作を概念的に説明するものである。The operation | movement of the text sentence set selection apparatus 100 which concerns on Embodiment 3 is demonstrated notionally.

Explanation of symbols

１００テキスト文セット選択装置、１１０シソーラス辞書、１２０作業テキスト文セット記録部、１３０初期化部、１４０選択部、１５０シソーラス置換部、１６０再判定部、１７０文追加部、１８０設計条件判断部、２００テキストコーパス。 DESCRIPTION OF SYMBOLS 100 Text sentence set selection apparatus, 110 Thesaurus dictionary, 120 Work text sentence set recording part, 130 Initialization part, 140 Selection part, 150 Thesaurus replacement part, 160 Re-determination part, 170 sentence addition part, 180 Design condition judgment part, 200 Text corpus.

Claims

A method for selecting a text sentence set for corpus-based speech synthesis from a text corpus,
A storage means for storing a thesaurus that holds synonyms in the target category of speech synthesis;
An initialization step to empty the working text statement set first,
A selection step of selecting an arbitrary sentence from the text corpus and selecting from the text corpus a text sentence (hereinafter referred to as a selected sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set;
A thesaurus replacement step of creating a similar sentence group by replacing a phrase constituting the selected sentence in the selection step with a synonym held by the thesaurus dictionary; and
Re-determination step of selecting a text sentence (hereinafter referred to as the best sentence) that maximizes the corpus quality when an arbitrary sentence is selected from the selected sentence group and the similar sentence group and the sentence is added to the working text sentence set. When,
A sentence adding step of adding the best sentence obtained by the redetermination step to the working text sentence set;
Text sentence set selection for corpus-based speech synthesis characterized by having a design condition judgment step that repeats from the selection step to the sentence addition step until the corpus quality of the added work text sentence set satisfies the corpus design condition Method.

In the thesaurus replacement step,
The method for selecting a text sentence set for corpus-based speech synthesis according to claim 1, wherein only noun words or phrases are set as replacement targets among the phrases constituting the text sentence selected in the selection step.

In the thesaurus replacement step,
3. The corpus-based speech synthesis for corpus-based speech synthesis according to claim 1, wherein, among words and phrases constituting the text sentence selected in the selection step, those having the maximum number of characters or syllables are targeted for replacement. Text sentence set selection method.

A text sentence set selection program for corpus-based speech synthesis, which causes a computer to execute the text sentence set selection method for corpus-based speech synthesis according to any one of claims 1 to 3.

An apparatus for selecting a text sentence set for corpus-based speech synthesis from a text corpus,
Storage means for storing a thesaurus dictionary holding synonyms in the target category of speech synthesis;
First, an initialization part that makes the working text sentence set empty,
A selection unit that selects an arbitrary sentence from the text corpus and selects a text sentence (hereinafter referred to as a selected sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set;
A thesaurus replacement unit that creates a similar sentence group by replacing a phrase constituting the selected sentence with a synonym held by the thesaurus dictionary;
A re-determination unit that selects an arbitrary sentence from the selected sentence and a group of similar sentences and obtains a text sentence (hereinafter referred to as the best sentence) that maximizes the corpus quality when the sentence is added to the working text sentence set. When,
A sentence adding unit for adding the best sentence obtained by the re-determination unit to a working text sentence set;
A design condition determination unit that repeats the processing of the selection unit, the thesaurus replacement unit, the re-determination unit, and the sentence addition unit until the corpus quality of the added work text sentence set satisfies a corpus design condition. A text sentence set selection device for featured corpus-based speech synthesis.

The thesaurus replacement part is:
The text sentence set selection device for corpus-based speech synthesis according to claim 5, wherein only a noun word or phrase is a replacement target among words or phrases constituting the text sentence selected by the selection unit.

The thesaurus replacement part is:
7. The corpus-based speech synthesis for corpus-based speech synthesis according to claim 5, wherein, among the words and phrases constituting the text sentence selected by the selection unit, those having the maximum number of characters or syllables are to be replaced. Text sentence set selection device.