JP2004294639A

JP2004294639A - Text analyzing device for speech synthesis and speech synthesiser

Info

Publication number: JP2004294639A
Application number: JP2003084990A
Authority: JP
Inventors: Kazuto Kojiya; 和人糀谷; Yumi Saito; ゆみ齊藤; Yuji Hirayama; 裕司平山
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2003-03-26
Filing date: 2003-03-26
Publication date: 2004-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a text analyzing device for speech synthesis and a speech synthesiser that enable even a user who does not have technical knowledge of pronunciation and phonetics to easily specify a reading and can synthesize a speech with correct pronunciation and accent. <P>SOLUTION: Provided is a pronunciation symbol dictionary 36 in which KANA(Japanese syllabary) characters of readings and pronunciation symbols are made to correspond to each other. When a text to be processed which includes a reading-specified character string in KANA is inputted, a reading tag analysis part 30 extracts the reading-specified character string and KANA characters of a reading and passes them to a reading character string-to-pronunciation symbol conversion part 31. This conversion part 31 converts the KANA characters of the reading into pronunciation symbols referring to the pronunciation symbol dictionary 36. Character strings other than the reading-specified character are processed by a morpheme analysis part 32 to perform morpheme analyses. Then an output part 33 combines the conversion result of the conversion part 31 and the analysis result of the morpheme analysis part 32 together to generate and output pronunciation symbol data of the whole object text. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、日本語の漢字かな混じり文字列から合成音声を生成する際に用いられる音声合成用テキスト解析装置および音声合成装置に関する。
【０００２】
【従来の技術】
従来より、日本語の漢字かな混じり文を音声で読み上げる音声合成装置が知られている。このような音声合成装置では、入力された漢字かな混じり文字列を韻律記号付き発音記号文字列に変換するテキスト解析が行われる。
【０００３】
音声合成用のテキスト解析では、まず、漢字かな混じり文字列を形態素に分割し、同時に各形態素の読み、品詞、アクセント型、アクセント結合様式などの同定を行う形態素解析が行われる。形態素とは、文中で意味を担う最小の言語単位をいい、通常、単語と同じかそれよりも小さい単位となる。
【０００４】
形態素解析においては、たとえば「小倉（おぐら）」と「小倉（こくら）」、「最高値（さいこうち）」と「最高値（さいたかね）」のように、同形異音語がある場合などは解析結果にあいまい性が生じるので、複数の候補解が導き出されることがある。一般に音声合成の用途では何らかの評価基準で選択した一つの解のみを出力しなければならない。しかし、その判断は難しく、選択した読みやアクセント型が利用者の意図したものと異なる可能性がある。
【０００５】
そこで、このような場合にでも確実に利用者の意図した読みを出力するための方法が検討されている。たとえば特許文献１では、入力文字列中の特殊単語に対してあらかじめ「読み」を指定しておき、その指定区間の読みとしては、形態素解析の結果同定された読みではなく、指定された読みに置き換えて出力する方法が提案されている。また、市販の音声合成装置でも、同様の方法が多く採用されている。
【０００６】
【特許文献１】
特開平４−３３１９９８号公報
【０００７】
【発明が解決しようとする課題】
従来の音声合成装置においては特殊単語に対する「読み」として、「読み仮名」ではなく、発音やアクセントと一致する読み、すなわち、「発音記号」を指定する必要があった。
【０００８】
発音記号の表記法としては、「日本語テキスト音声合成用記号の規格ＪＥＩＤＡ−６２−２０００」などが知られている。ＪＥＩＤＡ−６２−２０００では、たとえば「小倉（おぐら）」は「オク゜ラ」、「最高値（さいこうち）」は「サイコ’ーチ」のように表記する。ここで、「ー」は長音を表し、「ク゜」は「グ」が鼻濁音化した発音を表している。また、「’」はアクセント位置を表し、直前拍まで高く読み、以降を低く読むことを表す。
【０００９】
これにしたがえば、「京都から小倉まで２９０円です。」という文字列において、「小倉」を「おぐら」と発音させたいときには、「京都から＜ＰＲＯＮＳＹＭ＝“オク゜ラ”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」のように「小倉」の部分に読みを指定することとなる。
【００１０】
このように、従来では、音声合成装置への読み指定として、音響的な特徴を反映した読みを与えなければならなかった。そのため、発音や音声学に関する知識を持たない一般の利用者にとっては読みの指定そのものが難しいという問題があった。また、発音やアクセント位置を正確に指定しなかった場合には、発音や韻律が不自然な合成音声しか得られないということも問題となっていた。
【００１１】
本発明は上記実情に鑑みてなされたものであって、その目的とするところは、発音や音声学の専門知識を持たない利用者でも簡単に読みを指定でき、正しい発音とアクセントによる音声合成を可能とする音声合成用テキスト解析装置および音声合成装置を提供することにある。
【００１２】
【課題を解決するための手段】
上記目的を達成するために本発明にあっては、「読み仮名」による読み指定を可能とする。
【００１３】
詳しくは、本発明は、少なくとも読み仮名と発音記号を対応付ける辞書を有し、この辞書を参照することによって、読み仮名による読み指定がされた読み指定文字列の読み仮名を発音記号に変換する。なお、本発明における「辞書」とは、コンピュータ処理可能な形式で、読み仮名、発音記号等のデータ、および、それらの対応関係等が記録されたデータファイルまたはデータベースをいう。
【００１４】
この構成によれば、文字列に対する「読み」として「読み仮名」を指定すれば足りるので、発音や音声学の専門知識を持たない利用者でも簡単に読み指定を行うことができる。また、指定された読み仮名を発音記号に変換するので、変換結果に基づいて正しい発音とアクセントによる音声合成を行うことが可能となる。
【００１５】
より詳しくは、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出し、少なくとも読み仮名と発音記号を対応付ける発音記号辞書を参照して読み指定文字列に指定された読み仮名を発音記号に変換し、形態素解析によって、処理対象テキストのうち読み指定がされていない他の文字列を構成する形態素の発音記号を取得し、変換処理の変換結果と形態素解析の解析結果を組み合わせて処理対象テキストの発音記号データを生成する構成が好ましい。
【００１６】
この構成によれば、読み指定文字列については指定された読み仮名に基づいて正確な発音記号が生成され、それ以外の文字列については形態素解析によって適切な発音記号が生成される。したがって、特殊な読みをする文字列に対して「読み仮名」による読み指定をするという簡単な事前処理だけで、処理対象テキスト全体にわたって利用者の意図する正しい発音とアクセントによる音声合成を行うことが可能となる。
【００１７】
また、本発明の他の構成としては、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出し、少なくとも形態素の表記、読み仮名および発音記号を対応付ける形態素辞書を参照して処理対象テキストを形態素に分割することにより、読み仮名および発音記号を属性として有する形態素候補を生成し、複数の形態素候補のなかから最適な形態素を選定する際に、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより、読みの異なる形態素候補を除外し、選定された形態素に基づいて処理対象テキストの発音記号データを生成する構成も好ましい。
【００１８】
この構成によれば、読み指定文字列に対しても形態素解析が行われるので、複合語、句、節、文などの任意の区間に対して読み指定をすることが可能となる。また、読み指定文字列を形態素解析したうえで、指定された読み仮名に合致する読みを有する形態素候補のなかから最適解を選定することとしたので、利用者の指定した読みに対応した正確な発音記号が生成される。
【００１９】
ここで、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより読みの異なる形態素候補を除外する処理（以下、「読み仮名検査処理」という。）の具体的実施態様は種々のものが考えられる。
【００２０】
たとえば、生成された形態素候補に対してまず読み仮名検査処理を行うことによって読みの異なる形態素候補を除外した後、残った形態素候補のなかから最適解を選定する態様を採用し得る。また、生成された形態素候補のなかから最適解を探索する処理と同時に読み仮名検査処理を行う態様も採用し得る。また、形態素間の接続コストなどに基づいて複数の候補解に絞り込んだ後に、その候補解に対して読み仮名検査処理を行って最適解を得る態様も採用し得る。
【００２１】
なお、読み指定には、本来、仮名文字（平仮名または片仮名）を用いるべきであるところ、利用者によっては、英数記号（英字、数字または記号）を用いてしまう場合がある。そこで、読み指定文字列に指定された読み仮名が英数記号を含む場合の対処として次のような構成を備えるとよい。
【００２２】
たとえば、読み仮名検査処理において、英数記号については形態素候補の表記と照合することが好ましい。そして、英数記号以外の仮名文字の部分については形態素候補の読み仮名と照合するのである。これにより、英数記号で読み指定された区間と形態素候補との対応をとることができるので、それ以外の区間、すなわち、仮名文字で読み指定された区間の読み仮名検査処理を正確に行うことができる。この場合には、英数記号は、通常の形態素解析処理によって、または、所定のルールに基づいて、発音記号に変換すればよい。
【００２３】
あるいは、英数記号を読み仮名（仮名文字）に変換した後に、読み仮名検査処理を行うことも好ましい。このように英数記号を仮名文字に変換する前処理を行うことにより、読み指定文字列に対して画一的に読み仮名検査処理を実行できるため、処理の簡易化を図ることができる。また、形態素解析によって発音記号が得られるので、英数記号についても正確な発音記号を得ることができる。
【００２４】
あるいは、形態素辞書が、英数記号の形態素の読み仮名に英数記号による表記を登録可能に構成されていることも好ましい。たとえば、「１０」という数字の形態素の読み仮名として、「じゅう」という仮名文字だけでなく、「１０」という数字による表記を登録しておくのである。これにより、読み仮名として英数記号が指定されていた場合でも、画一的に読み仮名検査処理を実行できるため、処理の簡易化を図ることができる。また、形態素辞書の発音記号に基づき、英数記号についても正確な発音記号を得ることができる。
【００２５】
なお、本発明は、上記構成の少なくとも一部を有する音声合成用テキスト解析装置もしくは音声合成装置として捉えることができる。また、本発明は、上記手順の少なくとも一部を含む音声合成用テキスト解析方法もしくは音声合成方法、または、かかる方法を実現するためのプログラムとして捉えることもできる。なお、上記構成および手順の各々は可能な限り互いに組み合わせて本発明を構成することができる。
【００２６】
たとえば、本発明の一実施態様としての音声合成用テキスト解析装置は、少なくとも読み仮名と発音記号を対応付ける辞書と、この辞書を参照することによって、読み仮名による読み指定がされた読み指定文字列の読み仮名を発音記号に変換する変換手段と、を有するとよい。
【００２７】
また、本発明の一実施態様としての音声合成用テキスト解析装置は、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出する読み指定抽出手段と、少なくとも読み仮名と発音記号を対応付ける発音記号辞書と、発音記号辞書を参照して読み指定文字列に指定された読み仮名を発音記号に変換する変換手段と、形態素解析によって、処理対象テキストのうち読み指定がされていない他の文字列を構成する形態素の発音記号を取得する形態素解析手段と、変換手段の変換結果と形態素解析手段の解析結果を組み合わせて処理対象テキストの発音記号データを生成する生成手段と、を有するとよい。
【００２８】
また、本発明の一実施態様としての音声合成用テキスト解析装置は、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出する読み指定抽出手段と、少なくとも形態素の表記、読み仮名および発音記号を対応付ける形態素辞書と、形態素辞書を参照して処理対象テキストを形態素に分割することにより、読み仮名および発音記号を属性として有する形態素候補を生成する形態素候補生成手段と、複数の形態素候補のなかから最適な形態素を選定する際に、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより、読みの異なる形態素候補を除外する読み仮名検査手段と、選定された形態素に基づいて処理対象テキストの発音記号データを生成する生成手段と、を有するとよい。
【００２９】
また、本発明の一実施態様としての音声合成装置は、上記いずれかの音声合成用テキスト解析装置と、その音声合成用テキスト解析装置から得られる発音記号に基づいて合成音声を生成する音声合成手段と、を有するとよい。
【００３０】
また、本発明の一実施態様としての音声合成用テキスト解析方法は、コンピュータが、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出し、少なくとも読み仮名と発音記号を対応付ける発音記号辞書を参照して読み指定文字列に指定された読み仮名を発音記号に変換し、形態素解析によって、処理対象テキストのうち読み指定がされていない他の文字列を構成する形態素の発音記号を取得し、変換処理の変換結果と形態素解析の解析結果を組み合わせて処理対象テキストの発音記号データを生成するとよい。
【００３１】
また、本発明の一実施態様としての音声合成用テキスト解析方法は、コンピュータが、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出し、少なくとも形態素の表記、読み仮名および発音記号を対応付ける形態素辞書を参照して処理対象テキストを形態素に分割することにより、読み仮名および発音記号を属性として有する形態素候補を生成し、複数の形態素候補のなかから最適な形態素を選定する際に、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより、読みの異なる形態素候補を除外し、選定された形態素に基づいて処理対象テキストの発音記号データを生成するとよい。
【００３２】
また、本発明の一実施態様としての音声合成方法は、上記いずれかの音声合成用テキスト解析方法を含み、さらに、コンピュータが、その方法で生成した発音記号データに基づいて合成音声を生成するとよい。
【００３３】
また、本発明の一実施態様としての音声合成用テキスト解析プログラムは、コンピュータに、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出する処理と、少なくとも読み仮名と発音記号を対応付ける発音記号辞書を参照して読み指定文字列に指定された読み仮名を発音記号に変換する処理と、形態素解析によって、処理対象テキストのうち読み指定がされていない他の文字列を構成する形態素の発音記号を取得する処理と、変換処理の変換結果と形態素解析の解析結果を組み合わせて処理対象テキストの発音記号データを生成する処理と、を実行させるとよい。
【００３４】
また、本発明の一実施態様としての音声合成用テキスト解析プログラムは、コンピュータに、読み仮名による読み指定がされた読み指定文字列を含む処理対象テキストから読み指定文字列および指定された読み仮名を抽出する処理と、少なくとも形態素の表記、読み仮名および発音記号を対応付ける形態素辞書を参照して処理対象テキストを形態素に分割することにより、読み仮名および発音記号を属性として有する形態素候補を生成する処理と、複数の形態素候補のなかから最適な形態素を選定する際に、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより、読みの異なる形態素候補を除外する処理と、選定された形態素に基づいて処理対象テキストの発音記号データを生成する処理と、を実行させるとよい。
【００３５】
また、本発明の一実施態様としての音声合成プログラムは、上記いずれかの音声合成用テキスト解析プログラムを含み、さらに、コンピュータにそのプログラムで生成した発音記号データに基づいて合成音声を生成する処理を実行させるとよい。
【００３６】
【発明の実施の形態】
以下に図面を参照して、この発明の好適な実施の形態を例示的に詳しく説明する。
【００３７】
（第１の実施形態）
＜音声合成装置の構成＞
図１は、本発明の第１の実施形態に係る音声合成装置の機能構成を示すブロック図である。
【００３８】
音声合成装置１は、ＣＰＵ（中央演算処理装置）、メモリ、ハードディスクなどを基本ハードウエアとして具備する汎用的なコンピュータにより構成可能である。
【００３９】
音声合成装置１は、音声合成処理に係る機能として、テキスト入力部２、テキスト解析部３、構文解析部４、アクセント句処理部５、音声合成部６および音声出力部７を有する。これらの機能は、ハードディスクなどの記憶装置に記憶されたプログラムがメモリに読み込まれ、ＣＰＵにより実行されることで実現されるものである。
【００４０】
テキスト入力部２は、キーボードなどの入力装置１４もしくはハードディスクやメモリカードなどの記憶媒体１５から音声合成処理の対象となる処理対象テキストを読み込む機能を有する。読み込んだ処理対象テキストは、入力データとしてテキスト解析部３に引き渡される。
【００４１】
本実施形態の音声合成装置１では、日本語の漢字かな混じり文を処理対象テキストとして入力可能であり、この処理対象テキストには予め「読み仮名」による読み指定を含めることができる。
【００４２】
読み指定の方法はどのようなものを用いても構わないが、本実施形態では、読み指定をする文字列に読み指定タグを付する方法を採用する。具体的には、「京都から小倉まで２９０円です。」という処理対象テキストのうち、「小倉」の文字列に「おぐら」という読み指定を行う場合には、
「京都から＜ＰＲＯＮＳＹＭ＝“おぐら”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」
のように記述する。
【００４３】
記号「＜」から記号「＞」の部分がタグであり、「＜ＰＲＯＮ・・・＞」が読み指定の開始タグ、「＜／ＰＲＯＮ＞」が読み指定の終了タグである。開始タグと終了タグで挟まれた文字列「小倉」が読み指定文字列となる。そして、開始タグ中の「ＳＹＭ＝“おぐら”」の部分が、読み指定文字列「小倉」に指定された読み仮名「おぐら」を表している。
【００４４】
「小倉」という地名には、京都の「おぐら」の他にも九州の「こくら」などがあるため、自動によるテキスト解析ではいずれの読みを出力すべきかの判断が難しい場合もある。そこで、そのような同形異音語に対しては、利用者が事前に読み指定を行っておくことにより、いわゆる読み間違いを未然に防ぐことができる。
【００４５】
テキスト解析部３は、入力データをテキスト解析することにより、処理対象テキストの発音記号データを生成する機能を有する。テキスト解析部３は、ハードディスクに記憶された辞書１０を参照することによって、形態素および形態素の属性の同定を行う。なお、テキスト解析部３は、読み指定文字列に関しては特別な処理を行うが、その構成および処理内容の詳細については後述する。
【００４６】
テキスト解析部３から出力される発音記号データは、処理対象テキストを構成する形態素およびその形態素の属性を表す構造化データである。各形態素は、属性として、発音記号、品詞、アクセント型、アクセント結合様式などを有している。
【００４７】
構文解析部４は、発音記号データを構文解析することによって、形態素間の係り受け関係を同定する機能を有する。このとき、構文解析部４は、言語知識として文法１１を参照する。
【００４８】
アクセント句処理部５は、構文解析の結果に基づいて、複数の形態素をアクセント句と呼ばれる単位にまとめ、そのアクセント句のアクセント位置を同定する機能を有する。アクセント句とは、文を読み上げる際の韻律のまとまりの単位である。アクセント句処理を行う際には、連続する複数の形態素が一つのアクセント句を構成するかどうか、結合する場合にはどのようなアクセント型になるかを記述したアクセント結合規則１２を参照する。
【００４９】
音声合成部６は、上述した処理によって得られた発音記号に基づいて合成音声を生成する音声合成手段として機能する。音声合成部６は、音声データ１３を用いて合成音声を生成する。
【００５０】
音声出力部７は、スピーカ１６などの音声出力装置を制御して、合成音声を出力する機能を有している。
【００５１】
＜テキスト解析部の構成＞
次に、図２を参照して、テキスト解析部３の詳細な構成について説明する。
【００５２】
図２は、本発明の第１の実施形態に係る音声合成用テキスト解析装置としてのテキスト解析部の構成を示すブロック図である。
【００５３】
テキスト解析部３は、概略、読み指定タグ解析部３０、読み文字列−発音記号文字列変換部（以下、単に「変換部」という。）３１、形態素解析部３２および出力部３３を有して構成される。また、テキスト解析部３は、図１における辞書１０として、変換部３１が参照する発音記号辞書３６と、形態素解析部３２が参照する形態素辞書３７とを有している。
【００５４】
読み指定タグ解析部３０は、処理対象テキストから読み指定文字列および指定された読み仮名を抽出する読み指定抽出手段として機能する部分である。読み指定タグ解析部３０は、入力された入力データを先頭から順に調べていき、読み指定タグに基づいて処理対象テキストを読み指定文字列と読み指定がされていない他の文字列（以下、「通常文字列」という。）とに切り分ける。読み指定文字列に関しては同時にその読み仮名も抽出する。そして、読み指定文字列およびその読み仮名は変換部３１に引き渡され、通常文字列は形態素解析部３２に引き渡される。
【００５５】
変換部３１は、発音記号辞書３６を参照して読み指定文字列に指定された読み仮名を発音記号に変換する変換手段として機能する部分である。発音記号辞書３６は、図３に示すように、登録語の「表記（見出し）」、「読み仮名」および「発音記号」が対応付けられた辞書である。ここで、「表記」は漢字かな混じり文字列で、「読み仮名」は平仮名で、「発音記号」は韻律記号付き発音記号文字列で、それぞれ登録されている。かかる構造の辞書を参照することによって、たとえば、「あいきょう」という読み仮名を、「アイキョ’ー」という韻律記号付きの発音記号に変換することができる。
【００５６】
形態素解析部３２は、形態素解析によって、通常文字列を構成する形態素の発音記号を取得する形態素解析手段として機能する部分である。形態素解析部３２は、辞書検索部３４、最適解探索部３５および探索バッファ３８を有して構成される。
【００５７】
辞書検索部３４は、形態素辞書３７を参照して通常文字列を形態素に分割する機能を有する。形態素辞書３７は、形態素の「表記（見出し）」、「品詞」、「発音記号」、「アクセント型」、「アクセント結合様式」などが対応付けられた辞書である。
【００５８】
辞書検索部３４は、通常文字列を先頭または末尾から順に調べていき、形態素辞書３７に登録された形態素の表記にマッチする文字列が見つかれば、その文字列を形態素候補とする。ここでは、とりうるすべての組み合わせの形態素候補が抽出される。すなわち、「京都」という文字列に対しては、「京都」の他にも「京」＋「都」という２つの形態素からなる候補も抽出される。形態素候補は、品詞、発音記号、アクセント型、アクセント結合様式などの属性とともに探索バッファ３８に格納される。
【００５９】
最適解探索部３５は、探索バッファ３８に格納された複数の形態素候補のなかから最適な形態素を選定する機能を有する。最適解探索部３５は、個々の形態素に与えられている形態素コスト、および、形態素同士の接続のしやすさを示す接続コストに基づいて、最適な形態素の組み合わせ（最適解）を求める。通常文字列を構成する形態素の組み合わせが決定すれば、通常文字列の発音記号も定まる。
【００６０】
出力部３３は、変換部３１の変換結果と形態素解析部３２の解析結果を組み合わせて処理対象テキストの発音記号データを生成する生成手段として機能する部分である。生成された発音記号データは、出力データとして出力される。
【００６１】
＜テキスト解析部の処理＞
次に、図４を参照して、上記構成のテキスト解析部３の処理フローについて説明する。図４は、本実施形態におけるテキスト解析処理の流れを示すフローチャートである。
【００６２】
ここでは、図５（ａ）に示すように、
「京都から＜ＰＲＯＮＳＹＭ＝“おぐら”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」
という入力データが入力された場合を例に挙げて、テキスト解析処理の具体的な流れを説明する。
【００６３】
まず、テキスト入力部２から、読み指定タグ付きの漢字かな混じり文字列「京都から＜ＰＲＯＮＳＹＭ＝“おぐら”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」が入力される（ステップＳ１００）。
【００６４】
次に、読み指定タグ解析部３０が、読み指定タグを調べて、処理対象テキストを読み指定文字列と通常文字列とに切り分ける（ステップＳ１０１）。ここでは、図５（ｂ）に示すように、通常文字列「京都から」、読み指定文字列「小倉」、通常文字列「まで２９０円です。」の３つの区間に分けられる。読み指定文字列「小倉」に関しては、指定された読み仮名「おぐら」も一緒に抽出される。その後、最初の文字列から順に次の処理に引き渡される。
【００６５】
通常文字列「京都から」は、読み指定のある区間ではないため、形態素解析部３２に引き渡される（ステップＳ１０２）。形態素解析部３２では、通常文字列「京都から」を形態素解析することによって、形態素およびその属性を同定する（ステップＳ１０３）。文字列「京都から」は、「京都」と「から」の２つの形態素に分割され、形態素「京都」の属性として発音記号「キョ’ート」、品詞「地名」などが得られ、形態素「から」の属性として発音記号「カ’ラ」、品詞「助詞」などが得られる。これらの解析結果は出力部３３に引き渡される。
【００６６】
続いて、次の文字列「小倉」の処理に移る（ステップＳ１０４）。この文字列は読み指定のある区間であるため、変換部３１に引き渡される（ステップＳ１０２）。変換部３１では、発音記号辞書３６を参照して、文字列「小倉」に指定された読み仮名「おぐら」を発音記号に変換する（ステップＳ１０５）。図３に示すように、読み仮名「おぐら」は、韻律記号付き発音記号文字列「オク゜ラ」に変換される。この変換結果は出力部３３に引き渡される。
【００６７】
続いて、次の文字列「まで２９０円です。」の処理に移る（ステップＳ１０４）。この文字列は読み指定のある区間ではないため、形態素解析部３２に引き渡され（ステップＳ１０２）、上記と同様にして形態素解析が行われる（ステップＳ１０３）。この解析結果も出力部３３に引き渡される。
【００６８】
これですべての文字列に対して処理を行ったため、出力部３３の処理に移る（ステップＳ１０４）。出力部３３は、形態素解析部３２による文字列「京都から」、「まで２９０円です。」の解析結果と、変換部３１による文字列「小倉」の変換結果とを組み合わせて、図５（ｃ）に示すような発音記号データを生成し、出力する（ステップＳ１０６）。なお、ここで出力されたデータは、以降の構文解析処理に引き渡される。
【００６９】
以上述べたように、本実施形態の構成によれば、文字列「小倉」に対する読みとして、「オク゜ラ」のような発音記号ではなく、「おぐら」のような仮名文字による読み仮名を指定すれば足りるので、発音や音声学の専門知識を持たない利用者でも簡単に読み指定を行うことができる。
【００７０】
そして、発音記号辞書３６を参照することによって、指定された読み仮名「おぐら」を発音記号「オク゜ラ」に変換するので、変換結果に基づいて正しい発音とアクセントによる音声合成を行うことが可能となる。
【００７１】
また、読み指定のされていない通常文字列「京都から」、「まで２９０円です。」については、形態素解析によって適切な発音記号が生成される。
【００７２】
したがって、特殊な読みをする文字列に対して「読み仮名」による読み指定をするという簡単な事前処理だけで、処理対象テキスト全体にわたって利用者の意図する正しい発音とアクセントによる音声合成を行うことが可能となる。
【００７３】
＜変形例１＞
本実施形態では、発音記号辞書３６と形態素辞書３７とを別構成としたが、形態素辞書中に形態素の読み仮名を登録することによって、その形態素辞書を発音記号辞書として用いることもできる。この場合のテキスト解析部の構成例を図６に示す。すなわち、変換部３１は、読み仮名を発音記号に変換する際に、形態素辞書３７を参照するのである。この構成によれば、読み指定文字列についても、品詞、アクセント型、アクセント結合様式などの属性を得ることができる。これらの属性が構文解析やアクセント句処理にて活用されることによって、より自然な合成音声の生成が可能となる。
【００７４】
＜変形例２＞
本実施形態では、指定された読み仮名が発音記号辞書３６に登録されていない単語、すなわち、辞書未登録語であった場合には、発音記号に変換することができない場合がある。そこで、変換部３１に、読み仮名、読み指定文字列の表記、またはこれらの組み合わせ等に基づいて、発音記号を推定する推定手段としての機能を持たせ、辞書未登録語については読み仮名等から発音記号を推定することが好ましい。これにより、辞書未登録語の読み仮名が指定された場合であっても、その読み仮名を発音記号に変換することが可能となる。
【００７５】
＜変形例３＞
読み仮名による読み指定と発音記号による読み指定の両方を利用可能とし、これらの読み指定を切り替え可能とする構成も好ましい。読み指定方法の切り替えは次のように行えばよい。
【００７６】
（１）設定ファイルで指定する方法
システム設定ファイルやレジストリなどの設定ファイルに、「ＯＲＴＨＯＧＲＡＰＨＹ＝“読み仮名”」または「ＯＲＴＨＯＧＲＡＰＨＹ＝“発音記号”」のように読み指定方法を記述しておく。テキスト解析部３は、起動時などに設定ファイルの設定を参照し、前者の場合には、読み指定文字列を読み仮名指定として扱い、後者の場合には、発音記号指定として扱うように処理を切り替える。
【００７７】
（２）入力データ中で宣言する方法
入力データの先頭などに「＜ＯＲＴＨＯＧＲＡＰＨＹ＝“読み仮名”＞」または「＜ＯＲＴＨＯＧＲＡＰＨＹ＝“発音記号”＞」のように読み指定方法のタグを記述する。テキスト解析部３は、入力データごとに、読み仮名指定として扱うか発音記号指定として扱うか処理を切り替える。
【００７８】
（３）読み指定に識別符号を付加することにより切り替える方法
読み指定タグに、読み仮名による指定か発音記号による指定かを指示する識別符号を記述しておく。テキスト解析部３は、テキスト解析処理を行う際に識別符号に基づいて読み指定文字列の処理を切り替える。たとえば、識別符号として「ＯＲＴＨ」を用いた場合には、「＜ＰＲＯＮＯＲＴＨ＝“発音記号” ＳＹＭ＝“キョ’ート”＞京都＜／ＰＲＯＮ＞から＜ＰＲＯＮＯＲＴＨ＝“読み仮名” ＳＹＭ＝“おぐら”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」という入力データに対し、「京都」の部分は発音記号指定として扱い、「小倉」の部分は読み仮名指定として扱うように処理を切り替える。
【００７９】
（４）使用する文字セットで判別する方法
読み指定が「平仮名」でされたときには読み仮名指定として扱い、「片仮名」でされたときには発音記号指定として扱うように処理を切り替える。たとえば、「＜ＰＲＯＮＳＹＭ＝“キョ’ート”＞京都＜／ＰＲＯＮ＞から＜ＰＲＯＮＳＹＭ＝“おぐら”＞小倉＜／ＰＲＯＮ＞まで２９０円です。」という入力データに対し、「京都」の読み指定は片仮名なので発音記号指定として扱い、「小倉」の読み指定は平仮名なので読み仮名指定として扱う。
【００８０】
このように複数の読み指定方法を利用可能とし、適宜切り替え可能とすることによって、装置の利便性・柔軟性が向上する。
【００８１】
（第２の実施形態）
次に、本発明の第２の実施形態について説明する。
【００８２】
第１の実施形態と本実施形態とはテキスト解析部の構成が異なるのみであり、その他の構成および作用については同一である。
【００８３】
＜テキスト解析部の構成＞
図７は、本実施形態に係る音声合成用テキスト解析装置としてのテキスト解析部の構成を示すブロック図である。
【００８４】
テキスト解析部８は、概略、読み指定タグ解析部８０、形態素解析部８１および発音記号文字列出力部８２を有して構成される。ここで、形態素解析部８１は、辞書検索部８３、読み仮名検査部８４、最適解探索部８５、形態素辞書８６および探索バッファ８７を有している。
【００８５】
読み指定タグ解析部８０は、処理対象テキストから読み指定文字列および指定された読み仮名を抽出する読み指定抽出手段として機能する部分である。読み指定タグ解析部８０は、入力された入力データを先頭から順に調べていき、読み指定タグに基づいて処理対象テキストを読み指定文字列と通常文字列とに切り分ける。読み指定文字列に関しては同時にその読み仮名も抽出する。そして、読み指定文字列およびその読み仮名を読み仮名検査部８４に引き渡し、読み指定文字列および通常文字列を辞書検索部８３に引き渡す。
【００８６】
辞書検索部８３は、形態素辞書８６を参照して処理対象テキスト（つまり、読み指定文字列および通常文字列）を形態素に分割することにより形態素候補を生成する形態素候補生成手段として機能する。本実施形態では、図８に示すように、形態素辞書８６として、形態素の「表記（見出し）」、「読み仮名」、「品詞」、「発音記号」、「アクセント型」、「アクセント結合様式」などが対応付けられた辞書を用いる。
【００８７】
辞書検索部８３は、処理対象テキストを先頭または末尾から順に調べていき、形態素辞書８６に登録された形態素の表記にマッチする文字列が見つかれば、その文字列を形態素候補とする。ここでは、とりうるすべての組み合わせの形態素候補が抽出される。形態素候補は、読み仮名、品詞、発音記号、アクセント型、アクセント結合様式などの属性とともに探索バッファ８７に格納される。
【００８８】
読み仮名検査部８４は、読み指定文字列に指定された読み仮名と形態素候補の読み仮名とを照合することにより、読みの異なる形態素候補を除外する読み仮名検査手段として機能する。すなわち、探索バッファ８７に格納された形態素候補のうち、読み指定文字列に指定された読み仮名と一致しないものを、探索バッファ８７から削除するのである。
【００８９】
最適解探索部８５は、探索バッファ８７に格納された複数の形態素候補のなかから最適な形態素を選定する最適解探索手段として機能する。最適解探索部８５は、個々の形態素に与えられている形態素コスト、および、形態素同士の接続のしやすさを示す接続コストに基づいて、探索バッファ８７に残っている複数の形態素候補のなかから最適な形態素の組み合わせ（最適解）を求める。
【００９０】
発音記号文字列出力部８２は、最適解探索部８５で選定された形態素に基づいて処理対象テキストの発音記号データを生成する生成手段として機能する部分である。ここで生成される発音記号データは、第１の実施形態におけるものと同じである。発音記号データは、出力データとして出力される。
【００９１】
＜テキスト解析部の処理＞
次に、図９を参照して、上記構成のテキスト解析部８の処理フローについて説明する。図９は、本実施形態におけるテキスト解析処理の流れを示すフローチャートである。
【００９２】
ここでは、図１０（ａ）に示すように、
「＜ＰＲＯＮＳＹＭ＝“きょうとからおぐら”＞京都から小倉＜／ＰＲＯＮ＞まで＜ＰＲＯＮＳＹＭ＝“２９０えん”＞２９０円＜／ＰＲＯＮ＞です。」
という入力データが入力された場合を例に挙げて、テキスト解析処理の具体的な流れを説明する。
【００９３】
まず、テキスト入力部２から、入力データが入力される（ステップＳ２００）。
【００９４】
次に、読み指定タグ解析部８０が、読み指定タグを調べて、処理対象テキストを読み指定文字列と通常文字列とに切り分ける（ステップＳ２０１）。ここでは、図１０（ｂ）に示すように、読み指定文字列「京都から小倉」、通常文字列「まで」、読み指定文字列「２９０円」、通常文字列「です。」の４つの区間に分けられる。読み指定文字列「京都から小倉」および「２９０円」に関しては、指定された読み仮名「きょうとからおぐら」および「２９０えん」も一緒に抽出される。
【００９５】
そして、処理対象文字列「京都から小倉まで２９０円です。」が辞書検索部８３に引き渡され、読み指定文字列および読み仮名が読み仮名検査部８４に引き渡される。
【００９６】
次に、辞書検索部８３が、処理対象文字列を形態素に分割し、得られた形態素候補を探索バッファ８７に格納する（ステップＳ２０２）。このときの探索バッファ８７の内容を図１１（ａ）に示す。辞書検索部８３は考えられ得るすべての組み合わせの形態素候補を抽出するため、「京（きょう）」、「都（みやこ）」、「小倉（こくら）」のように、利用者が本来意図した読みとは異なる形態素候補も抽出されることとなる。
【００９７】
続いて、読み仮名検査部８４が、読み仮名検査処理を実行する（ステップＳ２０３）。
【００９８】
まず、読み指定文字列「京都から小倉」を構成する形態素候補の組み合わせを抽出する。ここでは、
（１）「きょう−みやこ−か−ら−こくら」
（２）「きょう−みやこ−か−ら−おぐら」
（３）「きょう−みやこ−から−こくら」
（４）「きょう−みやこ−から−おぐら」
（５）「きょうと−か−ら−こくら」
（６）「きょうと−か−ら−おぐら」
（７）「きょうと−から−こくら」
（８）「きょうと−から−おぐら」
のように、８種類の組み合わせが得られる。
【００９９】
そして、指定された読み仮名「きょうとからおぐら」と読みが合致する組み合わせ（６）および（８）を構成する形態素候補のみ残し、他の形態素候補を探索バッファ８７から削除する。これにより、「京（きょう）」、「都（みやこ）」、「小倉（こくら）」の３つの形態素候補が削除される。
【０１００】
次に、読み指定文字列「２９０円」についても読み仮名検査処理を行うが、ここでは次のような問題が生ずる。「２９０円」の読み指定には、本来、「にひゃくきゅうじゅうえん」のように仮名文字を用いるべきであるところ、本例では「２９０えん」のように数字にて読み指定がなされているため、上記と同様の処理では読み仮名検査処理を行うことができないのである。
【０１０１】
そこで、読み仮名検査部８４は、英数記号によって読み指定がされている場合には、例外的に、形態素候補の読み仮名ではなく表記と照合を行い、英数記号と表記とが一致していたらその形態素候補を探索バッファ８７に残すようにする。本例では、読み指定の「２９０」の部分と形態素候補の表記「２９０」とが照合され、この形態素は探索バッファ８７に残される。そして、それ以外の部分「えん」については、上記と同様にして読み仮名検査を実行する。
【０１０２】
読み仮名検査後の探索バッファ８７の内容を図１１（ｂ）に示す。残された形態素候補であれば、どのように組み合わせても正しい読みが得られることがわかる。
【０１０３】
最適解探索部８５は、探索バッファ８７に残された形態素候補の各々の組み合わせについて、形態素コストや接続コストを計算し、最適な組み合わせを選定する（ステップＳ２０４）。
【０１０４】
そして、発音記号文字列出力部８２が、選定された形態素候補に基づいて、図１０（ｃ）に示すような発音記号データを生成し、出力する（ステップＳ２０５）。なお、ここで出力されたデータは、以降の構文解析処理に引き渡される。
【０１０５】
以上述べたように、本実施形態の構成によれば、文字列「京都から小倉」に対する読みとして、「キョ’ートカラ／オク゜ラ」のような発音記号ではなく、「きょうとからおぐら」のような仮名文字による読み仮名を指定すれば足りるので、発音や音声学の専門知識を持たない利用者でも簡単に読み指定を行うことができる。
【０１０６】
また、読み指定文字列に対しても形態素解析が行われるので、複合語、句、節、文などの任意の区間に対して読み指定をすることが可能となる。
【０１０７】
そして、読み指定文字列を形態素解析したうえで、指定された読み仮名とは異なる形態素候補を除外し、読みが合致する形態素候補のなかから最適解を選定することとしたので、利用者の指定した読みに対応した正確な発音記号が生成される。
【０１０８】
さらに、「２９０」のように英数記号によって読み指定がされている場合には、形態素候補の表記と照合を行うようにしたので、英数記号で読み指定された区間と形態素候補との対応をとることができ、それ以外の区間、すなわち、仮名文字で読み指定された区間「えん」の読み仮名検査処理を正確に行うことができる。
【０１０９】
したがって、本実施形態の構成によっても、特殊な読みをする文字列に対して「読み仮名」による読み指定をするという簡単な事前処理だけで、処理対象テキスト全体にわたって利用者の意図する正しい発音とアクセントによる音声合成を行うことが可能となる。
【０１１０】
＜変形例１＞
本実施形態では、生成された形態素候補に対してまず読み仮名検査処理を行うことによって読みの異なる形態素候補を除外した後、残った形態素候補のなかから最適解を選定することとしたが、読み仮名検査処理の実施態様はこれに限られない。
【０１１１】
たとえば、辞書検索部８３で生成されたすべての形態素候補を対象として最適解の探索を行うこととし、その探索処理と同時に読み仮名検査処理を実行する態様も好ましい。
【０１１２】
あるいは、辞書検索部８３で生成されたすべての形態素候補を対象として、形態素コストや接続コストなどに基づいて複数の候補解に絞り込んだ後に、その複数の候補解に対して読み仮名検査処理を行うことにより最適解を得る態様も好ましい。
【０１１３】
＜変形例２＞
読み指定文字列を１つの形態素とみなして解析するか、複数の形態素である可能性を考慮して解析するかを切り替え可能とする構成も好ましい。この切り替えは次のように行えばよい。
【０１１４】
（１）設定ファイルで指定する方法
システム設定ファイルやレジストリなどの設定ファイルに、「ＷＯＲＤＮＵＭ＝“単一”」または「ＷＯＲＤＮＵＭ＝“複数”」のように解析方法を記述しておく。テキスト解析部８は、起動時などに設定ファイルの設定を参照し、前者の場合には、読み指定文字列を単一の形態素として扱い、後者の場合には、本実施形態のように考えられ得るすべての形態素に分割する処理を行う。
【０１１５】
（２）入力データ中で宣言する方法
入力データの先頭などに「＜ＷＯＲＤＮＵＭ＝“単一”＞」または「＜ＷＯＲＤＮＵＭ＝“複数”＞」のように解析方法のタグを記述する。テキスト解析部８は、入力データごとに、読み指定文字列を単一の形態素として扱うかどうか判断する。
【０１１６】
（３）読み指定に識別符号を付加することにより切り替える方法
読み指定タグに、単一の形態素として扱うか否かを指示する識別符号を記述しておく。テキスト解析部８は、テキスト解析処理を行う際に識別符号に基づいて読み指定文字列の解析方法を切り替える。たとえば、識別符号として「ＷＯＲＤＮＵＭ」を用いた場合には、「＜ＰＲＯＮＳＹＭ＝“きょう” ＷＯＲＤＮＵＭ＝“単一”＞今日＜／ＰＲＯＮ＞の＜ＰＲＯＮＳＹＭ＝“とうきょうがいこくかわせしじょう” ＷＯＲＤＮＵＭ＝“複数”＞東京外国為替市場＜／ＰＲＯＮ＞の円相場をお伝えします。」という入力データに対し、「今日」の部分は単一の形態素とみなし、「東京外国為替市場」の部分は複数の形態素に分割する。
【０１１７】
このように読み指定文字列の解析方法を指定可能とすることによって、装置の利便性・柔軟性が向上する。
【０１１８】
＜変形例３＞
本実施形態では、英数記号によって読み指定された文字列「２９０」の読みとして、形態素解析により得られた発音記号「ニヒャクキュ’ージュー」を出力することとしたが、それ以外にも次のような構成を採ることができる。
【０１１９】
たとえば、英数記号を所定のルールに基づいて発音記号に変換する手段を設けることによって、英数記号の発音記号を生成してもよい。
【０１２０】
ただし、ルールに基づく変換では単語固有の発音に対応できない場合もある。そこで、英数記号から読み仮名へ変換する手段を設けることによって、この読み仮名を使用して読み仮名検査処理を行うようにすることも好ましい。このように英数記号を仮名文字に変換する前処理を行うことにより、読み指定文字列に対して画一的に読み仮名検査処理を実行できるため、処理の簡易化を図ることができる。また、形態素解析によって発音記号が得られるので、英数記号についても正確な発音記号を得ることができる。
【０１２１】
また、形態素辞書が、英数記号の形態素の読み仮名に英数記号による表記を登録可能に構成されていることも好ましい。たとえば、「１０」という数字の形態素の読み仮名として、「じゅう」という仮名文字だけでなく、「１０」という数字による表記を登録しておくのである。これにより、読み仮名として英数記号が指定されていた場合でも、画一的に読み仮名検査処理を実行できるため、処理の簡易化を図ることができる。また、形態素辞書の発音記号に基づき、英数記号についても正確な発音記号を得ることができる。
【０１２２】
【発明の効果】
以上説明したように、本発明によれば、発音や音声学の専門知識を持たない利用者でも簡単に読みを指定することができ、また、正しい発音とアクセントによる音声合成が可能となる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る音声合成装置の機能構成を示すブロック図である。
【図２】本発明の第１の実施形態に係るテキスト解析部の構成を示すブロック図である。
【図３】本発明の第１の実施形態に係る発音記号辞書の構造を示す図である。
【図４】本発明の第１の実施形態に係るテキスト解析処理の流れを示すフローチャートである。
【図５】（ａ）は入力データの一例を示す図であり、（ｂ）は読み指定タグ解析処理の解析結果の一例を示す図であり、（ｃ）は発音記号データの一例を示す図である。
【図６】本発明の第１の実施形態に係るテキスト解析部の変形例１の構成を示すブロック図である。
【図７】本発明の第２の実施形態に係るテキスト解析部の構成を示すブロック図である。
【図８】本発明の第２の実施形態に係る形態素辞書の構造を示す図である。
【図９】本発明の第２の実施形態に係るテキスト解析処理の流れを示すフローチャートである。
【図１０】（ａ）は入力データの一例を示す図であり、（ｂ）は読み指定タグ解析処理の解析結果の一例を示す図であり、（ｃ）は発音記号データの一例を示す図である。
【図１１】（ａ）は読み仮名検査処理前の探索バッファの内容を示す図であり、（ｂ）は読み仮名検査処理後の探索バッファの内容を示す図である。
【符号の説明】
１音声合成装置
２テキスト入力部
３テキスト解析部
４構文解析部
５アクセント句処理部
６音声合成部
７音声出力部
８テキスト解析部
１０辞書
１１文法
１２アクセント結合規則
１３音声データ
１４入力装置
１５記憶媒体
１６スピーカ
３０読み指定タグ解析部
３１変換部
３２形態素解析部
３３出力部
３４辞書検索部
３５最適解探索部
３６発音記号辞書
３７形態素辞書
３８探索バッファ
８０読み指定タグ解析部
８１形態素解析部
８２発音記号文字列出力部
８３辞書検索部
８４読み仮名検査部
８５最適解探索部
８６形態素辞書
８７探索バッファ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text analysis device for speech synthesis and a speech synthesis device used when generating synthetic speech from a character string mixed with Japanese kanji and kana.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a speech synthesizer that reads a sentence mixed with Japanese kanji or kana by voice has been known. In such a speech synthesizer, text analysis is performed to convert an input kanji-kana mixed character string into a phonetic symbol character string with prosodic symbols.
[0003]
In the text analysis for speech synthesis, first, a kanji-kana mixed character string is divided into morphemes, and at the same time, morphological analysis is performed to read each morpheme and identify parts of speech, accent type, accent connection style, and the like. A morpheme is the smallest linguistic unit that plays a role in a sentence, and is usually the same as or smaller than a word.
[0004]
In morphological analysis, for example, when there are homonymous words such as "Ogura" and "Kokura", "highest value" and "highest value" Since the results of the analysis have ambiguity, a plurality of candidate solutions may be derived. In general, for speech synthesis, only one solution selected based on some evaluation criteria must be output. However, the judgment is difficult, and the selected reading or accent type may be different from what the user intended.
[0005]
Therefore, a method for reliably outputting the reading intended by the user even in such a case is being studied. For example, in Patent Literature 1, "reading" is specified in advance for a special word in an input character string, and the reading of the specified section is not the reading identified as a result of morphological analysis but the specified reading. A method of replacing and outputting has been proposed. The same method is often employed in commercially available speech synthesizers.
[0006]
[Patent Document 1]
JP-A-4-331998
[0007]
[Problems to be solved by the invention]
In the conventional speech synthesizer, it is necessary to designate a pronunciation that matches a pronunciation or an accent, that is, a “phonetic symbol”, instead of a “reading kana”, as a “reading” for a special word.
[0008]
As a notation method of phonetic symbols, "Japanese Text Speech Synthesis Symbol Standard JEIDA-62-2000" is known. In JEIDA-62-2000, for example, "Ogura" is described as "Okuura", and "Maximum value" is described as "Psycho-arch". Here, “-” represents a long sound, and “゜” represents a pronunciation of “G” turned into a muddy sound. Further, “′” indicates an accent position, which means that the reading is high until the immediately preceding beat and the reading after that is low.
[0009]
According to this, if you want to pronounce "Ogura" as "Ogura" in the character string "Kyoto to Kokura is 290 yen", "From Kyoto <PRON SYM =" Okudra "> Ogura </ PRON > Is up to 290 yen. "
[0010]
As described above, conventionally, as a reading designation to the speech synthesizer, a reading reflecting acoustic characteristics had to be given. For this reason, there is a problem that it is difficult for a general user who has no knowledge of pronunciation or phonetics to specify the reading itself. Another problem is that if the pronunciation and accent position are not specified correctly, only synthesized speech with unnatural pronunciation and prosody can be obtained.
[0011]
The present invention has been made in view of the above circumstances, and its purpose is to enable a user who does not have specialized knowledge of pronunciation and phonetics to easily specify reading, and to perform speech synthesis with correct pronunciation and accent. An object of the present invention is to provide a text analysis device for speech synthesis and a speech synthesis device which are enabled.
[0012]
[Means for Solving the Problems]
In order to achieve the above object, according to the present invention, it is possible to designate a reading by a “reading kana”.
[0013]
More specifically, the present invention has a dictionary that associates at least a reading kana with a phonetic symbol, and converts the reading kana of a reading specified character string specified by the reading kana into a phonetic symbol by referring to the dictionary. The “dictionary” in the present invention refers to a data file or database in which data such as phonetic kana and phonetic symbols, and their correspondences are recorded in a computer-processable format.
[0014]
According to this configuration, it is sufficient to designate "reading kana" as "reading" for a character string, so that a user who does not have expertise in pronunciation or phonetics can easily specify reading. In addition, since the designated kana is converted to phonetic symbols, it is possible to perform speech synthesis with correct pronunciation and accent based on the conversion result.
[0015]
More specifically, extract the specified reading string and the specified reading kana from the processing target text including the specified reading kana string specified by the reading kana, and refer to a phonetic dictionary that associates at least the reading kana with the phonetic symbol. The phonetic symbol specified in the specified character string is converted to a phonetic symbol, and the phonetic symbol of the morpheme constituting another character string not specified in the read target text is obtained by morphological analysis. It is preferable to generate the phonetic symbol data of the text to be processed by combining the conversion result of the above and the analysis result of the morphological analysis.
[0016]
According to this configuration, an accurate phonetic symbol is generated based on the specified phonetic kana for the specified phonetic character string, and an appropriate phonetic symbol is generated for the other character strings by morphological analysis. Therefore, it is possible to perform speech synthesis with the correct pronunciation and accent intended by the user over the entire text to be processed, with only a simple pre-processing of specifying the pronunciation of the character string to be read specially by "Yomikana". It becomes possible.
[0017]
Further, as another configuration of the present invention, a reading specified character string and a specified reading kana are extracted from a processing target text including a reading specified character string specified by a reading kana, and at least a morpheme notation, a reading kana By dividing the text to be processed into morphemes with reference to the morpheme dictionary that associates the phonetic symbols with the phonetic symbols, a morpheme candidate having the pronunciation kana and the phonetic symbol as an attribute is generated, and an optimal morpheme is selected from among a plurality of morpheme candidates. At this time, the phonetic symbol data of the text to be processed is excluded based on the selected morpheme by excluding morpheme candidates having different pronunciations by comparing the phonetic kana specified in the phonetic designation string with the phonetic kana of the morpheme candidate. A configuration for generating the data is also preferable.
[0018]
According to this configuration, since the morphological analysis is performed also on the specified reading character string, it is possible to specify the reading of an arbitrary section such as a compound word, a phrase, a clause, and a sentence. In addition, after performing the morphological analysis of the specified reading character string, the optimal solution is selected from among morpheme candidates having the reading that matches the specified reading kana, so that an accurate solution corresponding to the reading specified by the user can be obtained. Phonetic symbols are generated.
[0019]
Here, specific processing of excluding morpheme candidates having different readings by comparing the reading kana specified in the reading designation character string with the reading kana of the morpheme candidate (hereinafter referred to as “reading kana inspection processing”). Various aspects can be considered.
[0020]
For example, it is possible to adopt a mode in which morpheme candidates with different readings are excluded by first performing a reading kana check process on the generated morpheme candidates, and then an optimal solution is selected from the remaining morpheme candidates. In addition, a mode in which the reading kana checking process is performed simultaneously with the process of searching for the optimal solution from the generated morpheme candidates may be adopted. In addition, after narrowing down to a plurality of candidate solutions based on the connection cost between morphemes or the like, an aspect of obtaining an optimal solution by performing a reading kana check process on the candidate solution may be adopted.
[0021]
It should be noted that kana characters (hiragana or katakana) should be originally used for the reading designation, but some users may use alphanumeric symbols (alphabetic characters, numbers or symbols). Therefore, the following configuration may be provided to cope with the case where the reading kana specified in the reading specification character string includes an alphanumeric symbol.
[0022]
For example, in the reading kana checking process, it is preferable to collate an alphanumeric symbol with the notation of a morpheme candidate. Then, the part of the kana character other than the alphanumeric symbol is compared with the reading kana of the morpheme candidate. This makes it possible to correlate the section specified by reading with alphanumeric symbols with the morpheme candidate, so that accurate reading kana inspection processing of other sections, that is, sections specified by reading with kana characters, can be performed accurately. Can be. In this case, the alphanumeric symbols may be converted into phonetic symbols by ordinary morphological analysis processing or based on a predetermined rule.
[0023]
Alternatively, it is also preferable to perform the reading kana inspection process after converting the alphanumeric symbols into the reading kana (kana characters). By performing the pre-processing for converting the alphanumeric symbol to the kana character in this way, the reading kana inspection processing can be uniformly performed on the specified reading character string, so that the processing can be simplified. Since phonetic symbols can be obtained by morphological analysis, accurate phonetic symbols can be obtained for alphanumeric symbols.
[0024]
Alternatively, it is also preferable that the morphological dictionary is configured to be able to register the notation by the alphanumeric symbol in the reading kana of the morpheme of the alphanumeric symbol. For example, not only the kana character “ju” but also the notation “10” is registered as the reading kana of the morpheme of the number “10”. Thereby, even when an alphanumeric symbol is specified as a reading kana, the reading kana inspection process can be uniformly performed, and the process can be simplified. In addition, accurate phonetic symbols can be obtained for alphanumeric symbols based on phonetic symbols in the morphological dictionary.
[0025]
The present invention can be regarded as a text-to-speech text analysis device or a text-to-speech synthesis device having at least a part of the above configuration. Further, the present invention can also be regarded as a text analysis method for speech synthesis or a speech synthesis method including at least a part of the above procedure, or a program for realizing such a method. In addition, each of the above configurations and procedures can be combined with each other as much as possible to constitute the present invention.
[0026]
For example, a text-to-speech text analysis device according to an embodiment of the present invention includes a dictionary that associates at least a reading kana with a phonetic symbol, and by referring to the dictionary, reads a reading-specified character string specified by reading a kana. Conversion means for converting the phonetic symbols into phonetic symbols.
[0027]
In addition, the text analysis device for speech synthesis according to one embodiment of the present invention includes a reading unit that extracts a reading specified character string and a specified reading kana from a processing target text including a reading specified character string specified by a reading kana. Designation extraction means, at least a phonetic symbol dictionary for associating phonetic symbols with phonetic symbols, conversion means for converting phonetic symbols specified in the phonetic symbol dictionary with reference to phonetic symbol dictionaries to phonetic symbols, and processing by morphological analysis A morphological analysis unit that obtains phonetic symbols of morphemes constituting another character string that is not specified for reading in the target text, and a phonetic symbol of the text to be processed by combining the conversion result of the conversion unit and the analysis result of the morphological analysis unit Generating means for generating data.
[0028]
In addition, the text analysis device for speech synthesis according to one embodiment of the present invention includes a reading unit that extracts a reading specified character string and a specified reading kana from a processing target text including a reading specified character string specified by a reading kana. A designation extraction unit, a morpheme dictionary that associates at least a morpheme notation, a reading kana and a phonetic symbol, and a morpheme candidate having the reading kana and a phonetic symbol as attributes by dividing the processing target text into morphemes with reference to the morphological dictionary. The morpheme candidate generating means to generate, and when selecting the best morpheme from among a plurality of morpheme candidates, the pronunciation differs by comparing the phonetic kana specified in the phonetic designation string with the phonetic kana of the morpheme candidate. Yomi kana checking means for excluding morpheme candidates, and phonetic symbol data of the text to be processed based on the selected morpheme Generating means for forming, it may have a.
[0029]
According to another embodiment of the present invention, there is provided a speech synthesis apparatus including any of the above-described speech synthesis text analysis apparatuses, and a speech synthesis unit configured to generate synthesized speech based on phonetic symbols obtained from the speech synthesis text analysis apparatus. It is good to have.
[0030]
Further, in the text analysis method for speech synthesis as one embodiment of the present invention, the computer converts the reading designation character string and the specified reading kana from the processing target text including the reading designation character string designated by the reading kana. Extract and convert at least the phonetic symbol specified in the phonetic symbol string into phonetic symbols by referring to the phonetic symbol dictionary that associates phonetic symbols with phonetic symbols, and the morphological analysis does not specify phonetic reading in the text to be processed It is preferable that phonetic symbols of morphemes constituting another character string are acquired, and phonetic symbol data of the text to be processed is generated by combining the conversion result of the conversion process and the analysis result of the morphological analysis.
[0031]
Further, in the text analysis method for speech synthesis as one embodiment of the present invention, the computer converts the reading designation character string and the specified reading kana from the processing target text including the reading designation character string designated by the reading kana. By extracting and dividing the text to be processed into morphemes with reference to a morpheme dictionary that associates at least the notation of morphemes, phonetic symbols and phonetic symbols, a morpheme candidate having phonetic symbols and phonetic symbols as attributes is generated. When selecting the best morpheme from the candidates, by comparing the reading kana specified in the reading specification character string with the reading kana of the morpheme candidate, morpheme candidates having different readings are excluded, and the selected morpheme is added to the selected morpheme. It is preferable to generate phonetic symbol data of the text to be processed based on the data.
[0032]
Further, a speech synthesis method as one embodiment of the present invention includes any of the text analysis methods for speech synthesis described above, and the computer may generate a synthesized speech based on phonetic symbol data generated by the method. .
[0033]
Further, the text analysis program for speech synthesis as one embodiment of the present invention stores in a computer a reading designation character string and a designated reading kana from a processing target text including a reading designation character string designated by a reading kana. A process of extracting, a process of converting a phonetic symbol specified in the phonetic symbol string into phonetic symbols by referring to a phonetic symbol dictionary that associates at least phonetic symbols with phonetic symbols, and a process of morphological analysis to specify phonetic reading in the text to be processed A process of obtaining phonetic symbols of morphemes constituting another character string that has not been processed, and a process of generating phonetic symbol data of the text to be processed by combining the conversion result of the conversion process and the analysis result of the morphological analysis. It is good to let.
[0034]
Further, the text analysis program for speech synthesis as one embodiment of the present invention stores in a computer a reading designation character string and a designated reading kana from a processing target text including a reading designation character string designated by a reading kana. Extracting, and dividing the processing target text into morphemes by referring to a morpheme dictionary that associates at least the notation of the morpheme, the phonetic kana, and the phonetic symbol, to generate a morpheme candidate having the phonetic kana and the phonetic symbol as attributes. When selecting an optimal morpheme from among a plurality of morpheme candidates, a process of excluding morpheme candidates having different readings by comparing the phonetic kana of the morpheme candidate with the phonetic kana specified in the phonetic designation string. And generating phonetic symbol data of the text to be processed based on the selected morpheme. There.
[0035]
Further, a speech synthesis program as one embodiment of the present invention includes any of the text analysis programs for speech synthesis described above, and further includes a computer that performs processing for generating synthesized speech based on phonetic symbol data generated by the program. It is good to run.
[0036]
BEST MODE FOR CARRYING OUT THE INVENTION
Preferred embodiments of the present invention will be illustratively described in detail below with reference to the drawings.
[0037]
(1st Embodiment)
<Configuration of speech synthesizer>
FIG. 1 is a block diagram showing a functional configuration of the speech synthesizer according to the first embodiment of the present invention.
[0038]
The speech synthesizer 1 can be configured by a general-purpose computer including a CPU (central processing unit), a memory, a hard disk, and the like as basic hardware.
[0039]
The speech synthesizer 1 includes a text input unit 2, a text analysis unit 3, a syntax analysis unit 4, an accent phrase processing unit 5, a speech synthesis unit 6, and a speech output unit 7 as functions related to speech synthesis processing. These functions are realized by a program stored in a storage device such as a hard disk being read into a memory and executed by a CPU.
[0040]
The text input unit 2 has a function of reading a processing target text to be subjected to speech synthesis processing from an input device 14 such as a keyboard or a storage medium 15 such as a hard disk or a memory card. The read processing target text is delivered to the text analysis unit 3 as input data.
[0041]
In the speech synthesis device 1 of the present embodiment, a sentence mixed with Japanese kanji or kana can be input as a text to be processed, and the text to be processed can include a reading designation by “Yomigana” in advance.
[0042]
Although any method can be used for the reading designation, this embodiment adopts a method of attaching a reading designation tag to a character string to be designated for reading. More specifically, in the text to be processed, "From Kyoto to Kokura is 290 yen."
"It's 290 yen from Kyoto to <PRON SYM =" Ogura "> Ogura </ PRON>."
Describe as follows.
[0043]
The portion from the symbol “<” to the symbol “>” is a tag, “<PRON...>” Is a start tag for reading specification, and “</ PRON>” is an end tag for reading specification. The character string “Ogura” sandwiched between the start tag and the end tag is the read specified character string. Then, the part of “SYM =“ Ogura ”” in the start tag represents the reading pseudonym “Ogura” specified in the reading specification character string “Ogura”.
[0044]
Since the place name "Ogura" includes "Kuraku" in Kyushu in addition to "Ogura" in Kyoto, it may be difficult to determine which reading to output by automatic text analysis. Therefore, if such a homonymous word is read by the user in advance, so-called reading error can be prevented beforehand.
[0045]
The text analysis unit 3 has a function of generating phonetic symbol data of the text to be processed by performing text analysis on the input data. The text analysis unit 3 identifies the morpheme and the attribute of the morpheme by referring to the dictionary 10 stored in the hard disk. Note that the text analysis unit 3 performs a special process on the designated reading character string, and the configuration and the details of the process will be described later.
[0046]
The phonetic symbol data output from the text analysis unit 3 is structured data representing morphemes constituting the text to be processed and attributes of the morphemes. Each morpheme has, as attributes, phonetic symbols, parts of speech, accent types, accent combining styles, and the like.
[0047]
The parsing unit 4 has a function of parsing phonetic symbol data to identify a dependency relationship between morphemes. At this time, the syntax analyzer 4 refers to the grammar 11 as language knowledge.
[0048]
The accent phrase processing unit 5 has a function of collecting a plurality of morphemes into a unit called an accent phrase based on the result of the syntax analysis and identifying an accent position of the accent phrase. An accent phrase is a unit of prosody when reading out a sentence. When performing the accent phrase processing, reference is made to an accent combination rule 12 that describes whether a plurality of consecutive morphemes constitute one accent phrase and, when combined, what accent type is to be formed.
[0049]
The speech synthesis unit 6 functions as a speech synthesis unit that generates a synthesized speech based on the phonetic symbols obtained by the above-described processing. The voice synthesizer 6 generates a synthesized voice using the voice data 13.
[0050]
The voice output unit 7 has a function of controlling a voice output device such as the speaker 16 and outputting a synthesized voice.
[0051]
<Configuration of text analysis unit>
Next, a detailed configuration of the text analysis unit 3 will be described with reference to FIG.
[0052]
FIG. 2 is a block diagram showing a configuration of a text analysis unit as a text analysis device for speech synthesis according to the first embodiment of the present invention.
[0053]
The text analysis unit 3 generally includes a reading designation tag analysis unit 30, a reading character string-phonetic symbol character string conversion unit (hereinafter, simply referred to as a "conversion unit") 31, a morphological analysis unit 32, and an output unit 33. Be composed. Further, the text analysis unit 3 has, as the dictionary 10 in FIG. 1, a pronunciation symbol dictionary 36 referred to by the conversion unit 31 and a morpheme dictionary 37 referred to by the morpheme analysis unit 32.
[0054]
The reading specification tag analysis unit 30 is a part that functions as a reading specification extraction unit that extracts a reading specification character string and a specified reading kana from a processing target text. The reading designation tag analysis unit 30 examines the input data that has been input in order from the beginning, and reads the text to be processed based on the reading designation tag as a reading designation character string and another character string not designated as reading (hereinafter, “ Normal character string "). As for the reading designation character string, the reading kana is also extracted at the same time. Then, the specified reading character string and its reading kana are transferred to the conversion unit 31, and the normal character string is transferred to the morphological analysis unit 32.
[0055]
The conversion unit 31 is a part that functions as a conversion unit that converts the phonetic symbol specified in the phonetic symbol dictionary with reference to the phonetic symbol dictionary 36 to phonetic symbols. As shown in FIG. 3, the phonetic symbol dictionary 36 is a dictionary in which registered words “notation (heading)”, “reading kana”, and “phonetic symbols” are associated with each other. Here, “notation” is a character string mixed with kanji and kana, “yomikana” is a hiragana, and “pronunciation symbol” is a phonetic symbol character string with prosodic symbols, which are registered. By referring to the dictionary having such a structure, it is possible to convert, for example, a reading kana “Aikyo” into a pronunciation symbol with a prosodic symbol “Aikyo'-”.
[0056]
The morphological analysis unit 32 is a part that functions as a morphological analysis unit that obtains phonetic symbols of morphemes forming a normal character string by morphological analysis. The morphological analysis unit 32 includes a dictionary search unit 34, an optimal solution search unit 35, and a search buffer 38.
[0057]
The dictionary search unit 34 has a function of dividing a normal character string into morphemes with reference to the morphological dictionary 37. The morpheme dictionary 37 is a dictionary in which “notations (headings)”, “part of speech”, “phonetic symbols”, “accent type”, “accent combination style”, and the like of morphemes are associated.
[0058]
The dictionary search unit 34 examines the normal character string in order from the beginning or end, and if a character string that matches the notation of the morpheme registered in the morphological dictionary 37 is found, the character string is set as a morpheme candidate. Here, morpheme candidates of all possible combinations are extracted. That is, for the character string "Kyoto", a candidate consisting of two morphemes "Kyo" + "To" in addition to "Kyoto" is also extracted. The morpheme candidates are stored in the search buffer 38 together with attributes such as part of speech, phonetic symbols, accent type, and accent combination style.
[0059]
The optimum solution search unit 35 has a function of selecting an optimum morpheme from a plurality of morpheme candidates stored in the search buffer 38. The optimal solution search unit 35 determines an optimal combination of morphemes (optimal solution) based on the morpheme cost given to each morpheme and the connection cost indicating the ease of connection between morphemes. If the combination of the morphemes forming the normal character string is determined, the phonetic symbols of the normal character string are also determined.
[0060]
The output unit 33 is a part that functions as a generation unit that generates phonetic symbol data of the text to be processed by combining the conversion result of the conversion unit 31 and the analysis result of the morphological analysis unit 32. The generated phonetic symbol data is output as output data.
[0061]
<Process of text analysis unit>
Next, a processing flow of the text analysis unit 3 having the above configuration will be described with reference to FIG. FIG. 4 is a flowchart illustrating the flow of the text analysis process according to the present embodiment.
[0062]
Here, as shown in FIG.
"It's 290 yen from Kyoto to <PRON SYM =" Ogura "> Ogura </ PRON>."
A specific flow of the text analysis process will be described by taking as an example a case where the input data is input.
[0063]
First, a character string mixed with kanji and kana with a reading designation tag "290 yen from Kyoto to <PRON SYM =" Ogura "> Ogura </ PRON>" is input from the text input unit 2 (step S100).
[0064]
Next, the reading designation tag analysis unit 30 examines the reading designation tag, and separates the processing target text into a reading designation character string and a normal character string (step S101). Here, as shown in FIG. 5B, the section is divided into three sections: a normal character string "from Kyoto", a designated reading character string "Kokura", and a normal character string "up to 290 yen." For the specified reading character string "Kokura", the specified reading pseudonym "Ogura" is also extracted. After that, the first character string is passed to the next processing in order.
[0065]
Since the normal character string "from Kyoto" is not a section with a designated reading, it is passed to the morphological analysis unit 32 (step S102). The morphological analysis unit 32 identifies the morpheme and its attribute by morphologically analyzing the normal character string “from Kyoto” (step S103). The character string "Kyoto-kara" is divided into two morphemes, "Kyoto" and "Kara," and the phonetic symbol "Kyoto" and the part-of-speech "Place name" are obtained as attributes of the morpheme "Kyoto." As the attributes of "kara", phonetic symbols "ka'la", parts of speech "particles", and the like are obtained. These analysis results are delivered to the output unit 33.
[0066]
Then, the process proceeds to the next character string "Kokura" (step S104). Since this character string is a section for which reading is specified, it is transferred to the conversion unit 31 (step S102). The conversion unit 31 refers to the phonetic symbol dictionary 36, and converts the reading kana "ogura" specified for the character string "Kokura" into phonetic symbols (step S105). As shown in FIG. 3, the pronunciation kana “ogura” is converted into a phonetic symbol character string with prosody symbol “okudura”. This conversion result is delivered to the output unit 33.
[0067]
Then, the process proceeds to the next character string “up to 290 yen” (step S104). Since this character string is not a section with a designated reading, it is transferred to the morphological analysis unit 32 (step S102), and morphological analysis is performed in the same manner as described above (step S103). This analysis result is also delivered to the output unit 33.
[0068]
Since the processing has been performed on all the character strings, the processing shifts to the processing of the output unit 33 (step S104). The output unit 33 combines the analysis result of the character string “from Kyoto” and “up to 290 yen” by the morphological analysis unit 32 with the conversion result of the character string “Ogura” by the conversion unit 31 and obtains the result shown in FIG. ) Is generated and output (step S106). The data output here is passed to the subsequent syntax analysis processing.
[0069]
As described above, according to the configuration of the present embodiment, as the pronunciation for the character string “Ogura”, instead of the phonetic symbol such as “Okuura”, the reading kana using kana characters such as “Ogura” should be specified. Since it is enough, even a user who does not have specialized knowledge of pronunciation or phonetics can easily specify reading.
[0070]
Then, by referring to the phonetic symbol dictionary 36, the designated kana “ogura” is converted to the phonetic symbol “okudura”, so that it is possible to perform speech synthesis with correct pronunciation and accent based on the conversion result. Become.
[0071]
In addition, for the ordinary character strings “Yokoto” and “up to 290 yen” for which the reading is not specified, appropriate phonetic symbols are generated by morphological analysis.
[0072]
Therefore, it is possible to perform speech synthesis with the correct pronunciation and accent intended by the user over the entire text to be processed, with only a simple pre-processing of specifying the pronunciation of the character string to be read specially by "Yomikana". It becomes possible.
[0073]
<Modification 1>
In the present embodiment, the phonetic symbol dictionary 36 and the morphological dictionary 37 are configured separately, but the morpheme dictionary can be used as a phonetic symbol dictionary by registering the reading kana of the morpheme in the morphological dictionary. FIG. 6 shows a configuration example of the text analysis unit in this case. That is, the conversion unit 31 refers to the morphological dictionary 37 when converting the reading kana into phonetic symbols. According to this configuration, attributes such as the part of speech, the accent type, and the accent combination style can be obtained for the designated reading character string. By utilizing these attributes in syntactic analysis and accent phrase processing, it is possible to generate more natural synthesized speech.
[0074]
<Modification 2>
In the present embodiment, if the specified pronunciation kana is a word that is not registered in the phonetic symbol dictionary 36, that is, a word that has not been registered in the dictionary, it may not be possible to convert it into phonetic symbols. Therefore, the converting unit 31 is provided with a function as an estimating means for estimating phonetic symbols based on the pronunciation of a reading kana, a reading designation character string, or a combination thereof. It is preferable to estimate phonetic symbols. This makes it possible to convert the phonetic symbols into phonetic symbols even if the phonetic symbols of the dictionary unregistered words are specified.
[0075]
<Modification 3>
It is also preferable that both the reading specification using the reading kana and the reading specification using the phonetic symbols can be used, and the reading specification can be switched. Switching of the reading designation method may be performed as follows.
[0076]
(1) How to specify in the configuration file
In a setting file such as a system setting file or a registry, a reading designation method is described such as “ORTHOGRAPHY =“ reading kana ”” or “ORTHOGRAPHY =“ phonetic symbol ””. The text analysis unit 3 refers to the settings in the setting file at the time of startup or the like. In the former case, the text analyzing unit 3 treats the specified reading character string as the specified reading kana, and in the latter case, processes it as the specified phonetic symbol. Switch.
[0077]
(2) Declaration method in input data
At the top of the input data, etc., a tag of the reading designation method is described, such as "<ORTHOGRAPHY =" Yomi "" or "<ORTHOGRAPHY =" phonetic symbol ">. The text analysis unit 3 switches, for each input data, whether to process it as the pronunciation kana designation or the pronunciation symbol designation.
[0078]
(3) Switching by adding an identification code to the reading specification
In the reading designation tag, an identification code indicating whether designation is made by a reading kana or a phonetic symbol is described. The text analysis unit 3 switches the processing of the designated reading character string based on the identification code when performing the text analysis processing. For example, when “ORTH” is used as the identification code, “<PRON ORTH =“ phonetic symbol ”SYM =“ Kyoto ”> Kyoto </ PRON> to <PRON ORTH =“ reading pseudonym ”SYM =” For input data "Ogura"> Ogura </ PRON> is 290 yen. ", The processing is switched so that" Kyoto "is treated as phonetic symbol designation and" Ogura "is treated as reading kana designation.
[0079]
(4) Method of determining by character set to be used
The process is switched so that when the pronunciation is designated as "hiragana", it is treated as designation of kana, and when the designation is "katakana", it is treated as designation of phonetic symbols. For example, for the input data "<PRON SYM =" Kyoto "> Kyoto </ PRON> to <PRON SYM =" Ogura "> Kokura </ PRON>, the price is 290 yen. Since the designation is katakana, it is treated as a phonetic symbol designation, and the reading designation of "Ogura" is treated as a hiragana designation because it is hiragana.
[0080]
As described above, by making a plurality of reading designation methods available and switching them appropriately, the convenience and flexibility of the apparatus are improved.
[0081]
(Second embodiment)
Next, a second embodiment of the present invention will be described.
[0082]
The first embodiment is different from the present embodiment only in the configuration of the text analysis unit, and the other configurations and operations are the same.
[0083]
<Configuration of text analysis unit>
FIG. 7 is a block diagram illustrating a configuration of a text analysis unit as the text analysis device for speech synthesis according to the present embodiment.
[0084]
The text analysis unit 8 includes an outline, a read designation tag analysis unit 80, a morphological analysis unit 81, and a phonetic symbol character string output unit 82. Here, the morphological analysis unit 81 includes a dictionary search unit 83, a reading kana checking unit 84, an optimal solution search unit 85, a morphological dictionary 86, and a search buffer 87.
[0085]
The reading specification tag analysis unit 80 is a part that functions as a reading specification extraction unit that extracts a reading specification character string and a specified reading kana from the text to be processed. The reading designation tag analysis unit 80 sequentially checks the input data that has been input from the beginning, and separates the processing target text into a reading designation character string and a normal character string based on the reading designation tag. As for the reading designation character string, the reading kana is also extracted at the same time. Then, it passes the specified reading character string and its reading kana to the reading kana inspection unit 84, and transfers the specified reading character string and the normal character string to the dictionary search unit 83.
[0086]
The dictionary search unit 83 functions as a morpheme candidate generation unit that generates a morpheme candidate by dividing the text to be processed (that is, the designated reading character string and the normal character string) into morphemes with reference to the morpheme dictionary 86. In the present embodiment, as shown in FIG. 8, the morpheme dictionary 86 includes “notation (heading)”, “reading kana”, “part of speech”, “phonetic symbol”, “accent type”, and “accent combination style” of the morpheme. For example, a dictionary associated with such information is used.
[0087]
The dictionary search unit 83 examines the text to be processed in order from the beginning or end, and if a character string that matches the notation of the morpheme registered in the morphological dictionary 86 is found, the character string is set as a morpheme candidate. Here, morpheme candidates of all possible combinations are extracted. The morpheme candidates are stored in the search buffer 87 together with attributes such as a reading kana, a part of speech, a phonetic symbol, an accent type, and an accent combination style.
[0088]
The reading kana checking unit 84 functions as a reading kana checking unit that excludes morpheme candidates having different readings by comparing the reading kana specified in the reading specifying character string with the reading kana of the morpheme candidate. In other words, of the morpheme candidates stored in the search buffer 87, those that do not match the reading kana specified in the reading specification character string are deleted from the search buffer 87.
[0089]
The optimum solution search unit 85 functions as an optimum solution search unit that selects an optimum morpheme from a plurality of morpheme candidates stored in the search buffer 87. The optimal solution search unit 85 selects a plurality of morpheme candidates remaining in the search buffer 87 based on the morpheme cost given to each morpheme and the connection cost indicating the ease of connection between morphemes. Find the optimal combination of morphemes (optimal solution).
[0090]
The phonetic symbol character string output unit 82 is a unit that functions as a generating unit that generates phonetic symbol data of the text to be processed based on the morpheme selected by the optimal solution searching unit 85. The phonetic symbol data generated here is the same as that in the first embodiment. The phonetic symbol data is output as output data.
[0091]
<Process of text analysis unit>
Next, a processing flow of the text analysis unit 8 having the above configuration will be described with reference to FIG. FIG. 9 is a flowchart illustrating the flow of the text analysis process according to the present embodiment.
[0092]
Here, as shown in FIG.
"<PRON SYM =" Kyoto to Ogura "> From Kyoto to Kokura </ PRON><PRON SYM =" 290 en "> 290 yen </ PRON>.
A specific flow of the text analysis process will be described by taking as an example a case where the input data is input.
[0093]
First, input data is input from the text input unit 2 (step S200).
[0094]
Next, the reading designation tag analysis unit 80 checks the reading designation tag, and separates the processing target text into a reading designation character string and a normal character string (step S201). Here, as shown in FIG. 10 (b), four sections of a specified reading character string "from Kyoto to Kokura", a normal character string "to", a specified reading character string "290 yen", and a normal character string "is." Divided into As for the designated reading character strings “Kyoto to Kokura” and “290 yen”, the designated reading pseudonyms “Kyoto to Ogura” and “290 en” are also extracted.
[0095]
Then, the processing target character string “290 yen from Kyoto to Kokura” is transferred to the dictionary search unit 83, and the specified reading character string and the reading kana are transferred to the reading kana inspection unit 84.
[0096]
Next, the dictionary search unit 83 divides the processing target character string into morphemes and stores the obtained morpheme candidates in the search buffer 87 (Step S202). FIG. 11A shows the contents of the search buffer 87 at this time. The dictionary search unit 83 extracts readings originally intended by the user, such as “Kyo (Tokyo)”, “Miyako (Miyako)”, “Kokura (Kokura)”, in order to extract morpheme candidates of all possible combinations. Morpheme candidates different from the above are also extracted.
[0097]
Subsequently, the reading kana checking section 84 executes a reading kana checking process (step S203).
[0098]
First, a combination of morpheme candidates constituting the designated reading character string “Kyoto to Kokura” is extracted. here,
(1) "Today-Miyako-Kara-La-Kokura"
(2) "Today-Miyako-Ka-La-Ogura"
(3) "Today-Miyako-from-Kokura"
(4) "Today-Miyako-Kara-Ogura"
(5) "Today-Kara-La-Kokura"
(6) "Today-ka-la-ogur"
(7) "Today-from-Kokura"
(8) "Today-from-Ogura"
Thus, eight kinds of combinations are obtained.
[0099]
Then, only the morpheme candidates composing the combinations (6) and (8) whose readings match the designated pronunciation pseudonym “Kyotokaraogura” are left, and the other morpheme candidates are deleted from the search buffer 87. As a result, the three morpheme candidates “Kyo”, “Miyako”, and “Kokura” are deleted.
[0100]
Next, the reading kana check processing is also performed on the reading designation character string "290 yen", but here, the following problem occurs. In the reading specification of "290 yen", kana characters should be originally used such as "Nihyakukyujuen", but in this example, the reading specification is made by a number such as "290 yen". Therefore, in the same processing as described above, the reading kana check processing cannot be performed.
[0101]
Therefore, if the reading is specified by an alphanumeric symbol, the reading kana checking unit 84 exceptionally compares the morpheme candidate with the notation, not the reading kana, so that the alphanumeric symbol matches the notation. Then, the morpheme candidate is left in the search buffer 87. In this example, the part “290” of the reading designation is compared with the notation “290” of the morpheme candidate, and this morpheme is left in the search buffer 87. Then, for the other part "en", the reading kana check is executed in the same manner as described above.
[0102]
FIG. 11B shows the contents of the search buffer 87 after the reading kana check. It can be seen that correct reading can be obtained with any combination of the remaining morpheme candidates.
[0103]
The optimal solution search unit 85 calculates a morpheme cost and a connection cost for each combination of the morpheme candidates left in the search buffer 87, and selects an optimal combination (Step S204).
[0104]
Then, the phonetic symbol character string output unit 82 generates and outputs phonetic symbol data as shown in FIG. 10C based on the selected morpheme candidate (step S205). The data output here is passed to the subsequent syntax analysis processing.
[0105]
As described above, according to the configuration of the present embodiment, the pronunciation for the character string “Kyoto to Kokura” is not a phonetic symbol such as “Kyotokara / okudura”, but is “Kyotokaraogura”. Since it is sufficient to specify the reading kana using kana characters, even a user who does not have expertise in pronunciation or phonetics can easily specify the reading.
[0106]
Also, since morphological analysis is performed on the read designation character string, it is possible to designate the read for an arbitrary section such as a compound word, phrase, section, sentence, and the like.
[0107]
Then, after performing the morphological analysis of the specified reading character string, morpheme candidates different from the specified reading kana were excluded, and the optimal solution was selected from among the morpheme candidates that matched the reading, so that the user specified Accurate phonetic symbols corresponding to the read pronunciation are generated.
[0108]
Further, when the reading is specified by an alphanumeric symbol such as “290”, the notation of the morpheme candidate is compared with that of the morpheme candidate. In the other sections, that is, the section "en" specified by reading with kana characters, the reading kana inspection processing can be accurately performed.
[0109]
Therefore, according to the configuration of the present embodiment, it is possible to obtain the correct pronunciation intended by the user over the entire text to be processed only by a simple pre-processing of specifying the reading of the character string to be read specially by the “reading kana” It is possible to perform speech synthesis using accents.
[0110]
<Modification 1>
In the present embodiment, the generated morpheme candidate is first subjected to a reading kana check process to exclude morpheme candidates having different readings, and then an optimal solution is selected from the remaining morpheme candidates. The embodiment of the kana inspection process is not limited to this.
[0111]
For example, it is preferable that the search for the optimal solution is performed for all the morpheme candidates generated by the dictionary search unit 83, and the reading kana check process is executed simultaneously with the search process.
[0112]
Alternatively, for all the morpheme candidates generated by the dictionary search unit 83, after narrowing down to a plurality of candidate solutions based on morpheme cost, connection cost, and the like, a reading kana check process is performed on the plurality of candidate solutions. Thus, a mode in which the optimum solution is obtained is also preferable.
[0113]
<Modification 2>
It is also preferable that the reading designation character string be analyzed as one morpheme or analyzed in consideration of the possibility of a plurality of morphemes. This switching may be performed as follows.
[0114]
(1) How to specify in the configuration file
An analysis method is described in a setting file such as a system setting file or a registry such as “WORDNUM =“ single ”” or “WORDNUM =“ plural ””. The text analysis unit 8 refers to the setting of the setting file at the time of startup or the like. In the former case, the reading specified character string is treated as a single morpheme, and in the latter case, it is considered as in the present embodiment. The process of dividing into all the obtained morphemes is performed.
[0115]
(2) Declaration method in input data
A tag of the analysis method is described at the top of the input data, such as “<WORDNUM =“ single ”>” or “<WORDNUM =“ plurality ”>”. The text analysis unit 8 determines, for each input data, whether to handle the specified reading character string as a single morpheme.
[0116]
(3) Switching by adding an identification code to the reading specification
An identification code indicating whether or not to be treated as a single morpheme is described in the reading designation tag. The text analysis unit 8 switches the analysis method of the designated reading character string based on the identification code when performing the text analysis processing. For example, when “WORDNUM” is used as the identification code, “<PRON SYM =“ Today ”WORDNUM =“ Single ”> Today </ PRON><PRON SYM =” "I will tell you the yen exchange rate of" multiple "> Tokyo foreign exchange market </ PRON>.", The part of "Today" is regarded as a single morpheme, and Is divided into morphemes.
[0117]
By making it possible to specify the method of analyzing the specified reading character string, the convenience and flexibility of the apparatus are improved.
[0118]
<Modification 3>
In the present embodiment, the phonetic symbol “Nihyakukyujuju” obtained by morphological analysis is output as the reading of the character string “290” read and specified by the alphanumeric symbol. Configuration can be adopted.
[0119]
For example, by providing means for converting alphanumeric symbols into phonetic symbols based on a predetermined rule, phonetic symbols of alphanumeric symbols may be generated.
[0120]
However, the conversion based on the rule may not be able to cope with the pronunciation unique to the word. Therefore, it is also preferable to provide means for converting an alphanumeric symbol to a reading kana so that the reading kana inspection process is performed using the reading kana. By performing the pre-processing for converting the alphanumeric symbol to the kana character in this way, the reading kana inspection processing can be uniformly performed on the specified reading character string, so that the processing can be simplified. Since phonetic symbols can be obtained by morphological analysis, accurate phonetic symbols can be obtained for alphanumeric symbols.
[0121]
It is also preferable that the morphological dictionary is configured to be able to register a notation by an alphanumeric symbol in a reading kana of a morpheme of an alphanumeric symbol. For example, not only the kana character “ju” but also the notation “10” is registered as the reading kana of the morpheme of the number “10”. Thereby, even when an alphanumeric symbol is specified as a reading kana, the reading kana inspection process can be uniformly performed, and the process can be simplified. In addition, accurate phonetic symbols can be obtained for alphanumeric symbols based on phonetic symbols in the morphological dictionary.
[0122]
【The invention's effect】
As described above, according to the present invention, even a user who does not have specialized knowledge of pronunciation or phonetics can easily designate reading and speech synthesis with correct pronunciation and accent can be performed.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a functional configuration of a speech synthesis device according to a first embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a text analysis unit according to the first embodiment of the present invention.
FIG. 3 is a diagram illustrating a structure of a phonetic symbol dictionary according to the first embodiment of the present invention.
FIG. 4 is a flowchart illustrating a flow of a text analysis process according to the first embodiment of the present invention.
5A is a diagram illustrating an example of input data, FIG. 5B is a diagram illustrating an example of an analysis result of a reading designation tag analysis process, and FIG. 5C is a diagram illustrating an example of phonetic symbol data; It is.
FIG. 6 is a block diagram showing a configuration of a first modification of the text analysis unit according to the first embodiment of the present invention.
FIG. 7 is a block diagram illustrating a configuration of a text analysis unit according to a second embodiment of the present invention.
FIG. 8 is a diagram showing a structure of a morphological dictionary according to a second embodiment of the present invention.
FIG. 9 is a flowchart illustrating a flow of a text analysis process according to the second embodiment of the present invention.
10A is a diagram illustrating an example of input data, FIG. 10B is a diagram illustrating an example of an analysis result of a reading designation tag analysis process, and FIG. 10C is a diagram illustrating an example of phonetic symbol data; It is.
11A is a diagram showing the contents of the search buffer before the reading kana checking process, and FIG. 11B is a diagram showing the contents of the search buffer after the reading kana checking process.
[Explanation of symbols]
1 Voice synthesis device
2 Text input section
3 Text analysis section
4 Syntax analyzer
5 Accent phrase processing unit
6 Voice synthesis unit
7 Audio output section
8 Text analysis unit
10 dictionaries
11 Grammar
12 Accent binding rules
13 Voice data
14 Input device
15 Storage media
16 speakers
30 Reading specification tag analysis unit
31 Conversion unit
32 Morphological analyzer
33 Output unit
34 Dictionary Search Unit
35 Optimal Solution Search Unit
36 phonetic dictionary
37 Morphological dictionary
38 Search buffer
80 Reading specification tag analysis unit
81 Morphological analysis unit
82 Phonetic symbol string output section
83 Dictionary Search Unit
84 Yomi Kana Inspection Division
85 Optimal Solution Search Unit
86 Morphological dictionary
87 Search buffer

Claims

A dictionary that associates at least the reading kana and phonetic symbols,
Conversion means for converting the reading kana of the reading specified character string specified by the reading kana into phonetic symbols by referring to the dictionary,
A text analysis device for speech synthesis having

Reading designation extraction means for extracting a reading designation character string and a specified reading kana from a processing target text including a reading designation character string designated by a reading kana,
A phonetic dictionary that associates at least the reading kana and phonetic symbols,
Conversion means for converting the phonetic symbol specified in the phonetic symbol dictionary into phonetic symbols with reference to the phonetic symbol dictionary;
A morphological analysis unit that obtains phonetic symbols of morphemes constituting another character string that is not specified to be read in the text to be processed, by morphological analysis;
Generating means for generating the phonetic symbol data of the text to be processed by combining the conversion result of the conversion means and the analysis result of the morphological analysis means;
A text analysis device for speech synthesis having

Reading designation extraction means for extracting a reading designation character string and a specified reading kana from a processing target text including a reading designation character string designated by a reading kana,
A morphological dictionary that associates at least morpheme notation, phonetic kana and phonetic symbols,
Morpheme candidate generation means for generating a morpheme candidate having the pronunciation kana and phonetic symbols as attributes by dividing the processing target text into morphemes with reference to the morpheme dictionary,
When selecting an optimal morpheme from among a plurality of morpheme candidates, a phonetic kana check that excludes morpheme candidates with different pronunciations by collating the phonetic kana specified in the phonetic designation string with the phonetic kana of the morpheme candidate. Means,
Generating means for generating phonetic symbol data of the text to be processed based on the selected morpheme;
A text analysis device for speech synthesis having

If the reading kana specified in the reading specification string contains alphanumeric symbols,
4. The text analysis apparatus for speech synthesis according to claim 3, wherein the reading kana checking means checks the alphanumeric symbols with the notation of morpheme candidates.

The text analysis device for speech synthesis according to claim 3, wherein the morphological dictionary is configured to be able to register a notation by an alphanumeric symbol in a reading kana of a morpheme of an alphanumeric symbol.

Computer
Extracting the reading specified character string and the specified reading kana from the processing target text including the reading specified character string specified by the reading kana,
At least the pronunciation kana specified in the pronunciation designation string is converted to a pronunciation symbol by referring to a pronunciation symbol dictionary that associates the pronunciation kana with the pronunciation symbol,
By the morphological analysis, the phonetic symbols of the morphemes constituting the other character strings that are not specified for reading in the text to be processed are obtained,
A text analysis method for speech synthesis that generates phonetic symbol data of a processing target text by combining a conversion result of a conversion process and an analysis result of a morphological analysis.

Computer
Extracting the reading specified character string and the specified reading kana from the processing target text including the reading specified character string specified by the reading kana,
By dividing the processing target text into morphemes with reference to at least the morpheme notation, the morpheme dictionary that associates the phonetic symbols and phonetic symbols, a morpheme candidate having phonetic symbols and phonetic symbols as attributes is generated,
When selecting the best morpheme from among a plurality of morpheme candidates, by comparing the phonetic kana specified in the phonetic designation string with the phonetic kana of the morpheme candidate, excluding morpheme candidates having different phonemes,
A text analysis method for speech synthesis that generates phonetic symbol data of a text to be processed based on a selected morpheme.

8. The text analysis method for speech synthesis according to claim 7, wherein when the pronunciation kana specified in the pronunciation designation character string includes an alphanumeric symbol, the alphanumeric symbol is collated with the notation of a morpheme candidate.

On the computer,
A process of extracting the reading specified character string and the specified reading kana from the processing target text including the reading specified character string specified by the reading kana,
A process of converting a phonetic symbol specified in the phonetic symbol string into phonetic symbols by referring to a phonetic symbol dictionary that associates at least the phonetic symbols with phonetic symbols;
A process of acquiring phonetic symbols of morphemes constituting another character string not specified to be read in the text to be processed by morphological analysis;
A process of combining the conversion result of the conversion process and the analysis result of the morphological analysis to generate phonetic symbol data of the text to be processed;
A text analysis program for speech synthesis that runs.

On the computer,
A process of extracting the reading specified character string and the specified reading kana from the processing target text including the reading specified character string specified by the reading kana,
A process of generating a morpheme candidate having the pronunciation kana and the phonetic symbol as attributes by dividing the processing target text into morphemes by referring to at least the morpheme notation, the morpheme dictionary that associates the phonetic symbol and the phonetic symbol,
When selecting the best morpheme from a plurality of morpheme candidates, by comparing the phonetic kana of the morpheme candidate with the phonetic kana specified in the phonetic designation string, a process of excluding morpheme candidates with different phonetic readings,
A process of generating phonetic symbol data of the text to be processed based on the selected morpheme;
A text analysis program for speech synthesis that runs.

The text analysis program for speech synthesis according to claim 10, wherein when the pronunciation kana specified in the pronunciation designation character string includes an alphanumeric symbol, a process of collating the alphanumeric symbol with the notation of a morpheme candidate is executed.

A text analysis device for speech synthesis according to any one of claims 1 to 5,
Speech synthesis means for generating a synthesized speech based on phonetic symbols obtained from the text analysis device for speech synthesis,
A speech synthesizer having: