JP3573889B2

JP3573889B2 - Audio output device

Info

Publication number: JP3573889B2
Application number: JP31876696A
Authority: JP
Inventors: 奈穂子佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1996-11-14
Filing date: 1996-11-14
Publication date: 2004-10-06
Anticipated expiration: 2016-11-14
Also published as: JPH10143186A

Description

【０００１】
【発明の属する技術分野】
本発明は、日本語テキストを音声に変換して出力する音声出力装置に関する。
【０００２】
【従来の技術】
音声出力装置の一例としてテキスト音声合成システムが知られている。この種のテキスト音声合成システムは、一般に、自然言語で書かれたテキストが入力されるとき、入力されたテキストの文字列に対し、形態素辞書などを参照して形態素候補を生成し、次に一定のアルゴリズムにより形態素候補中から選択した最適解に対して読みを含む音韻を設定し、さらに一定の規則（ルール）に従ってアクセント位置，ポーズ位置を設定して制御記号（発音記号列）に変換し、この制御記号を音声合成器に与えて、制御記号に応じた音声を出力するようになっている。
【０００３】
ここで、音韻設定には、通常、形態素辞書に記載された読み情報を用いることが多いが、形態素辞書に未登録の語が入力テキスト中に存在する場合、未登録の語については、適切な読みを付与できず、適切な音韻を設定できないことがあった。
【０００４】
このような未登録な語に対して読みを付与するために、従来では、例えば、特開平５−２０３０２号，特開平２−２２５１７４号に示されているように、形態素辞書（日本語辞書）を参照して、漢字複合語の中で読みを付与できなかった文字列（未登録文字列）について、その漢字一文字ずつ漢字辞書を用いて読みを付与する方法が提案されている。また、特開平６−２８２２９０号には、入力された文字テキストについて字種判定部にて字種を判定し、辞書検索部にて全ての音韻（読み）や単語の区切りなどの仮設候補を列挙し、仮設候補選択部にて最も相応しい候補を選択するときに、仮設候補選択部にて不明語とされた漢字を含む文字について、単独漢字辞書を参照して、各単独漢字ごとに、所定の規則に従い、その音韻を決定する（漢字には複数の読みがあることを踏まえ、単漢字辞書中の読みに優先順位を設けて条件に応じた音韻を与える）方法が提案されている。
【０００５】
【発明が解決しようとする課題】
このように、未登録文字列について読みを付与する技術が種々提案されているが、従来では、未登録文字列について適切な読みを付与するには限界があった。すなわち、上述の従来技術によっては、例えば、未登録文字列が漢字かな交じりの文字列である場合に、適切な読みを付与することができないという問題があった。
【０００６】
本発明は、未登録文字列について、より適切な尤もらしい読み，音韻を付与し、より適切な音声を出力することの可能な音声出力装置を提供することを目的としている。
【０００７】
【課題を解決するための手段】
上記目的を達成するために、請求項１記載の発明は、入力された日本語のテキストを音声に変換して出力する音声出力装置において、入力された日本語のテキストに対して形態素辞書を用いて形態素候補を生成して言語解析する言語解析部と、言語解析部により生成された形態素候補に対して読みを含む音韻を設定する音韻設定部とを有しており、前記言語解析部は、前記テキストから前記形態素辞書を用いて形態素候補を生成する際、形態素辞書に登録されておらず形態素候補が生成できなかった一まとまりの文字列範囲を未登録文字列として前記テキストから抽出し、また、前記音韻設定部は、漢字単独の読み情報が記憶されている単漢字辞書と、漢字以外の文字と発音記号との対応が記憶されている音変換テーブルとを備え、前記言語解析部によって未登録文字列が抽出されたとき、前記音韻設定部は、前記単漢字辞書および音変換テーブルを用いて未登録文字列の読みを推定するようになっており、前記単漢字辞書には、読みの音訓の属性情報とともに、さらに、単独で出現した漢字に付与する単独読みの属性情報が記憶されており、少なくとも対象文字の前および／または後の文字種により、読みの音訓の属性情報あるいは単独読みの属性情報を用いて、未登録文字列の読みを決定するようになっていることを特徴としている。
【０００９】
また、請求項２記載の発明は、請求項１記載の音声出力装置において、前記単漢字辞書には、読みの音訓の属性情報とともに、さらに、特定の品詞を構成する場合に限った特定品詞読みの属性情報が記憶されていることを特徴としている。
【００１０】
また、請求項３記載の発明は、請求項１記載の音声出力装置において、前記言語解析部または音韻設定部は、抽出した未登録文字列に対し、該未登録文字列全体としての品詞属性を推定するようになっていることを特徴としている。
【００１１】
また、請求項４記載の発明は、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列の直前，直後の形態素候補を参照して、前記未登録文字列全体としての品詞属性を推定することを特徴としている。
【００１２】
また、請求項５記載の発明は、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、前記未登録文字列全体としての品詞属性を予め設定された品詞の範囲内で推定することを特徴としている。
【００１３】
また、請求項６記載の発明は、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列に対して未登録文字列全体としての品詞属性を推定したとき、該未登録文字列に対し、前記推定した品詞属性を持つ形態素候補を生成し、生成した該未登録文字列の形態素候補を形態素辞書に登録されている登録文字列の形態素候補と同様に扱うことを特徴としている。
【００１４】
また、請求項７記載の発明は、請求項１記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列が抽出されたときに、該未登録文字列の文字種を判定し、音韻設定部は、未登録文字列中に文字種が漢字の文字列があると判定した場合は、未登録文字列の読みを推定する際、該未登録文字列に対し単漢字辞書を参照して適切な読みを付与し、また、未登録文字列の漢字以外の文字種の文字列については、未登録文字列の読みを推定する際、音変換テーブルを用いて適切な読みを付与するようになっていることを特徴としている。
【００１５】
また、請求項８記載の発明は、請求項７記載の音声出力装置において、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、単漢字辞書に記憶されている複数の読み情報から適切な読みを選択することを特徴としている。
【００１６】
また、請求項９記載の発明は、請求項７記載の音声出力装置において、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、当該漢字の文字列の直前，直後の文字列の文字種を利用することにより、単漢字辞書に記憶されている複数の読み情報から適切な読みを選択することを特徴としている。
【００１７】
また、請求項１０記載の発明は、請求項７記載の音声出力装置において、前記言語解析部または音韻設定部は、抽出した未登録文字列に対し、該未登録文字列全体としての品詞属性を推定するようになっており、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、該未登録文字列全体に対して前記推定された品詞属性を用いて単漢字辞書に記憶されている複数の読み情報から適切な読みを選択することを特徴としている。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態を図面に基づいて説明する。図１は本発明に係る音声出力装置の構成例を示す図である。図１を参照すると、この音声出力装置は、テキスト入力部１から入力された日本語のテキストに対して、形態素辞書３を用いて形態素候補を生成して言語解析する言語解析部２と、言語解析部２により生成された形態素候補に対して読みを含む音韻を設定し、出力部７に与え、出力部７から音声を出力させる音韻設定部４とを有している。
【００１９】
ここで、言語解析部２は、前記テキストから形態素辞書３を用いて形態素候補を生成する際、形態素辞書３に登録されておらず形態素候補が生成できなかった文字列範囲を未登録文字列としてテキストから抽出するようになっている。
【００２０】
また、音韻設定部４は、漢字単独の読み情報と読みの音訓の属性情報とが記憶されている単漢字辞書５と、漢字以外の文字と発音記号との対応が記憶されている音変換テーブル６とを備え、言語解析部２によって未登録文字列が抽出されるとき、音韻設定部４は、単漢字辞書５および音変換テーブル６を用いて未登録文字列の読みを推定するようになっている。
【００２１】
図２は単漢字辞書５の一例を示す図である。図２を参照すると、単漢字辞書５には、読みの音訓の属性情報（訓読みの属性情報，音読みの属性情報）とともに、さらに、漢字１文字で独立して意味をなす単独読みの属性情報（単独で出現した漢字に付与する単独読みの属性情報）が記憶されている。さらに、図２の例では、単漢字辞書５には、特定の品詞を構成する場合に限った特定品詞読みの属性情報（図２の例では、固有名詞構成する漢字に付与する固有名詞読みの属性情報）が記憶されている。
【００２２】
また、図３は音変換テーブル６の一例を示す図である。図３を参照すると、音変換テーブル６には、漢字以外の文字（平仮名，片仮名など）に対して、発音記号（ａａ，ｉｉなど）が対応させて記憶されている。
【００２３】
より具体的に、図１の音声出力装置において、言語解析部２または音韻設定部４は、入力されたテキストから未登録文字列が抽出されたときに、該未登録文字列の文字種を判定し、音韻設定部４は、未登録文字列中に文字種が漢字の文字列があると判定した場合は、未登録文字列の読みを推定する際、該未登録文字列に対し図２のような単漢字辞書５を参照して適切な読みを付与し、また、未登録文字列の漢字以外の文字種の文字列については、図３のような音変換テーブル６を用いて適切な読みを付与するようになっている。
【００２４】
この場合、音韻設定部４は、未登録文字列中の漢字の文字列に対して読みを付与する際に、単漢字辞書５に記憶されている複数の読み情報から適切な読みを選択することができる。
【００２５】
具体的に、音韻設定部４は、未登録文字列中の漢字の文字列に対して読みを付与する際に、当該漢字の文字列の直前，直後の文字列の文字種を利用することにより、単漢字辞書に記憶されている複数の読み情報から適切な読みを選択することができる。
【００２６】
図４，図５は図１の音声出力装置の処理動作例を示すフローチャートである。なお、図４，図５の処理動作例では、言語解析部２から音韻設定部４に未登録文字列を与え、音韻設定部４において未登録文字列の文字種を判定し、読みを付与するようになっている。
【００２７】
図４，図５を参照すると、図１の音声出力装置では、テキスト入力部１からテキストが入力されると（ステップＳ１）、言語処理部２では、入力されたテキストに対し形態素辞書３を参照して形態素候補（形態素候補列）を生成する（ステップＳ２）。この際、言語処理部２は、形態素辞書３に登録されていない文字列については、未登録文字列として抽出する（ステップＳ３）。
【００２８】
このようにして、言語処理部２で生成された形態素候補，あるいは抽出された未登録文字列は、音韻設定部４へ送られる。音韻設定部４では、言語処理部２から未登録文字列が送られたかを判断する（ステップＳ４）。この結果、未登録文字列ではなく、形態素候補が送られたと判断したときには、音韻設定部４は、形態素候補のうちから、最適候補を選択し、形態素を確定し、これに読みを付与する（ステップＳ５）。そして、アクセント位置，ポーズ位置を確定し（ステップＳ６）、この文字列に対する音韻をアクセント位置，ポーズ位置などの情報とともに制御記号（発音記号列）に変換して出力部７に与え（ステップＳ７）、出力部７から音声として出力させる（ステップＳ８）。
【００２９】
これに対し、ステップＳ４において未登録文字列が送られたと判断した場合には、音韻設定部４は、この未登録文字列の字種を判定する（ステップＳ９）。具体的には、この未登録文字列が漢字か否かを判定する。この判定の結果、字種が漢字であるときには、音韻設定部４は、単漢字辞書５を検索してこの未登録文字列に読みを付与する（ステップＳ１０）。これに対し、ステップＳ９において、字種が漢字でないときには、音韻設定部４は、音変換テーブル６を検索してこの未登録文字列に読みを付与する（ステップＳ１１）。
【００３０】
このようにして、単漢字辞書５あるいは音変換テーブル６を参照して、未登録文字列に対して推定される読みを付与し、未登録文字列の最後尾まで読みを付与し終わったら（ステップＳ１２）、アクセント位置，ポーズ位置を確定し（ステップＳ６）、未登録文字列に対する音韻をアクセント位置，ポーズ位置などの情報とともに制御記号（発音記号列）に変換して出力部７に与え（ステップＳ７）、出力部７から音声として出力させる（ステップＳ８）。
【００３１】
このように、図１の音声出力装置によれば、未登録文字列を一まとまりの形態素候補として抽出することが可能となり、文字列全体としての属性付けが可能になり、属性を用いた高度な読み付与が可能になる。さらに、未登録文字列が漢字かな交じりの文字列であっても単漢字辞書５と音変換テーブル６を併用することで先頭から順番に読み，音韻を得ることが可能になる。
【００３２】
また、単漢字辞書５に、読みの音訓の属性情報（訓読みの属性情報，音読みの属性情報）とともに、さらに、漢字１文字で独立して意味をなす単独読みの属性情報（単独で出現した漢字に付与する単独読みの属性情報）が記憶されていることで、複合語の構成語でもなく、直後に送り仮名がない単独の読みが付与されるべきであるにもかかわらず、不自然な音読み，訓読みが付与されていた漢字表記に対し、自然な読みを付与することができるため、尤もらしい読み上げが可能になる。
【００３３】
また、単漢字辞書５に、特定の品詞を構成する場合に限った特定品詞読みの属性情報（図２の例では、固有名詞構成する漢字に付与する固有名詞読みの属性情報）が記憶されていることで、読みに傾向性のある特定の品詞に属する形態素候補の構成漢字に対して、その傾向を加味した読みを付与することができるため、尤もらしい読み上げが可能になる。
【００３４】
また、図１の音声出力装置において、言語解析部２または音韻設定部４は、未登録文字列を抽出したときに、抽出した未登録文字列に対し、該未登録文字列全体としての品詞属性を推定する機能をさらに具備していても良い。
【００３５】
この場合、言語解析部２または音韻設定部４は、未登録文字列の直前，直後の形態素候補を参照して、前記未登録文字列全体としての品詞属性を推定することができる。
【００３６】
あるいは、言語解析部２または音韻設定部４は、未登録文字列全体としての品詞属性をヒューリスティックに設定した品詞の範囲内で推定することもできる。
【００３７】
このように、未登録文字列全体としての品詞属性を推定する機能を具備している場合、言語解析部２または音韻設定部４は、未登録文字列に対して、推定した品詞属性を持つ形態素候補を生成し、生成した該未登録文字列の形態素候補を形態素辞書３に登録されている登録文字列の形態素候補と同様に扱うことができる。
【００３８】
図６は言語解析部２または音韻設定部４が未登録文字列全体としての品詞属性を推定する機能を有しているとした場合における未登録文字列に対する品詞推定処理，形態素候補生成処理の一例を示すフローチャートである。なお、図６の処理例では、言語解析部２から音韻設定部４に未登録文字列を与え、音韻設定部４において未登録文字列の品詞を推定するようになっている。
【００３９】
図６を参照すると、言語解析部２は、図４の処理例のステップＳ１〜Ｓ８と同様に、入力テキストのうち形態素辞書３に登録されている文字列については、形態素辞書３を参照して形態素候補（形態素候補列）を生成する一方、形態素辞書３に登録されていない未登録文字列については、これを抽出して、音韻設定部４へ送る（ステップＳ２１〜Ｓ２３）。
【００４０】
そして、音韻設定部４では、言語処理部２から未登録文字列が送られたかを判断する（ステップＳ２４）。この結果、未登録文字列ではなく、形態素候補が送られたと判断したときには、音韻設定部４は、形態素候補のうちから、最適候補を選択し、形態素を確定し、これに読みを付与する（ステップＳ２５）。そして、アクセント位置，ポーズ位置を確定し（ステップＳ２６）、この文字列に対する音韻をアクセント位置，ポーズ位置などの情報とともに制御記号（発音記号列）に変換して出力部７に与え（ステップＳ２７）、出力部７から音声として出力させる（ステップＳ２８）。
【００４１】
これに対し、ステップＳ２４において未登録文字列が送られたと判断した場合には、音韻設定部４は、この未登録文字列の品詞を推定して（ステップＳ２９）、未登録文字列について形態素候補を生成する（ステップＳ３０）。具体的には、この場合、音韻設定部４は、未登録文字列の直前，直後の形態素候補を参照して、前記未登録文字列全体としての品詞属性を推定することができる。あるいは、音韻設定部４は、未登録文字列全体としての品詞属性をヒューリスティックに設定した品詞の範囲内で推定することもできる。
【００４２】
このようにして、未登録文字列について未登録文字列全体としての品詞属性を推定し、推定した品詞属性を持つ形態素候補を生成すると、音韻設定部４は、生成した該未登録文字列の形態素候補を形態素辞書に登録されている登録文字列の形態素候補と同様に扱い、処理を行なう。すなわち、音韻設定部４は、生成した未登録文字列の形態素候補のうちから、最適候補を選択し、形態素を確定し、読みを付与する（ステップＳ２５）。そして、アクセント位置，ポーズ位置を確定し（ステップＳ２６）、この文字列に対する音韻をアクセント位置，ポーズ位置などの情報とともに制御記号（発音記号列）に変換して出力部７に与え（ステップＳ２７）、出力部７から音声として出力させる（ステップＳ２８）。
【００４３】
このように、未登録文字列全体としての品詞属性を推定する機能を言語解析部２または音韻設定部４にもたせることにより、抽出した一まとまりの未登録文字列に品詞属性を持たせることができ、高度な読み付与が可能になる。
【００４４】
そして、この場合、未登録文字列の直前，直後の形態素候補を参照して、未登録文字列全体としての品詞属性を推定する場合には、未登録文字列に尤もらしい品詞属性を与えることが可能になる。
【００４５】
また、未登録文字列全体としての品詞属性をヒューリスティックに設定した品詞の範囲内で推定する場合には、あらかじめ用意された品詞属性に限定した形態素候補しか生成されないので、形態素候補の大量生成を防ぐことができ、最適解の絞り込み処理を効率良く行なうことができる。
【００４６】
また、未登録文字列に対して、推定した品詞属性を持つ形態素候補を生成し、該未登録文字列の形態素候補を形態素辞書３に登録されている登録文字列の形態素候補と同様に扱うことができることにより、未登録文字列が含まれたテキストでも、言語解析部２または音韻設定部４において、そこで解析処理が止まることなく、未登録文字列の影響を受けずに最適解を絞り込むことができる。
【００４７】
また、図１の音声出力装置において、言語解析部２または音韻設定部４によって未登録文字列全体としての品詞属性を推定しても、この未登録文字列に対して読みが付与されない場合、音韻設定部４は、該未登録文字列全体に対して前記推定された品詞属性を用いて単漢字辞書５に記憶されている複数の読み情報から適切な読みを選択することで、未登録文字列中の漢字の文字列に対して読みを付与することもできる。
【００４８】
図７，図８はこのような未登録文字列への読み付与処理の一例を示すフローチャートである。なお、図７，図８の処理例において、図６と同じステップには、同じ符号を付している。図７，図８を参照すると、ステップＳ２４において未登録文字列が送られたと判断した場合には、前述したように、音韻設定部４は、この未登録文字列の品詞を推定して（ステップＳ２９）、未登録文字列について形態素候補を生成する（ステップＳ３０）。
【００４９】
このようにして、未登録文字列について未登録文字列全体としての品詞属性を推定し、推定した品詞属性を持つ形態素候補を生成すると、音韻設定部４は、生成した該未登録文字列の形態素候補を形態素辞書に登録されている登録文字列の形態素候補と同様に扱い、処理を行なう。すなわち、音韻設定部４は、生成した未登録文字列の形態素候補のうちから、最適候補を選択し、形態素を確定し、読みを付与する（ステップＳ２５）。
【００５０】
しかしながら、このとき、品詞属性は付与されたものの、読みの付与されない場合がある。このような場合を考慮して、ステップＳ２５で最適候補を選択し形態素を確定したとき、この形態素文字列に読みが付与されているか否かを判別する（ステップＳ４１）。この結果、最適解として品詞属性は付されているものの読みの付与されていないときには、この形態素文字列を先頭文字から順に入力させる（ステップＳ４２）。そして、音韻設定部４は、先頭から１文字ずつ入力される未登録文字列（形態素文字列）について、字種を判定し（ステップＳ４３）、この結果、字種が漢字と判定したときには、単漢字辞書５を検索して読みを付与する（ステップＳ４４）。すなわち、単漢字辞書５には複数の読みが記載されているので、音韻設定部４は、字種が漢字と判定したとき、この対象文字列の前後の文字列の文字種から、または、この対象文字列付帯の品詞属性に応じて音読み，訓読み，単独読み，特定品詞読みのいずれかを選択できる。また、字種が漢字以外と判定したときには、音変換テーブル６を検索して読みを付与する（ステップＳ４５）。このようにして、読みが付与されていない未登録文字列（形態素文字列）について１文字ずつ読み付与を実行し、文字列の終わりまできたら（ステップＳ４６）、読みの付与された文字列全体に対して、アクセント位置，ポーズ位置を確定し（ステップＳ２６）、この文字列に対する音韻をアクセント位置，ポーズ位置などの情報とともに制御記号（発音記号列）に変換して出力部７に与え（ステップＳ２７）、出力部７から音声として出力させる（ステップＳ２８）。
【００５１】
次に、本発明の音声出力装置の具体的な処理例について説明する。いま例えば、日本語のテキストとして、図９に示すようなテキストが入力すると、言語解析部２では、形態素辞書３を参照して、形態素候補を生成する。その際、言語解析部２は、形態素辞書３に未登録の文字列が存在する場合、これを未登録文字列として抽出する。例えば、図９の入力テキストに対し、図１０のような未登録文字列を抽出する。
【００５２】
このように、未登録文字列が抽出された場合、言語解析部２または音韻設定部４では、抽出した未登録文字列の品詞を推定する。この推定処理には、この未登録文字列の前後の形態素候補によって、この未登録文字列の品詞を推定をする方法や、ヒューリスティックを利用する方法を用いることができる。
【００５３】
具体例を挙げると、未登録文字列の前後の形態素候補によって推定する方法では、例えば、直前の品詞が数詞であれば、未登録文字列の品詞を名詞あるいは助数と推定し、また、直後の品詞が固有名詞接尾辞であれば、未登録文字列の品詞を固有名詞と推定するなどとして行なわれる。また、ヒューリスティックに品詞を推定する方法では、例えば、未登録文字列の品詞候補は一般名詞と固有名詞に限るなどのように、予め設定された品詞の範囲内で推定するとして行なわれる。
【００５４】
このようにして、未登録文字列に対して品詞の推定が行なわれる場合、言語解析部２または音韻設定部４は、推定した品詞の数だけ、形態素候補を生成する。図１１には、図１０の未登録文字列に対して品詞の推定を行なって得られた（推定された）形態素候補の一例が示されている。このようにして未登録文字列に対して形態素候補が推定される場合、言語解析部２または音韻設定部４では、未登録文字列に対して推定された形態素候補を、登録文字列についてのその他の形態素候補と同様に扱い、これらを絞り込みの対象とする。
【００５５】
すなわち、言語解析部２または音韻設定部４は、生成した全ての形態素候補から、最長一致法，コスト最小法などによって形態素候補を絞り込み、最適解を得る。図１２には、図９の入力テキストに対して得られた最適解の一例が示されている。
【００５６】
このようにして最適解が得られたとき、音韻設定部４は、最適解のうち、読みが付されていない形態素文字列に対し、読みを付与する。この場合、例えば、形態素文字列を先頭から１文字ずつ入力し、文字種が漢字であると判断されたときには、単漢字辞書５を検索し、また、漢字以外であると判断されたときには、音変換テーブル６を検索し、読みが付されていない形態素に対し、適切な読みを付与する。
【００５７】
ここで、読みを付与する仕方としては、文字列全体としての品詞属性によりどの属性を持つ読みを選択するかを決定する方法と、対象文字の前後の文字種により読みを決定する方法が考えられる。
【００５８】
具体例を挙げると、文字列全体としての品詞属性によりどの属性を持つ読みを選択するかを決定する方法では、文字列全体の品詞属性が固有名詞である場合は固有名詞読みを選択する、文字列全体の品詞属性が一般名詞だったら音読みを選択する、などによって、読みを決定することができる。また、対象文字の前後の文字種により読みを決定する方法では、直後の文字が平仮名だったら訓読みを選択する（「煙い」：ケムい）、直後の文字が句読点や閉じ括弧だったら単独読みを選択する（「煙。」：ケムリ）、などによって、読みを決定することができる。
【００５９】
図１３には、図１２の最適解において、読みが付与されていない形態素文字列である「住専」，「擬える」，「山花」について、読みを付与する例が示されている。図１３の例によれば、例えば、未登録文字列「住専」において、「住」は、文字列全体が一般名詞であり、後続文字が漢字であることから、音読みを選択し、また、「専」は、文字列全体が一般名詞であり、前接文字が漢字であることから、音読みを選択する。また、未登録文字列「擬える」において、「擬」は、文字列全体が動詞であり、後続文字が平仮名であることより、訓読みを選択する。また、未登録文字列「山花」において、「山」は、文字列全体が固有名詞であることより、固有名詞読みを選択し、また、「花」は、文字列全体が固有名詞であることより、固有名詞読みを選択する。
【００６０】
このようにして、未登録文字列について、適切な読みを付与することができ、全ての文字列について読みが付与された段階で、アクセント位置，ポーズ位置を確定して発音記号列に変換して音声出力部７に与え、音声出力部７から適切な音声を出力させることができる。
【００６１】
図１４には、図９の入力テキストの全ての文字列について、形態素，読みが付与された状態が示されている。図１４のような状態で、音韻設定部は、図１５に示すような発音記号列を生成することができる。
【００６２】
図１６は図１の音声出力装置のハードウェア構成例を示す図である。図１６を参照すると、この音声出力装置は、例えばパーソナルコンピュータ等で実現され、全体を制御するＣＰＵ２１と、ＣＰＵ２１の制御プログラム等が記憶されているＲＯＭ２２と、ＣＰＵ２１のワークエリア等として使用されるＲＡＭ２３と、日本語のテキストを入力するテキスト入力部１と、音声を出力する出力部（例えば、スピーカ）７とを有している。
【００６３】
ここで、ＲＡＭ２３には、例えば、図１の形態素辞書３，単漢字辞書５，音変換テーブル６を設定することができる。また、ＣＰＵ２１は、図１の言語解析部２，音韻設定部４の機能を有している。
【００６４】
なお、ＣＰＵ２１におけるこのような言語解析部２，音韻設定部４等としての機能は、例えばソフトウェアパッケージ（具体的には、ＣＤ−ＲＯＭ等の情報記録媒体）の形で提供することができ、このため、図１６の例では、情報記録媒体３０がセットさせるとき、これを駆動する媒体駆動装置３１が設けられている。
【００６５】
換言すれば、本発明の音声出力装置は、テキスト入力部，音声出力部等を備えた汎用の計算機システムにＣＤ−ＲＯＭ等の情報記録媒体に記録されたプログラムを読み込ませて、この汎用計算機システムのマイクロプロセッサに音声出力処理を実行させる装置構成においても実施することが可能である。この場合、本発明の音声出力処理を実行するためのプログラム（すなわち、ハードウェアシステムで用いられるプログラム）は、媒体に記録された状態で提供される。プログラムなどが記録される情報記録媒体としては、ＣＤ−ＲＯＭに限られるものではなく、ＲＯＭ，ＲＡＭ，フレキシブルディスク，メモリカード等が用いられても良い。媒体に記録されたプログラムは、ハードウェアシステムに組み込まれている記憶装置、例えばハードディスク装置やＲＡＭ２３などにインストールされることにより、このプログラムを実行して、本発明の音声出力装置の構築に寄与する。
【００６６】
また、本発明の音声出力処理を実現するためのプログラムは、媒体の形で提供されるのみならず、通信によって（例えばサーバによって）提供されるものであっても良い。
【００６７】
【発明の効果】
以上に説明したように、請求項１記載の発明によれば、入力された日本語のテキストを音声に変換して出力する音声出力装置において、入力された日本語のテキストに対して形態素辞書を用いて形態素候補を生成して言語解析する言語解析部と、言語解析部により生成された形態素候補に対して読みを含む音韻を設定する音韻設定部とを有しており、前記言語解析部は、前記テキストから前記形態素辞書を用いて形態素候補を生成する際、形態素辞書に登録されておらず形態素候補が生成できなかった一まとまりの文字列範囲を未登録文字列として前記テキストから抽出し、また、前記音韻設定部は、漢字単独の読み情報が記憶されている単漢字辞書と、漢字以外の文字と発音記号との対応が記憶されている音変換テーブルとを備え、前記言語解析部によって未登録文字列が抽出されたとき、前記音韻設定部は、前記単漢字辞書および音変換テーブルを用いて未登録文字列の読みを推定するようになっているので、未登録文字列を一まとまりの形態素候補として抽出することが可能となり、文字列全体としての属性付けが可能になり、属性を用いた高度な読み付与が可能になる。さらに、未登録文字列が漢字かな交じりの文字列であっても単漢字辞書と音変換テーブルを併用することで先頭から順番に音韻を得ることが可能になる。従って、未登録文字列について、より適切な尤もらしい読み，音韻を付与し、より適切な音声を出力することができる。
【００６８】
また、請求項１記載の発明によれば、前記単漢字辞書には、読みの音訓の属性情報とともに、さらに、単独で出現した漢字に付与する単独読みの属性情報が記憶されているので、複合語の構成語でもなく、直後に送り仮名がない単独の読みを要するにもかかわらず、不自然な音読み，訓読みが付与されていた漢字表記に対し、自然な読みを付与することができるため、尤もらしい読み上げが可能になる。
【００６９】
また、請求項２記載の発明によれば、請求項１記載の音声出力装置において、前記単漢字辞書には、読みの音訓の属性情報とともに、さらに、特定の品詞を構成する場合に限った特定品詞読みの属性情報が記憶されているので、読みに傾向性のある特定の品詞に属する形態素候補の構成漢字に対して、その傾向を加味した読みを付与することができるため、尤もらしい読み上げが可能になる。
【００７０】
また、請求項３記載の発明によれば、請求項１記載の音声出力装置において、前記言語解析部または音韻設定部は、抽出した未登録文字列に対し、該未登録文字列全体としての品詞属性を推定するようになっており、言語解析部に品詞推定機能を持たせることにより、抽出した一まとまりの未登録文字列に品詞属性を持たせることができ、高度な読み付与が可能になる。
【００７１】
また、請求項４記載の発明によれば、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列の直前，直後の形態素候補を参照して、前記未登録文字列全体としての品詞属性を推定するので、直前と直後の形態素候補の品詞との接続関係により、未登録文字列に尤もらしい品詞属性を与えることが可能になる。
【００７２】
また、請求項５記載の発明によれば、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、前記未登録文字列全体としての品詞属性を予め設定された品詞の範囲内で推定するようにしており、この場合には、あらかじめ用意された品詞属性に限定した形態素候補しか生成されないので、形態素候補の大量生成を防ぐことができ、最適解の絞り込み処理を効率良く行なうことができる。
【００７３】
また、請求項６記載の発明によれば、請求項３記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列に対して未登録文字列全体としての品詞属性を推定したとき、該未登録文字列に対し、前記推定した品詞属性を持つ形態素候補を生成し、生成した該未登録文字列の形態素候補を形態素辞書に登録されている登録文字列の形態素候補と同様に扱うようにしているので、未登録文字列が含まれたテキストでも、言語解析部または音韻設定部において、そこで解析処理が止まることなく、未登録文字列の影響を受けずに最適解を絞り込むことができる。
【００７４】
また、請求項７記載の発明によれば、請求項１記載の音声出力装置において、前記言語解析部または音韻設定部は、未登録文字列が抽出されたときに、該未登録文字列の文字種を判定し、音韻設定部は、未登録文字列中に文字種が漢字の文字列があると判定した場合は、未登録文字列の読みを推定する際、該未登録文字列に対し単漢字辞書を参照して適切な読みを付与し、また、未登録文字列の漢字以外の文字種の文字列については、未登録文字列の読みを推定する際、音変換テーブルを用いて適切な読みを付与するようになっているので、複雑な処理を経ることなく未登録文字列に読みを付与することが可能になる。
【００７５】
また、請求項８記載の発明によれば、請求項７記載の音声出力装置において、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、単漢字辞書に記憶されている複数の読み情報から適切な読みを選択するので、対象文字列によって柔軟に適切な読みを選択することができ、尤もらしい読み上げが可能になる。
【００７６】
また、請求項９記載の発明によれば、請求項７記載の音声出力装置において、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、当該漢字の文字列の直前，直後の文字列の文字種を利用することにより、単漢字辞書に記憶されている複数の読み情報から適切な読みを選択するので、尤もらしい読み上げが可能になる。
【００７７】
また、請求項１０記載の発明によれば、請求項７記載の音声出力装置において、前記言語解析部または音韻設定部は、抽出した未登録文字列に対し、該未登録文字列全体としての品詞属性を推定するようになっており、前記音韻設定部は、未登録文字列中の漢字の文字列に対して読みを付与する際に、該未登録文字列全体に対して前記推定された品詞属性を用いて単漢字辞書に記憶されている複数の読み情報から適切な読みを選択するので、尤もらしい読み上げが可能になる。
【図面の簡単な説明】
【図１】本発明に係る音声出力装置の構成例を示す図である。
【図２】単漢字辞書の一例を示す図である。
【図３】音変換テーブルの一例を示す図である。
【図４】図１の音声出力装置の処理動作例を示すフローチャートである。
【図５】図１の音声出力装置の処理動作例を示すフローチャートである。
【図６】未登録文字列に対する品詞推定処理，形態素候補生成処理の一例を示すフローチャートである。
【図７】未登録文字列への読み付与処理の一例を示すフローチャートである。
【図８】未登録文字列への読み付与処理の一例を示すフローチャートである。
【図９】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１０】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１１】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１２】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１３】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１４】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１５】本発明の音声出力装置の具体的な処理例を説明するための図である。
【図１６】本発明に係る音声出力装置のハードウェア構成例を示す図である。
【符号の説明】
１入力部
２言語解析部
３形態素辞書
４音韻設定部
５単漢字辞書
６音変換テーブル
７出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice output device that converts Japanese text into voice and outputs the voice.
[0002]
[Prior art]
A text-to-speech synthesis system is known as an example of a speech output device. Generally, when a text written in a natural language is input, this type of text-to-speech synthesis system generates a morpheme candidate for a character string of the input text by referring to a morpheme dictionary or the like, and then generates a morpheme candidate. According to the algorithm described above, phonemes including readings are set for the optimal solution selected from the morpheme candidates, and accent positions and pause positions are set according to certain rules (rules) and converted into control symbols (phonetic symbol strings). This control symbol is given to a speech synthesizer, and a voice corresponding to the control symbol is output.
[0003]
Here, the phonetic setting usually uses reading information described in the morphological dictionary, but when words that are not registered in the morphological dictionary are present in the input text, appropriate words are registered for the unregistered words. In some cases, reading could not be given, and an appropriate phoneme could not be set.
[0004]
Conventionally, in order to add a reading to such an unregistered word, a morphological dictionary (Japanese dictionary) has been used as disclosed in, for example, JP-A-5-20302 and JP-A-2-225174. With reference to Japanese Patent Application Laid-Open Publication No. 2003-229, there has been proposed a method in which a kanji compound word is provided with a reading using a kanji dictionary for each kanji character that cannot be given a reading (unregistered character string). Japanese Patent Application Laid-Open No. 6-282290 discloses a character type determination unit for input character text, and a dictionary search unit lists all temporary candidates such as phonemes (readings) and word delimiters. Then, when selecting the most suitable candidate in the temporary candidate selection unit, for a character including a kanji that has been determined to be an unknown word in the temporary candidate selection unit, refer to the single kanji dictionary and A method has been proposed in which the phoneme is determined according to rules (based on the fact that there are multiple readings in kanji, priorities are given to readings in a single kanji dictionary and phonemes are given according to conditions).
[0005]
[Problems to be solved by the invention]
As described above, various techniques for giving a reading to an unregistered character string have been proposed. However, conventionally, there is a limit in giving an appropriate reading to an unregistered character string. That is, according to the above-described related art, for example, when an unregistered character string is a character string mixed with kanji or kana, there is a problem that an appropriate reading cannot be given.
[0006]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a voice output device capable of giving a more likely reading and phoneme to an unregistered character string and outputting a more appropriate voice.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, an invention according to claim 1 uses a morphological dictionary for an input Japanese text in a voice output device that converts an input Japanese text into a voice and outputs the voice. A linguistic analysis unit that generates a morpheme candidate and performs language analysis, and a phonological setting unit that sets phonemes including readings for the morpheme candidates generated by the linguistic analysis unit, and the linguistic analysis unit includes: When generating a morpheme candidate from the text using the morpheme dictionary, extracting from the text a set of character strings that have not been registered in the morpheme dictionary and for which morpheme candidates could not be generated, as unregistered character strings, The phonological setting section includes a single kanji dictionary in which reading information of kanji alone is stored, and a sound conversion table in which correspondence between characters other than kanji and phonetic symbols is stored. When an unregistered character string is extracted by the analyzing unit, the phoneme setting unit estimates the reading of the unregistered character string using the single kanji dictionary and the sound conversion table. In addition to the attribute information of the pronunciation of the pronunciation, the attribute information of the single reading to be added to the kanji that appeared alone is stored. According to at least the character type before and / or after the target character, the reading of an unregistered character string is determined using the attribute information of the pronunciation of the pronunciation or the attribute information of the single reading. It is characterized by having.
[0009]
Also, Claim 2 According to the present invention, in the voice output device according to claim 1, the single kanji dictionary stores attribute information of a specific part of speech limited to a case where a specific part of speech is included, together with attribute information of a pronunciation part of the pronunciation. It is characterized by being.
[0010]
Also, Claim 3 According to a preferred embodiment of the present invention, in the voice output device according to the first aspect, the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the extracted unregistered character string as the entire unregistered character string. It is characterized by having.
[0011]
Also, Claim 4 The described invention, Claim 3 In the audio output device described above, the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the entire unregistered character string by referring to morpheme candidates immediately before and immediately after the unregistered character string. I have.
[0012]
According to a fifth aspect of the present invention, in the audio output device according to the third aspect, the language analysis unit or the phoneme setting unit sets a part of speech attribute as the entire unregistered character string. Preset It is characterized in that it is estimated within the part of speech.
[0013]
Also, Claim 6 The described invention, Claim 3 In the audio output device described above, when the linguistic analysis unit or the phoneme setting unit estimates a part of speech attribute of an unregistered character string as a whole for an unregistered character string, the estimated part of speech is A morpheme candidate having an attribute is generated, and the generated morpheme candidate of the unregistered character string is handled in the same manner as a morpheme candidate of a registered character string registered in a morphological dictionary.
[0014]
Also, Claim 7 According to the invention described in the first aspect, when the unregistered character string is extracted, the language analysis unit or the phoneme setting unit determines a character type of the unregistered character string, and the phoneme setting unit. If it is determined that there is a character string whose character type is kanji in the unregistered character string, when reading the unregistered character string, When a character string of a character type other than the kanji of an unregistered character string is estimated, an appropriate reading is added using a sound conversion table when estimating the reading of the unregistered character string. It is characterized by.
[0015]
Also, Claim 8 The described invention, Claim 7 In the voice output device described above, the phoneme setting unit, when giving a reading to a kanji character string in an unregistered character string, reads an appropriate reading from a plurality of reading information stored in a single kanji dictionary. It is characterized by selection.
[0016]
Also, Claim 9 The described invention, Claim 7 In the voice output device described above, the phoneme setting unit uses a character type of a character string immediately before and after the kanji character string when giving a reading to a kanji character string in an unregistered character string. Thereby, an appropriate reading is selected from a plurality of pieces of reading information stored in the single kanji dictionary.
[0017]
Also, Claim 10 The described invention, Claim 7 In the described audio output device, the language analysis unit or phoneme setting unit estimates a part of speech attribute of the unregistered character string as a whole with respect to the extracted unregistered character string. When adding a reading to a kanji character string in an unregistered character string, a plurality of readings stored in a single kanji dictionary using the estimated part-of-speech attribute for the entire unregistered character string. It is characterized by selecting an appropriate reading from information.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration example of a sound output device according to the present invention. Referring to FIG. 1, the voice output device includes a language analysis unit 2 that generates a morpheme candidate using a morpheme dictionary 3 and performs language analysis on a Japanese text input from a text input unit 1, It has a phoneme setting unit 4 that sets phonemes including readings for the morpheme candidates generated by the analysis unit 2, gives the phonemes to the output unit 7, and outputs sounds from the output unit 7.
[0019]
Here, when generating a morpheme candidate from the text using the morpheme dictionary 3, the linguistic analysis unit 2 regards a character string range that has not been registered in the morpheme dictionary 3 and could not be generated as an unregistered character string. It is designed to be extracted from text.
[0020]
Further, the phoneme setting unit 4 includes a single kanji dictionary 5 in which reading information of the kanji alone and attribute information of the pronunciation of the kanji are stored, and a sound conversion table in which correspondence between non-kanji characters and phonetic symbols is stored. 6, when the unregistered character string is extracted by the language analysis unit 2, the phoneme setting unit 4 estimates the reading of the unregistered character string using the single kanji dictionary 5 and the sound conversion table 6. ing.
[0021]
FIG. 2 is a diagram illustrating an example of the single kanji dictionary 5. Referring to FIG. 2, the single kanji dictionary 5 includes, in addition to the attribute information of the pronunciation of the pronunciation (attribution information of the pronunciation of pronunciation, the attribute information of the pronunciation of pronunciation), and the attribute information of the individual pronunciation (single reading) that independently makes sense with one kanji character. (Single-reading attribute information to be added to the kanji that appeared alone). Further, in the example of FIG. 2, the single kanji dictionary 5 stores attribute information of specific part-of-speech reading limited to the case where a specific part-of-speech is composed (in the example of FIG. Attribute information) is stored.
[0022]
FIG. 3 is a diagram showing an example of the sound conversion table 6. Referring to FIG. 3, phonetic symbols (aa, ii, etc.) are stored in the sound conversion table 6 in correspondence with characters (eg, hiragana, katakana, etc.) other than kanji.
[0023]
More specifically, in the voice output device of FIG. 1, when the unregistered character string is extracted from the input text, the language analysis unit 2 or the phoneme setting unit 4 determines the character type of the unregistered character string. If it is determined that the character type is a kanji character string in the unregistered character string, the phoneme setting unit 4 estimates the reading of the unregistered character string as shown in FIG. Appropriate readings are given with reference to the single kanji dictionary 5, and appropriate readings are given to character strings of character types other than kanji of unregistered character strings using the sound conversion table 6 as shown in FIG. It has become.
[0024]
In this case, the phoneme setting unit 4 selects an appropriate reading from a plurality of reading information stored in the single kanji dictionary 5 when adding a reading to the kanji character string in the unregistered character string. Can be.
[0025]
Specifically, the phoneme setting unit 4 uses the character types of the character strings immediately before and after the kanji character string when giving the reading to the kanji character string in the unregistered character string. An appropriate reading can be selected from a plurality of reading information stored in the single kanji dictionary.
[0026]
4 and 5 are flowcharts showing an example of the processing operation of the audio output device of FIG. In the processing operation examples of FIGS. 4 and 5, the unregistered character string is provided from the language analysis unit 2 to the phoneme setting unit 4, and the phoneme setting unit 4 determines the character type of the unregistered character string and adds the reading. It has become.
[0027]
Referring to FIGS. 4 and 5, in the voice output device of FIG. 1, when text is input from the text input unit 1 (step S1), the language processing unit 2 refers to the morphological dictionary 3 for the input text. To generate a morpheme candidate (morpheme candidate sequence) (step S2). At this time, the language processing unit 2 extracts a character string not registered in the morphological dictionary 3 as an unregistered character string (step S3).
[0028]
In this way, the morpheme candidate generated by the language processing unit 2 or the extracted unregistered character string is sent to the phoneme setting unit 4. The phoneme setting unit 4 determines whether an unregistered character string has been sent from the language processing unit 2 (step S4). As a result, when it is determined that a morpheme candidate is sent instead of an unregistered character string, the phoneme setting unit 4 selects an optimal candidate from among the morpheme candidates, determines a morpheme, and gives a reading to it ( Step S5). Then, the accent position and the pause position are determined (step S6), and the phoneme corresponding to this character string is converted into control symbols (pronunciation symbol sequence) together with information on the accent position, the pause position, etc., and given to the output unit 7 (step S7). Is output from the output unit 7 as audio (step S8).
[0029]
On the other hand, if it is determined in step S4 that an unregistered character string has been sent, the phoneme setting unit 4 determines the character type of the unregistered character string (step S9). Specifically, it is determined whether the unregistered character string is a kanji. If the result of this determination is that the character type is a kanji, the phoneme setting unit 4 searches the single kanji dictionary 5 and gives a reading to this unregistered character string (step S10). On the other hand, if the character type is not a Chinese character in step S9, the phoneme setting unit 4 searches the sound conversion table 6 and gives a reading to this unregistered character string (step S11).
[0030]
In this way, referring to the single kanji dictionary 5 or the sound conversion table 6, the estimated reading is given to the unregistered character string, and the reading is completed to the end of the unregistered character string (step S12), the accent position and the pause position are determined (step S6), and the phoneme for the unregistered character string is converted into a control symbol (phonetic symbol sequence) together with information on the accent position, the pause position, etc., and given to the output unit 7 (step S12). S7), and output it as a sound from the output unit 7 (step S8).
[0031]
As described above, according to the audio output device of FIG. 1, it is possible to extract an unregistered character string as a group of morpheme candidates, to perform attribute assignment of the entire character string, and to perform advanced attribute using attributes. Reading can be provided. Further, even if the unregistered character string is a character string mixed with kanji or kana, by using the single kanji dictionary 5 and the sound conversion table 6 together, it is possible to read in order from the beginning and obtain phonemes.
[0032]
In addition, the single kanji dictionary 5 includes, in addition to the attribute information of the pronunciation of the pronunciation (the attribute information of the pronunciation of the pronunciation, the attribute information of the pronunciation of the pronunciation), and the attribute information of the single reading that independently makes sense with one kanji character (the kanji that appeared alone) Singular reading attribute information) is stored, so that it is not a constituent word of a compound word, and even though it should be given a single reading without a sentence kana immediately afterward, an unnatural sound reading , Natural reading can be given to the kanji notation to which the Kun reading has been given, so that it is possible to read plausibly.
[0033]
In addition, the single kanji dictionary 5 stores attribute information of specific part-of-speech reading limited to forming a specific part-of-speech (in the example of FIG. 2, attribute information of proper noun reading added to the kanji forming the proper noun). By doing so, it is possible to give a reading taking into account the tendency to the constituent kanji of morpheme candidates belonging to a specific part of speech with a tendency to read, so that it is possible to read plausibly.
[0034]
In addition, in the voice output device of FIG. 1, when the language analysis unit 2 or the phoneme setting unit 4 extracts an unregistered character string, the language analysis unit 2 or the phoneme setting unit 4 assigns the part-of-speech attribute as the entire unregistered character string to the extracted unregistered character string. May be further provided.
[0035]
In this case, the language analysis unit 2 or the phoneme setting unit 4 can estimate the part of speech attribute of the unregistered character string as a whole by referring to morpheme candidates immediately before and after the unregistered character string.
[0036]
Alternatively, the linguistic analysis unit 2 or the phoneme setting unit 4 can also estimate the part of speech attribute of the entire unregistered character string within the range of the part of speech set heuristically.
[0037]
As described above, when a function for estimating the part-of-speech attribute of the entire unregistered character string is provided, the language analysis unit 2 or the phoneme setting unit 4 determines the morpheme having the estimated part-of-speech attribute for the unregistered character string. A candidate is generated, and the generated morpheme candidate of the unregistered character string can be handled in the same manner as the morpheme candidate of the registered character string registered in the morphological dictionary 3.
[0038]
FIG. 6 shows an example of part-of-speech estimation processing and morpheme candidate generation processing for an unregistered character string when the language analysis unit 2 or the phoneme setting unit 4 has a function of estimating the part-of-speech attribute of the entire unregistered character string. It is a flowchart which shows. In the processing example of FIG. 6, an unregistered character string is provided from the language analysis unit 2 to the phoneme setting unit 4, and the phoneme setting unit 4 estimates the part of speech of the unregistered character string.
[0039]
Referring to FIG. 6, the language analysis unit 2 refers to the morphological dictionary 3 for the character strings registered in the morphological dictionary 3 in the input text, similarly to steps S1 to S8 of the processing example of FIG. While morpheme candidates (morpheme candidate strings) are generated, unregistered character strings that are not registered in the morpheme dictionary 3 are extracted and sent to the phoneme setting unit 4 (steps S21 to S23).
[0040]
Then, the phoneme setting unit 4 determines whether an unregistered character string has been sent from the language processing unit 2 (step S24). As a result, when it is determined that a morpheme candidate is sent instead of an unregistered character string, the phoneme setting unit 4 selects an optimal candidate from among the morpheme candidates, determines a morpheme, and gives a reading to this ( Step S25). Then, the accent position and the pause position are determined (step S26), and the phoneme corresponding to this character string is converted into a control symbol (pronunciation symbol string) together with information on the accent position, the pause position, etc., and given to the output unit 7 (step S27). Is output from the output unit 7 as audio (step S28).
[0041]
On the other hand, if it is determined in step S24 that an unregistered character string has been sent, the phoneme setting unit 4 estimates the part of speech of the unregistered character string (step S29), and determines a morpheme candidate for the unregistered character string. Is generated (step S30). Specifically, in this case, the phoneme setting unit 4 can estimate the part of speech attribute of the unregistered character string as a whole by referring to morpheme candidates immediately before and after the unregistered character string. Alternatively, the phoneme setting unit 4 can also estimate the part-of-speech attribute of the entire unregistered character string within the range of the part-of-speech set heuristically.
[0042]
In this way, the part-of-speech attribute of the entire unregistered character string is estimated for the unregistered character string, and a morpheme candidate having the estimated part-of-speech attribute is generated. The candidates are treated and processed in the same manner as the morpheme candidates of the registered character string registered in the morphological dictionary. That is, the phoneme setting unit 4 selects an optimal candidate from among the generated morpheme candidates of the unregistered character string, determines the morpheme, and gives a reading (step S25). Then, the accent position and the pause position are determined (step S26), and the phoneme corresponding to this character string is converted into a control symbol (pronunciation symbol string) together with information on the accent position, the pause position, etc., and given to the output unit 7 (step S27). Is output from the output unit 7 as audio (step S28).
[0043]
As described above, by providing the function of estimating the part-of-speech attribute of the entire unregistered character string to the language analysis unit 2 or the phoneme setting unit 4, it is possible to provide the extracted unregistered character string with the part-of-speech attribute. , Advanced reading can be provided.
[0044]
In this case, when estimating the part-of-speech attribute of the entire unregistered character string by referring to the morpheme candidates immediately before and after the unregistered character string, a likely part-of-speech attribute is given to the unregistered character string. Will be possible.
[0045]
Further, when the part-of-speech attribute of the entire unregistered character string is estimated within the range of the part-of-speech that is heuristically set, only morpheme candidates limited to the part-of-speech attribute prepared in advance are generated. Therefore, the process of narrowing down the optimal solution can be performed efficiently.
[0046]
In addition, a morpheme candidate having an estimated part-of-speech attribute is generated for an unregistered character string, and the morpheme candidate of the unregistered character string is treated in the same manner as a morpheme candidate of a registered character string registered in the morphological dictionary 3. Is possible, even in a text including an unregistered character string, the language analysis unit 2 or the phoneme setting unit 4 can narrow down the optimal solution without stopping the analysis process and without being affected by the unregistered character string. it can.
[0047]
Also, in the voice output device of FIG. 1, if the linguistic analysis unit 2 or the phoneme setting unit 4 estimates the part-of-speech attribute of the unregistered character string as a whole, if no reading is given to the unregistered character string, The setting unit 4 selects an appropriate reading from a plurality of reading information stored in the single kanji dictionary 5 using the estimated part-of-speech attribute for the entire unregistered character string, thereby Yomi can also be given to the middle Chinese character string.
[0048]
FIG. 7 and FIG. 8 are flowcharts showing an example of such reading addition processing to an unregistered character string. 7 and 8, the same steps as those in FIG. 6 are denoted by the same reference numerals. Referring to FIGS. 7 and 8, if it is determined in step S24 that an unregistered character string has been sent, as described above, the phoneme setting unit 4 estimates the part of speech of the unregistered character string (step S24). S29), morpheme candidates are generated for unregistered character strings (step S30).
[0049]
In this way, the part-of-speech attribute of the entire unregistered character string is estimated for the unregistered character string, and a morpheme candidate having the estimated part-of-speech attribute is generated. The candidates are treated and processed in the same manner as the morpheme candidates of the registered character string registered in the morphological dictionary. That is, the phoneme setting unit 4 selects an optimal candidate from among the generated morpheme candidates of the unregistered character string, determines the morpheme, and gives a reading (step S25).
[0050]
However, at this time, there is a case where the part-of-speech attribute is given but the reading is not given. In consideration of such a case, when the optimal candidate is selected and the morpheme is determined in step S25, it is determined whether or not the morpheme character string is provided with a reading (step S41). As a result, when the part-of-speech attribute is attached but the reading is not given as the optimal solution, the morpheme character strings are sequentially input from the first character (step S42). Then, the phoneme setting unit 4 determines the character type of the unregistered character string (morpheme character string) input one by one from the beginning (step S43). As a result, when the character type is determined to be a kanji, The kanji dictionary 5 is searched to give a reading (step S44). That is, since a plurality of readings are described in the single kanji dictionary 5, the phoneme setting unit 4 determines whether the character type is a kanji, from the character type of the character string before and after the target character string, or Depending on the part-of-speech attribute attached to the character string, one of the on-speech reading, the kun reading, the single reading, and the specific part-of-speech reading can be selected. When it is determined that the character type is other than a Chinese character, the sound conversion table 6 is searched to give a reading (step S45). In this way, the reading is performed one character at a time for the unregistered character string (morpheme character string) to which the reading has not been added, and when the character string is completed (step S46), the entire character string to which the reading has been added is read. On the other hand, the accent position and the pause position are determined (step S26), and the phoneme corresponding to this character string is converted into a control symbol (pronunciation symbol sequence) together with information on the accent position, the pause position, and the like, and given to the output unit 7 (step S27). ), And output it as audio from the output unit 7 (step S28).
[0051]
Next, a specific processing example of the audio output device of the present invention will be described. Now, for example, when a text as shown in FIG. 9 is input as Japanese text, the language analysis unit 2 generates a morpheme candidate by referring to the morpheme dictionary 3. At this time, if there is a character string that has not been registered in the morphological dictionary 3, the language analysis unit 2 extracts this character string as an unregistered character string. For example, an unregistered character string as shown in FIG. 10 is extracted from the input text of FIG.
[0052]
Thus, when an unregistered character string is extracted, the language analysis unit 2 or the phoneme setting unit 4 estimates the part of speech of the extracted unregistered character string. In this estimation process, a method of estimating the part of speech of this unregistered character string using morpheme candidates before and after this unregistered character string, or a method using heuristics The law Can be used.
[0053]
To give a specific example, in the method of estimating morpheme candidates before and after an unregistered character string, for example, if the part of speech immediately before is a numerical part, the part of speech of the unregistered character string is estimated as a noun or an auxiliary number, and Is a proper noun suffix, the part of speech of an unregistered character string is estimated as a proper noun. In the method of heuristically estimating the part of speech, for example, the part of speech of an unregistered character string is limited to a general noun and a proper noun. Estimate within a preset part of speech It is performed as.
[0054]
When the part of speech is estimated for an unregistered character string in this way, the language analysis unit 2 or the phoneme setting unit 4 generates morpheme candidates by the number of estimated parts of speech. FIG. 11 shows an example of a morpheme candidate obtained (estimated) obtained by estimating the part of speech for the unregistered character string in FIG. When a morpheme candidate is estimated for an unregistered character string in this way, the language analysis unit 2 or the phoneme setting unit 4 replaces the morpheme candidate estimated for the unregistered character string with another for the registered character string. Are treated in the same way as the morpheme candidates described above, and these are targeted for narrowing down.
[0055]
That is, the linguistic analysis unit 2 or the phoneme setting unit 4 narrows down the morpheme candidates from all the generated morpheme candidates by the longest matching method, the minimum cost method, or the like, and obtains an optimal solution. FIG. 12 shows an example of the optimal solution obtained for the input text of FIG.
[0056]
When the optimal solution is obtained in this way, the phoneme setting unit 4 gives a reading to a morpheme character string to which no reading is added among the optimal solutions. In this case, for example, a morpheme character string is input one character at a time from the beginning. When it is determined that the character type is a kanji, the single kanji dictionary 5 is searched. When it is determined that the character is not a kanji, a sound conversion is performed. The table 6 is searched, and an appropriate reading is given to a morpheme to which no reading is added.
[0057]
Here, as a way of giving the reading, a method of deciding which attribute has a reading based on the part-of-speech attribute of the entire character string, and a method of determining the reading based on the character type before and after the target character are considered.
[0058]
As a specific example, in the method of determining which attribute has a part-of-speech attribute based on the part-of-speech attribute of the entire character string, if the part-of-speech attribute of the entire character string is a proper noun, the proper noun reading is selected. If the part-of-speech attribute of the entire column is a general noun, the pronunciation can be determined by selecting a pronunciation. Also, in the method of determining the reading based on the character type before and after the target character, if the character immediately after is a hiragana, select Kunyomi (“smoke”: Chem), and if the character immediately after is a punctuation mark or closing parenthesis, select a single reading ("Smoke.": Kemuri), the reading can be determined.
[0059]
FIG. 13 shows an example in which, in the optimal solution of FIG. 12, readings are given to morpheme character strings to which readings are not given, such as “Sensen”, “Impersonate”, and “Yamahana”. According to the example of FIG. 13, for example, in the unregistered character string “Sumisen”, “Sumi” selects a phonetic reading because the entire character string is a general noun and the succeeding character is a kanji character. "Dedicated" selects on-reading because the entire character string is a general noun and the preceding character is a kanji. In the unregistered character string "mimic", "pseudo" selects Kunyomi because the entire character string is a verb and the succeeding character is a hiragana character. In the unregistered character string "Yamahana", "Yama" selects proper noun reading because the entire character string is a proper noun, and "Hana" specifies that the entire character string is a proper noun More proper noun reading is selected.
[0060]
In this way, an appropriate reading can be given to an unregistered character string, and at the stage where reading has been given to all character strings, the accent position and the pause position are determined and converted to a phonetic symbol string. The sound is output to the audio output unit 7 so that the audio output unit 7 can output an appropriate audio.
[0061]
FIG. 14 shows a state in which morphemes and readings are given to all the character strings of the input text in FIG. In the state as shown in FIG. 14, the phoneme setting unit can generate a phonetic symbol string as shown in FIG.
[0062]
FIG. 16 is a diagram illustrating a hardware configuration example of the audio output device in FIG. Referring to FIG. 16, this audio output device is realized by, for example, a personal computer or the like, and controls a CPU 21 that controls the whole, a ROM 22 that stores a control program of the CPU 21, and a RAM 23 that is used as a work area of the CPU 21. And a text input unit 1 for inputting Japanese text, and an output unit (for example, speaker) 7 for outputting voice.
[0063]
Here, for example, the morphological dictionary 3, the single kanji dictionary 5, and the sound conversion table 6 of FIG. 1 can be set in the RAM 23. The CPU 21 has the functions of the language analysis unit 2 and the phoneme setting unit 4 in FIG.
[0064]
The functions of the CPU 21 such as the language analysis unit 2 and the phoneme setting unit 4 can be provided, for example, in the form of a software package (specifically, an information recording medium such as a CD-ROM). Therefore, in the example of FIG. 16, when the information recording medium 30 is set, the medium driving device 31 that drives the information recording medium 30 is provided.
[0065]
In other words, the audio output device of the present invention reads a program recorded on an information recording medium such as a CD-ROM into a general-purpose computer system having a text input unit, an audio output unit, and the like. It can also be implemented in an apparatus configuration that causes the microprocessor to execute the audio output processing. In this case, the program for executing the audio output processing of the present invention (that is, the program used in the hardware system) is provided in a state recorded on a medium. The information recording medium on which the program or the like is recorded is not limited to a CD-ROM, but may be a ROM, a RAM, a flexible disk, a memory card, or the like. The program recorded on the medium is installed in a storage device incorporated in the hardware system, for example, a hard disk device or the RAM 23, so that the program is executed to contribute to the construction of the audio output device of the present invention. .
[0066]
Further, the program for implementing the audio output processing of the present invention may be provided not only in the form of a medium but also by communication (for example, by a server).
[0067]
【The invention's effect】
As described above, according to the first aspect of the invention, in a voice output device that converts an input Japanese text into a voice and outputs the voice, a morphological dictionary is generated for the input Japanese text. A linguistic analysis unit that generates a morpheme candidate using linguistic analysis, and a phonological setting unit that sets phonemes including readings for the morphological candidate generated by the linguistic analysis unit, wherein the linguistic analysis unit includes When generating morpheme candidates from the text using the morpheme dictionary, extracting from the text a set of character string ranges that are not registered in the morpheme dictionary and for which morpheme candidates could not be generated, as unregistered character strings, The phonological setting unit includes a single kanji dictionary in which reading information of kanji alone is stored, and a sound conversion table in which correspondence between characters other than kanji and pronunciation symbols is stored. When the unregistered character string is extracted by the analysis unit, the phoneme setting unit estimates the reading of the unregistered character string using the single kanji dictionary and the sound conversion table. Can be extracted as a group of morpheme candidates, attributes can be assigned as a whole character string, and advanced reading using attributes can be provided. Furthermore, even if the unregistered character string is a character string mixed with kanji or kana, it is possible to obtain phonemes in order from the top by using both the single kanji dictionary and the sound conversion table. Therefore, more appropriate likely reading and phoneme can be given to the unregistered character string, and more appropriate voice can be output.
[0068]
Also, Claim 1 According to the described invention ,Previous The single Kanji dictionary contains attribute information of the pronunciation of the pronunciation, Appended to kanji that appeared alone Because the attribute information of the individual reading is stored, it is not a constituent word of the compound word. Since natural reading can be provided, it is possible to perform a probable reading.
[0069]
Also, Claim 2 According to the invention described in claim 1, in the voice output device according to claim 1, the single kanji dictionary includes attribute information of specific part-of-speech reading limited to forming a specific part of speech together with attribute information of pronunciation of reading. Is stored, the constituent kanji of morpheme candidates belonging to a specific part of speech having a tendency to read can be given a reading in which the tendency is taken into account.
[0070]
Also, Claim 3 According to the invention described in the first aspect, in the voice output device according to the first aspect, the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the extracted unregistered character string as the entire unregistered character string. By providing the linguistic analysis unit with the part-of-speech estimating function, a group of extracted unregistered character strings can be provided with a part-of-speech attribute, and advanced reading can be provided.
[0071]
Also, Claim 4 According to the described invention, Claim 3 In the audio output device described above, the language analysis unit or the phoneme setting unit estimates the part of speech attribute of the entire unregistered character string by referring to morpheme candidates immediately before and immediately after the unregistered character string. The connection relationship with the part of speech of the morpheme candidate immediately thereafter makes it possible to give a likely part of speech attribute to an unregistered character string.
[0072]
According to the fifth aspect of the present invention, in the audio output device according to the third aspect, the language analysis unit or the phoneme setting unit sets the part of speech attribute as the entire unregistered character string. Preset In this case, estimation is performed within the range of the part of speech. In this case, only morpheme candidates limited to the part of speech attributes prepared in advance are generated. It can be performed efficiently.
[0073]
Also, Claim 6 According to the described invention, Claim 3 In the audio output device described above, when the linguistic analysis unit or the phoneme setting unit estimates a part of speech attribute of an unregistered character string as a whole for an unregistered character string, the estimated part of speech is Since a morpheme candidate having an attribute is generated and the generated morpheme candidate of the unregistered character string is handled in the same manner as a morpheme candidate of a registered character string registered in the morphological dictionary, an unregistered character string is included. Even in the case of a text, an optimal solution can be narrowed down in the language analysis unit or phoneme setting unit without stopping the analysis processing and without being affected by an unregistered character string.
[0074]
Also, Claim 7 According to the invention described in claim 1, in the voice output device according to claim 1, when the unregistered character string is extracted, the language analysis unit or the phoneme setting unit determines the character type of the unregistered character string, and determines the phoneme. If the setting unit determines that the character type is a kanji character string in the unregistered character string, the reading unit refers to the single kanji dictionary for the unregistered character string when estimating the reading of the unregistered character string. When a character string of a character type other than a kanji character of an unregistered character string is estimated, a proper conversion is added using a sound conversion table when estimating the reading of the unregistered character string. Therefore, it is possible to add a reading to an unregistered character string without going through complicated processing.
[0075]
Also, Claim 8 According to the described invention, Claim 7 In the voice output device described above, the phoneme setting unit, when giving a reading to a kanji character string in an unregistered character string, reads an appropriate reading from a plurality of reading information stored in a single kanji dictionary. Since the selection is made, an appropriate reading can be flexibly selected according to the target character string, and it is possible to perform a probable reading aloud.
[0076]
Also, Claim 9 According to the described invention, Claim 7 In the voice output device described above, the phoneme setting unit uses a character type of a character string immediately before and after the kanji character string when giving a reading to a kanji character string in an unregistered character string. As a result, an appropriate reading is selected from a plurality of pieces of reading information stored in the single kanji dictionary, so that it is possible to read aloud.
[0077]
Also, Claim 10 According to the described invention, Claim 7 In the audio output device described above, the language analysis unit or the phoneme setting unit estimates a part-of-speech attribute of the unregistered character string as a whole with respect to the extracted unregistered character string. When giving a reading to a kanji character string in an unregistered character string, a plurality of readings stored in a single kanji dictionary using the estimated part-of-speech attribute for the entire unregistered character string. Since an appropriate reading is selected from the information, a plausible reading is possible.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of an audio output device according to the present invention.
FIG. 2 is a diagram illustrating an example of a single kanji dictionary.
FIG. 3 is a diagram illustrating an example of a sound conversion table.
FIG. 4 is a flowchart illustrating an example of a processing operation of the audio output device in FIG. 1;
FIG. 5 is a flowchart illustrating an example of a processing operation of the audio output device in FIG. 1;
FIG. 6 is a flowchart illustrating an example of a part of speech estimation process and a morpheme candidate generation process for an unregistered character string.
FIG. 7 is a flowchart illustrating an example of a process of giving a reading to an unregistered character string.
FIG. 8 is a flowchart illustrating an example of a process of giving a reading to an unregistered character string.
FIG. 9 is a diagram for explaining a specific processing example of the audio output device of the present invention.
FIG. 10 is a diagram for explaining a specific processing example of the audio output device of the present invention.
FIG. 11 is a diagram for explaining a specific processing example of the audio output device of the present invention.
FIG. 12 is a diagram illustrating a specific processing example of the audio output device of the present invention.
FIG. 13 is a diagram illustrating a specific processing example of the audio output device of the present invention.
FIG. 14 is a diagram for explaining a specific processing example of the audio output device of the present invention.
FIG. 15 is a diagram illustrating a specific processing example of the audio output device of the present invention.
FIG. 16 is a diagram illustrating an example of a hardware configuration of an audio output device according to the present invention.
[Explanation of symbols]
1 Input section
2 Language Analysis Department
3 Morphological dictionary
4 Phoneme setting section
5 Simple Kanji dictionary
6 sound conversion table
7 Output section

Claims

A speech output device that converts input Japanese text into speech and outputs the speech; a language analysis unit that generates a morpheme candidate for the input Japanese text using a morphological dictionary and performs language analysis; A phoneme setting unit that sets phonemes including readings for the morpheme candidates generated by the analysis unit, and the language analysis unit generates a morpheme candidate from the text using the morpheme dictionary. A set of character strings that have not been registered in the morphological dictionary and morpheme candidates could not be generated are extracted from the text as unregistered character strings, and the phoneme setting unit stores kanji-only reading information. A single kanji dictionary, and a sound conversion table in which correspondence between characters other than kanji and phonetic symbols is stored, and when an unregistered character string is extracted by the language analysis unit, The rhyme setting unit is configured to estimate the reading of the unregistered character string using the single kanji dictionary and the sound conversion table. Single reading attribute information to be added to the appearing kanji is stored , and at least depending on the character type before and / or after the target character, an unregistered character string is used by using the attribute information of the reading pronunciation or the single reading attribute. An audio output device characterized by determining reading of a voice.

2. The voice output device according to claim 1, wherein the single kanji dictionary further stores attribute information of a specific part-of-speech reading only when constituting a specific part-of-speech together with attribute information of a reading part. Characteristic audio output device.

2. The voice output device according to claim 1, wherein the language analyzing unit or the phoneme setting unit estimates a part of speech attribute of the extracted unregistered character string as the entire unregistered character string. Audio output device.

4. The speech output device according to claim 3, wherein the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the entire unregistered character string by referring to morpheme candidates immediately before and after the unregistered character string. An audio output device characterized by the following.

4. The voice output device according to claim 3, wherein the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the unregistered character string as a whole within a predetermined part of speech range. .

4. The voice output device according to claim 3, wherein the language analysis unit or the phoneme setting unit estimates the part-of-speech attribute of the unregistered character string as a whole with respect to the unregistered character string. A voice output device that generates a morpheme candidate having an estimated part-of-speech attribute and treats the generated morpheme candidate of the unregistered character string in the same manner as a morpheme candidate of a registered character string registered in a morphological dictionary.

2. The voice output device according to claim 1, wherein the language analysis unit or the phoneme setting unit determines a character type of the unregistered character string when the unregistered character string is extracted, and the phoneme setting unit determines the character type of the unregistered character string. If it is determined that there is a character string whose character type is a kanji in the string, when estimating the reading of an unregistered character string, an appropriate reading is given to the unregistered character string by referring to a single kanji dictionary, and For a character string of a character type other than the kanji of the unregistered character string, an appropriate reading is added using a sound conversion table when estimating the reading of the unregistered character string. Output device.

8. The voice output device according to claim 7, wherein the phoneme setting unit, when adding a reading to a character string of a kanji in an unregistered character string, appropriately uses a plurality of reading information stored in a single kanji dictionary. An audio output device for selecting a proper reading.

8. The voice output device according to claim 7, wherein the phoneme setting unit, when adding a reading to a character string of a kanji in an unregistered character string, a character type of a character string immediately before and after the character string of the kanji. A voice output device that selects an appropriate reading from a plurality of pieces of reading information stored in a single kanji dictionary by using the kanji.

8. The speech output device according to claim 7, wherein the language analysis unit or the phoneme setting unit estimates a part of speech attribute of the extracted unregistered character string as a whole of the unregistered character string. The setting unit is configured to store the kanji character string in the unregistered character string in the single kanji dictionary using the estimated part-of-speech attribute for the entire unregistered character string. An audio output device for selecting an appropriate reading from a plurality of reading information.