JP4319851B2

JP4319851B2 - Reading apparatus, reading method and reading processing program

Info

Publication number: JP4319851B2
Application number: JP2003088862A
Authority: JP
Inventors: 功一郎福永; 優小原
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2003-03-27
Filing date: 2003-03-27
Publication date: 2009-08-26
Anticipated expiration: 2023-03-27
Also published as: JP2004294897A

Description

【０００１】
【発明の属する技術分野】
本発明は、メールなどの文章を読み上げる読み上げ装置、読み上げ方法及び読み上げ処理用プログラムに関する。
【０００２】
【従来の技術】
従来から用いられているＴＴＳ（Text To Speech）装置（以下、読み上げ装置という）は、例えば、図５に示すように構成されている。すなわち、単語文字表記とその発音データを１対１で定義した発音定義データベース１と、読み上げ対象となる文字列の解析を行う解析部２と、各単語についてその発音データを出力する発音データ変換部３と、その発音データに基づいて合成音声を出力する音声合成部４を備えている。
【０００３】
このような構成を有する従来の読み上げ装置は、図６に示すフローチャートに従って動作する。すなわち、かな／漢字／英数字等が混在した単語によって構成される文章が記されたテキスト文字列データが解析部２に入力され、その文章を構成する各単語に分けられる（ステップ６０１）。続いて、発音データ変換部３が、発音定義データベース１を参照して、文章を構成する各単語に対応した発音データを検索し、入力されたテキスト文章に対する発音データ列を構築して出力する（ステップ６０２）。出力された発音データ列は、音声合成部４に入力され、この発音データ列に基づいて、テキスト文字列データが読み上げられる（ステップ６０４）。
【０００４】
【特許文献１】
特開２００３−１４４８５号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、上述したような従来の読み上げ装置には、以下に述べるような問題点があった。
すなわち、読み上げ装置に入力された文字列データが、例えば、図７に示したような文字列であった場合、下線を付した「A」、「AM」という文字は、同じ表記であっても、文脈上の構成からして、Ａ▲１▼は、アルファベットとして「エー」と発音し、A▲２▼は、英語の接頭語として「ア」と発音される。また、AM▲１▼は、英語の動詞として「アム」と発音し、AM▲２▼は、ラジオの変調方式として「エーエム」と発音される。従って、図７に示した文字列は、「エー、アイアムアメンバーオブエーエムトーキョーファンクラブ。」と読み上げるのが正しい。
【０００６】
しかし、従来の読み上げ装置では、図５の発音定義データベース１において、文字列とそれに対応した発音データを１対１でしか定義できないため、どちらか一方の読みしか実現できない。その結果、図７に示した文字列は、「エー、アイアムエーメンバーオブアムトーキョーファンクラブ。」と、明らかにおかしな読み上げがなされ、意味不明な音声情報を提供してしまう場合があった。
【０００７】
また、特許文献１はこのような従来技術の一例であって、母国語と現地語のように複数の読み方がある場合に、それぞれの読み方を（母国語と現地語の双方）を用いて進路案内文の音声出力を行うものである。しかし、一通りの読み方だけが要求される場合、複数の読み方が辞書にあるときには、いずれを発声するかは予め規則で定められているため、適切な発声がなされない場合もあった。
【０００８】
本発明は、上述したような従来技術の問題点を解消するために提案されたものであり、その目的は、同一表記異発音語句が存在する文章での誤った読み上げの発生を減らすことができる、精度の高い読み上げ装置、読み上げ方法及び読み上げ処理用プログラムを提供することにある。
【０００９】
【課題を解決するための手段】
上記目的を達成するために、請求項１に記載の読み上げ装置は、読み上げ対象となる文字列を構成する各単語についての１又は複数の読みを格納した発音定義データベースと、この発音定義データベースを参照して、各単語の発音データを検出する発音データ検出部と、各単語について検出された発音データが複数か否かを判断する判定部と、検出された発音データが複数の場合に、それらの発音データを合成して、中間的な発音データを作成する中間発音データ作成部と、前記判定部で単一の読みと判断された発音データ、及び／又は、前記中間発音データ作成部によって作成された中間発音データに基づいて、合成音声を出力する音声合成部とを有することを特徴とする。
【００１０】
また、請求項３に記載の読み上げ方法は、請求項１に記載の発明を方法の観点で捉えたものであって、読み上げ対象となる文字列を構成する各単語についての１又は複数の読みを格納した発音定義データベースを参照して、各単語の発音データを検出する発音データ検出処理と、各単語について検出された発音データが複数か否かを判断する判定処理と、検出された発音データが複数の場合に、それらの発音データを合成して、中間的な発音データを作成する中間発音データ作成処理と、前記判定処理において単一の読みと判断された発音データ、及び／又は、前記中間発音データ作成処理によって作成された中間発音データに基づいて、合成音声を出力する音声合成処理とを含むことを特徴とする。
【００１１】
また、請求項５に記載の発明は、請求項３に記載の発明をコンピュータプログラムという観点で捉えたものであって、コンピュータを制御することにより、読み上げ対象となる文字列を読み上げる読み上げ処理用プログラムにおいて、そのプログラムは前記コンピュータに、読み上げ対象となる文字列を構成する各単語についての１又は複数の読みを格納した発音定義データベースを参照して、各単語の発音データを検出する発音データ検出ステップと、各単語について検出された発音データが複数か否かを判断する判定ステップと、検出された発音データが複数の場合に、それらの発音データを合成して、中間的な発音データを作成する中間発音データ作成ステップと、前記判定ステップにおいて単一の読みと判断された発音データ、及び／又は、前記中間発音データ作成ステップにおいて作成された中間発音データに基づいて、合成音声を出力する音声合成ステップとを実行させるものであることを特徴とする。
【００１２】
上記のような構成を有する請求項１、請求項３及び請求項５の発明によれば、複数の読み方がある単語については、それら複数の読みを合成した中間発音データが作成されるので、使用者がどちらとも聞き取れるような音声として出力することができる。その結果、同一表記異発音語句が存在する文章での誤った読み上げの発生を大幅に減らすことができる。
【００１３】
請求項２に記載の読み上げ装置は、読み上げ対象となる文字列から、同一表記異発音単語を抽出する抽出部と、前記同一表記異発音単語についての中間発音データを定義した中間発音データ定義データベースと、この中間発音データ定義データベースを参照して、同一表記異発音単語を中間発音データに変換する第１の発音データ変換部と、単一の読みを有する単語についての発音データを定義した単一発音データ定義データベースと、この単一発音データ定義データベースを参照して、前記第１の発音データ変換部で変換されていない文字列データを、所定の発音データに変換する第２の発音データ変換部と、前記第１の発音データ変換部と第２の発音データ変換部のそれぞれによって変換された発音データに基づいて、合成音声を出力する音声合成部とを有することを特徴とする。
【００１４】
また、請求項４に記載の読み上げ方法は、請求項２に記載の発明を方法の観点で捉えたものであって、読み上げ対象となる文字列から、同一表記異発音単語を抽出する抽出処理と、前記同一表記異発音単語についての中間発音データを定義した中間発音データ定義データベースを参照して、同一表記異発音単語を中間発音データに変換する第１の発音データ変換処理と、単一の読みを有する単語についての発音データを定義した単一発音データ定義データベースを参照して、前記第１の発音データ変換処理によって未だ変換されていない文字列データを、所定の発音データに変換する第２の発音データ変換処理と、前記第１の発音データ変換処理と第２の発音データ変換処理のそれぞれによって変換された発音データに基づいて、合成音声を出力する音声合成処理とを含むことを特徴とする。
【００１５】
また、請求項６に記載の発明は、請求項４に記載の発明をコンピュータプログラムという観点で捉えたものであって、コンピュータを制御することにより、読み上げ対象となる文字列を読み上げる読み上げ処理用プログラムにおいて、そのプログラムは前記コンピュータに、読み上げ対象となる文字列から、同一表記異発音単語を抽出する抽出ステップと、前記同一表記異発音単語についての中間発音データを定義した中間発音データ定義データベースを参照して、同一表記異発音単語を中間発音データに変換する第１の発音データ変換ステップと、単一の読みを有する単語についての発音データを定義した単一発音データ定義データベースを参照して、前記第１の発音データ変換ステップによって未だ変換されていない文字列データを、所定の発音データに変換する第２の発音データ変換ステップと、前記第１の発音データ変換ステップと第２の発音データ変換ステップのそれぞれによって変換された発音データに基づいて、合成音声を出力する音声合成ステップとを実行させるものであることを特徴とする。
【００１６】
上記のような構成を有する請求項２、請求項４及び請求項６の発明によれば、複数の読み方がある単語については、予め中間的な発音データが定義されているため、その中間発音データ定義データベースを参照して、中間発音データを取得するだけで、使用者がどちらとも聞き取れるような音声として出力することができる。その結果、同一表記異発音語句が存在する文章での誤った読み上げの発生を大幅に減らすことができる。
【００１７】
【発明の実施の形態】
以下、本発明の実施の形態（以下、実施形態という）について、図面を参照して、具体的に説明する。
なお、本発明の各機能は、コンピュータを、ソフトウェアで制御することによって実現することが一般的である。この場合、コンピュータが備えるレジスタ、メモリ、外部記憶装置などの記憶装置が、いろいろな形式で、情報を一時的に保持したり永続的に保存する。そして、ＣＰＵが、前記ソフトウェアにしたがって、これらの情報に加工及び判断などの処理を加え、さらに、処理の順序を制御する。
【００１８】
また、コンピュータを制御するソフトウェアは、本出願の各請求項及び本明細書に記述する処理に対応した命令を組み合わせることによって作成され、作成されたソフトウェアは、アセンブルやコンパイルされた組み込みソフトウェアなどの形式で実行されることで、上記のようなハードウェア資源を活用する。
【００１９】
但し、本発明を実現するための上記のような態様はいろいろ変更することができ、例えば、本発明を実現するソフトウェアを記録したＲＯＭチップやＣＤ−ＲＯＭのような記録媒体は、それ単独でも本発明の一態様である。また、本発明の機能の一部をＬＳＩなどの物理的な電子回路で実現することも可能である。
【００２０】
（１）第１実施形態
（１−１）構成
図１は、本実施形態の読み上げ装置の全体構成を示すブロック図である。すなわち、本実施形態の読み上げ装置は、入力された文章（テキストデータ）から、処理対象となる単語を抽出する単語抽出部１０と、各単語の１又は複数の読みを格納した発音定義データベース１１と、この発音定義データベース１１を参照して、処理対象となる単語の発音データを検出する発音データ検出部１２と、その単語について検出された発音データが複数か否かを判断する判定部１３と、検出された発音データが複数の場合に、それらの発音データを合成して、中間的な発音データを作成する中間発音データ作成部１４と、各単語について、前記判定部１３で単一の読みと判断された発音データ、あるいは、前記中間発音データ作成部１４によって作成された中間発音データを記憶する発音データ記憶部１５と、各単語の発音データに基づいて、合成音声を出力する音声合成部１６とを備えている。
なお、前記中間発音データ作成部１４における複数の発音データの合成方法は、予め設定されているものとする。
【００２１】
（１−２）作用
上記のような構成を有する本実施形態の読み上げ装置における処理の流れを、図２に示したフローチャートを参照して説明する。
まず、単語抽出部１０が、読み上げ対象となる文字列（テキストデータ）を取得し、この文字列の中から、単語を一つ抽出する（ステップ２０１、２０２）。そして、抽出された各単語に対して、発音定義データベース１１を参照して、その単語の発音データを検出する（ステップＳ３）。
【００２２】
その単語について検出された発音データが複数であるか否かが判断され（ステップ２０４）、複数であった場合には、それら複数の発音データを合成して、中間発音データを作成し（ステップ２０５）、その発音データを記憶する（ステップ２０６）。一方、ステップ２０４において、その単語について検出された発音データが一つであった場合には、ステップ２０６に進み、その発音データを記憶する。
【００２３】
続いて、読み上げ対象となる文字列（テキストデータ）に含まれる全ての単語について発音データが取得されたか否かが判断され（ステップ２０７）、まだ終了していない場合には、ステップ２０２に戻り、ステップ２０２〜ステップ２０６の処理を繰り返す。一方、全ての単語について発音データが取得された場合には、それらの発音データに従って文字列（テキストデータ）が読み上げられる（ステップ２０８）。
【００２４】
（１−３）具体例
続いて、従来技術の項で説明した図７に示した文章を例にして、本実施形態の作用・効果を具体的に説明する。なお、図７に示した文字列中、下線で示された同一表記異発音の単語については、図２のステップ２０３において、全ての読みが検出され、ステップ２０４において、検出された発音データは複数であるとの判断がなされる。そして、この場合、中間発音データ作成部１４において、使用者がどちらとも聞き取れるような中間的な発音データ（中間発音データ）が合成される。
【００２５】
この具体例においては、単語抽出部１０が、読み上げ対象となる文字列（テキストデータ）を取得し、この文字列の中から単語を一つ抽出する。例えば、単語「Ａ」を抽出し、この単語について、発音定義データベース１１を参照して、その単語の発音データを検索すると、「ア」と「エー」の２通りの読み方が検出される。このように複数の読み方がある場合、中間発音データ作成部１４において、それら複数の発音データを合成して、「アェー」と聞こえるような中間的な発音データが作成される。
【００２６】
また、単語「ＡＭ」が抽出された場合も、その単語の発音データを検索すると、「アム」と「エーエム」の２通りの読み方が検出され、それら複数の発音データを合成して、「アェーム」と聞こえるような中間的な発音データが作成される。一方、単一の読み方しかない単語については、発音定義データベース１１からその発音データが検出される。
【００２７】
従って、図７に示した文字列は、「（＊＊１）． I （＊＊２）（＊＊１） MENBER OF （＊＊２）TOKYO FUN CLUB．」と変換される。なお、（＊＊１）は、「A」の中間的な発音データ≒「アェー」、（＊＊２）は、「AM」の中間的な発音データ≒「アェーム」である。
【００２８】
その結果、出力される音声は、「アェー（＊）、アイアェーム（＊）アェー（＊）メンバーオブアェーム（＊）トーキョーファンクラブ」と聞こえる音声となる。なお、（＊）を付した音声は、例としてそのように聞こえるという意味である。
【００２９】
なお、発音データは音響的なデータであるため、文字で表現するのは非常に困難であり、上記の例では、それぞれ「アェー」「アェーム」と表しているが、実際には２つの読みの中間的な音声となるような波形−周波数特性に調整された音響的なデータが作成される。
【００３０】
（１−４）効果
上述したように、本実施形態の読み上げ装置によれば、複数の読み方がある単語については、それら複数の読みを合成した中間発音データが作成されるので、使用者がどちらとも聞き取れるような音声として出力することができる。その結果、同一表記異発音語句が存在する文章での誤った読み上げの発生を大幅に減らすことができる。
【００３１】
（２）第２実施形態
（２−１）構成
図３は、本実施形態の読み上げ装置の全体構成を示すブロック図である。すなわち、本実施形態の読み上げ装置は、入力された文章（テキストデータ）から、例えば、「A」、「AM」のような同一表記異発音単語を抽出する同一表記異発音単語抽出部２１と、前記同一表記異発音単語についての中間発音データを定義した中間発音データ定義データベース２２と、この中間発音データ定義データベース２２を参照して、抽出された同一表記異発音単語に対して、使用者がどちらとも聞き取れるような中間発音データに変換する第１の発音データ変換部２３と、単一の読みを有する単語についての発音データを定義した単一発音データ定義データベース２４と、前記同一表記異発音単語のみについて変換がなされた変換済み発音データと、未変換のテキスト文字列が混在したテキストデータに対して、前記単一発音データ定義データベース２４を参照して、残りの文字列データを発音データに変換する第２の発音データ変換部２５と、前記第１の発音データ変換部２３と第２の発音データ変換部２５とによって変換された発音データに基づいて、合成音声を出力する音声合成部２６とを備えている。
【００３２】
（２−２）作用
上記のような構成を有する本実施形態の読み上げ装置における処理の流れを、図４に示したフローチャートを参照して説明する。
まず、同一表記異発音単語抽出部２１が、読み上げ対象となる文章（テキストデータ）を取得し、この文章の中から、中間発音データ定義データベース２２に予め登録されている同一表記異発音単語（例：「A」、「AM」）を抽出する（ステップＳ４０１、ステップＳ４０２）。
【００３３】
続いて、第１の発音データ変換部２３が、抽出された各単語に対して、中間発音データ定義データベース２２を参照して、予め設定された中間発音データに変換する（ステップＳ４０３）。
【００３４】
抽出された全ての同一表記異発音単語について、中間発音データへの変換が終了したか否かが判断され（ステップＳ４０４）、続いて、第２の発音データ変換部２５が、変換されていない残りの文字列について、単一発音データ定義データベース２４を参照して、通常の発音データ列に変換する（ステップＳ４０５）。
そして、ステップＳ４０３及びステップＳ４０５において変換された発音データに基づいて、文字列（テキストデータ）が読み上げられる（ステップ４０６）。
【００３５】
（２−３）具体例
続いて、従来技術の項で説明した図７に示した文章を例にして、本発明の作用・効果を具体的に説明する。なお、図７に示した文字列中、下線で示された同一表記異発音の語句は、予め、中間発音データ定義データベース２２に、複数の読み方を有する単語として定義されると共に、使用者がどちらとも聞き取れるような中間的な発音データも定義されている。
【００３６】
なお、この中間発音データは、例えば、上記の「Ａ」、「ＡＭ」の２通りの読みの場合では、以下のように定義されている。すなわち、表記「Ａ」について、読みが「ア」と「エー」の２通りある場合には、例えば、「アェー」と聞こえるような発音データが定義されている。また、表記「ＡＭ」で、「アム」と「エーエム」の２通りの読み方ができる場合には、例えば、「アェーム」と聞こえるような発音データが定義されている。
【００３７】
なお、発音データとは音響的なデータであるため、文字で表現するのは非常に困難であり、上記の例では、それぞれ「アェー」「アェーム」と表しているが、実際には２つの読みの中間的な音声となるような波形−周波数特性に調整された音響的なデータが作成される。
【００３８】
同一表記異発音単語抽出部２１は、このような中間発音データ定義データベース２２を参照して、入力された文字列中からこのデータベース２２に定義された中間発音すべき単語文字列（この例では「Ａ」と「ＡＭ」）を抽出し、以下のように、抽出した文字列部分のみを、予め設定された発音データ（中間発音データ）に変換する。
【００３９】
すなわち、図７に示した文字列は、「（＊＊１）． I （＊＊２）（＊＊１） MENBER OF （＊＊２）TOKYO FUN CLUB．」と変換される。なお、（＊＊１）は、「A」の発音データ≒「アェー」、（＊＊２）は、「AM」の発音データ≒「アェーム」である。
【００４０】
続いて、第２の発音データ変換部２５が、既に変換された発音データを無視して、変換されていない残りの文字列について、単一発音データ定義データベース２４を参照して、文章を構成する各単語に対応した発音データを検索し、入力されたテキスト文章に対する発音データ列を構築して出力し、音声合成部２６で発話音声データに変換し、音声として出力する。
【００４１】
その結果、出力される音声は、「アェー（＊）、アイアェーム（＊）アェー（＊）メンバーオブアェーム（＊）トーキョーファンクラブ」と聞こえる音声となる。なお、（＊）を付した音声は、例としてそのように聞こえるという意味である。
【００４２】
（２−４）効果
上述したように、本実施形態の読み上げ装置によれば、複数の読み方がある単語については、予め中間的な発音データが定義されているため、その中間発音データ定義データベースを参照して、中間発音データを取得するだけで、使用者がどちらとも聞き取れるような音声として出力することができる。その結果、同一表記異発音語句が存在する文章での誤った読み上げの発生を大幅に減らすことができる。
【００４３】
（３）他の実施形態
本発明は、上述したような実施形態に限定されるものではなく、以下のような変形例が可能である。すなわち、複数の読み方のある単語について、中間発音データを作成する方法は、特に限定されるものではなく、複数の読み方を単純に合成する方法や、いずれかの読みを強調して合成する方法等、適宜選択することができる。
【００４４】
【発明の効果】
以上説明したように、本発明によれば、同一表記異発音語句が存在する文章での誤った読み上げの発生を減らすことができる、精度の高い読み上げ装置、読み上げ方法及び読み上げ処理用プログラムを提供することができる。
【図面の簡単な説明】
【図１】本発明に係る読み上げ装置の第１実施形態の構成を示すブロック図
【図２】図１に示した第１実施形態の読み上げ装置における処理の流れを示すフローチャート
【図３】本発明に係る読み上げ装置の第２実施形態の構成を示すブロック図
【図４】図３に示した第２実施形態の読み上げ装置における処理の流れを示すフローチャート
【図５】従来の読み上げ装置の構成を示すブロック図
【図６】従来の読み上げ装置における処理の流れを示すフローチャート
【図７】読み上げ対象となる文章（テキストデータ）の一例を示す図
【符号の説明】
１０…単語抽出部
１１…発音定義データベース
１２…発音データ検出部
１３…判定部
１４…中間発音データ作成部
１５…発音データ記憶部
１６…音声合成部
２１…同一表記異発音単語抽出部
２２…中間発音データ定義データベース
２３…第１の発音データ変換部
２４…単一発音データ定義データベース
２５…第２の発音データ変換部
２６…音声合成部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a reading device that reads a text such as an email, a reading method, and a reading processing program.
[0002]
[Prior art]
A conventionally used TTS (Text To Speech) device (hereinafter referred to as a reading device) is configured as shown in FIG. 5, for example. That is, a pronunciation definition database 1 in which a word character notation and its pronunciation data are defined one-to-one, an analysis unit 2 that analyzes a character string to be read out, and a pronunciation data conversion unit that outputs the pronunciation data for each word 3 and a speech synthesizer 4 for outputting synthesized speech based on the pronunciation data.
[0003]
The conventional reading apparatus having such a configuration operates according to the flowchart shown in FIG. That is, text character string data in which a sentence composed of words in which kana / kanji / alphanumeric characters are mixed is input to the analysis unit 2 and divided into each word constituting the sentence (step 601). Subsequently, the pronunciation data conversion unit 3 refers to the pronunciation definition database 1 to search for pronunciation data corresponding to each word constituting the sentence, and constructs and outputs a pronunciation data string for the input text sentence ( Step 602). The output pronunciation data string is input to the speech synthesizer 4, and the text character string data is read out based on the pronunciation data string (step 604).
[0004]
[Patent Document 1]
Japanese Patent Laid-Open No. 2003-14485
[Problems to be solved by the invention]
However, the conventional reading apparatus as described above has the following problems.
That is, if the character string data input to the reading device is, for example, a character string as shown in FIG. 7, the underlined characters “A” and “AM” may have the same notation. From the contextual structure, A (1) is pronounced as “A” as an alphabet, and A (2) is pronounced as “A” as an English prefix. AM (1) is pronounced “am” as an English verb, and AM (2) is pronounced “AM” as a radio modulation method. Therefore, the string shown in FIG. 7, "er, I Am A Member of AMR Tokyo fan club." Is to read aloud and correct.
[0006]
However, in the conventional reading device, since the character string and the corresponding pronunciation data can be defined only one-on-one in the pronunciation definition database 1 of FIG. 5, only one of the readings can be realized. As a result, the character string shown in FIG. 7, and "er, I Am er members of Am Tokyo fan club.", Apparently strange reading has been made, there is a case that would provide a meaning unknown voice information.
[0007]
Patent Document 1 is an example of such a conventional technique, and when there are a plurality of readings such as a native language and a local language, the course is determined by using each of the readings (both the native language and the local language). The voice of the guidance sentence is output. However, when only one reading method is required, when there are a plurality of reading methods in the dictionary, which one is to be uttered is determined in advance according to a rule, and therefore, appropriate utterance may not be made.
[0008]
The present invention has been proposed in order to solve the above-described problems of the prior art, and the object of the present invention is to reduce the occurrence of erroneous reading in a sentence having the same notation different pronunciation phrase. Another object of the present invention is to provide a reading apparatus, a reading method, and a reading processing program with high accuracy.
[0009]
[Means for Solving the Problems]
To achieve the above object, the reading device according to claim 1 refers to a pronunciation definition database storing one or a plurality of readings for each word constituting a character string to be read out, and the pronunciation definition database. A pronunciation data detection unit that detects the pronunciation data of each word, a determination unit that determines whether or not there is a plurality of pronunciation data detected for each word, and when there are a plurality of detected pronunciation data, An intermediate pronunciation data creation unit that synthesizes pronunciation data to create intermediate pronunciation data, and pronunciation data determined as a single reading by the determination unit and / or created by the intermediate pronunciation data creation unit And a speech synthesizer that outputs synthesized speech based on the intermediate pronunciation data.
[0010]
Further, the reading method according to claim 3 captures the invention according to claim 1 from the viewpoint of the method, and reads one or a plurality of readings for each word constituting the character string to be read. Referring to the stored pronunciation definition database, pronunciation data detection processing for detecting pronunciation data for each word, determination processing for determining whether or not there is a plurality of pronunciation data detected for each word, and detected pronunciation data In a plurality of cases, the intermediate pronunciation data creating process for synthesizing those pronunciation data to create intermediate pronunciation data, the pronunciation data determined as a single reading in the determination process, and / or the intermediate pronunciation data And voice synthesis processing for outputting synthesized speech based on the intermediate pronunciation data created by the pronunciation data creation processing.
[0011]
The invention according to claim 5 captures the invention according to claim 3 from the viewpoint of a computer program, and reads out a character string to be read out by controlling the computer. In the computer, the program refers to the pronunciation definition database storing one or more readings for each word constituting the character string to be read out, and detects pronunciation data of each word. A determination step for determining whether or not there is a plurality of pronunciation data detected for each word; and if there are a plurality of detected pronunciation data, the pronunciation data is synthesized to create intermediate pronunciation data Intermediate pronunciation data creation step, pronunciation data determined as a single reading in the determination step, and Or, on the basis of the intermediate sound data generated in the intermediate sound data creation step, characterized in that to execute a speech synthesis step of outputting synthetic speech.
[0012]
According to the inventions of claim 1, claim 3 and claim 5 having the above-described configuration, for a word having a plurality of readings, intermediate pronunciation data obtained by synthesizing the plurality of readings is created. Can be output as a voice that can be heard by both. As a result, it is possible to greatly reduce the occurrence of erroneous reading out in a sentence in which the same notation different pronunciation phrase exists.
[0013]
The reading device according to claim 2 includes an extraction unit that extracts the same notation different pronunciation word from the character string to be read, an intermediate pronunciation data definition database that defines intermediate pronunciation data for the same notation different pronunciation word, A first pronunciation data conversion unit that converts the same pronounced different pronunciation word into intermediate pronunciation data with reference to the intermediate pronunciation data definition database, and a single pronunciation that defines pronunciation data for a word having a single reading A data definition database, and a second pronunciation data conversion unit that converts character string data that has not been converted by the first pronunciation data conversion unit into predetermined pronunciation data with reference to the single pronunciation data definition database And outputting synthesized speech based on the pronunciation data converted by each of the first pronunciation data conversion unit and the second pronunciation data conversion unit. And having a speech synthesizer.
[0014]
Further, the reading method according to claim 4 captures the invention according to claim 2 from the viewpoint of the method, and includes an extraction process for extracting the same notation different pronunciation word from the character string to be read. A first pronunciation data conversion process for converting the same notation different pronunciation words into intermediate pronunciation data with reference to an intermediate pronunciation data definition database defining intermediate pronunciation data for the same notation different pronunciation words; A second pronunciation that converts character string data that has not been converted by the first pronunciation data conversion processing into predetermined pronunciation data, with reference to a single pronunciation data definition database that defines pronunciation data for words having Based on the pronunciation data converted by each of the pronunciation data conversion processing and the first pronunciation data conversion processing and the second pronunciation data conversion processing, Characterized in that it comprises a speech synthesis process of force.
[0015]
The invention according to claim 6 is the computer program for reading out a character string to be read out by controlling the computer by capturing the invention according to claim 4 from the viewpoint of a computer program. In the computer, the program refers to the extraction step of extracting the same notation different pronunciation word from the character string to be read out, and the intermediate pronunciation data definition database defining the intermediate pronunciation data for the same notation different pronunciation word. A first pronunciation data conversion step for converting the same notation different pronunciation words into intermediate pronunciation data, and a single pronunciation data definition database that defines pronunciation data for words having a single reading; Character string data that has not yet been converted by the first pronunciation data conversion step Speech synthesis for outputting synthesized speech based on the second pronunciation data conversion step for converting to the second pronunciation data, and the pronunciation data converted by each of the first pronunciation data conversion step and the second pronunciation data conversion step Steps are executed.
[0016]
According to the invention of claim 2, claim 4 and claim 6 having the above-mentioned configuration, intermediate pronunciation data is defined in advance for words having a plurality of readings. By simply obtaining intermediate pronunciation data by referring to the definition database, it can be output as a voice that the user can hear. As a result, it is possible to greatly reduce the occurrence of erroneous reading out in a sentence in which the same notation different pronunciation phrase exists.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention (hereinafter referred to as embodiments) will be specifically described with reference to the drawings.
Each function of the present invention is generally realized by controlling a computer with software. In this case, a storage device such as a register, a memory, or an external storage device included in the computer temporarily holds or permanently stores information in various formats. Then, the CPU adds processing such as processing and determination to these pieces of information according to the software, and further controls the order of processing.
[0018]
The software for controlling the computer is created by combining instructions corresponding to the processing described in each claim of the present application and the present specification, and the created software is in the form of assembled or compiled embedded software or the like. The above hardware resources are utilized by executing the above.
[0019]
However, the above-described aspects for realizing the present invention can be variously modified. For example, a recording medium such as a ROM chip or a CD-ROM storing software for realizing the present invention can be used alone. It is one embodiment of the invention. Also, some of the functions of the present invention can be realized by a physical electronic circuit such as an LSI.
[0020]
(1) First Embodiment (1-1) Configuration FIG. 1 is a block diagram showing the overall configuration of a reading apparatus according to this embodiment. That is, the reading device of the present embodiment includes a word extraction unit 10 that extracts a word to be processed from an input sentence (text data), and a pronunciation definition database 11 that stores one or more readings of each word. The pronunciation definition database 11 is referred to, a pronunciation data detection unit 12 that detects the pronunciation data of a word to be processed, a determination unit 13 that determines whether or not there is a plurality of pronunciation data detected for the word, When there is a plurality of detected pronunciation data, the pronunciation data are combined to create intermediate pronunciation data, and the intermediate pronunciation data creation section 14 that creates intermediate pronunciation data, and the determination section 13 reads a single reading for each word. A pronunciation data storage unit 15 for storing the determined pronunciation data or the intermediate pronunciation data created by the intermediate pronunciation data creation unit 14, and a pronunciation data for each word Based on the data, and a speech synthesis unit 16 for outputting a synthesized speech.
It is assumed that the method of synthesizing the plurality of pronunciation data in the intermediate pronunciation data creation unit 14 is set in advance.
[0021]
(1-2) Operation The flow of processing in the reading apparatus of the present embodiment having the above-described configuration will be described with reference to the flowchart shown in FIG.
First, the word extraction unit 10 acquires a character string (text data) to be read out, and extracts one word from the character string (steps 201 and 202). Then, for each extracted word, the pronunciation definition database 11 is referenced to detect the pronunciation data of the word (step S3).
[0022]
It is determined whether or not there is a plurality of pronunciation data detected for the word (step 204). If there are a plurality of pronunciation data, the plurality of pronunciation data are combined to create intermediate pronunciation data (step 205). The pronunciation data is stored (step 206). On the other hand, if there is only one pronunciation data detected for the word in step 204, the process proceeds to step 206 and the pronunciation data is stored.
[0023]
Subsequently, it is determined whether or not pronunciation data has been acquired for all the words included in the character string (text data) to be read out (step 207). If not yet completed, the process returns to step 202, The processing from step 202 to step 206 is repeated. On the other hand, if pronunciation data is acquired for all words, a character string (text data) is read out according to the pronunciation data (step 208).
[0024]
(1-3) Specific Example Next, the operation and effect of the present embodiment will be specifically described with reference to the sentence shown in FIG. In the character string shown in FIG. 7, for the same notation differently pronounced words indicated by the underline, all readings are detected in step 203 of FIG. 2, and a plurality of pronunciation data is detected in step 204. Judgment is made. In this case, the intermediate pronunciation data creation unit 14 synthesizes intermediate pronunciation data (intermediate pronunciation data) that can be heard by both users.
[0025]
In this specific example, the word extraction unit 10 acquires a character string (text data) to be read out, and extracts one word from the character string. For example, when the word “A” is extracted and the pronunciation data of the word is searched with reference to the pronunciation definition database 11, two ways of reading “A” and “A” are detected. In this way, when there are a plurality of readings, the intermediate pronunciation data creation unit 14 synthesizes the plurality of pronunciation data to create intermediate pronunciation data that sounds like “Ah”.
[0026]
Also, even when the word “AM” is extracted, when the pronunciation data of the word is searched, two ways of reading “am” and “AM” are detected. Intermediate pronunciation data that can be heard is created. On the other hand, the pronunciation data is detected from the pronunciation definition database 11 for words that have only a single reading.
[0027]
Accordingly, the character string shown in FIG. 7 is converted to “(** 1). I (** 2) (** 1) MENBER OF (** 2) TOKYO FUN CLUB.”. Note that (** 1) is intermediate sound data of “A” ≈ “Ah”, and (** 2) is intermediate sound data of “AM” ≈ “Ahm”.
[0028]
As a result, the output sound is the sound that can be heard as “Ah (*), I Ahm (*) Ah (*) Member of Ahm (*) Tokyo Fan Club”. Note that the sound with (*) means that it sounds like that as an example.
[0029]
Since the pronunciation data is acoustic data, it is very difficult to express it with characters. In the above example, they are represented as “Ah” and “Ahm”, respectively. Acoustic data adjusted to have a waveform-frequency characteristic that produces an intermediate sound is created.
[0030]
(1-4) Effect As described above, according to the reading device of the present embodiment, for a word having a plurality of readings, intermediate pronunciation data obtained by combining the plurality of readings is created. Both can be output as a sound that can be heard. As a result, it is possible to greatly reduce the occurrence of erroneous reading out in a sentence in which the same notation different pronunciation phrase exists.
[0031]
(2) Second Embodiment (2-1) Configuration FIG. 3 is a block diagram showing the overall configuration of the reading apparatus according to the present embodiment. That is, the reading device of the present embodiment includes the same notation different pronunciation word extraction unit 21 that extracts the same notation different pronunciation words such as “A” and “AM” from the input sentence (text data), The intermediate pronunciation data definition database 22 that defines intermediate pronunciation data for the same notation different pronunciation words, and the intermediate pronunciation data definition database 22 with reference to the intermediate pronunciation data definition database 22, which of the same notation different pronunciation words is used by the user A first pronunciation data conversion unit 23 for converting into intermediate pronunciation data that can be heard together, a single pronunciation data definition database 24 that defines pronunciation data for words having a single reading, and the same notation different pronunciation words only For the converted phonetic data converted for the text data and the text data mixed with the unconverted text character string, Referring to the definition database 24, the second pronunciation data conversion unit 25 that converts the remaining character string data into pronunciation data, and the first pronunciation data conversion unit 23 and the second pronunciation data conversion unit 25 perform conversion. And a speech synthesizer 26 for outputting synthesized speech based on the generated pronunciation data.
[0032]
(2-2) Operation The flow of processing in the reading apparatus of the present embodiment having the above-described configuration will be described with reference to the flowchart shown in FIG.
First, the same notation different pronunciation word extraction unit 21 acquires a sentence (text data) to be read out, and from this sentence, the same notation different pronunciation word (example) registered in the intermediate pronunciation data definition database 22 in advance. : “A”, “AM”) are extracted (step S401, step S402).
[0033]
Subsequently, the first pronunciation data conversion unit 23 converts each extracted word into preset intermediate pronunciation data with reference to the intermediate pronunciation data definition database 22 (step S403).
[0034]
It is determined whether or not conversion to intermediate pronunciation data has been completed for all of the extracted notation different pronunciation words (step S404), and then the second pronunciation data conversion unit 25 performs the remaining conversion. Is converted into a normal pronunciation data string with reference to the single pronunciation data definition database 24 (step S405).
A character string (text data) is read out based on the pronunciation data converted in step S403 and step S405 (step 406).
[0035]
(2-3) Specific Example Next, the operation and effect of the present invention will be specifically described using the sentence shown in FIG. 7 described in the section of the prior art as an example. In addition, in the character string shown in FIG. 7, words with the same notation and different pronunciation, which are indicated by underlining, are defined in the intermediate pronunciation data definition database 22 as words having a plurality of readings in advance, Intermediate pronunciation data that can be heard is also defined.
[0036]
The intermediate pronunciation data is defined as follows, for example, in the case of the above two readings “A” and “AM”. That is, for the notation “A”, if there are two readings “A” and “A”, for example, pronunciation data that sounds like “A” is defined. In addition, when the notation “AM” can be read in two ways, “AM” and “AM”, for example, pronunciation data that can be heard as “AM” is defined.
[0037]
Note that the pronunciation data is acoustic data, so it is very difficult to express it with characters. In the above example, they are represented as “Ah” and “Ahm”, respectively. Thus, acoustic data adjusted to have a waveform-frequency characteristic so as to be an intermediate voice is created.
[0038]
The same notation different pronunciation word extraction unit 21 refers to such an intermediate pronunciation data definition database 22 and selects a word character string to be intermediately pronounced defined in the database 22 from the input character string (in this example, “ A ”and“ AM ”) are extracted, and only the extracted character string portion is converted into preset pronunciation data (intermediate pronunciation data) as follows.
[0039]
That is, the character string shown in FIG. 7 is converted to “(** 1). I (** 2) (** 1) MENBER OF (** 2) TOKYO FUN CLUB.”. Note that (** 1) is “A” sounding data≈ “Ah”, and (** 2) is “AM” sounding data≈ “Ahm”.
[0040]
Subsequently, the second pronunciation data conversion unit 25 ignores the already-converted pronunciation data, and constructs a sentence with reference to the single pronunciation data definition database 24 for the remaining character strings that have not been converted. The phonetic data corresponding to each word is searched, a phonetic data string for the input text sentence is constructed and output, converted into utterance voice data by the voice synthesizer 26, and output as voice.
[0041]
As a result, the output sound is the sound that can be heard as “Ah (*), I Ahm (*) Ah (*) Member of Ahm (*) Tokyo Fan Club”. Note that the sound with (*) means that it sounds like that as an example.
[0042]
(2-4) Effects As described above, according to the reading device of the present embodiment, intermediate pronunciation data is defined in advance for words having a plurality of readings. By simply referring to the intermediate pronunciation data, it can be output as a sound that can be heard by both users. As a result, it is possible to greatly reduce the occurrence of erroneous reading out in a sentence in which the same notation different pronunciation phrase exists.
[0043]
(3) Other Embodiments The present invention is not limited to the above-described embodiments, and the following modifications are possible. In other words, the method of creating intermediate pronunciation data for a word with a plurality of readings is not particularly limited, a method of simply combining a plurality of readings, a method of emphasizing any of the readings, etc. Can be appropriately selected.
[0044]
【The invention's effect】
As described above, according to the present invention, it is possible to provide a high-accuracy reading device, a reading method, and a reading processing program capable of reducing the occurrence of erroneous reading in a sentence having the same notation different pronunciation phrase. be able to.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a first embodiment of a reading apparatus according to the present invention. FIG. 2 is a flowchart showing a flow of processing in the reading apparatus according to the first embodiment shown in FIG. The block diagram which shows the structure of 2nd Embodiment of the reading apparatus which concerns on FIG. 4 is a flowchart which shows the flow of a process in the reading apparatus of 2nd Embodiment shown in FIG. 3 FIG. 5 shows the structure of the conventional reading apparatus Block diagram [FIG. 6] A flow chart showing a flow of processing in a conventional reading apparatus. [FIG. 7] An example of a sentence (text data) to be read out.
DESCRIPTION OF SYMBOLS 10 ... Word extraction part 11 ... Pronunciation definition database 12 ... Pronunciation data detection part 13 ... Determination part 14 ... Intermediate pronunciation data creation part 15 ... Pronunciation data storage part 16 ... Speech synthesis part 21 ... Same notation different pronunciation word extraction part 22 ... Intermediate Pronunciation data definition database 23 ... first pronunciation data conversion unit 24 ... single pronunciation data definition database 25 ... second pronunciation data conversion unit 26 ... speech synthesis unit

Claims

A pronunciation definition database that stores one or more readings for each word constituting the character string to be read out;
With reference to this pronunciation definition database, a pronunciation data detection unit that detects pronunciation data of each word;
A determination unit for determining whether or not there is a plurality of pronunciation data detected for each word;
When the detected pronunciation data is plural, an intermediate pronunciation data creation unit that synthesizes the pronunciation data and creates intermediate pronunciation data;
A speech synthesizer that outputs synthesized speech based on the pronunciation data determined as a single reading by the determination unit and / or the intermediate pronunciation data created by the intermediate pronunciation data creation unit. Read-out device.

An extraction unit that extracts the same notation different pronunciation words from the character string to be read out;
An intermediate pronunciation data definition database defining intermediate pronunciation data for the same notation different pronunciation words;
A first pronunciation data conversion unit for converting the same notation different pronunciation words into intermediate pronunciation data with reference to the intermediate pronunciation data definition database;
A single pronunciation data definition database that defines pronunciation data for words with a single reading;
A second phonetic data conversion unit that converts character string data that has not been converted by the first phonetic data conversion unit into predetermined phonetic data with reference to the single phonetic data definition database;
A speech reading apparatus comprising: a speech synthesis unit that outputs synthesized speech based on the pronunciation data converted by each of the first pronunciation data conversion unit and the second pronunciation data conversion unit.

A pronunciation data detection process for detecting pronunciation data of each word with reference to a pronunciation definition database storing one or more readings for each word constituting a character string to be read;
A determination process for determining whether or not there is a plurality of pronunciation data detected for each word;
When the detected pronunciation data is plural, intermediate pronunciation data creation processing for synthesizing those pronunciation data to create intermediate pronunciation data;
A speech synthesis process for outputting synthesized speech based on the pronunciation data determined to be a single reading in the determination process and / or the intermediate pronunciation data created by the intermediate pronunciation data creation process. Read-out method.

An extraction process for extracting the same notation different pronunciation word from the character string to be read out,
A first pronunciation data conversion process for converting the same notation different pronunciation words into intermediate pronunciation data with reference to an intermediate pronunciation data definition database that defines intermediate pronunciation data for the same notation different pronunciation words;
Refers to a single pronunciation data definition database that defines pronunciation data for a word having a single reading, and converts character string data that has not yet been converted by the first pronunciation data conversion processing into predetermined pronunciation data A second pronunciation data conversion process,
A speech reading method comprising: speech synthesis processing for outputting synthesized speech based on the pronunciation data converted by each of the first pronunciation data conversion processing and the second pronunciation data conversion processing.

In the reading processing program that reads out the character string to be read out by controlling the computer,
The program is stored in the computer,
A pronunciation data detection step of detecting pronunciation data of each word with reference to a pronunciation definition database storing one or more readings for each word constituting a character string to be read;
A determination step of determining whether or not there is a plurality of pronunciation data detected for each word;
When the detected pronunciation data is plural, an intermediate pronunciation data creation step of synthesizing those pronunciation data to create intermediate pronunciation data;
A speech synthesis step of outputting synthesized speech based on the pronunciation data determined as a single reading in the determination step and / or the intermediate pronunciation data created in the intermediate pronunciation data creation step. A program for reading processing, characterized by being.

In the reading processing program that reads out the character string to be read out by controlling the computer,
The program is stored in the computer,
An extraction step of extracting the same notation different pronunciation word from the character string to be read out;
Referring to an intermediate pronunciation data definition database defining intermediate pronunciation data for the same notation different pronunciation words, a first pronunciation data conversion step of converting the same notation different pronunciation words into intermediate pronunciation data;
Refers to a single pronunciation data definition database that defines pronunciation data for a word having a single reading, and converts character string data that has not been converted by the first pronunciation data conversion step into predetermined pronunciation data A second pronunciation data conversion step,
A speech processing step for executing a speech synthesis step for outputting synthesized speech based on the pronunciation data converted by each of the first pronunciation data conversion step and the second pronunciation data conversion step Program.