JP2004171174A

JP2004171174A - Device and program for reading text aloud, and recording medium

Info

Publication number: JP2004171174A
Application number: JP2002334694A
Authority: JP
Inventors: Takao Kato; 隆夫加藤
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2002-11-19
Filing date: 2002-11-19
Publication date: 2004-06-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a text readout device which reads an unregistered unknown word aloud according to the reading which a user inputs. <P>SOLUTION: When there is any unanalyzable unknown word in a text to be read aloud, a text readout device 1 extracts the word, and displays the extracted unknown word on the text or a screen as shown by a figure. Once the reading of the displayed unknown word is inputted, the inputted unknown word and the reading are registered, and the text is read aloud afterwards according to the inputted reading. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明が属する技術分野】
本発明は、文章を音声合成技術を用いて合成音声として出力する文章読み上げ装置、読み上げのためのプログラム及び同プログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
文章を音声合成技術を用いて音声として出力することによって文章を読み上げる機能（文章読み上げ装置）が携帯情報端末（ＰＤＡ）に組み込まれるようになってきている。このような装置において、読み上げる書籍（新聞等を含む）の文章の中に読み方が複数ある単語または辞書に登録されておらず読み方の分からない単語がある場合がある。
【０００３】
このような場合、例えば特許文献１に開示された文書読上装置では、文書読み上げ中、読み上げ対象文書中から複数読み単語を検出し、その複数の読みを暫定学習データファイルから読み出して表示装置に表示し、その中から読み候補を指定することで、その読みを優先順位１位の読み候補に更新し、以後の読み上げを更新された読みにより行うようにしている。
【０００４】
このように、前記従来公知のものは、複数読みを暫定学習データファイルに記述されたその読みの優先順位の１番高いものを採用しており、暫定学習ファイルにないものについては単語辞書に記述されている読みの第１優先順位の読みに従い読み上げるようになっている。
【０００５】
【特許文献１】
特開平６−３３２８９９号公報（要約、段落（００１７）、段落（００２２））
【０００６】
【発明が解決しようとする課題】
前記従来の文書読上装置では複数読み単語については、選択した読みでその後も読み上げを行うことができる。しかし、この文書読上装置では単語辞書にない未知語については対応できない。
【０００７】
そこで、本発明は前記の問題点を解決するためになされたものであって、その第１の目的は文章の読み上げ中に、未知語についてユーザーがその未知語の読みを入力でき、その後は入力した読みで文章を読み上げることができるようにすることである。
【０００８】
第２の目的は、未知語に予め未知語マークを付与しておくことにより、読み上げ中に未知語の抽出を容易かつ迅速に行い、未知語の入力処理を円滑に行えるようにすることである。
【０００９】
【課題を解決するための手段】
請求項１の発明は、文章を解析し、該解析に従って音声を合成して読み上げる文章読み上げ装置において、読み上げ文章中の解析不能な未知語を抽出する手段と、抽出された未知語を表示する手段と、表示された未知語の読みを入力する手段と、入力された未知語をその読みと共に登録する手段とを備えている。この請求項１に記載の文章読み上げ装置では、前記未知語が登録された読みに従って読み上げられる。
【００１０】
請求項２の発明は、請求項１に記載された文章読み上げ装置において、読み上げ文章の形態素解析において解析不能の単語に未知語マークを付与する手段を備えている。この請求項２に記載の文章読み上げ装置では、前記未知語を抽出する手段が未知語に付与された未知語マークを検索して抽出を行う。
【００１１】
請求項３の発明は、文章を解析し、該解析に従って音声を合成して読み上げるためにコンピュータに、読み上げ文章中の解析不能な未知語を抽出する手順と、抽出された未知語を表示手段に表示する手順と、表示された未知語の読みを入力状態に移行する手順と、抽出された未知語を入力された読みと共に記憶手段に登録する手順と、前記未知語を登録された読みに従って読み上げるためのデータを生じる手順とを実行させることを特徴とする文章読み上げ装置用プログラムである。この請求項３に記載の発明によれば、上述の手順がコンピュータにより実行される。
【００１２】
請求項４の発明は、文章を解析し、該解析に従って音声を合成して読み上げるためにコンピュータに読み上げ文章中の解析不能な未知語を抽出する手順と、抽出された未知語に対して、未知語マークを付与する手順とを実行させることを特徴とする文章読み上げ装置用プログラムである。この請求項４に記載の発明によれば、上述の手順がコンピュータにより実行される。
【００１３】
請求項５に記載の発明は、請求項４に記載の手順に加えて、前記未知語マークを検出する手順と、前記未知語マークを付与された単語に読みを入力するための状態に移行する手順とをコンピュータに実行させるものである。この請求項５に記載の発明によれば、上述の手順がコンピュータにより実行される。
【００１４】
請求項６の発明は、請求項３乃至５のいずれかに記載のプログラムを記録したコンピュータ読み取り可能な記録媒体である。この請求項５の発明によれば、請求項３又は４に記載の発明と同様の作用を奏する。
【００１５】
【発明の実施の形態】
本発明の実施の形態に係る文章読み上げ装置について図面を参照しながら説明する。
【００１６】
図１は本発明の実施の形態に係る携帯端末装置による文章読み上げ装置の正面図である。文章読み上げ装置１は、図示のように、文字、画像が表示される液晶ディスプレイ２と、ユーザが各種キー操作を行うための操作キー３と、ユーザがスタイラス等を利用して入力するために液晶ディスプレイ２の表面に貼付されたタッチパネル４、スピーカ５、並びにイヤフォン（図示略）への音声出力のための出力端子部（図示略）を備えている。
【００１７】
次に上記文章読み上げ装置１の概略構成を示すブロック図である図２を参照して、上記装置の構成を説明する。
【００１８】
文章読み上げ装置１は、ＣＰＵ１０と、ＲＯＭ６と、ＲＡＭ７と、表示装置である上記液晶ディスプレイ２を含む表示装置９と、上記操作キー３及びタッチパネル４を含む入力装置８と、上記スピーカ５や出力端子部を含む音声出力装置５とを備えている。
【００１９】
ＲＯＭ６には、上記文章読み上げ装置１のＯＳ（制御プログラム）を記憶した領域（装置のプログラム領域）６１，上記装置１の電子ブックとして機能させるための電子ブックリーダプログラムを記憶した領域（電子ブックリーダプログラム領域）６２、文章読み上げ処理を実行させるための音声読み上げプログラムを記憶した領域（音声読み上げプログラム領域）６３等各種のプログラム記憶領域が設けられている。さらに、音声合成辞書データを記憶した領域（音声合成辞書データ領域）６５、音声合成音素データを記憶した領域（音声合成音素データ領域）６６、アクセント処理用辞書データを記憶した領域（アクセント処理用データ領域）６７等、読み上げ文章の形態素解析及び音声合成に必要なデータを記憶するためのデータ記憶領域が設けられている。
【００２０】
ＲＡＭ７には、音声合成のためのデータ処理の中間結果を記憶するためのワーク領域７１、ユーザが使用する各種データを記憶するためのユーザデータ領域７２、文章読み上げ装置１で読み上げられる書籍データを記憶するための書籍データ領域７３が設けられている。さらに解析できない単語（未知語）に関する情報を登録（記憶）するための領域として単語登録領域７４、及び、複数読みのある単語についてその読みを選択した場合その選択結果を記憶するための単語学習領域７５が設けられている。尚、上記データ領域７３は、文章読み上げ装置１外から取り込まれた書籍データも記憶されるように構成されている。
【００２１】
ＣＰＵ１０は、本実施の形態に係る文章読み上げ装置１においては、ＲＯＭ６に格納されている各種プログラムに基づいて、入力装置８、液晶ディスプレイ２、スピーカ５等について制御を行う。
【００２２】
ＣＰＵ１０は書籍データから、ＲＯＭ６の電子ブックリーダプログラム及び音声読み上げ用プログラムを使って音声合成データを作成し、このデータをスピーカ５に供給して合成音を出力する。合成音をスピーカ５に代えてイヤフォンから出力するようにしてもよいことは勿論である。
【００２３】
次に、実施の形態に係る文章読み上げ装置１のソフトウェアの構成を説明する。
【００２４】
図３は文章読み上げ装置１の文章読み上げ処理に関連する各プログラムの関連を示すブロック図である。
【００２５】
図中、電子ブックリーダプログラムは、携帯情報端末（ＰＤＡ）において、頁送り、文字組み、段組、頁サイズの変更、フォント設定、ルビ表示等の処理を実行させるためのプログラムである。
【００２６】
音声読み上げプログラムは書籍データ（テキストデータ）を合成音声で読み上げるためのプログラムで、上記電子ブックリーダプログラムに連動して動作するプログラムである。この該音声読み上げプログラムは、更に音声パラメータ変換プログラム、音声出力プログラム及びルビ情報切り出し及び登録プログラムを備えている。
【００２７】
音声パラメータ変換プログラムは読み上げ用文章から、音声合成単語辞書及びアクセント処理用辞書データを使って読み上げ用音声パラメータ文字列を作成する。音声出力プログラムは、上記音声パラメータ変換プログラムによって作成された読み上げ用音声パラメータ文字列に基づいて音声合成音素データを使って出力音声データを作成する。
【００２８】
未知語読み取得及び登録プログラムは、読み上げ文書について形態素解析を行う際に、辞書にないつまり解析できない単語（未知語）を見つけた時にその単語に未知語マークを付与（未知語であることを示すためのデータであって、文章の一部としては読まれないデータ）しておき、文書読み上げ中にその単語を検索して液晶ディスプレイ２に表示して、ユーザーがその読みを入力し、未知語とその読みとの組を上記単語登録領域７４に登録するまでの一連の処理を行うためのプログラムである。
【００２９】
読み選択及び学習プログラムは、同様に、読み上げ文書について形態素解析を行う際に、複数読み単語を見つけた時にその単語に複数読みマークを付与（複数読み語であることを示すためのデータであって、文章の一部としては読まれないデータ）しておき、文書読み上げ中にその単語を検索して液晶ディスプレイ２に表示して、ユーザーが複数の読みから優先順位第１位としての読みを選択し、その単語の読みの組を上記単語学習領域７５に登録するまでの一連の処理を行うためのプログラムである。
【００３０】
次に、上記文章読み上げ装置１の読み上げ処理について、読み上げ処理を図４のフローチャートを参照して説明する。
【００３１】
文章の読み上げ処理は、文章読み上げ装置１の液晶ディスプレイ２の表示画面に応じて、例えば読み上げ開始を入力するために操作キー３を操作することで開始され、以下のステップに従って実施される。
【００３２】
即ち、ステップＳ１０１において、ＣＰＵ１０は書籍データから、実際には液晶ディスプレイ２に表示された複数の文章の中から次に読み上げる１文章分のデータを切り出す（抽出する）。
【００３３】
切り出し文の一例として「今日は北京旅行の日です。」とする。
【００３４】
ステップＳ１０２において、音声合成辞書データ領域６５の音声合成辞書データを参照して１文章のデータを単語単位に分割して、各単語の読みとアクセント情報を取得する形態素解析を行う。
【００３５】
ここで、複数読みがある単語に対しては、既にユーザーによりどの読みが適切か選択され、その選択された読みが単語学習領域７５に登録されている場合はその読みを利用し、上記登録がない場合はあとで読みの選択ができるように所定の複数読みマークを付す。
【００３６】
また、この形態素解析において解析できない単語（以下未知語という）に対しては、あとで該単語の読みを取得するために複数読みマークとは異なる所定の未知語マークを付しておく。
【００３７】
上記例の解析結果は次のようになる。
【００３８】
単語分割として今日／は／北京／旅行／の／日／です／。
読みとしてキョー／ワ／ペキン／リョコー／ノ／ヒ／デス
アクセント情報として２＊１０＊０＊
（なお、数字はその位置でアクセントが落ちる。「０」は平板なアクセント、「＊」はアクセント位置の情報を持たない。）
【００３９】
続いてステップＳ１０３においてアクセント処理用データ領域６７のアクセント処理用データを参照し単語単位の読み、アクセント情報を基にして読み上げ用の音声パラメータ文字列を作成する。
【００４０】
上記例では、次のようになる。
アクセントと処理文字列キョ’ーワ／ペキンリョ’コーノ／ヒデ’ス｜（「／」は読み上げ単位の区切り位置、「｜」はポーズ位置、「’」はアクセントの落ちる位置）
【００４１】
ステップＳ１０４では、上記作成された音声パラメータ文字列を順に先頭から音声出力しながら読み上げていく。
【００４２】
この時上記複数読みマークの付された単語が存在する時は、液晶ディスプレイ２に複数の読みを表示してユーザーに優先順位１のものを操作キー３によって選択してもらい、この優先順位で読みを単語学習領域７５に記憶する。
【００４３】
又、未知語マークのある単語については、同様に液晶ディスプレイ２に表示してユーザーに読みを操作キー３によって入力してもらい、その読みを単語登録領域７４に登録する。なお、以後はこの登録された読みで文章の読み上げを行う。即ち、未知語に対して未知語マークを付与しその後、未知語マークを検出して、その検出された未知語マークを付与された単語の読みをユーザーが入力する状態に移行している。
【００４４】
ステップＳ１０５において、音声パラメータ文字列を基にして、音声合成音素データ領域６６のデータを参照して音素同士の結合・変形等を行い、出力音声データを作成する。
【００４５】
ステップＳ１０６において、出力音声データをアナログ変換してスピーカ５から音声出力する。
【００４６】
図５は、読み選択及び未知語読み取得処理のフローチャートである。
ステップＳ２０１においては、図４におけるステップＳ１０３で作成された音声パラメータ文字列を単語毎に検索し、複数読みマークの付された単語があるか否かを判定し（Ｓ２０２）、あると判定された時（ステップＳ２０２，ＹＥＳ）ステップＳ２０４に進む。
【００４７】
ステップＳ２０４では読み上げを一次停止し、当該単語の読み選択一覧を液晶ディスプレイ２表示する。
【００４８】
例えば、「大勢」という「たいせい」「おおぜい」の２通りの読みのある単語について、まだどちらの読みにするのか学習（登録）していない場合は読み上げを一旦停止し、例えば、図６に示す複数読みの選択画面で読み選択一覧を表示する。
【００４９】
次にステップＳ２０５において、ユーザーが読みを選択したかを判定し、選択された場合（ステップＳ２０５、ＹＥＳ）、ステップＳ２０６に進み、その読み単語学習領域７５に記憶し、読み上げを再開し（Ｓ２０９）、読み上げ処理を進める（Ｓ２１０）。なお、ステップＳ２０５では、ユーザーが読みを選択するまでは、次のステップＳ２０６には進まない。
【００５０】
ステップＳ２０２で複数読みマークが付された単語がない場合でも（Ｓ２０２，ＮＯ）、ステップＳ２０３で未知語マークの付された単語があると判定された時は（ステップＳ２０３，ＹＥＳ）、ステップＳ２０７に進み、読み上げを一次停止し、液晶ディスプレイ２に未知語の読みを入力する入力画面を表示する。例えば、未知語マークの付された「朝青龍」という単語の読みについて、まだ登録されていない場合は読み上げを一旦停止し、図７に示す入力画面を表示して読みの入力を促す。
【００５１】
なお、ステップＳ２０３において、未知語マークの付された単語がないと判定されたとき（ステップＳ２０３，ＮＯ）はそのまま読み上げ処理を進める（ステップ２１０）。
【００５２】
ステップＳ２０８において、ユーザーにより未知語の読みが入力された場合は（ステップＳ２０８、ＹＥＳ）、ステップＳ２０９に進み、その読みを登録して読み上げを再開し、その読み上げ処理を進める（Ｓ２１０）。なお、未知語の読みが入力されなかった場合は（ステップＳ２０８、ＮＯ）、ユーザーが読みを入力するまで次のステップＳ２０９には進まない。
【００５３】
このようにステップＳ２０１〜Ｓ２１０までの処理を順次行い、書籍データが最終まで読み上げられたと判定されたとき（Ｓ２１１、ＹＥＳ）、処理は終了する。
【００５４】
以上の処理は、文章読み上げ装置１のＣＰＵ１０がＲＯＭ６に格納したプログラムの指示及びデータに基づき、入力文章を解析し、解析に従って音声を合成して文章を読み上げる手順で行うが、その際に、本発明の実施の形態では、ＣＰＵ１０に読み上げ文章中の解析不能な未知語を抽出する手順と、抽出された未知語を表示手段に表示する手順と、表示された未知語の読みを入力する手順と、入力された未知語をその読みと共に記憶手段に登録する手順と、上記未知語を登録された読みに従って読み上げる手順とを実行させ、更に、読み上げ文章の形態素解析において解析不能の単語に未知語マークを付与する手順と、未知語に付与された未知語マークを検索する未知語抽出を行う手順を実行させることを文章読み上げ装置用プログラムをも提供するものである。
【００５５】
また、本プログラムは、従来周知のフレキシブルディスク、ＣＤＲＯＭ、ＤＶＤＲＯＭ、ＭＯ、ＨＤＤ等の任意のコンピュータ読み取り可能な記録媒体に記録して、或いはインターネット等のネットワーク網を介して提供することができる。
【００５６】
上述した実施の形態においては合成音声を発生するためのスピーカ５を備える１つの装置において、図４、図５に示すプログラムを全て処理しているが、各プログラムやプログラムの一部を別々の装置で処理して、最終的にスピーカ５から合成音声を生じても良い。例えば第１のコンピュータがステップＳ１０１〜Ｓ１０３を処理して、インターネットを介して音声パラメータ文字列のデータを第２のコンピュータに出力して、第２のコンピュータがステップＳ１０４〜Ｓ１０６を処理する。更に、合成音声を生じさせる装置は、文章読み上げの専用装置に限らず、読み上げ以外の他の機能を有するＰＡＤ、パソコン、携帯電話、カーナビゲーション端末、ＴＶ等であっても良い。尚、読み上げられる文章は、書籍に限らず、手紙（電子メールを含む）、道案内、宣伝、並びに歌詞などであっても良い。また、ＲＡＭ７に記憶されたデータは、装置の電源が落されると消失するが、装置の電源が落されても、継続して記憶されても良い。そして図４に示すステップＳ１０１〜Ｓ１０６については、一文章毎に処理する必要はなく、複数の文章毎について行っても良い。
【００５７】
【発明の効果】
本願の請求項１に記載の発明によれば、解析できない未知語があると文章読み上げ処理中に、未知語についてその読みの入力画面を表示し、ユーザーが読みを入力することでその読みが登録され、ユーザーにとって自然な手順で未知語についても正しい読みで読み上げることができ、読み上げを極めて自然に行うことができる。
【００５８】
本願の請求項２に記載の発明によれば、前記請求項１に記載の発明の効果に加え、読み上げ文章の解析時に未知語に予めマークを付与しておくことで読み上げ中に未知語の抽出が迅速に行われ、読み上げ中の処理がスムースに行われる。
【００５９】
本願の請求項３に記載の発明によれば、コンピュータに各手順を実行させ、未知語の読みの入力状態に移行して、入力された読みを用いて様々な語の読み上げのためのデータを生じることができる。
【００６０】
本願の請求項４に記載の発明によれば、コンピュータに各手順を実行させ、未知語の単語に未知語マークを付与しており、その未知語マークを付与された単語に関する後の処理を迅速に行うことができる。
【００６１】
本願の請求項５に記載の発明によれば、請求項４に記載の手順に加えて、更に各手順をコンピュータに実行させ、付与された未知語マークが検出されて、単語の読みの入力状態となるので、迅速に入力を行うことができる。
【００６２】
本願の請求項６に記載の発明によれば、請求項３乃至５のいずれかに記載の本発明のプログラムを記録媒体に記録したことで本プログラムを容易に提供することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態である文章読み上げ装置の正面図である。
【図２】本発明の実施の形態である文章読み上げ装置の概略構成を示すブロック図である。
【図３】本発明の実施の形態である文章読み上げ装置の読み上げソフトウェアのソフトウェア構成を示すブロック図である。
【図４】読み上げ処理のフローチャートである。
【図５】読み選択及び未知語読み取得処理のフローチャートである。
【図６】複数読み一覧表示した文章読み上げ装置の図である。
【図７】未知語の読み入力画面時の文章読み上げ装置の図である。
【符号の説明】
１・・・文章読み上げ装置、２・・・液晶ディスプレイ、３・・・操作キー、４・・・タッチパネル、５・・・スピーカー、６・・・ＲＯＭ、７・・・ＲＡＭ、８・・・入力装置、９・・・表示装置、１０・・・ＣＰＵ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text-to-speech apparatus that outputs text as synthesized speech using a speech synthesis technique, a program for reading aloud, and a recording medium on which the program is recorded.
[0002]
[Prior art]
2. Description of the Related Art A function of reading a sentence by outputting a sentence as a sound using a speech synthesis technology (sentence reading device) has been incorporated into a portable information terminal (PDA). In such an apparatus, there is a case where a sentence of a book (including a newspaper or the like) to be read includes a word that has a plurality of reading methods or a word that is not registered in the dictionary and cannot be read.
[0003]
In such a case, for example, the document reading device disclosed in Patent Document 1 detects a plurality of reading words from the reading target document during reading of the document, reads the plurality of readings from the provisional learning data file, and outputs the readings to the display device. By displaying and specifying a reading candidate from among them, the reading is updated to the reading candidate having the first priority, and the subsequent reading is performed by the updated reading.
[0004]
As described above, the conventional publicly known document employs a plurality of readings having the highest priority of the readings described in the provisional learning data file, and those not present in the provisional learning file are described in the word dictionary. The reading is performed according to the reading of the first priority of the reading being performed.
[0005]
[Patent Document 1]
JP-A-6-332899 (abstract, paragraph (0017), paragraph (0022))
[0006]
[Problems to be solved by the invention]
In the above-mentioned conventional text-to-speech apparatus, a plurality of words can be read aloud with the selected reading. However, this document reading device cannot handle unknown words that are not in the word dictionary.
[0007]
Therefore, the present invention has been made to solve the above problems, and a first object of the present invention is to allow a user to input the reading of an unknown word while reading out a sentence, It is to be able to read a sentence with the reading that was done.
[0008]
A second object is to add an unknown word mark to an unknown word in advance so that the unknown word can be easily and quickly extracted during reading, and the input processing of the unknown word can be performed smoothly. .
[0009]
[Means for Solving the Problems]
According to a first aspect of the present invention, there is provided a text-to-speech apparatus for analyzing a text, synthesizing a voice in accordance with the analysis, and reading out the unparseable unknown word in the text to be read, and displaying the extracted unknown word. Means for inputting the displayed reading of the unknown word, and means for registering the input unknown word together with the reading. In the text-to-speech apparatus according to the first aspect, the unknown word is read out according to the registered reading.
[0010]
According to a second aspect of the present invention, in the text-to-speech apparatus according to the first aspect, means is provided for assigning an unknown word mark to a word that cannot be analyzed in the morphological analysis of the text to be read. In the text-to-speech apparatus according to the second aspect, the means for extracting the unknown word searches for and extracts an unknown word mark assigned to the unknown word.
[0011]
According to a third aspect of the present invention, there is provided a computer for analyzing a sentence, synthesizing a speech in accordance with the analysis, and reading out the unparseable unknown word in the read-out sentence, and displaying the extracted unknown word on a display means. A displaying step, a step of shifting the displayed reading of the unknown word to an input state, a step of registering the extracted unknown word together with the input reading in the storage means, and reading out the unknown word according to the registered reading. And a procedure for generating data for the text reading apparatus. According to the third aspect of the present invention, the above procedure is executed by a computer.
[0012]
According to a fourth aspect of the present invention, there is provided a procedure for analyzing a sentence, synthesizing a speech in accordance with the analysis, and extracting an unanalyzable unknown word in the sentence read by a computer in order to read out the sentence. And a step of providing a word mark. According to the fourth aspect of the present invention, the above procedure is executed by a computer.
[0013]
According to a fifth aspect of the present invention, in addition to the procedure of the fourth aspect, the procedure shifts to a procedure for detecting the unknown word mark and a state for inputting a reading to a word to which the unknown word mark is added. And causing the computer to execute the steps. According to the invention described in claim 5, the above procedure is executed by the computer.
[0014]
According to a sixth aspect of the present invention, there is provided a computer-readable recording medium storing the program according to any one of the third to fifth aspects. According to the fifth aspect of the invention, the same effects as those of the third or fourth aspect of the invention are provided.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
A text-to-speech apparatus according to an embodiment of the present invention will be described with reference to the drawings.
[0016]
FIG. 1 is a front view of a text-to-speech apparatus using a portable terminal device according to an embodiment of the present invention. As shown, the text-to-speech apparatus 1 includes a liquid crystal display 2 on which characters and images are displayed, operation keys 3 for a user to perform various key operations, and a liquid crystal display for a user to input using a stylus or the like. A touch panel 4 attached to the surface of the display 2, a speaker 5, and an output terminal unit (not shown) for outputting sound to an earphone (not shown) are provided.
[0017]
Next, the configuration of the text-to-speech device 1 will be described with reference to FIG.
[0018]
The text-to-speech device 1 includes a CPU 10, a ROM 6, a RAM 7, a display device 9 including the liquid crystal display 2 as a display device, an input device 8 including the operation keys 3 and the touch panel 4, a speaker 5 and an output terminal. And an audio output device 5 including a unit.
[0019]
The ROM 6 has an area (program area of the apparatus) 61 in which the OS (control program) of the text-to-speech apparatus 1 is stored, and an area (electronic book reader) in which an electronic book reader program for functioning as an electronic book of the apparatus 1 is stored. Various program storage areas such as a program area 62 and an area 63 that stores a voice reading program for executing a text-to-speech process (voice reading program area) 63 are provided. Further, an area storing speech synthesis dictionary data (speech synthesis dictionary data area) 65, an area storing speech synthesis phoneme data (speech synthesis phoneme data area) 66, an area storing accent processing dictionary data (accent processing data A data storage area for storing data necessary for morphological analysis and speech synthesis of a read sentence, such as an area 67.
[0020]
The RAM 7 stores a work area 71 for storing intermediate results of data processing for speech synthesis, a user data area 72 for storing various data used by the user, and book data read out by the text reading apparatus 1. A book data area 73 is provided. Further, a word registration area 74 as an area for registering (storing) information on a word (unknown word) that cannot be analyzed, and a word learning area for storing a selection result of a word having a plurality of readings when the reading is selected. 75 are provided. The data area 73 is configured to store book data taken in from outside the text-to-speech apparatus 1.
[0021]
In the text-to-speech apparatus 1 according to the present embodiment, the CPU 10 controls the input device 8, the liquid crystal display 2, the speaker 5, and the like based on various programs stored in the ROM 6.
[0022]
The CPU 10 creates speech synthesis data from the book data using an electronic book reader program and a speech reading program in the ROM 6, supplies this data to the speaker 5, and outputs a synthesized sound. Of course, the synthesized sound may be output from the earphone instead of the speaker 5.
[0023]
Next, the software configuration of the text-to-speech apparatus 1 according to the embodiment will be described.
[0024]
FIG. 3 is a block diagram showing the relationship between programs related to the text-to-speech process of the text-to-speech apparatus 1.
[0025]
In the figure, an electronic book reader program is a program for causing a personal digital assistant (PDA) to execute processes such as page feed, character set, column set, change of page size, font setting, and ruby display.
[0026]
The voice reading program is a program for reading book data (text data) with synthesized voice, and operates in conjunction with the electronic book reader program. The voice reading program further includes a voice parameter conversion program, a voice output program, and a ruby information cutout and registration program.
[0027]
The speech parameter conversion program creates a speech parameter string for speech using a speech synthesis word dictionary and accent processing dictionary data from the text for speech. The speech output program creates output speech data using speech-synthesized phoneme data based on the speech parameter character string for reading-out created by the speech parameter conversion program.
[0028]
The unknown word reading acquisition and registration program assigns an unknown word mark to a word (indicating that the word is unknown) when it finds a word (unknown word) that is not in the dictionary, that is, cannot be analyzed, when performing morphological analysis on the read-out document. Data that is not read as part of a sentence), search for the word while reading out the document, display the word on the liquid crystal display 2, and enter the reading by the user. This is a program for performing a series of processing until a pair of the word and its reading is registered in the word registration area 74.
[0029]
Similarly, when performing a morphological analysis on a read-aloud document, the reading selection and learning program adds a plurality of reading marks to a plurality of reading words when the word is found (data for indicating that the word is a plurality of reading words. Data that is not read as part of a sentence), search for the word while reading out the document, display the word on the liquid crystal display 2, and select a reading as the first priority from a plurality of readings. This is a program for performing a series of processing until the reading set of the word is registered in the word learning area 75.
[0030]
Next, the text-to-speech processing of the text-to-speech apparatus 1 will be described with reference to the flowchart of FIG.
[0031]
The text-to-speech process is started, for example, by operating the operation key 3 to input the start of text-to-speech according to the display screen of the liquid crystal display 2 of the text-to-speech apparatus 1, and is performed according to the following steps.
[0032]
That is, in step S101, the CPU 10 actually cuts out (extracts) one sentence of data to be read out next from a plurality of sentences displayed on the liquid crystal display 2 from the book data.
[0033]
As an example of the cutout sentence, "Today is a day of a trip to Beijing."
[0034]
In step S102, the data of one sentence is divided into words by referring to the speech synthesis dictionary data in the speech synthesis dictionary data area 65, and morphological analysis is performed to acquire the reading and accent information of each word.
[0035]
Here, for a word having a plurality of readings, the user has already selected which reading is appropriate, and if the selected reading is registered in the word learning area 75, the reading is used and the registration is performed. If not, a predetermined plural reading mark is added so that reading can be selected later.
[0036]
In addition, a word that cannot be analyzed in the morphological analysis (hereinafter, referred to as an unknown word) is given a predetermined unknown word mark different from the multiple-read mark in order to acquire the reading of the word later.
[0037]
The analysis result of the above example is as follows.
[0038]
As word division, today / ha / Beijing / travel / no / day / is /.
2 * 10 * 0 * as Kyo / wa / pekin / ryoko / no / hi / death accent information as reading
(Note that the number is accented at that position. "0" has a flat accent, and "*" has no information on the accent position.)
[0039]
Subsequently, in step S103, reading is performed in word units with reference to the accent processing data in the accent processing data area 67, and a voice parameter character string for reading is created based on the accent information.
[0040]
In the above example, it is as follows.
Accent and processing character string kyo'wa / pekinryo'kono / hide's | ("/" is the position of the reading unit, "|" is the pause position, "'" is the position where the accent falls)
[0041]
In step S104, the generated voice parameter character strings are read out while outputting voices sequentially from the beginning.
[0042]
At this time, when there is a word with the multiple reading mark, a plurality of readings are displayed on the liquid crystal display 2 and the user selects the priority 1 with the operation key 3 and reads the reading with this priority. Is stored in the word learning area 75.
[0043]
In addition, a word having an unknown word mark is similarly displayed on the liquid crystal display 2, and the user inputs a reading using the operation keys 3, and the reading is registered in the word registration area 74. After that, the sentence is read out using the registered reading. That is, an unknown word mark is added to an unknown word, and then the unknown word mark is detected, and the user enters a reading of the word to which the detected unknown word mark has been added.
[0044]
In step S105, based on the voice parameter character string, referring to the data in the voice-synthesized phoneme data area 66, connection / deformation of phonemes is performed, and output voice data is created.
[0045]
In step S106, the output audio data is converted into an analog signal and output as audio from the speaker 5.
[0046]
FIG. 5 is a flowchart of the reading selection and unknown word reading acquisition processing.
In step S201, the voice parameter character string created in step S103 in FIG. 4 is searched for each word, and it is determined whether or not there is a word with multiple reading marks (S202). Time (step S202, YES), the process proceeds to step S204.
[0047]
In step S204, the reading is temporarily stopped, and a reading selection list of the word is displayed on the liquid crystal display 2.
[0048]
For example, if a word with two readings, "daisei" and "most", which is "many", has not yet learned (registered) which one to read, the reading is temporarily stopped. A reading selection list is displayed on the multiple reading selection screen shown in FIG.
[0049]
Next, in step S205, it is determined whether or not the user has selected reading, and if it has been selected (step S205, YES), the process proceeds to step S206, where it is stored in the reading word learning area 75, and reading out is resumed (S209). Then, the reading process proceeds (S210). In step S205, the process does not proceed to the next step S206 until the user selects reading.
[0050]
Even if there is no word with the multiple reading mark in step S202 (S202, NO), if it is determined in step S203 that there is a word with the unknown word mark (step S203, YES), the process returns to step S207. Then, the reading is temporarily stopped, and an input screen for inputting the reading of the unknown word is displayed on the liquid crystal display 2. For example, if the reading of the word "Asa Seiryu" with an unknown word mark is not registered yet, the reading is temporarily stopped, and an input screen shown in FIG. 7 is displayed to prompt the user to input the reading.
[0051]
When it is determined in step S203 that there is no word with an unknown word mark (step S203, NO), the reading process proceeds as it is (step 210).
[0052]
In step S208, when an unknown word is read by the user (step S208, YES), the process proceeds to step S209, where the reading is registered, reading is resumed, and the reading process is advanced (S210). If the reading of the unknown word is not input (step S208, NO), the process does not proceed to the next step S209 until the user inputs the reading.
[0053]
In this way, the processing of steps S201 to S210 is sequentially performed, and when it is determined that the book data has been read out to the end (S211, YES), the processing ends.
[0054]
The above process is performed by the CPU 10 of the text-to-speech apparatus 1 based on the instructions and data of the program stored in the ROM 6, analyzing the input text, synthesizing the voice according to the analysis, and reading out the text. In the embodiment of the present invention, a procedure for extracting an unanalyzable unknown word in a sentence read by the CPU 10, a procedure for displaying the extracted unknown word on the display means, a procedure for inputting the reading of the displayed unknown word, Executing the steps of registering the input unknown word in the storage means together with the reading, and reading the unknown word in accordance with the registered reading. And a program for a text-to-speech apparatus to execute a procedure of extracting an unknown word to search for an unknown word mark assigned to an unknown word. It is intended to provide.
[0055]
Further, the program can be recorded on any computer-readable recording medium such as a conventionally known flexible disk, CDROM, DVDROM, MO, and HDD, or can be provided via a network such as the Internet.
[0056]
In the above-described embodiment, one apparatus including the speaker 5 for generating a synthesized voice processes all of the programs illustrated in FIGS. 4 and 5. , And finally a synthesized voice may be generated from the speaker 5. For example, the first computer processes steps S101 to S103, outputs voice parameter character string data to the second computer via the Internet, and the second computer processes steps S104 to S106. Further, the device that generates the synthesized speech is not limited to a dedicated device for reading out text, but may be a PAD, a personal computer, a mobile phone, a car navigation terminal, a TV, or the like having a function other than reading out. The text to be read out is not limited to a book, but may be a letter (including an e-mail), a guide, a publicity, lyrics, and the like. Further, the data stored in the RAM 7 is lost when the power of the apparatus is turned off, but may be stored continuously even if the power of the apparatus is turned off. Steps S101 to S106 shown in FIG. 4 need not be performed for each sentence, and may be performed for a plurality of sentences.
[0057]
【The invention's effect】
According to the invention described in claim 1 of the present application, when there is an unknown word that cannot be analyzed, an input screen for reading the unknown word is displayed during the text-to-speech processing, and the reading is registered by the user inputting the reading. Thus, the unknown word can be read aloud in a natural manner for the user and can be read aloud naturally.
[0058]
According to the invention described in claim 2 of the present application, in addition to the effect of the invention described in claim 1, extraction of unknown words during reading is performed by adding marks in advance to unknown words at the time of analyzing the text to be read. Is performed quickly, and the process during reading out is performed smoothly.
[0059]
According to the invention described in claim 3 of the present application, the computer is caused to execute each procedure, shift to an input state of reading of an unknown word, and read data for reading out various words using the input reading. Can occur.
[0060]
According to the invention described in claim 4 of the present application, the computer is caused to execute each procedure, and an unknown word mark is assigned to the word of the unknown word, and the subsequent processing on the word to which the unknown word mark is added is quickly performed. Can be done.
[0061]
According to the invention described in claim 5 of the present application, in addition to the procedure described in claim 4, the computer is further caused to execute each procedure, the added unknown word mark is detected, and the input state of the word reading is detected. Therefore, the input can be performed quickly.
[0062]
According to the invention described in claim 6 of the present application, the program according to any one of claims 3 to 5 is recorded on a recording medium, so that the program can be easily provided.
[Brief description of the drawings]
FIG. 1 is a front view of a text-to-speech apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a schematic configuration of a text-to-speech apparatus according to an embodiment of the present invention.
FIG. 3 is a block diagram illustrating a software configuration of reading software of the text reading apparatus according to the embodiment of the present invention;
FIG. 4 is a flowchart of a reading process.
FIG. 5 is a flowchart of reading selection and unknown word reading acquisition processing.
FIG. 6 is a diagram of a text-to-speech apparatus displaying a plurality of reading lists.
FIG. 7 is a diagram of a text-to-speech apparatus on an unknown word reading input screen.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Text-to-speech device, 2 ... Liquid crystal display, 3 ... Operation keys, 4 ... Touch panel, 5 ... Speaker, 6 ... ROM, 7 ... RAM, 8 ... Input device, 9 display device, 10 CPU.

Claims

In a text-to-speech apparatus that analyzes text and synthesizes and reads voice according to the analysis,
Means for extracting unparseable unknown words in the text to be read out;
Means for displaying the extracted unknown words,
Means for inputting the reading of the displayed unknown word, means for registering the extracted unknown word together with the input reading,
With
A text-to-speech apparatus, wherein the unknown word is read out according to the registered reading.

The text-to-speech apparatus according to claim 1,
A means for assigning an unknown word mark to a word that cannot be analyzed in the morphological analysis of the text to be read, and
The text-to-speech apparatus, wherein the means for extracting the unknown word searches for and extracts an unknown word mark added to the unknown word.

In order to analyze a sentence, synthesize a voice according to the analysis, and read aloud,
Extracting unknown words that cannot be analyzed in the text to be read out,
Displaying the extracted unknown word on a display means;
A procedure for shifting the displayed unknown word reading to an input state, a procedure for registering the extracted unknown word in the storage means together with the input reading,
And a procedure for generating data for reading the unknown word in accordance with the registered reading.

In order to analyze a sentence, synthesize a voice according to the analysis, and read aloud,
A program for reading aloud, comprising: executing a procedure of extracting an unanalyzable unknown word in a text to be read and a step of adding an unknown word mark to the extracted unknown word.

In the program for reading out according to claim 4,
Detecting the unknown word mark;
Shifting to a state for inputting a reading of the word to which the unknown word mark has been added.

A computer-readable recording medium on which the program according to claim 3 is recorded.