JP3857188B2

JP3857188B2 - Text-to-speech system and method

Info

Publication number: JP3857188B2
Application number: JP2002170595A
Authority: JP
Inventors: 伸之片江; 泰山崎; 篤志山本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-06-11
Filing date: 2002-06-11
Publication date: 2006-12-13
Anticipated expiration: 2022-06-11
Also published as: JP2004013122A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and a system for declaiming a text for precisely declaiming the text even when using a device having a limited storage capacity. <P>SOLUTION: In the method for declaiming a text in an environment wherein a server for managing data transmission/reception between terminals and a plurality of terminals utilizing the server are connected through a network, the server uses a large-scale word dictionary to perform word analysis of text data to be transmitted to a prescribed terminal being a destination and extracts words which are included in the large-scale word dictionary but are not in a word dictionary provided to the terminal being the destination, and further extracts reading information being information about how to read the words. The server then makes the extracted words and information into a user dictionary for the terminal being the destination and delivers it to the terminal. The terminal refers to the delivered user dictionary for the terminal and the previously prepared word dictionary to convert text data included in transmitted data to a voice. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、記憶容量に制限のある携帯端末等であっても正確な読み上げを可能とするテキスト読み上げシステム及び方法に関する。
【０００２】
【従来の技術】
昨今のコンピュータ環境の急速な進展に伴い、演算処理負荷の大きい音声合成技術を用いたアプリケーションの開発も多様になってきている。最近では、カーナビゲーションや電話応答システム等の従来からの応用分野に加えて、携帯電話やＰＤＡ（Personal Digital Assistant）等の携帯端末にも適用されており、電子メール等のテキストデータを読み上げるのに用いられている。かかるアプリケーションにおいては、視覚障害者や年配者がこれらの機材を使用する際の大きな手助けとなるとともに、健常者にとっても、画面に目を奪われずにテキストの内容を知ることができ、特にハンズフリーが要求される局面において利便性の高いものとなっている。
【０００３】
図１は、従来の一般的なテキスト読み上げシステムのブロック構成図である。図１において、テキストデータが入力されると、言語処理部１における単語解析部１１では、システムが共通して用意しているシステム辞書１２と、テキスト作成者が個々に設定したユーザ辞書１３の２つの単語辞書を使用して、テキストの文字列を単語に分割する。そして、構文解析部１４において、各単語の係り受け情報を解析し、ポーズのとり方等の情報を生成することになる。
【０００４】
次に、音響処理部２における韻律生成部２１では、テキスト中の各音の時間長やピッチ周波数パターンを設定し、波形生成部２２では、音声素片データ記憶部２３に記憶されている音声素片データを、韻律生成部２１で設定した各音の時間長やピッチ周波数パターン等に従って繋ぎ合わせることによって音声波形を生成する。
【０００５】
アプリケーションへの適用例として、テキスト読み上げシステムを搭載した電子メールシステムの構成例を図２に示す。図２において、通信ネットワーク２０１上にメールサーバ１０１と複数の端末Ａ３１〜Ｘ５１が接続されている。他の端末から端末Ａ３１に送信されたメールは、メール送受信部３０１で受信される。メールを音声で読み上げる場合には、音声合成部３０２がシステムにあらかじめ用意されたシステム辞書３０４とユーザが個々に設定するユーザ辞書３０５を使用して、音声波形を生成する。
【０００６】
ここで音声合成部３０２は、図１における単語解析部１１、構文解析部１４、韻律生成部２１、波形生成部２２、音声素片データ２３を含んでいる。生成された音声波形は、音声出力部３０３によって出力される。音声出力に読み誤りがある場合等については、ユーザが各々にユーザ辞書編集部３０６を使用することによって、ユーザ辞書３０５を更新する。
【０００７】
【発明が解決しようとする課題】
一般に、テキストデータの読み上げ精度は、上述した２つの単語辞書に格納されている語彙数に影響されるところが大きい。しかし、携帯端末等においては、軽量かつコンパクトであることが求められることから、記憶装置の小サイズ化に伴う記憶容量の削減が必須となる。したがって、テキスト読み上げシステムを携帯端末等に搭載する場合、単語辞書の語彙数が、ＰＣ向けソフトウェアや据え置き型の装置における単語辞書の語彙数よりも大きく削減されている場合が多い。そのため、テキストデータの読み上げ精度が、通常よりも低くなる傾向にあるという問題点があった。
【０００８】
例えば、人の姓や名は数十万種類あると考えられている。したがって、携帯端末等におけるシステム辞書の記憶装置の容量制限によって、頻度の高い数千種類の姓のみが格納されている場合には、当該システム辞書に格納されていない姓については誤って読み上げられる可能性が高くなる。
【０００９】
具体的には、「阿久津」という姓は、システム辞書に読みが「アクツ」であると格納されていれば正しく読み上げられるものの、システム辞書に「アクツ」という読みが格納されていなければ、単独の漢字の読みを組み合わせて「アヒサツ」と誤って読み上げられることになる。
【００１０】
一方、端末Ａ３１の使用者や知人に「阿久津」さんという人が実在する場合、端末Ａ３１に送信されるメールには、高頻度で「阿久津」という文字列が含まれることが多いものと考えられる。しかし、「阿久津」という姓は一般的に頻度の高い姓ではないことから、当該姓はシステム辞書ではなく、ユーザ辞書に登録して対応することになる。
【００１１】
ところが、ユーザ辞書の単語登録には、表記、読み、アクセント、品詞等の情報を登録する必要があり、特に携帯端末においては操作が複雑になることもあって、一般ユーザには登録作業がやや困難であるという問題点も残されている。
【００１２】
かかる問題点を解消するために、特開平９−１３５２６４号公報においては、テキスト変換サーバを用意し、電子メールのテキスト変換（言語処理）を全てサーバ側で行い、その結果である読み／韻律情報をユーザ端末に出力するシステムが開示されている。
【００１３】
しかし、かかるシステムにおいては、端末側に言語処理が用意されていないことから、ユーザがテキスト読み上げを行うごとにメールサーバにアクセスする必要があり、携帯端末としての利便性を損なうおそれが生じる。
【００１４】
あるいは、特開２０００−２０４１７号公報においては、メール発信者がユーザ辞書情報をメールに添付し、メールサーバが音声合成をする際に、システム辞書の他に添付されたユーザ辞書を使用する方法が開示されている。
【００１５】
しかし、かかる方法においては、メール発信者がユーザ辞書を編集する必要が生じるとともに、通信回線上をメール交換の都度ユーザ辞書が送受信されることになり、回線負荷が大きくなってしまうという問題点も残されていた。
【００１６】
本発明は、上記問題点を解決するために、携帯端末等の記憶容量に制限のある機器を使用する場合であっても、精度良くテキストを読み上げることができるテキスト読み上げシステム及び方法を提供することを目的とする。
【００１７】
【課題を解決するための手段】
上記目的を達成するために本発明にかかるテキスト読み上げシステムは、端末間のデータの送受信を管理するサーバと、サーバを利用する複数の端末がネットワークを介して接続されている環境におけるテキスト読み上げシステムであって、サーバが、大規模な単語辞書を使用して、所定の端末を宛先として送信されるテキストデータについて単語解析を行う単語解析部と、大規模な単語辞書には含まれているが宛先となる端末に用意されている単語辞書には含まれていない単語及び当該単語をどのように読み上げるのかに関する情報である読み情報を抽出し、宛先となる端末用のユーザ辞書として作成するユーザ辞書作成／編集部と、宛先となる端末用のユーザ辞書を端末へと配信するユーザ辞書配信部とを含み、端末において、配信されたサーバで作成された端末用のユーザ辞書と、事前に用意されている単語辞書を参照して、送信されたデータに含まれるテキストデータを音声に変換することを特徴とする。
【００１８】
かかる構成により、記憶容量に制限のある携帯端末等に、テキスト読み上げ精度を大容量システム辞書使用時と同等にする必要最小限のシステム辞書を構築することができ、記憶容量に制限のある場合であっても精度良くテキスト読み上げを実現することが可能となる。
【００１９】
また、本発明にかかるテキスト読み上げシステムは、サーバから端末に対して、端末用のユーザ辞書を定期的に送信することが好ましい。あるいは、端末からサーバに対して端末用のユーザ辞書の配信を請求し、サーバが請求に応じて端末用のユーザ辞書を配信することが好ましい。アプリケーションに応じて任意のタイミングでユーザ辞書を更新すれば足りるからである。
【００２０】
また、本発明にかかるテキスト読み上げシステムは、サーバが端末に対するデータ配信時にテキストデータと端末用のユーザ辞書を配信することが好ましい。特に電子メールシステム等においては、メールが送信されると同時に更新されたユーザ辞書を入手することができるからである。
【００２１】
また、本発明は、上記のようなテキスト読み上げシステムの機能をコンピュータの処理ステップとして実行するソフトウェアを特徴とするものであり、具体的には、端末間のデータの送受信を管理するサーバと、サーバを利用する複数の端末がネットワークを介して接続されている環境におけるテキスト読み上げ方法であって、サーバが、大規模な単語辞書を使用して、所定の端末を宛先として送信されるテキストデータについて単語解析を行う工程と、大規模な単語辞書には含まれているが宛先となる端末に用意されている単語辞書には含まれていない単語及び当該単語をどのように読み上げるのかに関する情報である読み情報を抽出し、宛先となる端末用のユーザ辞書として作成する工程と、宛先となる端末用のユーザ辞書を端末へと配信する工程とを含み、端末において、配信されたサーバで作成された端末用のユーザ辞書と、事前に用意されている単語辞書を参照して、送信されたデータに含まれるテキストデータを音声に変換するテキスト読み上げ方法並びにそのような工程を具現化するコンピュータ実行可能なプログラムであることを特徴とする。
【００２２】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、記憶容量に制限のある携帯端末等に、テキスト読み上げ精度を大容量システム辞書使用時と同等にする必要最小限のシステム辞書を構築することができ、記憶容量に制限のある場合であっても精度良くテキスト読み上げを行うことができるテキスト読み上げシステムを実現することが可能となる。
【００２３】
【発明の実施の形態】
以下、本発明の実施の形態にかかるテキスト読み上げシステムについて、図面を参照しながら説明する。図３は本発明の実施の形態にかかるテキスト読み上げシステムの構成図である。
【００２４】
図３に示すように、メールサーバ１０１を含むサーバ１１０に、単語解析部１０２、ユーザ辞書作成／編集部１０３、ユーザ辞書配信部１０４、システム辞書（大）１０５、システム辞書（小）１０６、端末Ａ〜端末Ｘ向けユーザ辞書１０７〜１０８を用意する。ここでシステム辞書（大）１０５は、大容量かつ豊富な語彙を有する単語辞書であり、システム辞書（小）１０６は、システム辞書（大）１０５のサブセットの語彙を有するやや小容量の単語辞書を意味する。したがって、システム辞書（小）１０６に登録されている語彙は全てシステム辞書（大）１０５に含まれているものである。そして、システム辞書（小）１０６は、端末Ａ〜端末Ｘに共通して用意されたシステム辞書（小）３０４と同一の内容となる。
【００２５】
通信ネットワーク２０１に接続された複数端末の一個である端末Ａ３１には、従来例と同様、メール送受信部３０１、音声合成部３０２、システム辞書（小）３０４、ユーザ辞書３０５、音声出力部３０３の他に、ユーザ辞書編集部３０６が用意されている。
【００２６】
メールサーバ１０１が、端末Ａ３１を宛先として送信されたメールを受信すると、単語解析部１０２において、システム辞書（大）１０５を参照しながら単語に分割される。そして、ユーザ辞書作成／編集部１０３は、当該メールに含まれる単語のうちシステム辞書（小）１０６に含まれない単語を、端末Ａ用ユーザ辞書１０７に格納する。端末Ｂ〜端末Ｘを宛先としてメールサーバに届いたメールに関しても同様の処理を行い、それぞれの端末用ユーザ辞書１０７〜１０８に、システム辞書（小）１０６に含まれていない単語を、それぞれ格納する。
【００２７】
こうして作成／編集された端末Ａ用ユーザ辞書の内容は、ユーザ辞書配信部１０４によって、メールと共に端末Ａに配信される。端末Ａ３１では、ユーザ辞書編集部３０６を使用して、端末のユーザ辞書３０５を更新することによって、メール配信時においては大容量のシステム辞書（大）１０５を使用するのと同様の精度で、メール内容（テキストデータ）の読み上げを行うことが可能となる。
【００２８】
また、ユーザ辞書の配信タイミングについても、様々なタイミングで配信することが考えられる。まず図４に示すように、定期的に配信を行うように制御することが考えられる。この場合、定期配信制御部１０９を設けることが必要となる。
【００２９】
定期配信制御部１０９は、各端末に対する各端末用ユーザ辞書１０７〜１０８の配信記録を有しており、前回配信されてから一定時間経過時において、定期的に各端末用ユーザ辞書１０７〜１０８を各端末へ送信するよう制御するものである。例えば端末Ａ用ユーザ辞書１０７は、ユーザ辞書配信部１０４によって一定期間ごとに端末Ａに配信されることになる。
【００３０】
また、時間的な間隔のみならず、例えば各端末用ユーザ辞書１０７〜１０８の更新量が一定値に到達した時点において、定期的に各端末用ユーザ辞書１０７〜１０８を各端末へ送信するよう制御するものであっても良い。
【００３１】
あるいは、図５に示すように、必要に応じてサーバ１１０にアクセスした時点で配信することも考えられる。この場合、端末Ａ３１にユーザ辞書請求部３０７を設けており、当該ユーザ辞書請求部３０７において、必要に応じてサーバ１１０にアクセスし、請求者が使用する端末用のユーザ辞書１０７〜１０８の配信を請求する。
【００３２】
一方、サーバ１１０は請求配信制御部１１１を有しており、端末Ａ３１からの請求に応じて、端末Ａ用ユーザ辞書１０７をユーザ辞書配信部１０４を介して端末Ａ３１に配信する。端末Ａ３１では、ユーザ辞書編集部３０６を使用して、端末におけるユーザ辞書３０５を更新することになる。
【００３３】
なお、それぞれの端末用ユーザ辞書１０７〜１０８に格納される内容としては、システム辞書（小）１０６に含まれていない単語に限定されるものではなく、例えばそれぞれの端末におけるユーザ辞書のレプリカをサーバ１１０上で保存しておき、当該辞書とマージした新たなユーザ辞書として格納することも考えられる。この場合には、配信されるのはそれぞれの端末用ユーザ辞書そのものとなる。
【００３４】
この場合、図６に示すように、サーバ１１０がユーザ辞書添付部１１２を有し、端末Ａ用ユーザ辞書１０７を端末Ａを宛先とするメールに添付して端末Ａ３１に送信する。端末Ａ３１では、ユーザ辞書編集部３０６において受信したメールから、添付された端末Ａ用ユーザ辞書１０７を抽出し、既存のユーザ辞書３０５と置換することになる。
【００３５】
次に、本発明の実施の形態にかかるテキスト読み上げシステムを実現するプログラムの処理の流れについてメールを例として説明する。図７に本発明の実施の形態にかかるテキスト読み上げシステムを実現するサーバ１１０におけるプログラムの処理の流れ図を示す。
【００３６】
図７に示すように、まず任意の端末を宛先とするメールを受信し（ステップＳ７０１）、システム辞書（大）１０５を参照しながらメール内容を単語単位に分割する（ステップＳ７０２）。
【００３７】
次に、システム辞書（小）１０６を照会して、当該メール内容から抽出された単語が含まれているか否かを確認する（ステップＳ７０３）。当該メール内容から抽出された単語がシステム辞書（小）１０６に含まれていない場合には、当該単語及びその読みを、当該メールの宛先である端末用のユーザ辞書１０７に格納する（ステップＳ７０４）。
【００３８】
そして、更新された当該メールの宛先である端末用ユーザ辞書の内容をメールと共に当該端末へ配信することになる（ステップＳ７０５）。当該端末用ユーザ辞書の内容を受信した端末では、当該内容に基づいて当該端末におけるユーザ辞書３０５を更新することによって、送信されてくるメールの読み上げを、サーバ１１０に存在するシステム辞書（大）１０５を用いる場合と同一の精度で行うことが可能となる。
【００３９】
以上のように本実施の形態によれば、記憶容量に制限のある携帯端末等に、テキスト読み上げ精度を大容量システム辞書使用時と同等にする必要最小限のシステム辞書を構築することができ、記憶容量に制限のある場合であっても精度良くテキスト読み上げを実現することが可能となる。
【００４０】
特に電子メールにおける読み上げシステムにおいては、読み精度を向上するためにユーザが行うべきユーザ辞書設定自体も簡便化することができ、精度の高いテキスト読み上げを容易に実現することが可能となる。
【００４１】
なお、本発明の実施の形態にかかるテキスト読み上げシステムを実現するプログラムは、図８に示すように、ＣＤ−ＲＯＭ８２−１やフレキシブルディスク８２−２等の可搬型記録媒体８２だけでなく、通信回線の先に備えられた他の記憶装置８１や、コンピュータ８３のハードディスクやＲＡＭ等の記録媒体８４のいずれに記憶されるものであっても良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００４２】
また、本発明の実施の形態にかかるテキスト読み上げシステムにより生成された個人別辞書等についても、図８に示すように、ＣＤ−ＲＯＭ８２−１やフレキシブルディスク８２−２等の可搬型記録媒体８２だけでなく、通信回線の先に備えられた他の記憶装置８１や、コンピュータ８３のハードディスクやＲＡＭ等の記録媒体８４のいずれに記憶されるものであっても良く、例えば本発明にかかるテキスト読み上げシステムを利用する際にコンピュータ８３により読み取られる。
【００４３】
（付記１）端末間のデータの送受信を管理するサーバと、前記サーバを利用する複数の端末がネットワークを介して接続されている環境におけるテキスト読み上げシステムであって、
前記サーバが、
大規模な単語辞書を使用して、所定の端末を宛先として送信されるテキストデータについて単語解析を行う単語解析部と、
前記大規模な単語辞書には含まれているが宛先となる端末に用意されている単語辞書には含まれていない単語及び前記単語をどのように読み上げるのかに関する情報である読み情報を抽出し、宛先となる前記端末用のユーザ辞書として作成するユーザ辞書作成／編集部と、
宛先となる前記端末用のユーザ辞書を前記端末へと配信するユーザ辞書配信部とを含み、
前記端末において、
配信された前記サーバで作成された前記端末用のユーザ辞書と、事前に用意されている単語辞書を参照して、送信されたデータに含まれるテキストデータを音声に変換することを特徴とするテキスト読み上げシステム。
【００４４】
（付記２）前記サーバから前記端末に対して、前記端末用のユーザ辞書を定期的に送信する付記１に記載のテキスト読み上げシステム。
【００４５】
（付記３）前記端末から前記サーバに対して前記端末用のユーザ辞書の配信を請求し、前記サーバが請求に応じて前記端末用のユーザ辞書を配信する付記１に記載のテキスト読み上げシステム。
【００４６】
（付記４）前記サーバが前記端末に対するデータ配信時に前記テキストデータと前記端末用のユーザ辞書を配信する付記１から３のいずれか一項に記載のテキスト読み上げシステム。
【００４７】
（付記５）端末間のデータの送受信を管理するサーバと、前記サーバを利用する複数の端末がネットワークを介して接続されている環境におけるテキスト読み上げ方法であって、
前記サーバが、
大規模な単語辞書を使用して、所定の端末を宛先として送信されるテキストデータについて単語解析を行う工程と、
前記大規模な単語辞書には含まれているが宛先となる端末に用意されている単語辞書には含まれていない単語及び前記単語をどのように読み上げるのかに関する情報である読み情報を抽出し、宛先となる前記端末用のユーザ辞書として作成する工程と、
宛先となる前記端末用のユーザ辞書を前記端末へと配信する工程とを含み、
前記端末において、
配信された前記サーバで作成された前記端末用のユーザ辞書と、事前に用意されている単語辞書を参照して、送信されたデータに含まれるテキストデータを音声に変換することを特徴とするテキスト読み上げ方法。
【００４８】
（付記６）端末間のデータの送受信を管理するサーバと、前記サーバを利用する複数の端末がネットワークを介して接続されている環境におけるテキスト読み上げ方法を具現化する前記サーバにおけるコンピュータ実行可能なプログラムであって、
大規模な単語辞書を使用して、所定の端末を宛先として送信されるテキストデータについて単語解析を行うステップと、
前記大規模な単語辞書には含まれているが宛先となる端末に用意されている単語辞書には含まれていない単語及び前記単語をどのように読み上げるのかに関する情報である読み情報を抽出し、宛先となる前記端末用のユーザ辞書として作成するステップと、
宛先となる前記端末用のユーザ辞書を前記端末へと配信するステップとを含むことを特徴とするコンピュータ実行可能なプログラム。
【００４９】
【発明の効果】
以上のように本発明にかかるテキスト読み上げシステムによれば、記憶容量に制限のある携帯端末等に、テキスト読み上げ精度を大容量システム辞書使用時と同等にする必要最小限のシステム辞書を構築することができ、記憶容量に制限のある場合であっても精度良くテキスト読み上げを実現することが可能となる。
【図面の簡単な説明】
【図１】従来のテキスト読み上げシステムのブロック構成図
【図２】従来のテキスト読み上げシステムを備えた電子メールシステムの構成例示図
【図３】本発明の実施の形態にかかるテキスト読み上げシステムの構成図
【図４】本発明の実施例にかかるテキスト読み上げシステムの構成図
【図５】本発明の他の実施例にかかるテキスト読み上げシステムの構成図
【図６】本発明の他の実施例にかかるテキスト読み上げシステムの構成図
【図７】本発明の実施の形態にかかるテキスト読み上げシステムを実現するサーバにおけるプログラムの処理の流れ図
【図８】コンピュータ環境の例示図
【符号の説明】
１言語処理部
２音響処理部
１１、１０２単語解析部
１２システム辞書
１３、３０５ユーザ辞書
１４構文解析部
２１韻律生成部
２２波形生成部
２３音声素片データ記憶部
３１端末Ａ
４１端末Ｂ
５１端末Ｘ
１０１メールサーバ
１０３ユーザ辞書作成／編集部
１０４ユーザ辞書配信部
１０５システム辞書（大）
１０６、３０４システム辞書（小）
１０７端末Ａ用ユーザ辞書
１０８端末Ｘ用ユーザ辞書
１０９定期配信制御部
１１０サーバ
１１１請求配信制御部
１１２ユーザ辞書添付部
２０１ネットワーク
３０１メール送受信部
３０２音声合成部
３０３音声出力部
３０６ユーザ辞書編集部
３０７ユーザ辞書請求部
８１回線先の記憶装置
８２ＣＤ−ＲＯＭやフレキシブルディスク等の可搬型記録媒体
８２−１ＣＤ−ＲＯＭ
８２−２フレキシブルディスク
８３コンピュータ
８４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a text-to-speech system and method that enable accurate text-to-speech even for portable terminals and the like with limited storage capacity.
[0002]
[Prior art]
With the rapid progress of the computer environment in recent years, the development of applications using speech synthesis technology with a heavy processing load has been diversified. Recently, it has been applied to mobile terminals such as mobile phones and PDAs (Personal Digital Assistants) in addition to conventional application fields such as car navigation and telephone answering systems. It is used. This application is a great help for visually impaired and elderly people when using these devices, and also allows healthy people to know the content of the text without taking their eyes off the screen, especially hands-free. However, it is highly convenient in the situation where is required.
[0003]
FIG. 1 is a block diagram of a conventional general text-to-speech system. In FIG. 1, when text data is input, the word analysis unit 11 in the language processing unit 1 includes a system dictionary 12 prepared in common by the system and a user dictionary 13 set individually by the text creator. Split a text string into words using two word dictionaries. Then, the syntax analysis unit 14 analyzes the dependency information of each word, and generates information such as how to pose.
[0004]
Next, the prosody generation unit 21 in the sound processing unit 2 sets the time length and pitch frequency pattern of each sound in the text, and the waveform generation unit 22 stores the speech elements stored in the speech unit data storage unit 23. A speech waveform is generated by connecting the pieces of data according to the time length, pitch frequency pattern, etc. of each sound set by the prosody generation unit 21.
[0005]
As an application example, FIG. 2 shows a configuration example of an electronic mail system equipped with a text-to-speech system. In FIG. 2, a mail server 101 and a plurality of terminals A31 to X51 are connected on a communication network 201. Mail transmitted from another terminal to the terminal A31 is received by the mail transmitting / receiving unit 301. When the mail is read out by voice, the voice synthesizer 302 generates a voice waveform using the system dictionary 304 prepared in advance in the system and the user dictionary 305 set individually by the user.
[0006]
Here, the speech synthesis unit 302 includes the word analysis unit 11, the syntax analysis unit 14, the prosody generation unit 21, the waveform generation unit 22, and the speech segment data 23 in FIG. 1. The generated voice waveform is output by the voice output unit 303. When there is a reading error in the audio output, the user dictionary 305 is updated by the user using the user dictionary editing unit 306 for each.
[0007]
[Problems to be solved by the invention]
In general, the reading accuracy of text data is greatly affected by the number of vocabularies stored in the two word dictionaries described above. However, since portable terminals and the like are required to be lightweight and compact, it is indispensable to reduce the storage capacity accompanying the reduction in the size of the storage device. Therefore, when the text-to-speech system is installed in a portable terminal or the like, the number of vocabularies in the word dictionary is often greatly reduced from the number of vocabularies in the word dictionary in PC software or stationary devices. For this reason, there is a problem that the reading accuracy of text data tends to be lower than usual.
[0008]
For example, there are thought to be hundreds of thousands of people's first and last names. Therefore, when only thousands of frequently used surnames are stored due to the capacity limit of the storage device of the system dictionary in a portable terminal or the like, the surnames not stored in the system dictionary can be read out by mistake. Increases nature.
[0009]
Specifically, the surname “Akutsu” is correctly read out if the reading is “Actu” in the system dictionary, but if the reading “Actu” is not stored in the system dictionary, A combination of readings of kanji will be mistakenly read as “Ahisatsu”.
[0010]
On the other hand, when the user or acquaintance of the terminal A31 actually has a person named “Akutsu”, it is considered that the mail transmitted to the terminal A31 frequently includes the character string “Akutsu”. . However, since the surname “Akutsu” is generally not a frequent surname, the surname is registered in the user dictionary instead of the system dictionary.
[0011]
However, it is necessary to register information such as notation, reading, accent, part of speech, etc. for word registration in the user dictionary, and the operation is complicated especially on a mobile terminal. There is also a problem that it is difficult.
[0012]
In order to solve this problem, Japanese Patent Application Laid-Open No. 9-135264 prepares a text conversion server, performs all text conversion (language processing) of e-mail on the server side, and reads / prosodic information as a result thereof. Is output to the user terminal.
[0013]
However, in such a system, since language processing is not prepared on the terminal side, it is necessary to access the mail server every time the user reads out text, and the convenience as a portable terminal may be impaired.
[0014]
Alternatively, in Japanese Patent Laid-Open No. 2000-20417, a method of using a user dictionary attached in addition to a system dictionary when a mail sender attaches user dictionary information to a mail and a mail server performs speech synthesis. It is disclosed.
[0015]
However, in this method, it is necessary for the mail sender to edit the user dictionary, and the user dictionary is transmitted / received every time the mail is exchanged on the communication line, which increases the load on the line. It was left.
[0016]
In order to solve the above-described problems, the present invention provides a text-to-speech system and method that can accurately read a text even when a device having a limited storage capacity such as a portable terminal is used. With the goal.
[0017]
[Means for Solving the Problems]
In order to achieve the above object, a text-to-speech system according to the present invention is a text-to-speech system in an environment in which a server that manages transmission and reception of data between terminals and a plurality of terminals that use the server are connected via a network. A server uses a large word dictionary to perform word analysis on text data transmitted to a predetermined terminal as a destination, and a destination included in the large word dictionary Create a user dictionary that extracts words that are not included in the word dictionary prepared for the terminal and the reading information that is information on how to read the word and creates it as a user dictionary for the destination terminal / Editing unit and a user dictionary distribution unit that distributes the user dictionary for the terminal that is the destination to the terminal, and is distributed at the terminal A user dictionary for terminals that are created on the server, with reference to the word dictionary that is prepared in advance, and converting the speech text data included in the transmitted data.
[0018]
With this configuration, it is possible to build the minimum necessary system dictionary with the same text reading accuracy as when using a large-capacity system dictionary on mobile terminals with limited storage capacity, and when the storage capacity is limited. Even if it exists, it becomes possible to realize text reading with high accuracy.
[0019]
The text-to-speech system according to the present invention preferably periodically transmits a user dictionary for the terminal from the server to the terminal. Alternatively, it is preferable that the terminal requests the server to distribute the user dictionary for the terminal, and the server distributes the user dictionary for the terminal in response to the request. This is because it is sufficient to update the user dictionary at an arbitrary timing according to the application.
[0020]
In the text-to-speech system according to the present invention, it is preferable that the server distributes the text data and the user dictionary for the terminal when data is distributed to the terminal. This is because, in particular, in an electronic mail system or the like, an updated user dictionary can be obtained at the same time as mail is transmitted.
[0021]
Further, the present invention is characterized by software that executes the functions of the text-to-speech system as described above as a processing step of a computer. Specifically, a server that manages transmission and reception of data between terminals, and a server A text-to-speech method in an environment in which a plurality of terminals using a network are connected via a network, in which a server uses a large-scale word dictionary and uses words for text data transmitted to a predetermined terminal as a destination Reading that is information about the analysis step, words that are included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal, and how to read the word. Extracting information and creating a user dictionary for the destination terminal and distributing the user dictionary for the destination terminal to the terminal The text data included in the transmitted data is converted into speech by referring to the terminal user dictionary created by the distributed server and the word dictionary prepared in advance at the terminal. A text-to-speech method and a computer-executable program that embodies such a process.
[0022]
With this configuration, by loading and executing the program on a computer, a system dictionary with the minimum necessary to make the text-to-speech accuracy equivalent to that when using a large-capacity system dictionary is built on portable terminals with limited storage capacity. Therefore, it is possible to realize a text-to-speech system that can accurately perform text-to-speech even when the storage capacity is limited.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a text-to-speech system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 3 is a configuration diagram of the text-to-speech system according to the embodiment of the present invention.
[0024]
As shown in FIG. 3, a server 110 including a mail server 101 includes a word analysis unit 102, a user dictionary creation / editing unit 103, a user dictionary distribution unit 104, a system dictionary (large) 105, a system dictionary (small) 106, a terminal User dictionaries 107 to 108 for A to terminal X are prepared. Here, the system dictionary (large) 105 is a word dictionary having a large capacity and an abundant vocabulary, and the system dictionary (small) 106 is a slightly small capacity word dictionary having a vocabulary of a subset of the system dictionary (large) 105. means. Accordingly, all vocabularies registered in the system dictionary (small) 106 are included in the system dictionary (large) 105. The system dictionary (small) 106 has the same contents as the system dictionary (small) 304 prepared in common for the terminals A to X.
[0025]
The terminal A31, which is one of a plurality of terminals connected to the communication network 201, includes a mail transmission / reception unit 301, a voice synthesis unit 302, a system dictionary (small) 304, a user dictionary 305, a voice output unit 303, as in the conventional example. In addition, a user dictionary editing unit 306 is prepared.
[0026]
When the mail server 101 receives a mail transmitted to the terminal A31 as a destination, the word analysis unit 102 divides the mail into words while referring to the system dictionary (large) 105. Then, the user dictionary creation / editing unit 103 stores, in the terminal A user dictionary 107, words that are not included in the system dictionary (small) 106 among the words included in the mail. The same processing is performed for mails that arrive at the mail server with terminal B to terminal X as destinations, and words that are not included in system dictionary (small) 106 are stored in user dictionary 107 to 108 for each terminal. .
[0027]
The contents of the user dictionary for terminal A created / edited in this way are distributed to terminal A together with the mail by the user dictionary distribution unit 104. In the terminal A31, the user dictionary editing unit 306 is used to update the user dictionary 305 of the terminal, so that at the time of mail distribution, the mail can be processed with the same accuracy as that of the large-capacity system dictionary (large) 105. The contents (text data) can be read out.
[0028]
Also, it is conceivable that the user dictionary is distributed at various timings. First, as shown in FIG. 4, it is conceivable to perform control so that distribution is performed periodically. In this case, it is necessary to provide the regular distribution control unit 109.
[0029]
The regular distribution control unit 109 has distribution records of the terminal user dictionaries 107 to 108 for each terminal, and periodically stores the terminal user dictionaries 107 to 108 when a predetermined time has elapsed since the last distribution. It controls to transmit to each terminal. For example, the user dictionary 107 for the terminal A is distributed to the terminal A by the user dictionary distribution unit 104 at regular intervals.
[0030]
In addition to the time interval, for example, when the update amount of each terminal user dictionary 107 to 108 reaches a certain value, the terminal user dictionary 107 to 108 is periodically transmitted to each terminal. It may be what you do.
[0031]
Alternatively, as shown in FIG. 5, it is possible to distribute the content when accessing the server 110 as necessary. In this case, the user dictionary requesting unit 307 is provided in the terminal A31, and the user dictionary requesting unit 307 accesses the server 110 as necessary, and distributes the user dictionary 107 to 108 for the terminal used by the claimant. Claim.
[0032]
On the other hand, the server 110 has a billing distribution control unit 111 and distributes the terminal A user dictionary 107 to the terminal A31 via the user dictionary distribution unit 104 in response to a request from the terminal A31. In the terminal A31, the user dictionary editing unit 306 is used to update the user dictionary 305 in the terminal.
[0033]
The contents stored in each of the terminal user dictionaries 107 to 108 are not limited to words that are not included in the system dictionary (small) 106. For example, a replica of the user dictionary at each terminal is stored as a server. It is also conceivable to store it on 110 and store it as a new user dictionary merged with the dictionary. In this case, the terminal user dictionary itself is distributed.
[0034]
In this case, as shown in FIG. 6, the server 110 has a user dictionary attachment unit 112 and attaches the terminal A user dictionary 107 to a mail addressed to the terminal A and transmits it to the terminal A31. In the terminal A31, the attached user dictionary 107 for the terminal A is extracted from the mail received by the user dictionary editing unit 306 and replaced with the existing user dictionary 305.
[0035]
Next, the processing flow of the program that implements the text-to-speech system according to the embodiment of the present invention will be described using mail as an example. FIG. 7 shows a flowchart of processing of a program in the server 110 that realizes the text-to-speech system according to the embodiment of the present invention.
[0036]
As shown in FIG. 7, first, a mail addressed to an arbitrary terminal is received (step S701), and the mail content is divided into words while referring to the system dictionary (large) 105 (step S702).
[0037]
Next, the system dictionary (small) 106 is inquired to check whether or not a word extracted from the mail content is included (step S703). If the word extracted from the mail content is not included in the system dictionary (small) 106, the word and its reading are stored in the user dictionary 107 for the terminal that is the destination of the mail (step S704). .
[0038]
Then, the contents of the updated user dictionary for the terminal that is the destination of the mail are distributed to the terminal together with the mail (step S705). In the terminal that has received the contents of the terminal user dictionary, the user dictionary 305 in the terminal is updated based on the contents, thereby reading out the read mail to the system dictionary (large) 105 existing in the server 110. It is possible to perform with the same accuracy as when using.
[0039]
As described above, according to the present embodiment, it is possible to construct a minimum required system dictionary that makes text reading accuracy equivalent to that when using a large-capacity system dictionary on a portable terminal having a limited storage capacity, Even when the storage capacity is limited, it is possible to realize text reading with high accuracy.
[0040]
Particularly in a reading system for e-mail, user dictionary setting itself to be performed by the user in order to improve reading accuracy can be simplified, and high-precision text reading can be easily realized.
[0041]
As shown in FIG. 8, the program for realizing the text-to-speech system according to the embodiment of the present invention is not only a portable recording medium 82 such as a CD-ROM 82-1 and a flexible disk 82-2, but also a communication line. It may be stored in any of the other storage device 81 provided in front of the recording medium or a recording medium 84 such as a hard disk or a RAM of the computer 83. When the program is executed, the program is loaded and stored in the main memory. Executed.
[0042]
Further, as for the personal dictionary generated by the text-to-speech system according to the embodiment of the present invention, only a portable recording medium 82 such as a CD-ROM 82-1 or a flexible disk 82-2 is used, as shown in FIG. Instead, it may be stored in any of the other storage device 81 provided at the end of the communication line and the recording medium 84 such as a hard disk or RAM of the computer 83. For example, the text-to-speech system according to the present invention Is read by the computer 83 when using.
[0043]
(Supplementary Note 1) A text-to-speech system in an environment in which a server that manages transmission and reception of data between terminals and a plurality of terminals that use the server are connected via a network,
The server is
A word analysis unit that performs word analysis on text data transmitted to a predetermined terminal using a large word dictionary;
Extracting words that are included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and reading information that is information on how to read the word; A user dictionary creation / editing unit for creating a user dictionary for the terminal as a destination;
A user dictionary distribution unit that distributes a user dictionary for the terminal as a destination to the terminal,
In the terminal,
The text which is converted into speech by converting text data included in the transmitted data with reference to the user dictionary for the terminal created by the distributed server and a word dictionary prepared in advance Reading system.
[0044]
(Supplementary note 2) The text-to-speech system according to supplementary note 1, in which a user dictionary for the terminal is periodically transmitted from the server to the terminal.
[0045]
(Supplementary note 3) The text-to-speech system according to supplementary note 1, wherein the terminal requests the server to distribute the user dictionary for the terminal, and the server distributes the user dictionary for the terminal in response to the request.
[0046]
(Supplementary note 4) The text-to-speech system according to any one of supplementary notes 1 to 3, wherein the server distributes the text data and the user dictionary for the terminal during data distribution to the terminal.
[0047]
(Supplementary note 5) A text-to-speech method in an environment in which a server that manages transmission and reception of data between terminals and a plurality of terminals that use the server are connected via a network,
The server is
Performing word analysis on text data transmitted to a predetermined terminal using a large word dictionary;
Extracting words that are included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and reading information that is information on how to read the word; Creating a user dictionary for the terminal as a destination;
Delivering the user dictionary for the terminal as a destination to the terminal,
In the terminal,
The text which is converted into speech by converting text data included in the transmitted data with reference to the user dictionary for the terminal created by the distributed server and a word dictionary prepared in advance Reading method.
[0048]
(Supplementary Note 6) A computer-executable program in the server for realizing a text-to-speech method in an environment in which a server for managing transmission / reception of data between terminals and a plurality of terminals using the server are connected via a network Because
Performing word analysis on text data sent to a predetermined terminal using a large word dictionary;
Extracting words that are included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and reading information that is information on how to read the word; Creating a user dictionary for the terminal as a destination;
Delivering a user dictionary for the terminal as a destination to the terminal.
[0049]
【The invention's effect】
As described above, according to the text-to-speech system according to the present invention, it is possible to construct a minimum necessary system dictionary that makes the text-to-speech accuracy equivalent to that when a large-capacity system dictionary is used on a portable terminal having a limited storage capacity. Thus, even when the storage capacity is limited, it is possible to realize text reading with high accuracy.
[Brief description of the drawings]
FIG. 1 is a block diagram of a conventional text-to-speech system. FIG. 2 is a configuration diagram of an e-mail system having a conventional text-to-speech system. FIG. 3 is a block diagram of a text-to-speech system according to an embodiment of the invention. FIG. 4 is a block diagram of a text-to-speech system according to an embodiment of the present invention. FIG. 5 is a block diagram of a text-to-speech system according to another embodiment of the present invention. FIG. 7 is a configuration diagram of a reading system. FIG. 7 is a flowchart of processing of a program in a server that realizes the text reading system according to the embodiment of the present invention.
DESCRIPTION OF SYMBOLS 1 Language processing part 2 Acoustic processing part 11,102 Word analysis part 12 System dictionary 13,305 User dictionary 14 Syntax analysis part 21 Prosody generation part 22 Waveform generation part 23 Speech unit data storage part 31 Terminal A
41 Terminal B
51 Terminal X
101 mail server 103 user dictionary creation / editing unit 104 user dictionary distribution unit 105 system dictionary (large)
106, 304 System dictionary (small)
107 User Dictionary for Terminal A 108 User Dictionary for Terminal X 109 Regular Distribution Control Unit 110 Server 111 Request Distribution Control Unit 112 User Dictionary Attachment Unit 201 Network 301 Mail Transmission / Reception Unit 302 Speech Synthesizer 303 Speech Output Unit 306 User Dictionary Editing Unit 307 User Dictionary requesting unit 81 Line destination storage device 82 Portable recording medium 82-1 such as CD-ROM or flexible disk CD-ROM
82-2 Flexible disk 83 Computer 84 Recording medium such as RAM / hard disk on computer

Claims

A text-to-speech system in an environment in which a server that manages transmission and reception of data between terminals and a plurality of terminals that use the server are connected via a network,
The server is
A large word dictionary,
A small-sized word dictionary with the same contents as the word dictionary common to each terminal,
A user dictionary for each terminal,
A word analysis unit that performs word analysis on text data transmitted to a predetermined terminal using the large-scale word dictionary;
Referring to the small-capacity word dictionary , how to read a word that is included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and the word A user dictionary creation / editing unit that stores reading information, which is information related to the above, in the user dictionary for the terminal as a destination;
A user dictionary distribution unit that distributes a user dictionary for the terminal as a destination to the terminal,
In the terminal,
A text-to-speech system that converts text data included in transmitted data into speech with reference to a user dictionary for the terminal created by the distributed server and a word dictionary common to the terminals .

The text-to-speech system according to claim 1, wherein the terminal requests the server to distribute the user dictionary for the terminal, and the server distributes the user dictionary for the terminal in response to the request.

The text-to-speech system according to claim 1 or 2, wherein the server distributes the text data and the user dictionary for the terminal when data is distributed to the terminal.

A text-to-speech method in an environment in which a server that manages transmission and reception of data between terminals and a plurality of terminals that use the server are connected via a network,
The server comprising a large-scale word dictionary, a small-capacity word dictionary having the same content as a word dictionary common to each terminal, and a user dictionary for each terminal ,
Using said large word dictionary, and performing word analysis for the text data to be transmitted to a predetermined terminal as a destination,
Referring to the small-capacity word dictionary , how to read a word that is included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and the word Storing the reading information, which is information related to this , in the user dictionary for the terminal as the destination;
Delivering the user dictionary for the terminal as a destination to the terminal,
In the terminal,
A text-to-speech method characterized in that text data included in transmitted data is converted to speech by referring to the terminal user dictionary created by the distributed server and the word dictionary common to the terminals .

A large-scale word dictionary, a small-capacity word dictionary having the same contents as a word dictionary common to each terminal, a user dictionary for each terminal, a server for managing transmission and reception of data between terminals, and the server A computer-executable program in the server that embodies a text-to-speech method in an environment where a plurality of terminals to be used are connected via a network,
Using said large word dictionary, and performing word analysis for the text data to be transmitted to a predetermined terminal as a destination,
Referring to the small-capacity word dictionary , how to read a word that is included in the large-scale word dictionary but not included in the word dictionary prepared in the destination terminal and the word Storing the reading information, which is information relating to this , in the user dictionary for the terminal as the destination;
Program, characterized in that to perform the steps on a computer that distributes user dictionary for the terminal as the destination to the terminal.