JP2004129174A

JP2004129174A - Information communication instrument, information communication program, and recording medium

Info

Publication number: JP2004129174A
Application number: JP2002320012A
Authority: JP
Inventors: Yoshinaga Kato; 加藤　喜永; Atsushi Ito; 伊藤　篤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-08-06
Filing date: 2002-11-01
Publication date: 2004-04-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information communication instrument between two persons which does not disturb to ambient by making conversation by either voice or characters according to environment during the communication time. <P>SOLUTION: The information communication instrument 100 is a telephone device of carrying type such as a portable telephone, and is composed of a text input means 1 which inputs a text by key of the portable telephone, etc., a text voice synthesis means 2 which converts text information to voice data, a voice communication means 3 which manages communication with outside, a voice output means 4 which outputs the voice data to a speaker or a earphone connecting terminal, and an earphone 5. An information communication instrument 110 is consisted of a voice communication means 6 which manages communication with outside, a voice output means 8 which outputs the voice data to the speaker or the earphone connecting terminal, and a voice input means 7 such as the microphone, etc. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、情報通信装置に関し、さらに詳しくは、テキスト情報と音声情報を相互に変換する情報通信装置に関するものである。
【０００２】
【従来の技術】
近年の携帯電話の普及にはめざましいものがあり、その保有者数は年々増大の一途をたどっている。その爆発的な普及の理由として、基地局との間で通信可能な範囲であれば、どこでも良好な通信状態で使用できることが挙げられる。しかし、その普及に伴って、携帯電話の使用環境によっては通話に伴う声や電波が他人へ迷惑を及ぼすことも指摘されている。例えば、混雑した電車やバス或いは多数の人が一同に会する場所などにおける通話は、他人に不快感を与えるばかりでなく、ペースメーカ等の医療用機器を携帯している人に悪影響を与える可能性もある。しかし、このような問題があるにも拘わらず、依然として車内等で通話する使用者が存在するのは、移動時の通話がコミュニケーション手段として非常に大きな利便性を提供するからである。この問題を解決する一つの方法として、音声通話の代わりに電子メールを使用することが考えられるが、電子メールの場合、相手側も携帯電話あるいはＰＣなどを使用できれば、電子メールを扱えるので問題ないが、家庭用の固定電話など音声でしか情報をやりとりできない機器から着信した場合、車内の携帯電話利用者は、やむを得ず通話するか、後から着信情報を調べてかけ直さなければならないという不都合があった。
この問題を解決するために、音声通信しか扱えない固定電話などでも電子メールを利用できるシステムが、特開平１０−３０３９６９号公報や、特開平１１−１１２５５０号公報に記載されている。この公報には、テキスト音声認識を用いて音声をテキストに変換して、携帯電話などの画面にテキストを表示する技術について開示されており、また、携帯電話からの電子メールをテキスト音声合成により音声データに変換し、固定電話に送信する技術が開示されている。
【特許文献１】特開平１０−３０３９６９号公報
【特許文献２】特開平１１−１１２５５０号公報
【０００３】
【発明が解決しようとする課題】
このように、前記特開平１０−３０３９６９号公報や、特開平１１−１１２５５０号公報による発明では、音声を認識してテキスト情報に変換してＰＣに電子メールを送ったり、逆に、テキスト情報を音声合成音に変換して通話することができる。しかしながら、電子メールは、音声会話のような即時性をもつコミュニケーション形態には向いていない。つまり、送信したメールは、受信者が自分のメールを読もうとするまではメールサーバに蓄積されたままの状態であり、即座に反応することができない。従って、親密なコミュニケーションを求めるユーザにとって、電子メールのやりとりは、音声通話と同等程度のコミュニケーションができないという物足りなさがあり、利用者にとって十分に満足できるニ者間情報通信装置を提供できていないのが現状である。
本発明は、かかる課題に鑑み、通信時の環境に応じて、音声あるいは文字の何れかで会話を行うことにより周囲に迷惑をかけない２者間情報通信装置を提供することを目的とする。
【０００４】
【課題を解決するための手段】
本発明はかかる課題を解決するために、請求項１は、音声若しくはテキスト情報の授受を有線若しくは無線により行う情報通信装置において、入力されたテキスト情報に基づいて音声データに変換するテキスト音声合成手段と、音声入力モードとテキスト入力モードを切り替える入力切り替え手段と、を更に備え、情報の送信に際して、前記入力切り替え手段によりテキスト入力モードに切り替えた後、音声に代わってテキスト情報を入力することにより、前記テキスト情報を前記テキスト音声合成手段により音声データに変換して送信することを特徴とする。
例えば、情報通信装置が携帯電話である場合、電車の車内等で、携帯電話に着信があった際、音声により通話をすることができない場合がある。そのような場合、入力切り替え手段により携帯電話をテキスト入力モードに切替え、相手の音声をイヤホンで聞きながら、キーによりテキスト情報を入力する。その情報は携帯電話のテキスト音声合成手段により音声データに変換され、相手に送信される。
かかる発明によれば、携帯電話にテキスト音声合成手段を搭載することにより、音声の代わりにキーからテキスト情報を入力し、その情報を音声に変換して送信するので、周囲に迷惑をかけることなく、通話が音声により可能となる。
【０００５】
請求項２は、音声若しくはテキスト情報の授受を有線若しくは無線により行う情報通信装置において、入力されたテキスト情報に基づいて音声データに変換するテキスト音声合成手段と、受信した音声データに基づいてテキスト情報に変換する音声認識手段と、音声入力モードとテキスト入力モードを切り替える入力切り替え手段と、を更に備え、情報の送信に際して、前記入力切り替え手段によりテキスト入力モードに切り替えた後、音声に代わってテキスト情報を入力することにより、前記テキスト情報を前記テキスト音声合成手段により音声データに変換して送信し、情報の受信に際しては、受信した音声データを前記音声認識手段によりテキスト情報に変換して表示することを特徴とする。
前記請求項１の発明は、受信は通常の音声により行い、送信のみテキスト情報を音声に変換して通話するものである。しかし、この場合、返事はキーから入力するため相手の音声をイヤホンで聞かなければならず面倒であり、しかもイヤホンを常備しなければならない。これを解消するために、請求項２の発明では、音声認識手段を更に備えて、受信時の相手の音声をテキスト情報に変換して表示するものである。
かかる発明によれば、送信と受信をテキスト情報により行うため、イヤホンを必要とせず、しかも周囲に迷惑をかけることなく情報の授受が可能となる。
請求項３は、音声若しくはテキスト情報の授受を有線若しくは無線により行う情報通信装置において、受信したテキスト情報に基づいて音声データに変換するテキスト音声合成手段と、入力した音声データに基づいてテキスト情報に変換する音声認識手段と、音声入力モードとテキスト入力モードを切り替える入力切り替え手段と、を更に備え、情報の送信に際して、前記入力切り替え手段により音声入力モードに切り替えた後、音声データを入力することにより、前記音声データを前記音声認識手段によりテキスト情報に変換して送信し、情報の受信に際しては、受信したテキスト情報を前記テキスト音声合成手段により音声データに変換することを特徴とする。
例えば、車内の携帯電話とテキスト機能を持たない固定電話間、或いは車内の携帯電話と周囲に迷惑がかからない環境にある携帯電話間で情報の授受を行う場合、車内の携帯電話からはテキスト情報でのみ送受信が行えない状況にあるので、固定電話あるいは他の携帯電話側にテキスト音声合成手段と音声認識手段を備え、受信の際はテキスト音声合成手段によりテキスト情報を音声に変換し、送信の際は音声データを音声認識手段によりテキスト情報に変換する。
かかる発明によれば、テキスト情報を音声に、音声データをテキスト情報に変換するので、テキスト機能を持たない固定電話でも全てテキスト情報のみで情報の授受を行う車内の携帯電話との通信が可能となる。
【０００６】
請求項４は、前記音声認識手段により受信した音声データをテキスト情報に変換して表示する際、送信したテキスト情報と受信したテキスト情報が同一表示画面の上下に分離して表示されることを特徴とする。
携帯電話の場合、必ず液晶等の表示部が備わっている。従って、メールなどのテキスト情報を送る場合その表示画面を見ながらキー操作を行うのは公知の技術である。本発明は送信のテキスト情報と、相手から受信したテキスト情報を、表示部の上下に分離して表示することにより、見やすい表示部を実現するものである。
かかる発明によれば、送受信のテキスト情報が同一表示画面上の上下に分離して表示されるので、送信情報と受信情報が見やすくなり、操作の誤りを減少することができる。
請求項５は、前記音声認識手段により受信した音声データをテキスト情報に変換して表示する際、送信したテキスト情報と受信したテキスト情報が時系列に同一表示画面に表示されることを特徴とする。
インターネットにチャット（ｃｈａｔ）という通信モードがある。これは、メールの送受信のやり取りを、リアルタイムに時系列にテキスト表示して、あたかも、電話で会話しているように通信するものである。本発明はこれと同じように、送信したテキスト情報と変換された受信テキスト情報を時系列に表示するものである。
かかる発明によれば、送受信のテキスト情報が、時系列に表示部に表示されるので、情報の流れが容易にわかり、電話で会話しているように通信することができる。
【０００７】
請求項６は、前記受信したテキスト情報の先頭部位に該テキスト情報を送信した送信元を特定する文字若しくは符号を表示することを特徴とする。
前記請求項５の場合、相手の受信テキスト情報と送信テキスト情報の区別ができないので、時系列に表示されていても、どちらが送信側か受信側かが判然としない。そこで、本発明では、相手の受信テキスト情報の先頭にそれを区別する文字、符号等を同時に表示するものである。例えば、相手の名前が電話番号、或いはメールアドレスと同時に登録されていれば、名前を表示することが可能である。
かかる発明によれば、受信相手のテキスト情報に識別のための文字、符号等を付加するので、受信と送信テキスト情報の区別が明確になり、通信の効率を高めることができる。
請求項７は、所定の特殊キーの組み合わせと、該組み合わせに対応付けられたテキスト情報を記憶するテキストデータベースを更に備え、前記テキスト入力モード時に前記特殊キーが入力された場合、該特殊キーに対応するテキスト情報を前記テキストデータベースにより検索し、該検索結果を前記テキスト音声合成手段に出力するか、若しくは前記テキスト情報をそのまま送信することを特徴とする。
テキスト情報で通信を行う場合、最大の欠点は、会話と異なりリアルタイム性が劣ることである。これは、入力手段がキーにより行うためである。そこで、本発明では、この課題を少しでも解消するために、キー入力の負荷を減らす目的で、通常あまり使われない特殊キーに会話で頻繁に使用される「うなずく」「感嘆詞」「返事」等のテキスト情報を対応付けてデータベースに登録しておく。そして、その特殊キーが入力されるとデータベースから登録テキスト情報を検索して出力するものである。
かかる発明によれば、特殊キーを頻繁に使用されるテキスト情報と対応付けてデータベースに登録しておくので、テキスト情報による通信のリアルタイム性を高めることができる。
【０００８】
請求項８は、所定の特殊キーの組み合わせと、該組み合わせに対応付けられた音声データを記憶する音声データベースを更に備え、前記テキスト入力モード時に前記特殊キーが入力された場合、該特殊キーに対応する音声データを前記音声データベースに基づいて検索し、該検索結果を送信することを特徴とする。
前記請求項７では、特殊キーとテキスト情報を対応付けてデータベースに登録したが、本発明では、特殊キーと音声データを対応付けてデータベースに登録しておき、特殊キーが入力されるとデータベースから登録音声データを検索して出力するものである。
かかる発明によれば、特殊キーを頻繁に使用される音声データと対応付けてデータベースに登録しておくので、テキスト情報による通信のリアルタイム性を高めることができる。
請求項９は、前記テキスト音声合成手段は、複数の異なる声種が個別に登録されている音素片辞書と、性別が判断可能な情報により構成される送信者情報に基づいて前記音素片辞書から所定の音素片辞書を選択する送信者特定手段と、入力されたテキストに対応する音声を合成する音声合成エンジンと、を備え、前記テキスト入力モード時にテキスト情報が入力された場合、前記音声合成エンジンは前記送信者特定手段により選択された音素片辞書を用いて、前記テキスト情報を音声合成することを特徴とする。
テキスト情報から音声合成して音声として相手に送信する場合、音声合成の声の種類は一般的には機械的な合成音である。しかし、この合成音は場合によっては必要であるが、限られた仲間や親しい友人間の通信では味気ないものである。そこで、本発明では異なる種類の音種、例えば、男性、女性、機械的な合成音、或いは実際に本人の声等を音素片辞書と呼ばれるデータベースに登録しておき、送信者情報により音素片辞書を選択して、テキスト情報をこの選択された音種で変換して出力するものである。
かかる発明によれば、テキスト情報が選択された音種により音声合成されて出力されるので、さらに親密なコミュニケーションを実現することができる。
【０００９】
請求項１０は、前記送信者特定手段は、送信先の性別に対応した音素片辞書を選択することを特徴とする。
音素片辞書から特定の音素を選択するのは、送信者特定手段である。この送信者特定手段に対して指令を与えるのが、送信者情報である。送信者情報は主に送信先の性別を指定するものであり、送信者が男性であれば男性の音素を選択する。
かかる発明によれば、送信者の性別に応じて音素を選択するので、通信に自然の雰囲気を創り出すことができる。
請求項１１は、前記入力切り替え手段は、通信が終了した時点で、前記音声入力モードに自動的に切り替わることを特徴とする。
本発明による通信装置には通常の音声入力モードと、テキスト入力モードがある。例えば、車内で相手から携帯電話に受信があった場合、音声による通信が不可能な場合、この入力切り替え手段によりテキスト入力モードに切替えて、通信を行うことができる。従って、通信が終了した時点で、音声入力モードに自動的に戻っておく必要がある。
かかる発明によれば、通信が終了した時点で、音声入力モードに自動的に戻るので、受信は必ず通常の音声モードで通話を行うことができる。
請求項１２は、コンピュータを請求項１乃至１１の何れか一項に記載の情報通信装置として機能させることを特徴とする。
コンピュータにより本発明の機能を実現するためには、そのコンピュータのＯＳ（オペレーティングシステム）環境に合ったソフトウェアによりプログラミングされる必要がある。
かかる発明によれば、同じＯＳ環境のコンピュータであれば、同じプログラムで何処でも本発明の情報通信装置として機能させることができる。
請求項１３は、請求項１２に記載の情報通信プログラムを記録したことを特徴とする。
また、前記プログラムが同じＯＳ環境のコンピュータ上で機能するために、そのプログラムを持ち運び可能なフレキシブルディスク、ＣＤ、ＭＯ、ＭＤ等の記録媒体に記録しておくことが好ましい。
かかる発明によれば、本発明の情報通信プログラムを持ち運び可能な記録媒体に記録するので、同じＯＳ環境のコンピュータであれば何処でも情報通信装置として機能させることができる。
【００１０】
【発明の実施の形態】
以下、本発明を図に示した実施形態を用いて詳細に説明する。但し、この実施形態に記載される構成要素、種類、組み合わせ、形状、その相対配置などは特定的な記載がない限り、この発明の範囲をそれのみに限定する主旨ではなく単なる説明例に過ぎない。
図１は本発明の第１の実施形態に係る情報通信装置の構成図である。以下、各実施の形態の説明では、本発明の情報通信装置を電話（固定電話や携帯電話）に組み込んだものとして説明する。また、本発明の情報通信装置の組込み対象は、電話に限ったものではなく、小型携帯情報端末（ＰＤＡ）など情報通信機能を有する装置であれば組み込むことで同様の効果を期待できる。
本発明の情報通信装置１００は、携帯電話のような持ち運ぶタイプの電話装置であり、携帯電話のキー等の手段によって入力されるテキスト入力手段１と、テキスト情報が音声データに変換されるテキスト音声合成手段２と、外部との通信を司る音声通信手段３と、音声データをスピーカーやイヤホン接続端子に出力する音声出力手段４と、イヤホン５により構成される。また、情報通信装置１１０は、外部との通信を司る音声通信手段６と、音声データをスピーカーやイヤホン接続端子に出力する音声出力手段８と、マイクロフォンなどの音声入力手段７により構成される。
この実施形態において、情報通信装置１００のテキスト入力手段１は、携帯電話のキー等の手段によって入力されるものであり、公知技術であるため詳細は省略する。これにより入力されたテキストは、テキスト音声合成手段２により、音声データに変換される。変換された音声データは、音声通信手段３によって情報通信装置１１０に伝えられ、情報通信装置１１０の音声出力手段８によりスピーカーやイヤホンによって出力される。情報通信装置１１０は通常の電話と呼ばれるものそのものである。また、情報通信装置１１０はマイクロフォンなどの音声入力手段７によって入力された音声データは、音声通信手段６によって、情報通信装置１００に送られ、イヤホン５に接続された音声出力手段４によって出力される。以上のように、情報通信装置１００が電車などのマナーを要求される場所においても、音声による通信をしないため、周りに迷惑をかけることなく、内容を情報通信装置１１０側に伝えることができ、また、情報通信装置１１０側の内容もイヤホン５等で聞いているため、周りに迷惑をかけることなく、内容を得ることができる。
【００１１】
図２は、本発明の第２の実施形態に係る情報通信装置の構成図である。同じ構成要素には同じ参照番号が付されているので、重複する説明は省略する。本発明の情報通信装置１２０は、携帯電話のような持ち運ぶタイプの電話装置であり、携帯電話のキー等の手段によって入力されるテキスト入力手段１０と、テキスト情報が音声データに変換されるテキスト音声合成手段１１と、外部との通信を司る音声通信手段１４と、音声データをテキスト情報に変換する音声認識手段１３と、テキスト情報を表示若しくは印字するテキスト出力手段１２により構成される。また、情報通信装置１１０は、前記第１の実施形態外部と同様であるので説明を省略する。図２が図１と異なる点は、音声出力手段４とイヤホン５の代わりに、音声認識手段１３とテキスト出力手段１２を備え、情報通信装置１２０において受信した音声データは、音声認識手段１３によってテキストに変換される点である。これにより、ユーザはこれを見ることで、わざわざイヤホン等の装置をつけることもなく、マナーの必要な環境でＢのユーザと会話をすることができる。
図３は、このときの情報通信装置１２０における画面例である。このように画面２０の上部２１では、情報通信装置１１０から送られた音声データからテキスト化されたテキストが順に表示される。画面２０の下部２２は、情報通信装置１２０から送るテキストを表示する。現在「てれび」という文字２３を入力し、かな漢字変換をしようとしているところである。テキスト入力手段１０では、一文が入力されるたびに、テキスト音声合成手段１１に送り、音声データに変換する。このようにして、情報通信装置１２０側ではテキストのみで会話が可能となる。
【００１２】
図４は別の画面例である。この例では、入出力されたテキスト２５が時系列に表示される。その際、情報通信装置１１０から受信したもの２６に関しては、先頭に送信者の名前をつけて表示する。この例では「太郎」２６ａとして名前を表示する。送信者の名前を知ることは、アドレス帳に登録しておき、電話番号からその電話番号を持つ人間の名前を表示すればよく、これは、携帯電話の着信表示などにおいて広く行われている方法であり、公知である。なお、アドレス帳に登録されていない送信者の場合は、「相手」または「不明」と表示すればよい。このようにして、ユーザは時系列を把握しつつ、テキストのみで会話ができる。
図５は、本発明の第３の実施形態に係る情報通信装置の構成図である。本発明の情報通信装置１４０は、携帯電話のような持ち運ぶタイプの電話装置であり、携帯電話のキー等の手段によって入力されるテキスト入力手段３０と、外部との通信を司るテキスト通信手段３２と、テキスト情報を表示若しくは印字するテキスト出力手段３１により構成される。また、情報通信装置１５０は、外部との通信を司るテキスト通信手段３３と、テキスト情報が音声データに変換されるテキスト音声合成手段３４と、音声データをスピーカーやイヤホン接続端子に出力する音声出力手段３５と、マイクロフォンなどの音声入力手段３７と、音声データをテキスト情報に変換する音声認識手段３６により構成される。
この実施形態では、図１、２の装置の場合と異なり、テキストによって通信を行なう。情報通信装置１４０は従来のテキストで通信が可能な携帯電話である。また、情報通信装置１５０は、固定電話や、マナーモードにする必要の無い場合の携帯電話である。情報通信装置１４０のテキスト入力手段３０から入力したテキストは、そのままテキスト通信手段３２によって情報通信装置１５０に送られる。情報通信装置１５０ではテキスト音声合成手段３４によって、音声データに変換し、音声出力手段３５によって、スピーカー等からユーザに伝えられる。情報通信装置１５０の音声入力手段３７において入力された音声は、音声データに変換され、音声認識手段３６によってテキストに変換されて、テキスト通信手段３３によって、情報通信装置１４０に送られる。情報通信装置１４０では、テキスト出力手段３１によって、情報通信装置１４０の画面に出力される。これにより、携帯電話側は従来の装置をそのまま使うことができ、さらに、固定電話などの音声会話が必要な相手に対しても、周りのマナーを気にせず、通信をすることができる。
【００１３】
図６は、本発明の第１〜第３の実施形態に係る情報通信装置のテキスト入力手段の部分を説明する構成図である。テキスト入力手段４０は、通常あまり使用されない特殊キーを入力する特殊キー入力手段４１と、この特殊キーに対応したテキスト情報を登録するテキスト登録手段４２と、特殊きーの組合わせと、それに対応するテキスト情報を記憶するテキストデータベース４３から構成され、その結果がテキスト音声合成手段４４と、テキスト通信手段４５に出力される。この実施形態では、まずテキスト登録手段４２によって、テキストがテキスト装置内のテキストデータベース４３に登録される。これは、キーとテキスト情報の組合わせであり、次のようなものである。例えば、キーは、＃や＊などのような通常使わないキーを含むキーの列でよい。また、その装置に新たなキー（例えば図７の「☆」）を設けても良い。テキストを登録する手段については、携帯電話の定型句登録手段などの公知技術を使用することができ、ここでは説明を省略する。また、特殊キー入力手段４１は、キーが入力されるたびに、テキストデータベース４３にマッチするものがあるか否かを、データベース上の項目から順に調べる。マッチするものが見つかった場合、そのキーに対応するテキストをテキスト音声合成手段４４またはテキスト通信手段４５に出力する。これにより、事前に登録しておいたテキストを簡単なキー操作で出力できるため、相手が音声で会話をしていても、即座に応答することができる。また、テキストデータベース４３には、図７の例のようにキー項目４６の＃に対応してテキスト項目４７の「うん」といった、うなずきを意味するテキストを登録しておくので、対話をするにあたって、間を空けることなく、自然な会話ができる。また、キー項目４６の＊＃に対応してテキスト項目４７のように「えええー。」といった、否定を意味するテキストを登録しておくので、対話をするにあたって、相手の発言に異論がある場合、即座にその意を示すことができる。また、キー項目４６の＊＊＃に対応してテキスト項目４７のように「なんで？」といった、疑問を意味するテキストを登録しておくので、対話をするにあたって、相手の発言に疑問がある場合、即座にその意を示すことができる。
【００１４】
図８は、本発明の第４の実施形態に係る情報通信装置のテキスト入力モード時における音声入力手段の部分を説明する構成図である。本実施形態は、第１、第２の実施形態におけるテキスト入力手段とテキスト音声合成手段とを図８で置き換えたものである。音声入力手段５０は、通常あまり使用されない特殊キーを入力する特殊キー入力手段５１と、この特殊キーに対応した音声情報を登録する音声データ登録手段５２と、特殊キーの組合わせと、それに対応する音声情報を記憶する音声データベース５３から構成され、その結果が音声通信手段５４に出力される。この実施形態では、まず音声データ登録手段５２によって、音声データが登録される。これは、留守番応答録音などによって知られる技術によって、携帯電話のマイクロフォンから録音することで実現できる。また、それ以外にも登録しておくデータがあっても構わない。
音声データベース５３は図９のようなものなる。例えば、キー項目５５の＃に対応して音声データ項目５７の「うん」といった、うなずきを意味する音声データを登録しておくので、対話をするにあたって、間を空けることなく、自然な会話ができる。また、キー項目５５の＊＃に対応して音声データ項目５７のように「えええー。」といった、否定を意味する音声データを登録しておくので、対話をするにあたって、相手の発言に異論がある場合、即座にその意を示すことができる。また、キー項目５５の＊＊＃に対応して音声データ項目５７のように「なんで？」といった、疑問を意味する音声データを登録しておくので、対話をするにあたって、相手の発言に疑問がある場合、即座にその意を示すことができる。このように、変更可項目５６が「不可」になっている項目の音声データは事前に登録され変更できない。また、いくつかのデータは、ユーザがマイクロフォンから録音することで登録することができる。このように、特殊キーが押されたとき、このデータベース５３を調べ、キーに一致する音声データを音声通信手段に出力する。これにより、あらかじめ録音しておいた音声データを出力するため、電車内などマナーを必要とする場所でも周りを気にすることなく、会話をすることができる。また、音声データベース５３にはこの例のように「うん」といった、うなずきを意味する音声データを登録しておくので、対話をするにあたって、間を空けることなく、自然な会話ができる。また、音声データベース５３にはこの例のように「えええー。」といった、否定を意味する音声データを登録しておくので、対話をするにあたって、相手の発言に異論がある場合、即座にその意を示すことができる。また、音声データベース５３にはこの例のように「なんで？」といった、疑問を意味する音声データを登録しておくので、対話をするにあたって、相手の発言に疑問がある場合、即座にその意を示すことができる。
【００１５】
図１０は、本発明の第５の実施形態に係る情報通信装置のテキスト音声合成手段の部分を説明する構成図である。テキスト音声合成手段６１は、複数の異なる声種が個別に登録されている音素片辞書６５と、性別が判断可能な情報により構成される送信者情報６３に基づいて、前記音素片辞書６５から所定の音素片辞書を選択する送信者特定手段６４と、テキスト入力手段６０により入力されたテキストに対応する音声を合成する音声合成エンジン６２により構成され、音声合成エンジン６２により合成された音声データは音声通信手段６６に出力される。ここで、音素片辞書６５は性別を表す３種類（男性６５ａ、女性６５ｂ、ロボット６５ｃ）が用意されている。この構成では、テキスト入力手段６０によりテキストが音声合成エンジン６２に入力されると、音声合成エンジン６２は、送信者特定手段６４が選択した音素片辞書を用いて、それぞれの声種で入力テキストを読み上げて音声データとして出力する。この時、送信者特定手段６４が送信者情報６３を用いて辞書を選択している。送信者情報６３は、性別を判断できる情報であればどのような情報でもよい。例えば、図１１のように携帯電話であれば登録リストの補助情報欄に性別を登録しておけばよい。図１１は、従来のリストと同様に登録名７０、電話番号７３の他に、Ｅ−ｍａｉｌ７２などの補助情報を登録できるようになっている。ここで、補助情報として性別欄７１を設け、登録名に対応する性別を登録しておく。また、一番目の行は、送信者自身を登録しておく。ここで、性別欄７１の記号が♂であれば男性音素片辞書に、♀であれば女性音素片辞書に切替える。
【００１６】
第１と第２の実施形態の場合、送信者本人の属性に適合した音声データを合成するので、その声種は送信者本人の性別にすればよいので、図１１の登録リストの中から本人の行の性別情報を送信する。本実施形態において、本人の性別は♂であるので、送信者特定手段６４は男性音素片辞書６５ａを選択する。一方、図５の第３の実施形態の場合、受信者側の音声データを合成する。例えば、図５の情報通信装置１５０が図１１の相手１と通話している場合、性別が♀であるので、送信者特定手段は女性音素片辞書６５ｂを選択する。このように登録リストに性別が登録されていれば、入力したテキストを自分または相手の属性（例では性別）に適合する合成音に変換して会話することができる。しかしながら、リストに登録されていない場合や、図１１の相手２のように性別が登録されていない場合には、性別を判断できない。このような場合には、ロボット音声６５ｃを選択する。この結果、受信者のテキスト音声合成データの声種を中性的に扱うことができる。
以上の例では、送信者情報に登録リストを利用したが、携帯電話に割り当てられたボタンを使用して、手動により送信者情報を送信者特定手段６４に入力してもかまわない。この場合、例えば、図１、図２に示した構成において、送信者本人の属性を相手に知らせたくない場合は、手動でロボット音声に切替えて会話することも可能である。また、本実施形態では、属性を表すのに性別を取り上げ、その属性を表現するための音素片辞書を用意したが、属性はこれらに限ったものではない。例えば、属性を年代として、老人、子供などの声種をもつ音素片辞書を用意してもかまわない。この場合、送信者情報として年代の情報を送信者特定手段６４に入力すればよい。これにより、テキストを自分または相手の属性に適合する声種に切替えて音声合成することができるので、どちらか一方がテキストによる会話を行なっていても、従来に比べて親密なコミュニケーションができる。
【００１７】
なお、本発明による通信装置には、通常の音声入力モードと、テキスト入力モードとがある。通信開始時のモードについては、用途に応じて以下のように決めることができる。
（１）音声入力モードをデフォルトにする。テキストによる会話を行う場合は、常にテキスト入力モードに切替えてから使用する。もし、テキスト入力モードによる通信を行っていた場合、通信が終了した時点で、本装置が、音声入力モードに自動的に切替える。音声による通話を主に使用している人に便利である。
（２）テキスト入力モードをデフォルトにする。音声による通話を行う場合は、常に音声入力モードに切替えてから使用する。もし、テキスト入力モードによる通信を行っていた場合、通信が終了した時点で、本装置がテキスト入力モードに自動的に切替える。テキストによる会話を主に使用している人に便利である。
（３）前回の通信モードを保持しておく。音声による通話を行っていた場合は、通信終了時も音声入力モードのままにしておく。逆にテキストによる会話を行っていた場合は、テキスト入力モードのまま通信を終了する。通信開始時に所望するモードになっていなかった場合は、所望のモードに切替えてから使用する。どちらのモードも同じ頻度で使用している人に便利である。
【００１８】
図１２は、上述した実施形態を構成する各機能をそれぞれプログラム化し、あらかじめＣＤ−ＲＯＭ８４などの記録媒体に書き込んでおき、このＣＤ−ＲＯＭ８４をＣＤ−ＲＯＭドライブ８３のような媒体駆動装置を搭載したコンピュータに装着して、これらのプログラムをコンピュータのメモリ８１あるいはハードディスク８２のような記憶装置に格納し、それを実行することによっても、本発明の目的を達成できる。この場合、記録媒体から読み出されたプログラム自体が上述した実施の形態の機能を実現することになり、そのプログラムおよびそのプログラムを記録した記録媒体も本発明を構成することになる。なお、記録媒体としては半導体媒体（例えば、ＲＯＭ、不揮発性メモリカードなど）、光媒体（例えば、ＤＶＤ、ＭＯ、ＭＤ、ＣＤ−Ｒ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）のいずれであってもよい。また、ロードしたプログラムを実行することにより上述した実施形態の機能が実現されるだけでなく、そのプログラム８２ａの指示に基づき、オペレーティングシステム（ＯＳ）８２ｂなどが実際の処理の一部または全部を行い、その処理によって上述した実施の形態の機能が実現される場合も含まれる。
また、上述した実施形態の機能を実現するプログラム８２ａが、機能拡張ボードや機能拡張ユニットに備わるメモリにロードされ、そのプログラムの指示に基き、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって、上述した実施例の機能が実現される場合も含まれる。さらに、上述したプログラムをサーバコンピュータの磁気ディスクなどの記憶装置に格納しておき、通信網で接続されたユーザのコンピュータからダウンロードの形式で頒布する場合、このサーバコンピュータの記憶装置も本発明の記録媒体に含まれる。
【００１９】
【発明の効果】
以上記載のごとく請求項１の発明によれば、携帯電話にテキスト音声合成手段を搭載することにより、音声の代わりにキーからテキスト情報を入力し、その情報を音声に変換して送信するので、周囲に迷惑をかけることなく、通話が音声により可能となる。
また請求項２では、送信と受信をテキスト情報により行うため、イヤホンを必要とせず、しかも周囲に迷惑をかけることなく情報の授受が可能となる。
また請求項３では、テキスト情報を音声に、音声データをテキスト情報に変換するので、テキスト機能を持たない固定電話でも全てテキスト情報のみで情報の授受を行う車内の携帯電話との通信が可能となる。
また請求項４では、送受信のテキスト情報が同一表示画面上の上下に分離して表示されるので、送信情報と受信情報が見やすくなり、操作の誤りを減少することができる。
また請求項５では、送受信のテキスト情報が、時系列に表示部に表示されるので、情報の流れが容易にわかり、電話で会話しているように通信することができる。
また請求項６では、受信相手のテキスト情報に識別のための文字、符号等を付加するので、受信と送信テキスト情報の区別が明確になり、通信の効率を高めることができる。
また請求項７では、特殊キーを頻繁に使用されるテキスト情報と対応付けてデータベースに登録しておくので、テキスト情報による通信のリアルタイム性を高めることができる。
また請求項８では、特殊キーを頻繁に使用される音声データと対応付けてデータベースに登録しておくので、テキスト情報による通信のリアルタイム性を高めることができる。
また請求項９では、テキスト情報が選択された音種により音声合成されて出力されるので、さらに親密なコミュニケーションを実現することができる。
また請求項１０では、送信者の性別に応じて音素を選択するので、通信に自然の雰囲気を創り出すことができる。
また請求項１１は、通信が終了した時点で、音声入力モードに自動的に戻るので、必ず通常の音声モードで通話を行うことができる。
また請求項１２は、同じＯＳ環境のコンピュータであれば、同じプログラムで何処でも本発明の情報通信装置として機能させることができる。
また請求項１３は、本発明の情報通信プログラムを持ち運び可能な記録媒体に記録するので、同じＯＳ環境のコンピュータであれば何処でも情報通信装置として機能させることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る情報通信装置の構成図である。
【図２】本発明の第２の実施形態に係る情報通信装置の構成図である。
【図３】本発明の情報通信装置における画面例を表す図である。
【図４】本発明の情報通信装置における別の画面例を表す図である。
【図５】本発明の第３の実施形態に係る情報通信装置の構成図である。
【図６】本発明の第３の実施形態に係る情報通信装置のテキスト入力手段の部分を説明する構成図である。
【図７】本発明のテキストデータベースの一例を示す図である。
【図８】本発明の第４の実施形態に係る情報通信装置のテキスト入力モード時における音声入力手段の部分を説明する構成図である。
【図９】本発明の音声データベースの一例を示す図である。
【図１０】本発明の第５の実施形態に係る情報通信装置のテキスト音声合成手段の部分を説明する構成図である。
【図１１】本発明の登録リストの一例を示す図である。
【図１２】本発明のプログラムとしての実施形態を説明する図である。
【符号の説明】
１　テキスト入力手段、２　テキスト音声合成手段、３、６　音声通信手段、４、８　音声出力手段、５　イヤホン、７　音声入力手段、１１０　情報通信装置[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information communication device, and more particularly, to an information communication device for mutually converting text information and voice information.
[0002]
[Prior art]
In recent years, the spread of mobile phones has been remarkable, and the number of owners has been increasing year by year. The reason for the explosive spread is that it can be used in a good communication state anywhere as long as communication is possible with the base station. However, it has been pointed out that with the spread, voices and radio waves accompanying a call may cause trouble to others depending on the usage environment of the mobile phone. For example, calls on crowded trains and buses or places where many people meet together may not only cause discomfort to others, but also adversely affect people carrying medical devices such as pacemakers. There is also. However, in spite of such a problem, there are still users who make a call in a vehicle or the like because a call when moving provides a great convenience as a communication means. One way to solve this problem is to use e-mail instead of voice call. In the case of e-mail, if the other party can use a mobile phone or PC, there is no problem because e-mail can be handled. However, if a call comes in from a device that can only exchange information by voice, such as a home fixed telephone, the in-vehicle mobile phone user has to inevitably have to make a call or check the incoming information later and call back again. Was.
In order to solve this problem, Japanese Patent Application Laid-Open Nos. H10-303969 and H11-112550 describe systems that can use electronic mail even on a fixed telephone or the like that can only handle voice communication. This gazette discloses a technique for converting speech into text using text speech recognition and displaying the text on a screen of a mobile phone or the like. A technique of converting the data into data and transmitting the data to a fixed telephone is disclosed.
[Patent Document 1] JP-A-10-303969
[Patent Document 2] JP-A-11-112550
[0003]
[Problems to be solved by the invention]
As described above, in the inventions disclosed in JP-A-10-303969 and JP-A-11-112550, a voice is recognized and converted into text information, and an e-mail is sent to a PC. You can make a call by converting it into a synthesized voice. However, e-mail is not suitable for an instant communication form such as a voice conversation. That is, the transmitted mail remains in the mail server until the recipient attempts to read his / her own mail, and cannot respond immediately. Therefore, for users seeking intimate communication, the exchange of e-mails is unsatisfactory in that it is not possible to perform communication at the same level as voice communication, and it has not been possible to provide a two-way information communication device that is sufficiently satisfactory for users. Is the current situation.
An object of the present invention is to provide a two-party information communication device that does not disturb the surroundings by performing conversation using either voice or text according to the environment at the time of communication in view of the above problem.
[0004]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, the present invention provides a text-to-speech synthesizing unit that converts voice or text information into voice data based on input text information in an information communication apparatus that transmits and receives voice or text information. And input switching means for switching between a voice input mode and a text input mode, further comprising: upon transmitting information, after switching to the text input mode by the input switching means, inputting text information instead of voice. The text information is converted into voice data by the text-to-speech synthesis means and transmitted.
For example, when the information communication device is a mobile phone, it may not be possible to make a voice call when there is an incoming call on the mobile phone in a train or the like. In such a case, the mobile phone is switched to the text input mode by the input switching means, and text information is input by the keys while listening to the voice of the other party with the earphone. The information is converted into voice data by the text-to-speech synthesis means of the mobile phone and transmitted to the other party.
According to this invention, text information is input from a key instead of a voice by installing the text-to-speech synthesizing means in the mobile phone, and the information is converted into a voice and transmitted. , A call can be made by voice.
[0005]
According to a second aspect of the present invention, there is provided an information communication apparatus for transmitting or receiving voice or text information by wire or wirelessly, a text-to-speech synthesizing means for converting input text information into voice data, and text information based on received voice data. Further comprising: a voice recognition unit for converting the input data into a text input mode; and an input switching unit for switching between a voice input mode and a text input mode. The text information is converted into voice data by the text-to-speech synthesizing unit and transmitted by inputting, and upon receiving the information, the received voice data is converted into text information by the voice recognition unit and displayed. It is characterized by.
According to the first aspect of the present invention, the reception is performed by a normal voice, and only the transmission is performed by converting the text information into a voice and talking. However, in this case, since the answer is input from the key, the voice of the other party must be heard through the earphone, which is troublesome, and the earphone must be provided at all times. In order to solve this problem, the invention according to claim 2 further comprises a voice recognizing means, which converts the voice of the partner at the time of reception into text information and displays the text information.
According to this invention, transmission and reception are performed by text information, so that an earphone is not required and information can be transmitted and received without disturbing the surroundings.
According to a third aspect of the present invention, in the information communication apparatus for transmitting and receiving voice or text information by wire or wirelessly, a text-to-speech synthesizing means for converting received text information into voice data, and converting the text information based on the input voice data. The apparatus further comprises a voice recognition unit for converting, and an input switching unit for switching between a voice input mode and a text input mode.In transmitting information, after switching to the voice input mode by the input switching unit, inputting voice data. The voice data is converted into text information by the voice recognition means and transmitted, and upon receiving the information, the received text information is converted into voice data by the text voice synthesis means.
For example, when exchanging information between a mobile phone in a car and a landline phone without a text function, or between a mobile phone in a car and a mobile phone in an environment that does not disturb the surroundings, text information is sent from the mobile phone in the car. Since it is in a situation where transmission and reception cannot be performed only, fixed telephones or other mobile phones are equipped with text-to-speech synthesis means and speech recognition means. Converts voice data into text information by voice recognition means.
According to this invention, since text information is converted into voice and voice data is converted into text information, it is possible to perform communication with a mobile phone in a vehicle that exchanges information only with text information even on a fixed telephone without a text function. Become.
[0006]
Preferably, when the speech data received by the speech recognition means is converted into text information and displayed, the transmitted text information and the received text information are displayed separately on the upper and lower sides of the same display screen. And
In the case of a mobile phone, a display unit such as a liquid crystal is always provided. Therefore, when sending text information such as e-mail, it is a known technique to perform key operations while looking at the display screen. The present invention realizes a display unit which is easy to see by displaying the text information of transmission and the text information received from the other party separately above and below the display unit.
According to this invention, the transmitted / received text information is displayed vertically separated on the same display screen, so that the transmitted information and the received information are easy to see, and an operation error can be reduced.
According to a fifth aspect of the present invention, when the voice data received by the voice recognition unit is converted into text information and displayed, the transmitted text information and the received text information are displayed on the same display screen in chronological order. .
There is a communication mode called chat on the Internet. In this method, the transmission and reception of e-mails is displayed in text in chronological order in real time, and communication is performed as if a telephone conversation is being performed. Similarly, the present invention displays the transmitted text information and the converted received text information in chronological order.
According to this invention, the transmitted and received text information is displayed on the display unit in chronological order, so that the flow of information can be easily understood, and communication can be performed as if the user is talking on the telephone.
[0007]
A sixth aspect of the present invention is characterized in that a character or a code specifying a transmission source of the text information is displayed at a head portion of the received text information.
In the case of the fifth aspect, it is not possible to distinguish the reception text information and the transmission text information of the other party, and it is not clear which is the transmission side or the reception side even if they are displayed in chronological order. Therefore, in the present invention, a character, a code and the like for distinguishing the received text information at the head of the other party are displayed at the same time. For example, if the name of the other party is registered at the same time as the telephone number or the mail address, the name can be displayed.
According to this invention, since characters, codes, and the like for identification are added to the text information of the receiving party, the distinction between the received and transmitted text information becomes clear and communication efficiency can be improved.
Claim 7 further comprises a text database that stores a combination of a predetermined special key and text information associated with the combination, and corresponds to the special key when the special key is input in the text input mode. The text database to be searched is searched by the text database, and the search result is output to the text-to-speech synthesis means, or the text information is transmitted as it is.
When communicating using text information, the biggest drawback is that, unlike conversation, real-time properties are poor. This is because the input means uses keys. Therefore, in the present invention, in order to solve this problem as much as possible, in order to reduce the load of key input, special keys that are not often used are frequently used in conversations such as "nod", "exclamation", "reply". Are registered in the database in association with each other. When the special key is input, the registered text information is retrieved from the database and output.
According to this invention, since the special key is registered in the database in association with frequently used text information, real-time communication of text information can be improved.
[0008]
Claim 8 further comprises a voice database that stores a combination of a predetermined special key and voice data associated with the combination, and corresponds to the special key when the special key is input in the text input mode. Searching for the voice data to be performed based on the voice database, and transmitting the search result.
In claim 7, the special key and the text information are registered in the database in association with each other. However, in the present invention, the special key and the voice data are registered in the database in association with each other. It searches and outputs registered voice data.
According to this invention, since the special key is registered in the database in association with the frequently used voice data, the real-time communication of the text information can be improved.
In the ninth aspect, the text-to-speech synthesizing unit may include a speech segment dictionary in which a plurality of different voice types are individually registered, and the speech segment dictionary based on sender information including gender-determinable information. A sender identification unit for selecting a predetermined phoneme dictionary, and a speech synthesis engine for synthesizing speech corresponding to the input text, wherein when text information is inputted in the text input mode, the speech synthesis engine Is characterized in that the text information is speech-synthesized using a phoneme segment dictionary selected by the sender specifying means.
When voice is synthesized from text information and transmitted as voice to the other party, the type of voice for voice synthesis is generally a mechanical synthesized voice. However, although this synthesized sound is necessary in some cases, it is dull in communication between a limited number of friends and close friends. Therefore, in the present invention, different kinds of sound types, for example, male, female, mechanical synthetic sounds, or voices of the person themselves are registered in a database called a phoneme segment dictionary, and the phoneme segment dictionary is registered based on the sender information. Is selected, and the text information is converted and output using the selected sound type.
According to this invention, since the text information is synthesized and output by the selected sound type, further intimate communication can be realized.
[0009]
A tenth aspect is characterized in that the sender specifying means selects a phoneme segment dictionary corresponding to the gender of the transmission destination.
It is the sender specifying means that selects a specific phoneme from the phoneme dictionary. It is the sender information that gives an instruction to this sender specifying means. The sender information mainly specifies the gender of the destination, and if the sender is male, the male phoneme is selected.
According to this invention, since a phoneme is selected according to the sex of the sender, a natural atmosphere can be created for communication.
An eleventh aspect is characterized in that the input switching means automatically switches to the voice input mode when the communication is completed.
The communication device according to the present invention has a normal voice input mode and a text input mode. For example, when communication is not possible by voice when a mobile phone is received from the other party in the car, communication can be performed by switching to the text input mode by the input switching means. Therefore, it is necessary to automatically return to the voice input mode when the communication is completed.
According to this invention, when the communication is completed, the mode automatically returns to the voice input mode, so that the reception can always be performed in the normal voice mode.
According to a twelfth aspect, a computer is caused to function as the information communication device according to any one of the first to eleventh aspects.
In order to realize the functions of the present invention by a computer, it is necessary to perform programming by software suitable for an OS (Operating System) environment of the computer.
According to this invention, any computer in the same OS environment can function as the information communication apparatus of the present invention anywhere with the same program.
According to a thirteenth aspect, the information communication program according to the twelfth aspect is recorded.
In addition, in order for the program to function on a computer in the same OS environment, it is preferable to record the program on a portable recording medium such as a flexible disk, CD, MO, or MD.
According to this invention, since the information communication program of the present invention is recorded on a portable recording medium, any computer with the same OS environment can function as an information communication device.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in detail using embodiments shown in the drawings. However, the components, types, combinations, shapes, relative arrangements, and the like described in this embodiment are not merely intended to limit the scope of the present invention but are merely illustrative examples unless otherwise specified. .
FIG. 1 is a configuration diagram of the information communication device according to the first embodiment of the present invention. In the following, the embodiments will be described assuming that the information communication apparatus of the present invention is incorporated in a telephone (fixed telephone or mobile telephone). Also, the information communication device of the present invention is not limited to being incorporated into a telephone, but a similar effect can be expected by incorporating a device having an information communication function such as a small personal digital assistant (PDA).
The information communication device 100 of the present invention is a portable telephone device such as a mobile phone, and includes a text input unit 1 that is input by means such as a key of a mobile phone, and a text voice that converts text information into voice data. It comprises a synthesizing unit 2, an audio communication unit 3 for communicating with the outside, an audio output unit 4 for outputting audio data to a speaker or an earphone connection terminal, and an earphone 5. The information communication apparatus 110 includes an audio communication unit 6 that controls external communication, an audio output unit 8 that outputs audio data to a speaker or an earphone connection terminal, and an audio input unit 7 such as a microphone.
In this embodiment, the text input means 1 of the information communication device 100 is input by means such as a key of a mobile phone, and is a well-known technique, and thus the details are omitted. The text thus input is converted by the text-to-speech synthesis means 2 into voice data. The converted audio data is transmitted to the information communication device 110 by the audio communication unit 3 and output by the audio output unit 8 of the information communication device 110 by a speaker or an earphone. The information communication device 110 is what is called a normal telephone itself. In the information communication apparatus 110, audio data input by the audio input means 7 such as a microphone is sent to the information communication apparatus 100 by the audio communication means 6 and output by the audio output means 4 connected to the earphone 5. . As described above, even in a place where manners are required, such as a train, the information communication device 100 does not perform voice communication, so that the contents can be transmitted to the information communication device 110 without disturbing others, In addition, since the content on the information communication device 110 is also heard through the earphone 5 or the like, the content can be obtained without disturbing others.
[0011]
FIG. 2 is a configuration diagram of the information communication device according to the second embodiment of the present invention. The same components are denoted by the same reference numerals, and duplicate description will be omitted. The information communication device 120 of the present invention is a portable telephone device such as a mobile phone, and includes a text input unit 10 that is input by means such as a key of a mobile phone, and a text voice that converts text information into voice data. It comprises a synthesizing unit 11, a voice communication unit 14 for communicating with the outside, a voice recognition unit 13 for converting voice data into text information, and a text output unit 12 for displaying or printing text information. Further, the information communication device 110 is the same as that of the first embodiment, so that the description is omitted. 2 is different from FIG. 1 in that a voice recognition unit 13 and a text output unit 12 are provided instead of the voice output unit 4 and the earphone 5, and the voice data received by the information communication apparatus 120 is converted into a text by the voice recognition unit 13. Is converted to Thus, the user can talk with the user B in an environment where manners are required without looking at the device and wearing a device such as an earphone.
FIG. 3 is an example of a screen on the information communication device 120 at this time. As described above, on the upper portion 21 of the screen 20, texts that are converted into text from the audio data transmitted from the information communication device 110 are sequentially displayed. The lower portion 22 of the screen 20 displays a text sent from the information communication device 120. Currently, the user is inputting the character 23 of "TV" to convert the kana-kanji character. Each time one sentence is input, the text input unit 10 sends the sentence to the text-to-speech synthesis unit 11 and converts the sentence into voice data. In this manner, the information communication device 120 can have a conversation only with the text.
[0012]
FIG. 4 is another screen example. In this example, the input / output text 25 is displayed in chronological order. At this time, the information 26 received from the information communication device 110 is displayed with the name of the sender at the beginning. In this example, the name is displayed as "Taro" 26a. In order to know the sender's name, it is sufficient to register it in the address book and display the name of the person having the telephone number from the telephone number, which is a widely used method for displaying incoming calls on mobile phones And is known. In the case of a sender not registered in the address book, "other party" or "unknown" may be displayed. In this way, the user can have a conversation only with the text while grasping the time series.
FIG. 5 is a configuration diagram of the information communication device according to the third embodiment of the present invention. The information communication device 140 of the present invention is a portable telephone device such as a mobile phone, and includes a text input unit 30 that is input by means such as a key of a mobile phone, a text communication unit 32 that manages communication with the outside. And text output means 31 for displaying or printing text information. Further, the information communication device 150 includes a text communication unit 33 that controls communication with the outside, a text-to-speech synthesis unit that converts text information into voice data, and a voice output unit that outputs voice data to a speaker or an earphone connection terminal. 35, voice input means 37 such as a microphone, and voice recognition means 36 for converting voice data into text information.
In this embodiment, unlike the case of the apparatus of FIGS. The information communication device 140 is a conventional mobile phone capable of communicating by text. The information communication device 150 is a fixed telephone or a mobile phone in a case where it is not necessary to set the manner mode. The text input from the text input unit 30 of the information communication device 140 is sent to the information communication device 150 by the text communication unit 32 as it is. In the information communication device 150, the data is converted into voice data by the text-to-speech synthesizing unit 34, and transmitted to the user from the speaker or the like by the voice output unit 35. The voice input by the voice input unit 37 of the information communication device 150 is converted into voice data, converted into text by the voice recognition unit 36, and sent to the information communication device 140 by the text communication unit 33. In the information communication device 140, the information is output to the screen of the information communication device 140 by the text output unit 31. As a result, the mobile phone can use the conventional device as it is, and can communicate with a party requiring a voice conversation, such as a fixed telephone, without worrying about manners around him.
[0013]
FIG. 6 is a configuration diagram illustrating a text input unit of the information communication apparatus according to the first to third embodiments of the present invention. The text input means 40 is a combination of a special key input means 41 for inputting a special key which is not often used, a text registration means 42 for registering text information corresponding to the special key, and a special key. It is composed of a text database 43 for storing text information, and the result is output to a text-to-speech synthesis unit 44 and a text communication unit 45. In this embodiment, first, the text is registered in the text database 43 in the text device by the text registration unit 42. This is a combination of key and text information, such as: For example, the key may be a sequence of keys including keys that are not commonly used, such as # and *. Further, a new key (for example, “☆” in FIG. 7) may be provided in the device. As a means for registering a text, a known technique such as a fixed phrase registration means of a mobile phone can be used, and description thereof is omitted here. Further, each time a key is input, the special key input unit 41 checks whether there is a match in the text database 43 in order from the items on the database. If a match is found, the text corresponding to the key is output to the text-to-speech synthesis means 44 or the text communication means 45. As a result, the text registered in advance can be output by a simple key operation, so that even if the other party has a voice conversation, it is possible to respond immediately. In the text database 43, as shown in the example of FIG. 7, a text meaning a nod, such as "No" of the text item 47, is registered in correspondence with # of the key item 46. You can have a natural conversation without a pause. In addition, since a text meaning negation, such as "Ehh," is registered in correspondence with * # of the key item 46 as in the text item 47, when there is disagreement with the other party's remarks during the conversation. , You can show it immediately. In addition, since a text meaning a question such as "Why?" Is registered in correspondence with ** # of the key item 46, as in the text item 47, if there is any doubt in the speech of the other party in the dialogue. , You can show it immediately.
[0014]
FIG. 8 is a configuration diagram illustrating a part of a voice input unit in a text input mode of an information communication apparatus according to a fourth embodiment of the present invention. In the present embodiment, the text input means and the text-to-speech synthesis means in the first and second embodiments are replaced by FIG. The voice input means 50 includes a special key input means 51 for inputting a special key which is not often used, a voice data registration means 52 for registering voice information corresponding to the special key, a combination of the special key, and a combination of the special key. It comprises a voice database 53 for storing voice information, and the result is output to the voice communication means 54. In this embodiment, first, audio data is registered by the audio data registration unit 52. This can be realized by recording from a microphone of a mobile phone by a technique known as answering machine response recording. Further, there may be other data to be registered.
The voice database 53 is as shown in FIG. For example, voice data that indicates a nod, such as “No” in the voice data item 57, is registered in correspondence with the # of the key item 55, so that a natural conversation can be performed without a gap in a conversation. . In addition, since voice data indicating negation, such as "yes," is registered in correspondence with * # of the key item 55 as in the voice data item 57, when conversing, there is objection to the other party's remarks. In some cases, this can be shown immediately. In addition, since voice data meaning a question such as "Why?" Is registered in correspondence with ** # of the key item 55 as in the voice data item 57, it is possible to ask questions of the other party during a conversation. In some cases, this can be shown immediately. As described above, the audio data of the item in which the changeable item 56 is “impossible” is registered in advance and cannot be changed. Also, some data can be registered by the user recording from a microphone. As described above, when the special key is pressed, the database 53 is checked and voice data corresponding to the key is output to the voice communication means. Thereby, since the voice data recorded in advance is output, it is possible to have a conversation even in a place where manners are required, such as in a train, without worrying about the surroundings. In addition, since voice data indicating a nod, such as "No", is registered in the voice database 53 as in this example, a natural conversation can be performed without a pause in a conversation. Further, since voice data indicating a negative, such as "Eh, eh," is registered in the voice database 53 as in this example, when there is an objection to the other party's remarks during the dialogue, the meaning is immediately given. Can be shown. Further, since voice data meaning a question such as "Why?" Is registered in the voice database 53 as described in this example, if there is any doubt about the speech of the other party during the dialogue, the meaning is immediately determined. Can be shown.
[0015]
FIG. 10 is a configuration diagram illustrating a part of a text-to-speech synthesis unit of an information communication device according to a fifth embodiment of the present invention. The text-to-speech synthesizing unit 61 determines a predetermined value from the phoneme segment dictionary 65 based on a phoneme segment dictionary 65 in which a plurality of different voice types are individually registered and sender information 63 composed of information whose gender can be determined. And a speech synthesis engine 62 for synthesizing speech corresponding to the text input by the text input means 60. The speech data synthesized by the speech synthesis engine 62 is Output to the communication means 66. Here, three types of phoneme segment dictionaries 65 (male 65a, female 65b, and robot 65c) representing gender are prepared. In this configuration, when a text is input to the speech synthesis engine 62 by the text input unit 60, the speech synthesis engine 62 uses the phoneme segment dictionary selected by the sender identification unit 64 to convert the input text for each voice type. Read it out and output it as audio data. At this time, the sender specifying means 64 selects a dictionary using the sender information 63. The sender information 63 may be any information as long as it can determine gender. For example, in the case of a mobile phone as shown in FIG. 11, the gender may be registered in the auxiliary information column of the registration list. In FIG. 11, similarly to the conventional list, in addition to the registered name 70 and the telephone number 73, auxiliary information such as an E-mail 72 can be registered. Here, a gender column 71 is provided as auxiliary information, and the gender corresponding to the registered name is registered. In the first line, the sender himself is registered. Here, if the symbol in the gender column 71 is ♂, the dictionary is switched to the male phoneme dictionary, and if the symbol is ♀, the dictionary is switched to the female phoneme dictionary.
[0016]
In the case of the first and second embodiments, since the voice data adapted to the attribute of the sender is synthesized, the voice type may be determined by the gender of the sender. Send gender information for row. In the present embodiment, since the person's gender is ♂, the sender identifying means 64 selects the male phoneme dictionary 65a. On the other hand, in the case of the third embodiment in FIG. 5, the voice data on the receiver side is synthesized. For example, when the information communication apparatus 150 in FIG. 5 is talking with the other party 1 in FIG. 11, the gender is ♀, and the sender specifying unit selects the female phoneme dictionary 65b. If the gender is registered in the registration list in this way, the input text can be converted into a synthetic sound that matches the attribute of the user or the other party (sex in the example), and conversation can be performed. However, if the gender is not registered in the list or the gender is not registered as in the case of the partner 2 in FIG. 11, the gender cannot be determined. In such a case, the robot voice 65c is selected. As a result, the voice type of the text-to-speech synthesis data of the recipient can be treated neutrally.
In the above example, the registration list is used for the sender information. However, the sender information may be manually input to the sender identification unit 64 by using a button assigned to the mobile phone. In this case, for example, in the configuration shown in FIGS. 1 and 2, if it is not desired to notify the attribute of the sender to the other party, it is also possible to manually switch to the robot voice and have a conversation. In the present embodiment, gender is used to represent an attribute, and a phoneme segment dictionary for expressing the attribute is prepared. However, the attribute is not limited to these. For example, it is possible to prepare a phoneme segment dictionary having voice types such as an old man and a child, with the attribute being the age. In this case, the age information may be input to the sender specifying means 64 as the sender information. As a result, the text can be switched to a voice type that matches the attribute of the user or the partner, and the voice can be synthesized. Therefore, even if one of the two is engaged in the text conversation, intimate communication can be performed as compared with the related art.
[0017]
The communication device according to the present invention has a normal voice input mode and a text input mode. The mode at the start of communication can be determined as follows depending on the application.
(1) Set the voice input mode to default. When conducting a text conversation, always switch to text input mode before using. If communication has been performed in the text input mode, the device automatically switches to the voice input mode when the communication is completed. Useful for people who mainly use voice calls.
(2) Make the text input mode the default. When making a voice call, always switch to voice input mode before using. If the communication is being performed in the text input mode, the apparatus automatically switches to the text input mode when the communication ends. Useful for people who primarily use text conversations.
(3) The previous communication mode is held. If a voice call is being made, the voice input mode is kept at the end of the communication. On the other hand, if the conversation is in text, the communication is terminated in the text input mode. If the desired mode is not set at the start of communication, the mode is switched to the desired mode before use. Both modes are useful for those who use the same frequency.
[0018]
FIG. 12 shows that each function constituting the above-described embodiment is programmed and written in a recording medium such as a CD-ROM 84 in advance, and the CD-ROM 84 is mounted with a medium driving device such as a CD-ROM drive 83. The object of the present invention can also be achieved by installing these programs in a computer, storing these programs in a storage device such as the memory 81 or the hard disk 82 of the computer, and executing the programs. In this case, the program itself read from the recording medium implements the functions of the above-described embodiment, and the program and the recording medium on which the program is recorded also constitute the present invention. The recording medium may be a semiconductor medium (for example, ROM, nonvolatile memory card, etc.), an optical medium (for example, DVD, MO, MD, CD-R, etc.), a magnetic medium (for example, magnetic tape, flexible disk, etc.). Either one may be used. By executing the loaded program, not only the functions of the above-described embodiment are realized, but also the operating system (OS) 82b performs part or all of the actual processing based on the instruction of the program 82a. The case where the function of the above-described embodiment is realized by the processing is also included.
Further, the program 82a for realizing the functions of the above-described embodiment is loaded into the memory provided in the function expansion board or the function expansion unit, and the CPU or the like provided in the function expansion board or the function expansion unit is actually loaded based on the instruction of the program. And a part of the entire process is performed, and the function of the above-described embodiment is realized by the process. Further, when the above-mentioned program is stored in a storage device such as a magnetic disk of a server computer and distributed in a download form from a computer of a user connected via a communication network, the storage device of the server computer is also a recording device of the present invention. Included in the medium.
[0019]
【The invention's effect】
As described above, according to the first aspect of the present invention, text information is input from a key instead of a voice by installing the text-to-speech synthesis means in the mobile phone, and the information is converted into a voice and transmitted. Calls can be made by voice without disturbing others.
According to the second aspect, since transmission and reception are performed by text information, it is possible to exchange information without an earphone and without disturbing the surroundings.
According to the third aspect, since text information is converted into voice and voice data is converted into text information, it is possible to perform communication with a mobile phone in a vehicle which exchanges information only with text information even in a fixed telephone having no text function. Become.
According to the fourth aspect, the transmitted and received text information is displayed separately on the upper and lower sides of the same display screen, so that the transmitted information and the received information can be easily seen, and an operation error can be reduced.
According to the fifth aspect, text information to be transmitted and received is displayed on the display unit in chronological order, so that the flow of information can be easily understood, and communication can be performed as if the user is talking on the telephone.
In addition, since a character, a code, and the like for identification are added to the text information of the receiving party, the distinction between the receiving and transmitting text information becomes clear, and communication efficiency can be improved.
Further, since the special key is registered in the database in association with the frequently used text information, real-time communication of the text information can be improved.
According to the eighth aspect, since the special key is registered in the database in association with frequently used voice data, real-time communication of text information can be improved.
In the ninth aspect, the text information is synthesized and output by the selected sound type, so that more intimate communication can be realized.
According to the tenth aspect, a phoneme is selected according to the gender of the sender, so that a natural atmosphere can be created for communication.
According to the eleventh aspect, when the communication is completed, the mode automatically returns to the voice input mode, so that the call can always be performed in the normal voice mode.
According to a twelfth aspect of the present invention, the same program can function as the information communication device of the present invention anywhere in a computer under the same OS environment.
According to the thirteenth aspect, the information communication program of the present invention is recorded on a portable recording medium, so that any computer having the same OS environment can function as an information communication device.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an information communication device according to a first embodiment of the present invention.
FIG. 2 is a configuration diagram of an information communication device according to a second embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of a screen in the information communication device of the present invention.
FIG. 4 is a diagram illustrating another example of a screen in the information communication device of the present invention.
FIG. 5 is a configuration diagram of an information communication device according to a third embodiment of the present invention.
FIG. 6 is a configuration diagram illustrating a text input unit of an information communication device according to a third embodiment of the present invention.
FIG. 7 is a diagram showing an example of a text database according to the present invention.
FIG. 8 is a configuration diagram illustrating a part of a voice input unit in a text input mode of an information communication device according to a fourth embodiment of the present invention.
FIG. 9 is a diagram showing an example of a speech database according to the present invention.
FIG. 10 is a configuration diagram illustrating a part of a text-to-speech synthesis unit of an information communication device according to a fifth embodiment of the present invention.
FIG. 11 is a diagram showing an example of a registration list according to the present invention.
FIG. 12 is a diagram illustrating an embodiment of the present invention as a program.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Text input means, 2 Text voice synthesis means, 3, 6 Voice communication means, 4, 8 Voice output means, 5 Earphones, 7 Voice input means, 110 Information communication device

Claims

In an information communication device that performs transmission and reception of voice or text information by wire or wireless, a text-to-speech synthesizing unit that converts input text information into voice data, an input switching unit that switches between a voice input mode and a text input mode, Further comprising
When transmitting information, after switching to the text input mode by the input switching means, by inputting text information instead of voice, the text information is converted to voice data by the text-to-speech synthesis means and transmitted. Characteristic information communication device.

A text-to-speech synthesizing means for converting voice or text information into voice data based on input text information in a communication device for transmitting or receiving voice or text information, and voice recognition for converting text data into text information based on received voice data Means, and an input switching means for switching between a voice input mode and a text input mode, further comprising:
At the time of transmitting information, after switching to the text input mode by the input switching means, by inputting text information instead of voice, the text information is converted into voice data by the text voice synthesis means and transmitted, An information communication device for converting received voice data into text information by the voice recognition means and displaying the converted text data when receiving the voice data.

A text-to-speech synthesis unit that converts received text information into voice data and a voice recognition unit that converts text data based on input voice data in an information communication device that transmits and receives voice or text information by wire or wirelessly And input switching means for switching between a voice input mode and a text input mode, further comprising:
At the time of transmitting information, after switching to the voice input mode by the input switching means, by inputting voice data, the voice data is converted to text information by the voice recognition means and transmitted, and upon receiving information, An information communication apparatus, wherein the received text information is converted into voice data by the text-to-speech synthesis means.

2. The apparatus according to claim 1, wherein when the voice data received by the voice recognition unit is converted into text information and displayed, the transmitted text information and the received text information are displayed separately on the upper and lower sides of the same display screen. Or the information communication device according to 2 or 3.

3. The system according to claim 1, wherein when the voice data received by the voice recognition unit is converted into text information and displayed, the transmitted text information and the received text information are displayed on the same display screen in time series. Or the information communication device according to 3.

6. The information communication apparatus according to claim 5, wherein a character or a code specifying a transmission source of the text information is displayed at a head portion of the received text information.

A text database that stores a combination of a predetermined special key and text information associated with the combination, wherein when the special key is input in the text input mode, the text information corresponding to the special key is stored in the text database. The information communication apparatus according to claim 1, wherein a search is performed using a text database, and the search result is output to the text-to-speech synthesis unit, or the text information is transmitted as it is.

The apparatus further includes a voice database storing a combination of a predetermined special key and voice data associated with the combination, and when the special key is input in the text input mode, the voice data corresponding to the special key is converted to the voice data. The information communication apparatus according to claim 1, wherein a search is performed based on a voice database, and the search result is transmitted.

The text-to-speech synthesizing means includes a phoneme segment dictionary in which a plurality of different voice types are individually registered, and a predetermined phoneme segment dictionary from the phoneme segment dictionary based on sender information composed of gender-determinable information. And a speech synthesis engine for synthesizing speech corresponding to the input text,
4. The apparatus according to claim 1, wherein when text information is input in the text input mode, the speech synthesis engine performs speech synthesis on the text information using a phoneme segment dictionary selected by the sender identification unit. 7. The information communication device according to claim 6.

10. The information communication apparatus according to claim 9, wherein the sender specifying unit selects a phoneme segment dictionary corresponding to the gender of the transmission destination.

The information communication device according to claim 1, wherein the input switching unit automatically switches to the voice input mode when communication ends.

An information communication program for causing a computer to function as the information communication device according to claim 1.

A computer-readable recording medium on which the information communication program according to claim 12 is recorded.