JP2004133119A

JP2004133119A - Portable terminal device

Info

Publication number: JP2004133119A
Application number: JP2002296198A
Authority: JP
Inventors: Kazunori Hayashi; 林　和典; Masaru Mase; 間瀬　優
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-10-09
Filing date: 2002-10-09
Publication date: 2004-04-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a portable terminal device with a storage device which can prevent unnecessary information from being read aloud and also prevent a synthesized speech from illegally being used. <P>SOLUTION: The portable terminal device with the storage device is equipped with a portable terminal device 1 and the storage device 2 which is attachable to and detachable from the portable terminal device. Namely, the storage device has a phoneme database storage part 121, a synthesis purpose data storage part 122, and a terminal device interface part 120 and the portable terminal device has a storage device interface part 103, a system control part 101 which controls the whole, a reading-aloud selection processing part 110 which selects a range of synthesis purpose data, a speech synthesis processing part 102 which generates synthesized speech data from a phoneme database read out through the interface part and the synthesis purpose data read out according to the selected range, and a speech output processing part 104 which inputs the generated synthesized speech data and outputs it as a synthesized speech signal. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、朗読対象である合成目的データを記憶装置に保存し、その記憶装置から読み出した合成目的データを音声に変換する携帯端末装置に関するものである。
【０００２】
【従来の技術】
従来、電子メールやワープロの文章を読み上げる装置としては、記憶容量の豊富さや処理能力の高さ、ネットワーク機能の充実度等からパーソナルコンピュータにて実現していた。加えて、出力される合成音声の声色は男性や女性といった一般的なものであったが、人間の発声に近い合成音声を生成することができる音声合成方法も開示されている。すなわち、辞書の中に読み仮名、アクセント型等の情報をととも、アクセント指令値及び又は音韻継続時間長情報を予め用意しておき、音韻の継続時間長を用いて音素片データのパラメータ列を生成し、それらを基に音声波形を合成することにより、人間の発声に近い合成音声を出力するものである（例えば、特許文献１参照）。
【０００３】
しかしながら、パーソナルコンピュータを歩きながら使用することは、大きさ、重量の問題から大変不便であるし、その操作も容易とは言い難い面がある。また、朗読対象の合成目的データを音声変換するのみの機能であれば、コストパフォーマンスに欠けるという問題がある。また、出力される音声も男性や女性といった一般的なものであり、必ずしもユーザが所望する声色での音声出力ではないので、ユーザが聴いていて楽しさを感じにくい面があった。
【０００４】
そこで、これらの不具合を是正するための装置として、合成目的データを入力する入力部（メモリカードや光ディスク・磁気ディスク等の記憶装置から入力する入力部、ネットワークから入力する入力部、キーボードから入力する入力部等）と、音声合成された音を出力する出力部（スピーカやヘッドフォン）と、合成目的データから合成音声データを生成する音声合成処理部とを備えた携帯端末装置が提案されており、この携帯端末装置における音声合成処理部は、実際の人物の肉声をサンプリングし、そのサンプリングデータをデータベース化した音素データベースを使用する。また情報処理装置への取り付け取り外しが簡単で、小型の情報処理装置（小型パーソナルコンピユータ等）にも内蔵でき、且つ小型軽量で持ち運びができると共に単体でも文章読み上げ機能を持つＩＣカード形態の文章読み上げシステムが公知である。（例えば、特許文献２参照）
【０００５】
【特許文献１】
特開平７−１４０９９９号公報
【特許文献２】
特開平６−３３７７７４号公報
【０００６】
【発明が解決しようとする課題】
しかしながら、上記携帯端末装置では、合成目的データの作成年月日や筆者の経歴等の付帯情報を含む場合があり、これらも朗読されることになるが、ユーザにとっては必ずしも、その情報の朗読は必要なものではないという問題点を有していた。また、携帯端末装置から出力される合成音声は音素データベースの提供者である実際の人物の肉声に近い為、この装置の使用者が音素提供者になりすまし、音声認証等で不正使用を行う恐れがあった。
【０００７】
この記憶装置付き携帯端末装置では、不必要な情報の朗読を防止し、また合成音声の不正な使用を防止することが要求されている。
【０００８】
本発明は、この要求を満たすため、不必要な情報の朗読を防止することができ、また合成音声の不正な使用を防止することができる記憶装置付き携帯端末装置を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するために本発明の記憶装置付き携帯端末装置は、携帯端末装置と、携帯端末装置に対して着脱可能な記憶装置とを備えた記憶装置付き携帯端末装置であって、記憶装置は、音素データをデータベース化した音素データベースを保存する音素データベース保存部と、朗読対象となる合成目的データを保存する合成目的データ保存部と、携帯端末装置とデータの授受を行う端末装置インタフェース部とを有し、携帯端末装置は、記憶装置とデータの授受を行う記憶装置インタフェース部と、全体を制御するシステム制御部と、合成目的データの範囲を選択する朗読選択処理部と、記憶装置インタフェース部と端末装置インタフェース部を介して音素データベース保存部から読み出した音素データベースおよび選択した範囲に基づいて合成目的データ保存部から読み出した合成目的データから合成音声データを生成する音声合成処理部と、生成した合成音声データを入力して合成音声信号として出力する音声出力処理部とを有する構成を備えている。
【００１０】
これにより、不必要な情報の朗読を防止することができる記憶装置付き携帯端末装置が得られる。
【００１１】
【発明の実施の形態】
本発明の請求項１に記載の記憶装置付き携帯端末装置は、携帯端末装置と、携帯端末装置に対して着脱可能な記憶装置とを備えた記憶装置付き携帯端末装置であって、記憶装置は、音素データをデータベース化した音素データベースを保存する音素データベース保存部と、朗読対象となる合成目的データを保存する合成目的データ保存部と、携帯端末装置とデータの授受を行う端末装置インタフェース部とを有し、携帯端末装置は、記憶装置とデータの授受を行う記憶装置インタフェース部と、全体を制御するシステム制御部と、合成目的データの範囲を選択する朗読選択処理部と、記憶装置インタフェース部と端末装置インタフェース部を介して音素データベース保存部から読み出した音素データベースおよび選択した範囲に基づいて合成目的データ保存部から読み出した合成目的データから合成音声データを生成する音声合成処理部と、生成した合成音声データを入力して合成音声信号として出力する音声出力処理部とを有することとしたものである。
【００１２】
この構成により、ユーザーは朗読対象情報としての合成目的データの範囲を選択して自分に不要な部分の朗読を省く事ができるので、時間の無駄なく効率的に自分が所望する小説等の朗読を聴くことができるという作用を有する。
【００１３】
請求項２に記載の記憶装置付き携帯端末装置は、請求項１に記載の記憶装置付き携帯端末装置において、朗読選択処理部に代えて識別音合成処理部を備え、識別音合成処理部は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号に含めることとしたものである。
【００１４】
この構成により、携帯端末装置からは音声合成の出力音声であることを示す可聴範囲外の識別音を含んだ合成音声が出力されるので、仮に出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、合成音声の不正な使用を防止することができるという作用を有する。
【００１５】
請求項３に記載の記憶装置付き携帯端末装置は、請求項１に記載の記憶装置付き携帯端末装置朗読選択処理部に代えて識別音合成処理部を備え、識別音合成処理部は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号のすべてに渡り出力するのではなく、合成目的データ中にある識別符号をトリガとして合成音声信号に含めることとしたものである。
【００１６】
この構成により、仮に携帯端末装置の出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、識別符号のある合成音声信号の周波数解析のみを行うだけで、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、より少ない信号の周波数解析により、合成音声の不正な使用を防止することができるという作用を有する。
【００１７】
以下、本発明の実施の形態について、図１〜図６を用いて説明する。
【００１８】
（実施の形態１）
図１は、携帯端末装置、サーバ装置、パソコン等を有する通信システムを示す構成図である。
【００１９】
図１において、１は表示部や操作部、ヘッドフォン・スピーカ等の音声出力部等を備えた携帯端末装置、２は合成目的データ及び音素データベースを記憶するメモリカード等の記憶装置である。記憶装置２は、携帯端末装置１とは脱着可能である。記憶装置２には、小説等の合成目的データを保存する合成目的データ保存部（後述）と音素データから成る音素データベースを保存する音素データベース保存部がある。音素データベースは、すでに前述しているが、実際の人物の肉声をサンプリングしたもので、音素毎に音の継続時間長や基本周波数、パワー等の音に関する情報や、その音素が属するデータファイル名およびファイル中におけるその音素の開始位置や終了位置の情報等を、ラベル付けして、任意のフォーマットに沿った形でデータベース化されているものである。この音素データベースは、この携帯端末装置１から出力される合成音声の声色や抑揚を決定する重要な要素となる。３は合成目的データａや音素データベースｂを提供するインターネット５上のサーバ装置である。ここでは一つのサーバ装置しか記載してないが、複数のサーバ装置で合成目的データと音素データベースを分けて提供する場合もある。４は合成目的データａと音素データベースｂをインターネット５上のサーバ装置から公衆回線６を介してダウンロードし、記憶装置２に記憶させる為のパソコンである。
【００２０】
図１において、ユーザはまずパソコン４を通じ、自分が所望する小説等の合成目的データや音声キャラクタの音素データベースをインターネット５上のサーバ装置３からダウンロードし、パソコン４を通じてそのデータをメモリカード等の記憶装置２に記録する。次に、記憶装置２を携帯端末装置１に挿入し、携帯端末装置１上で朗読させる合成目的の部分を選択して再生の操作を行うと、携帯端末装置１上からユーザが所望するキャラクタの音声で選択した合成目的範囲のデータの音声合成音が出力される。
【００２１】
図２は、本発明の実施の形態１による記憶装置付き携帯端末装置を示すブロック図であり、図１の携帯端末装置１と記憶装置２を詳細に示す。
【００２２】
図２において、１は図１と同様の携帯端末装置、２は図１と同様の記憶装置、１０１は携帯端末装置１内の各処理部とデータのやり取りを行い、装置全体の制御を行うシステム制御部である。１０２は合成目的データの解析を行って、各文字データに最適な音素データを抽出および連結して合成音声データを生成し、そのデータを後述の音声出力処理部１０３に渡せるようにデータ変換する音声合成処理部、１０３はシステム制御部１０１から指示を受け、記憶装置２へのデータを読み書きする記憶装置インタフェース部（記憶装置Ｉ／Ｆ部）である。１０４は音声合成処理部１０２からデータを受け、データのフォーマット変換を行い、スピーカ１０８またはヘッドフォン（図示せず）に出力する音声出力処理部である。１０５は携帯端末装置１を制御するプログラムの保存領域やデータ処理の際の作業領域として用いられる記憶部、１０６はユーザが携帯端末装置１に自分の指示を与える操作部、１０７は携帯端末装置１の動作状態等をユーザに表示する表示部、１０９は携帯端末装置１に電源を供給する為の電源部、１１０はユーザが操作部１０６から選択した合成目的データの範囲選択情報に基づき、合成目的データの構成分析を行い、選択範囲の合成目的データの抽出を行う朗読選択処理部、１２０は記憶装置インタフェース部１０３と共に携帯端末装置１とデータのやり取りを行う端末装置インタフェース部（端末装置Ｉ／Ｆ部）、１２１は音素データベースを保存する音素データベース保存部、１２２は合成目的データを保存する合成目的データ保存部である。
【００２３】
このように構成された記憶装置付き携帯端末装置について、その動作を図３を用いて行う。図３は、図２の記憶装置付き携帯端末装置の動作（システム制御部１０１の動作）を示すフローチャートである。
【００２４】
ユーザが操作部１０６にて装置１の電源をオンすると、システム制御部１０１は、記憶装置インタフェース部１０３に指令を出し、記憶装置２が携帯端末装置１に接続しているか否かを調べさせる（Ｓ１）。接続していないと判定した場合は記憶部１０５からフォントデータを取り出し、表示部１０７に「メモリカードを挿入してください。」等のようなメッセージを表示し（Ｓ２）、ユーザに対して携帯端末装置１に記憶装置２の接続を行うように促す。接続していると判定した場合、システム制御部１０１は、操作部１０６から再生の操作が行われるのを待つ（Ｓ３）。再生の操作が行われると、「朗読する対象データを選択してください。１．全文（付帯情報付き）　２．本文のみ（付帯情報を除く）．．」等のメッセージを表示部に表示し（Ｓ４）、ユーザに合成目的データの朗読対象部分を選択させる（Ｓ５）。選択されたら、合成目的データの範囲選択情報を記憶部１０５に記憶する。次に、システム制御部１０１は、記憶装置インタフェース部１０３に対して、記憶装置２内にある合成目的データを読み出すように指令を出す。記憶装置インタフェース部１０３は、記憶装置２内にある端末装置インタフェース部１２０とやり取りをしながら合成目的データを読み出し、携帯端末装置１内の記憶部１０５に記録する（Ｓ６）。
【００２５】
次に、システム制御部１０１は、朗読選択処理部１１０に処理を開始させる。朗読選択処理部１１０は、記憶部１０５から合成目的データを読み出してデータの構成解析を行い、データをユーザの選択部分のみに最適化して記憶部１０５に記録する（Ｓ７）。例えば、ユーザからの選択情報が本文のみの場合であった場合は、その合成目的データの作者や作成年月日等の付帯情報等を省き、処理した合成目的データを改めて記憶部１０５に記憶する。次に、システム制御部１０１は、音声合成処理部１０２に指令を出し、処理を開始させる。音声合成処理部１０２は、記憶部１０５から朗読選択処理部１１０が処理した合成目的データを順次読み出しながら解析を行い、各文字データに最も適する音素データを記憶部１０５または記憶装置２から読み出して、繋ぎ合わせ、そのデータを音声出力処理部１０４が処理できるデータに変換して合成音声データを作成する（Ｓ８）。音声出力処理部１０４は、音声合成処理部１０２から合成音声データを受け取り、データをフォーマット変換し、スピーカ１０８またはヘッドフォンに出力する（Ｓ９）。
【００２６】
このようにしてユーザは所望するキャラクタ音声にて選択した範囲の合成目的データの朗読を聴くことができる。
【００２７】
以上のように本実施の形態によれば、記憶装置２は、音素データをデータベース化した音素データベースを保存する音素データベース保存部１２１と、朗読対象となる合成目的データを保存する合成目的データ保存部１２２と、携帯端末装置１とデータの授受を行う端末装置インタフェース部１２０とを有し、携帯端末装置１は、記憶装置２とデータの授受を行う記憶装置インタフェース部１０３と、全体を制御するシステム制御部１０１と、合成目的データの範囲を選択する朗読選択処理部１１０と、記憶装置インタフェース部１０３と端末装置インタフェース部１２０を介して音素データベース保存部１２１から読み出した音素データベースおよび選択した範囲に基づいて合成目的データ保存部１２２から読み出した合成目的データから合成音声データを生成する音声合成処理部１０２と、生成した合成音声データを入力して合成音声信号として出力する音声出力処理部１０４とを有することにより、ユーザーは朗読対象情報としての合成目的データの範囲を選択して自分に不要な部分の朗読を省く事ができるので、時間の無駄なく効率的に自分が所望する小説等の朗読を聴くことができる。
【００２８】
（実施の形態２）
本発明の実施の形態２による記憶装置付き携帯端末装置を有する通信システムの構成は実施の形態１と同様、図１の構成である。
【００２９】
図４は、本発明の実施の形態２による記憶装置付き携帯端末装置を示すブロック図であり、図１の携帯端末装置１と記憶装置２を詳細に示す。
【００３０】
図４において、システム制御部１０１、音声合成処理部１０２、記憶装置インタフェース部１０３、音声出力処理部１０４、記憶部１０５、操作部１０６、表示部１０７、スピーカ１０８、電源部１０９、端末装置インタフェース部１２０、音素データベース保存部１２１、合成目的データ保存部１２２は、図２と同様のものなので、同一符号を付し、説明は省略する。１１１は音声出力処理部１０４から出力される合成音声信号に識別音信号を合成する識別音合成処理部である。
【００３１】
このように構成された記憶装置付き携帯端末装置について、その動作を図５を用いて説明する。図５は、図４の記憶装置付き携帯端末装置の動作（システム制御部１０１の動作）を示すフローチャートである。
【００３２】
ユーザが操作部１０６にて装置１の電源をオンすると、システム制御部１０１は、記憶装置インタフェース部１０３に指令を出し、記憶装置２が携帯端末装置１に接続しているか否かを調べる（Ｓ１１）。接続していないと判定した場合は記憶部１０５からフォントデータを取り出し、表示部１０７に「メモリカードを挿入してください。」等のようなメッセージを表示し（Ｓ１２）、ユーザに対して携帯端末装置１に記憶装置２の接続を行うように促す。接続していると判定した場合、システム制御部１０１は操作部１０６から再生の操作が行われるのを待つ（Ｓ１３）。再生の操作が行われると、システム制御部１０１は、記憶装置インタフェース部１０３に対して、記憶装置２内にある合成目的データを読み出すように指令を出す。記憶装置インタフェース部１０３は、記憶装置２内にある端末装置インタフェース部１２０とやり取りをしながら合成目的データを読み出し、携帯端末装置１内の記憶部１０５に記録する（Ｓ１４）。
【００３３】
次に、システム制御部１０１は、音声合成処理部１０２に指令を出し、処理を開始させる。音声合成処理部１０２は、記憶部１０５から合成目的データを順次読み出しながら解析を行い、各文字データに最も適する音素データを記憶部１０５または記憶装置２から読み出して、繋ぎ合わせ、音声出力処理部１０４が処理できるデータに変換を行って、そのデータを記憶部１０５に記憶する（Ｓ１５）。続いて、システム制御部１０１は、ステップＳ１５で処理したデータを記憶部１０５から読み出し、データを音声出力処理部１０４に渡す。そして、識別音合成処理部１１１に指令を出し、処理を開始させる。識別音合成処理部１１１は、音声出力処理部１０４より出力される信号（合成音声信号）に可聴範囲外の識別音信号を合成し（Ｓ１６）、スピーカ１０８またはヘッドフォンに出力する（Ｓ１７）。
【００３４】
このようにして合成音声信号に可聴範囲外の識別音信号を合成させることにより、実際の人物の肉声には無い周波数成分を含めることができるので、音声認証装置において、この装置１の出力音声を使用不可能とすることができ、不正な使用を防ぐことができる。
【００３５】
以上のように本実施の形態によれば、識別音合成処理部１１１は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号に含めることにより、携帯端末装置１からは音声合成の出力音声であることを示す可聴範囲外の識別音を含んだ合成音声が出力されるので、仮に出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、合成音声の不正な使用を防止することができる。
【００３６】
（実施の形態３）
本発明の実施の形態３による記憶装置付き携帯端末装置を有する通信システムの構成は実施の形態１と同様、図１の構成である。また、本発明の実施の形態３による記憶装置付き携帯端末装置の構成は実施の形態２と同様、図４の構成である。
【００３７】
このように構成された記憶装置付き携帯端末装置について、その動作を図６を用いて説明する。図６は、図４の記憶装置付き携帯端末装置の動作（システム制御部１０１の動作）を示すフローチャートである。なお、図６のステップＳ２１〜Ｓ２４は図５のステップＳ１１〜Ｓ１４と同様であるので、その説明は省略する。
【００３８】
図６において、システム制御部１０１は、音声合成処理部１０２に指令を出し、処理を開始させる。音声合成処理部１０２は、記憶部１０５から合成目的データを順次読み出しながら解析を行う（Ｓ２５）。そして、読み出した合成目的データの内容が識別符号（識別符号は、ここではテキストデータを意味する合成目的データの任意の場所に手作業により入力される）か否かを判定する（Ｓ２６）。合成目的データの内容が識別符号でないと判定した場合、各文字データに最も適する音素データを記憶部１０５または記憶装置２から読み出し抽出する（Ｓ２７）。識別符号と判定した場合、この処理はない。抽出されたデータ及び識別符号は次々に繋ぎ合わせられ、記憶部１０５に保存される（Ｓ２８）。
【００３９】
次に、システム制御部１０１は、記憶部１０５からステップＳ２８で処理したデータ（合成音声データ）を読み出す。そして、読み出したデータの内容が識別符号か否かをを判定する（Ｓ２９）。識別符号でないと判定した場合、データを音声出力処理部１０４に渡す（Ｓ３０）。音声出力処理部１０４は、受け取ったデータをフォーマット変換し、スピーカ１０８またはヘッドフォンに出力する（Ｓ３３）。識別符号であると判定した場合、システム制御部１０１は、識別音合成処理部１１１をアクティブにする（Ｓ３１）。識別音合成処理部１１１は、音声出力処理部１０４より出力される合成音声信号に可聴範囲外の識別音信号を合成し（Ｓ３２）、スピーカ１０８またはヘッドフォンに出力させる（Ｓ３３）。
【００４０】
このようにして合成音声信号に可聴範囲外の識別音信号を合成させることにより、実際の人物の肉声には無い周波数成分を含めることができるので、音声認証装置において、この装置１の出力音声を使用不可能とすることができ、不正な使用を防ぐことができる。
【００４１】
以上のように本実施の形態によれば、識別音合成処理部１１１は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号のすべてに渡り出力するのではなく、合成目的データ中にある識別符号をトリガとして合成音声信号に含めることにより、仮に携帯端末装置１の出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、識別符号のある合成音声信号の周波数解析のみを行うだけで、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、より少ない信号の周波数解析により、合成音声の不正な使用を防止することができる。
【００４２】
【発明の効果】
以上説明したように本発明の請求項１に記載の記憶装置付き携帯携帯端末装置によれば、携帯端末装置と、携帯端末装置に対して着脱可能な記憶装置とを備えた記憶装置付き携帯端末装置であって、記憶装置は、音素データをデータベース化した音素データベースを保存する音素データベース保存部と、朗読対象となる合成目的データを保存する合成目的データ保存部と、携帯端末装置とデータの授受を行う端末装置インタフェース部とを有し、携帯端末装置は、記憶装置とデータの授受を行う記憶装置インタフェース部と、全体を制御するシステム制御部と、合成目的データの範囲を選択する朗読選択処理部と、記憶装置インタフェース部と端末装置インタフェース部を介して音素データベース保存部から読み出した音素データベースおよび選択した範囲に基づいて合成目的データ保存部から読み出した合成目的データから合成音声データを生成する音声合成処理部と、生成した合成音声データを入力して合成音声信号として出力する音声出力処理部とを有することにより、ユーザーは朗読対象情報としての合成目的データの範囲を選択して自分に不要な部分の朗読を省く事ができるので、時間の無駄なく効率的に自分が所望する小説等の朗読を聴くことができるという有利な効果が得られる。
【００４３】
請求項２に記載の記憶装置付き携帯端末装置によれば、請求項１に記載の記憶装置付き携帯端末装置において、朗読選択処理部に代えて識別音合成処理部を備え、識別音合成処理部は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号に含めることにより、携帯端末装置からは音声合成の出力音声であることを示す可聴範囲外の識別音を含んだ合成音声が出力されるので、仮に出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、合成音声の不正な使用を防止することができるという有利な効果が得られる。
【００４４】
請求項３に記載の記憶装置付き携帯端末装置によれば、請求項１に記載の記憶装置付き携帯端末装置朗読選択処理部に代えて識別音合成処理部を備え、識別音合成処理部は、合成音声信号であること示す可聴範囲外の識別音信号を合成音声信号のすべてに渡り出力するのではなく、合成目的データ中にある識別符号をトリガとして合成音声信号に含めることにより、仮に携帯端末装置の出力音声が不正な目的で使用されても、音声認証装置に音声の周波数解析機能を備えることにより、識別符号のある合成音声信号の周波数解析のみを行うだけで、実際の人物の肉声には無い識別音の周波数成分が含まれていることを検知することができるので、より少ない信号の周波数解析により、合成音声の不正な使用を防止することができるという有利な効果が得られる。
【図面の簡単な説明】
【図１】携帯端末装置、サーバ装置、パソコン等を有する通信システムを示す構成図
【図２】本発明の実施の形態１による記憶装置付き携帯端末装置を示すブロック図
【図３】図２の記憶装置付き携帯端末装置の動作を示すフローチャート
【図４】本発明の実施の形態２、３による記憶装置付き携帯端末装置を示すブロック図
【図５】図４の記憶装置付き携帯端末装置の動作を示すフローチャート
【図６】図４の記憶装置付き携帯端末装置の動作を示すフローチャート
【符号の説明】
１　携帯端末装置
２　記録装置
３　サーバ装置
４　パソコン
５　インターネット
６　公衆回線
１０１　システム制御部
１０２　音声合成処理部
１０３　記憶装置インタフェース部
１０４　音声出力処理部
１０５　記憶部
１０６　操作部
１０７　表示部
１０８　スピーカ
１０９　電源部
１１０　朗読選択処理部
１１１　識別音合成処理部
１２０　端末装置インタフェース部
１２１　音素データベース保存部
１２２　合成目的データ保存部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a portable terminal device that stores synthesis target data to be read in a storage device and converts the synthesis target data read from the storage device into voice.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, an apparatus for reading out e-mails and texts of a word processor has been realized by a personal computer because of its abundant storage capacity, high processing capability, and ample network functions. In addition, although the voice color of the output synthesized voice is a general voice such as a man or a woman, a voice synthesis method capable of generating a synthesized voice similar to human utterance has also been disclosed. That is, information on the reading kana, accent type, etc. is prepared in the dictionary, and an accent command value and / or phoneme duration time information are prepared in advance, and the parameter sequence of the phoneme segment data is determined using the phoneme duration time. By generating and synthesizing a speech waveform based on the generated speech, a synthesized speech similar to a human utterance is output (for example, see Patent Document 1).
[0003]
However, using a personal computer while walking is very inconvenient due to size and weight problems, and its operation is not easy. In addition, if the function only converts the speech data of the synthesis target data to be read, there is a problem that the cost performance is lacking. Also, the output voice is a general voice such as a male or female voice, and is not necessarily a voice output in a voice tone desired by the user.
[0004]
Therefore, as an apparatus for correcting these inconveniences, an input unit for inputting synthesis target data (an input unit for inputting from a storage device such as a memory card or an optical disk or a magnetic disk, an input unit for inputting from a network, and an input unit for inputting from a keyboard). A mobile terminal device that includes an input unit and the like, an output unit (speaker and headphones) that outputs a synthesized voice, and a voice synthesis processing unit that generates synthesized voice data from synthesis target data. The voice synthesis processing unit in the portable terminal device samples a real voice of an actual person and uses a phoneme database in which the sampled data is converted into a database. In addition, it is easy to attach to and detach from an information processing device, can be built into a small information processing device (small personal computer, etc.), and is a small, lightweight and portable IC card type text-to-speech system with a text-to-speech function. Is known. (For example, see Patent Document 2)
[0005]
[Patent Document 1]
JP-A-7-140999
[Patent Document 2]
JP-A-6-337774
[0006]
[Problems to be solved by the invention]
However, in the above-mentioned portable terminal device, supplementary information such as the creation date of the synthesis purpose data and the writer's career may be included, and these are also read aloud, but for the user, the reading of the information is not necessarily required. There was a problem that it was not necessary. Also, since the synthesized speech output from the mobile terminal device is close to the real voice of the actual person who is the provider of the phoneme database, there is a risk that the user of this device will impersonate the phoneme provider and perform unauthorized use in voice authentication and the like. there were.
[0007]
In the portable terminal device with a storage device, it is required to prevent reading of unnecessary information and prevent unauthorized use of synthesized speech.
[0008]
An object of the present invention is to provide a portable terminal device with a storage device capable of preventing unnecessary reading of information and preventing unauthorized use of synthesized speech in order to satisfy this demand. .
[0009]
[Means for Solving the Problems]
In order to solve the above problem, a portable terminal device with a storage device according to the present invention is a portable terminal device with a storage device, comprising a portable terminal device and a storage device that is detachable from the portable terminal device. A phoneme database storage unit that stores a phoneme database that is a database of phoneme data, a synthesis target data storage unit that stores synthesis target data to be read, and a terminal device interface unit that exchanges data with the portable terminal device. A portable terminal device, a storage device interface unit that exchanges data with the storage device, a system control unit that controls the whole, a reading selection processing unit that selects a range of synthesis target data, and a storage device interface unit And the phoneme database read from the phoneme database storage unit via the terminal device interface unit and the selected range And a speech output processing unit that inputs the generated synthesized speech data and outputs the synthesized speech data as a synthesized speech signal. ing.
[0010]
As a result, a portable terminal device with a storage device that can prevent unnecessary reading of information can be obtained.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
A portable terminal device with a storage device according to claim 1 of the present invention is a portable terminal device with a storage device comprising a portable terminal device and a storage device detachable from the portable terminal device. A phoneme database storage unit that stores a phoneme database that is a database of phoneme data, a synthesis target data storage unit that stores synthesis target data to be read, and a terminal device interface unit that exchanges data with the portable terminal device. A portable terminal device, a storage device interface unit that exchanges data with the storage device, a system control unit that controls the whole, a reading selection processing unit that selects a range of synthesis target data, and a storage device interface unit. Compositing based on phoneme database and selected range read from phoneme database storage via terminal device interface And a voice output processing unit that receives the generated synthesized voice data and outputs the synthesized voice data as a synthesized voice signal. is there.
[0012]
With this configuration, the user can select the range of the synthesis target data as the reading target information and omit the reading of the unnecessary part, thereby efficiently reading the novel or the like desired without wasting time. It has the effect of being able to listen.
[0013]
The portable terminal device with a storage device according to claim 2 is the portable terminal device with a storage device according to claim 1, further including an identification sound synthesis processing unit instead of the reading selection processing unit, and the identification sound synthesis processing unit includes: An identification sound signal outside the audible range indicating that the signal is a synthesized sound signal is included in the synthesized sound signal.
[0014]
With this configuration, a synthesized speech including an identification sound outside the audible range indicating that the speech is an output speech of the speech synthesis is output from the portable terminal device. By providing the authentication device with a voice frequency analysis function, it is possible to detect that a frequency component of an identification sound that is not included in the actual voice of a real person is included, thereby preventing unauthorized use of synthesized voice. It has the effect of being able to.
[0015]
According to a third aspect of the present invention, a portable terminal device with a storage device includes an identification sound synthesis processing unit in place of the reading / selection processing unit of the portable terminal device with a storage device according to the first aspect, and the identification sound synthesis processing unit includes a synthetic voice. Instead of outputting an identification sound signal outside the audible range, which indicates that the signal is a signal, over all of the synthesized speech signals, an identification code in the synthesis target data is included as a trigger in the synthesized speech signal.
[0016]
With this configuration, even if the output voice of the portable terminal device is used for an unauthorized purpose, the voice authentication device is provided with a voice frequency analysis function, so that only the frequency analysis of the synthesized voice signal having the identification code is performed. Since it is possible to detect that the frequency component of the discriminating sound that is not included in the actual human voice is included, it is possible to prevent the illegal use of the synthesized voice by analyzing the frequency of a smaller number of signals. Having.
[0017]
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 6.
[0018]
(Embodiment 1)
FIG. 1 is a configuration diagram illustrating a communication system including a mobile terminal device, a server device, a personal computer, and the like.
[0019]
In FIG. 1, reference numeral 1 denotes a portable terminal device having a display unit, an operation unit, and an audio output unit such as a headphone / speaker, and 2 denotes a storage device such as a memory card for storing synthesis target data and a phoneme database. The storage device 2 is detachable from the mobile terminal device 1. The storage device 2 includes a synthesis target data storage unit (described later) that stores synthesis target data such as a novel, and a phoneme database storage unit that stores a phoneme database including phoneme data. The phoneme database, which has already been described above, is a sample of the actual voice of a real person, and information on sounds such as sound duration, fundamental frequency, and power for each phoneme, and the data file name and name of the phoneme to which the phoneme belongs. Information such as the start position and end position of the phoneme in the file is labeled, and is stored in a database according to an arbitrary format. This phoneme database is an important element for determining the timbre and intonation of the synthesized speech output from the mobile terminal device 1. Reference numeral 3 denotes a server device on the Internet 5 for providing the synthesis target data a and the phoneme database b. Although only one server device is described here, a plurality of server devices may separately provide the synthesis target data and the phoneme database. Reference numeral 4 denotes a personal computer for downloading the synthesis target data a and the phoneme database b from a server device on the Internet 5 via the public line 6 and storing the data in the storage device 2.
[0020]
In FIG. 1, a user first downloads a synthesis target data such as a novel or a phoneme database of voice characters from a server device 3 on the Internet 5 through a personal computer 4 and stores the data through a personal computer 4 in a memory card or the like. Record on device 2. Next, the user inserts the storage device 2 into the portable terminal device 1, selects a synthesis target portion to be read on the portable terminal device 1, and performs a reproduction operation. A synthesized voice of the data in the synthesis target range selected by voice is output.
[0021]
FIG. 2 is a block diagram showing a portable terminal device with a storage device according to the first embodiment of the present invention, and shows the portable terminal device 1 and the storage device 2 of FIG. 1 in detail.
[0022]
2, 1 is a portable terminal device similar to that of FIG. 1, 2 is a storage device similar to that of FIG. 1, 101 is a system that exchanges data with each processing unit in the portable terminal device 1 and controls the entire device. It is a control unit. Reference numeral 102 denotes a voice for analyzing the synthesis target data, extracting and concatenating phoneme data optimal for each character data to generate synthesized voice data, and converting the data so that the data can be passed to a voice output processing unit 103 described later. A synthesis processing unit 103 is a storage device interface unit (storage device I / F unit) that receives an instruction from the system control unit 101 and reads and writes data in the storage device 2. An audio output processing unit 104 receives data from the audio synthesis processing unit 102, converts the format of the data, and outputs the data to a speaker 108 or headphones (not shown). Reference numeral 105 denotes a storage unit used as a storage area of a program for controlling the mobile terminal device 1 or a work area for data processing; 106, an operation unit by which a user gives his / her own instructions to the mobile terminal device 1; A display unit for displaying the operating state of the portable terminal device 1 to the user; 109, a power supply unit for supplying power to the mobile terminal device 1; 110, a synthesizing target based on the synthesis target data range selection information selected by the user from the operation unit 106; A recitation / selection processing unit that performs data configuration analysis and extracts synthesis target data of a selection range. A storage device interface unit 103 and a terminal device interface unit (terminal device I / F) that exchanges data with the portable terminal device 1. Unit), 121 is a phoneme database storage unit for storing a phoneme database, and 122 is a synthesis target data storage for storing synthesis target data. It is a part.
[0023]
The operation of the portable terminal device with a storage device configured as described above is performed using FIG. FIG. 3 is a flowchart showing the operation of the portable terminal device with a storage device of FIG. 2 (the operation of the system control unit 101).
[0024]
When the user turns on the power of the device 1 using the operation unit 106, the system control unit 101 issues a command to the storage device interface unit 103 to check whether or not the storage device 2 is connected to the portable terminal device 1 ( S1). If it is determined that the connection is not established, the font data is extracted from the storage unit 105, a message such as "Please insert a memory card." Is displayed on the display unit 107 (S2), and the portable terminal is displayed to the user. It prompts the device 1 to connect the storage device 2. If it is determined that the connection is established, the system control unit 101 waits for a reproduction operation from the operation unit 106 (S3). When the playback operation is performed, a message such as “Please select the data to be read aloud. 1. Full text (with accompanying information) 2. Only text (excluding accompanying information)” is displayed on the display unit ( S4) The user is caused to select a reading target portion of the synthesis purpose data (S5). After the selection, the range selection information of the synthesis target data is stored in the storage unit 105. Next, the system control unit 101 instructs the storage device interface unit 103 to read out the synthesis target data in the storage device 2. The storage device interface unit 103 reads out the synthesis target data while exchanging with the terminal device interface unit 120 in the storage device 2 and records it in the storage unit 105 in the portable terminal device 1 (S6).
[0025]
Next, the system control unit 101 causes the reading selection processing unit 110 to start processing. The read-aloud selection processing unit 110 reads out the synthesis target data from the storage unit 105, analyzes the configuration of the data, optimizes the data only for the part selected by the user, and records the data in the storage unit 105 (S7). For example, if the selection information from the user is only the text, the additional information such as the creator and the date of creation of the synthesis target data is omitted, and the processed synthesis target data is stored in the storage unit 105 again. . Next, the system control unit 101 issues a command to the speech synthesis processing unit 102 to start the processing. The voice synthesis processing unit 102 performs analysis while sequentially reading out the synthesis target data processed by the reading selection processing unit 110 from the storage unit 105, and reads out the phoneme data most suitable for each character data from the storage unit 105 or the storage device 2, The combined data is converted into data that can be processed by the voice output processing unit 104 to create synthesized voice data (S8). The voice output processing unit 104 receives the synthesized voice data from the voice synthesis processing unit 102, converts the format of the data, and outputs the data to the speaker 108 or headphones (S9).
[0026]
In this way, the user can listen to the reading of the synthesis target data in the selected range with the desired character voice.
[0027]
As described above, according to the present embodiment, the storage device 2 includes a phoneme database storage unit 121 that stores a phoneme database in which phoneme data is made into a database, and a synthesis target data storage unit that stores synthesis target data to be read. 122, and a terminal device interface unit 120 for exchanging data with the portable terminal device 1. The portable terminal device 1 includes a storage device interface unit 103 for exchanging data with the storage device 2, and a system for controlling the whole. The control unit 101, a reading selection processing unit 110 for selecting a range of synthesis target data, a phoneme database read from the phoneme database storage unit 121 via the storage device interface unit 103 and the terminal device interface unit 120, and the selected range. From the synthesis target data read from the synthesis target data storage unit 122 By having the voice synthesis processing unit 102 that generates synthesized voice data and the voice output processing unit 104 that inputs the generated synthesized voice data and outputs the synthesized voice data as a synthesized voice signal, the user can obtain the synthesis target data as the reading target information. Since it is possible to select a range and omit reading of unnecessary portions, the user can efficiently listen to reading of a novel or the like desired without wasting time.
[0028]
(Embodiment 2)
The configuration of the communication system having the portable terminal device with a storage device according to the second embodiment of the present invention is the configuration of FIG. 1 as in the first embodiment.
[0029]
FIG. 4 is a block diagram showing a portable terminal device with a storage device according to the second embodiment of the present invention, and shows the portable terminal device 1 and the storage device 2 of FIG. 1 in detail.
[0030]
4, a system control unit 101, a speech synthesis processing unit 102, a storage device interface unit 103, a speech output processing unit 104, a storage unit 105, an operation unit 106, a display unit 107, a speaker 108, a power supply unit 109, and a terminal device interface unit The reference numeral 120, the phoneme database storage unit 121, and the synthesis target data storage unit 122 are the same as those in FIG. Reference numeral 111 denotes an identification sound synthesis processing unit that synthesizes the identification sound signal with the synthesized audio signal output from the audio output processing unit 104.
[0031]
The operation of the portable terminal device with a storage device thus configured will be described with reference to FIG. FIG. 5 is a flowchart illustrating the operation of the portable terminal device with a storage device in FIG. 4 (the operation of the system control unit 101).
[0032]
When the user turns on the power of the device 1 using the operation unit 106, the system control unit 101 issues a command to the storage device interface unit 103 and checks whether the storage device 2 is connected to the portable terminal device 1 (S11). ). If it is determined that the connection is not established, the font data is extracted from the storage unit 105, a message such as "Please insert a memory card." Is displayed on the display unit 107 (S12), and the portable terminal is displayed to the user. It prompts the device 1 to connect the storage device 2. If it is determined that the connection is established, the system control unit 101 waits for a reproduction operation from the operation unit 106 (S13). When a reproduction operation is performed, the system control unit 101 issues a command to the storage device interface unit 103 to read out the synthesis target data in the storage device 2. The storage device interface unit 103 reads out the synthesis target data while exchanging with the terminal device interface unit 120 in the storage device 2 and records it in the storage unit 105 in the portable terminal device 1 (S14).
[0033]
Next, the system control unit 101 issues a command to the speech synthesis processing unit 102 to start the processing. The speech synthesis processing unit 102 performs analysis while sequentially reading out the synthesis target data from the storage unit 105, reads out the phoneme data most suitable for each character data from the storage unit 105 or the storage device 2, connects them, and connects them. Is converted into data that can be processed, and the data is stored in the storage unit 105 (S15). Subsequently, the system control unit 101 reads the data processed in step S15 from the storage unit 105, and passes the data to the audio output processing unit 104. Then, a command is issued to the identification sound synthesis processing unit 111 to start the processing. The identification sound synthesis processing unit 111 synthesizes an identification sound signal outside the audible range with the signal (synthesized audio signal) output from the audio output processing unit 104 (S16), and outputs the synthesized signal to the speaker 108 or headphones (S17).
[0034]
By combining the synthesized voice signal with the discrimination sound signal outside the audible range in this way, it is possible to include a frequency component that is not present in the actual voice of a real person. It can be disabled and unauthorized use can be prevented.
[0035]
As described above, according to the present embodiment, identification sound synthesis processing section 111 includes an identification sound signal outside the audible range, which is a synthesized sound signal, in the synthesized sound signal, so that portable terminal device 1 can output a sound. Since a synthesized voice including an identification sound outside the audible range indicating that it is a synthesized output voice is output, even if the output voice is used for an unauthorized purpose, the voice authentication device is provided with a voice frequency analysis function. Thus, it is possible to detect that a frequency component of the discrimination sound which is not included in the actual voice of the actual person is included, so that it is possible to prevent the illegal use of the synthesized voice.
[0036]
(Embodiment 3)
The configuration of the communication system having the portable terminal device with a storage device according to the third embodiment of the present invention is the configuration of FIG. 1 as in the first embodiment. The configuration of the portable terminal device with a storage device according to the third embodiment of the present invention is the same as that of the second embodiment shown in FIG.
[0037]
The operation of the portable terminal device with a storage device thus configured will be described with reference to FIG. FIG. 6 is a flowchart illustrating the operation of the portable terminal device with a storage device in FIG. 4 (the operation of the system control unit 101). Steps S21 to S24 in FIG. 6 are the same as steps S11 to S14 in FIG. 5, and a description thereof will be omitted.
[0038]
6, the system control unit 101 issues a command to the speech synthesis processing unit 102 to start the processing. The voice synthesis processing unit 102 performs analysis while sequentially reading out the synthesis target data from the storage unit 105 (S25). Then, it is determined whether or not the content of the read synthesis target data is an identification code (the identification code is manually input to an arbitrary place of the synthesis target data meaning text data in this case) (S26). When it is determined that the content of the synthesis target data is not the identification code, the phoneme data most suitable for each character data is read out from the storage unit 105 or the storage device 2 and extracted (S27). This processing is not performed if the identification code is determined. The extracted data and the identification code are connected one after another and stored in the storage unit 105 (S28).
[0039]
Next, the system control unit 101 reads the data (synthesized voice data) processed in step S28 from the storage unit 105. Then, it is determined whether or not the content of the read data is an identification code (S29). If it is determined that it is not the identification code, the data is passed to the audio output processing unit 104 (S30). The audio output processing unit 104 converts the format of the received data and outputs the data to the speaker 108 or headphones (S33). If it is determined that the identification code is the identification code, the system control unit 101 activates the identification sound synthesis processing unit 111 (S31). The identification sound synthesis processing unit 111 synthesizes an identification sound signal outside the audible range with the synthesized sound signal output from the sound output processing unit 104 (S32), and outputs the synthesized sound signal to the speaker 108 or headphones (S33).
[0040]
By combining the synthesized voice signal with the discrimination sound signal outside the audible range in this way, it is possible to include a frequency component that is not present in the actual voice of a real person. It can be disabled and unauthorized use can be prevented.
[0041]
As described above, according to the present embodiment, the identification sound synthesis processing unit 111 does not output the identification sound signal outside the audible range indicating that it is a synthesized audio signal over all of the synthesized audio signals, By including the identification code in the data as a trigger in the synthesized voice signal, even if the output voice of the portable terminal device 1 is used for an unauthorized purpose, the voice authentication device is provided with a voice frequency analysis function, thereby enabling the identification. By performing only the frequency analysis of the synthesized speech signal with a sign, it is possible to detect that the frequency component of the discriminating sound that is not included in the actual human voice is included. Unauthorized use of the synthesized speech can be prevented.
[0042]
【The invention's effect】
As described above, according to the portable terminal device with a storage device according to claim 1 of the present invention, the portable terminal with a storage device including the portable terminal device and a storage device that is detachable from the portable terminal device A storage device, a phoneme database storage unit for storing a phoneme database in which phoneme data is made into a database, a synthesis target data storage unit for storing synthesis target data to be read, and data transfer with the portable terminal device The portable terminal device has a storage device interface unit that exchanges data with the storage device, a system control unit that controls the whole, and a reading selection process that selects a range of synthesis target data. And a phoneme database and a phoneme database read from the phoneme database storage unit via the storage device interface unit and the terminal device interface unit. A voice synthesis processing unit that generates synthesized voice data from the synthesis target data read from the synthesis target data storage unit based on the selected range, and a voice output processing unit that inputs the generated synthesized voice data and outputs it as a synthesized voice signal , The user can select a range of the synthesis target data as the reading target information and omit the reading of the unnecessary part, so that the reading of novels or the like desired by himself can be efficiently performed without wasting time. Has the advantageous effect of being able to listen to music.
[0043]
According to the portable terminal device with a storage device according to claim 2, the portable terminal device with a storage device according to claim 1, further comprising an identification sound synthesis processing unit instead of the reading selection processing unit, and the identification sound synthesis processing unit. By including an identification sound signal outside the audible range indicating that the speech is a synthesized speech signal in the synthesized speech signal, the synthesized speech including the identification sound outside the audible range indicating the output speech of the speech synthesis is output from the portable terminal device. Since the voice is output, even if the output voice is used for an unauthorized purpose, the voice authentication device is provided with a voice frequency analysis function, so that a frequency component of an identification sound that is not present in the real human voice is included. Is detected, it is possible to obtain an advantageous effect that illegal use of synthesized speech can be prevented.
[0044]
According to the portable terminal device with a storage device according to claim 3, the portable terminal device with a storage device according to claim 1 is provided with an identification sound synthesis processing unit instead of the reading selection processing unit. Rather than outputting an identification sound signal outside the audible range indicating that it is a synthesized audio signal over all of the synthesized audio signals, the identification code in the synthesis target data is included in the synthesized audio signal as a trigger, so that the mobile terminal Even if the output voice of the device is used for an unauthorized purpose, the voice authentication device is equipped with a frequency analysis function of the voice, so that only the frequency analysis of the synthesized voice signal with the identification code is performed, Since it is possible to detect that the frequency component of the discriminating sound that does not exist is included, it is possible to prevent the illegal use of the synthesized voice by analyzing the frequency of a smaller number of signals. Effect can be obtained.
[Brief description of the drawings]
FIG. 1 is a configuration diagram illustrating a communication system including a mobile terminal device, a server device, a personal computer, and the like.
FIG. 2 is a block diagram showing a portable terminal device with a storage device according to the first embodiment of the present invention;
FIG. 3 is a flowchart showing the operation of the portable terminal device with a storage device of FIG. 2;
FIG. 4 is a block diagram showing a portable terminal device with a storage device according to the second and third embodiments of the present invention;
FIG. 5 is a flowchart showing the operation of the portable terminal device with a storage device of FIG. 4;
6 is a flowchart showing the operation of the portable terminal device with a storage device of FIG. 4;
[Explanation of symbols]
1 Mobile terminal device
2 Recording device
3 server device
4 PC
5 Internet
6 Public line
101 System control unit
102 Voice synthesis processing unit
103 Storage device interface unit
104 audio output processing unit
105 storage unit
106 Operation unit
107 Display
108 speaker
109 power supply
110 Reading selection unit
111 identification sound synthesis processing unit
120 Terminal device interface
121 Phoneme Database Storage
122 Synthetic Object Data Storage Unit

Claims

A portable terminal device with a storage device including a portable terminal device and a storage device that is detachable from the portable terminal device,
The storage device is a phoneme database storage unit that stores a phoneme database that is a database of phoneme data, a synthesis target data storage unit that stores synthesis target data to be read, and a terminal that exchanges data with the portable terminal device. Device interface unit,
The portable terminal device includes a storage device interface unit that exchanges data with the storage device, a system control unit that controls the whole, a reading selection processing unit that selects a range of the synthesis target data, and the storage device interface unit. And a speech synthesis processing unit that generates synthesized speech data from the synthesis target data read from the synthesis target data storage unit based on the phoneme database read from the phoneme database storage unit and the selected range via the terminal device interface unit. And a voice output processing unit for inputting the generated synthesized voice data and outputting it as a synthesized voice signal.

An identification sound synthesis processing unit is provided in place of the reading selection processing unit, and the identification sound synthesis processing unit includes an identification sound signal out of an audible range indicating the synthesized sound signal in the synthesized sound signal. The mobile terminal device according to claim 1.

An identification sound synthesis processing unit is provided in place of the reading selection processing unit, and the identification sound synthesis processing unit outputs an identification sound signal outside the audible range indicating the synthesized voice signal over all of the synthesized voice signals. The portable terminal device according to claim 1, wherein an identification code in the synthesis target data is included in the synthesized speech signal as a trigger instead of the synthesis target data.