JP2004212646A

JP2004212646A - Voice data display output control device and voice data display output control processing program

Info

Publication number: JP2004212646A
Application number: JP2002382193A
Authority: JP
Inventors: Takashi Koshiro; 孝湖城
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2002-12-27
Filing date: 2002-12-27
Publication date: 2004-07-29

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice data display output control device for easily synchronizing and outputting the voice output classified by parts, text display and image display, when synchronously outputting the text data, the voice data and the image data, and to provide a voice data display output control processing program. <P>SOLUTION: In the voice data display output control device, the voice data are output by a voice data output means, and the text data divided into a plurality of kinds of parts by a text synchronizing display control means are displayed synchronously with the voice data, output from the voice data output means. When any part of the text data is designated by a part designation means, an image corresponding to the part is displayed, read out from among the images corresponding to each part stored with a part classified image storing means, when the text data are synchronously displayed by the text synchronization display control means, on the basis of the designated part. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声，テキスト，画像などのデータを同期出力するための音声表示出力制御装置および音声表示出力制御処理プログラムに関する。
【０００２】
【従来の技術】
従来、音楽，テキスト，画像などを同時並行して再生する技術としては、例えばＭＰＥＧ−３により情報圧縮された音声ファイルのフレーム毎に、当該各フレームに設けられた付加データエリアに対して、音声ファイルに同期再生すべきテキストファイルや画像ファイルの同期情報を埋め込んでおくことにより、例えばカラオケの場合では、カラオケ音声とそのイメージ画像および歌詞のテキストを同期再生するものがある。
【０００３】
また、音声に対する文字の時間的な対応情報が予め用意されていることを前提に、当該音声信号の特徴量を抽出し対応する文字と関連付けて表示する装置も考えられている。（例えば、特許文献１参照。）
【０００４】
【特許文献１】
特公平０６−０２５９０５号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、このように従来行われているＭＰＥＧファイルの付加データエリアを利用した複数種類のファイルの同期再生技術では、同期情報の埋め込みが主たるＭＰ３音声ファイルの各フレーム毎の付加データエリアに規定されるため、当該ＭＰ３音声ファイルを再生させない限り同期情報を取り出すことが出来ず、ＭＰ３ファイルの再生を軸としてしか他の種類のファイルの同期再生を行うことが出来ない。
【０００６】
このため、例えばＭＰ３音声ファイルにテキストファイルおよび画像ファイルの同期情報を埋め込んだ場合に、音声ファイルの再生を行わない期間にあっても無音声のデータとして音声再生処理を継続的に行っていないと同期対象テキストおよび画像の再生を行うことが出来ない問題がある。
【０００７】
例えば英会話テキストのように、登場人物に応じてテキスト内容がパート別となっていたり、画像上にパート別の人物が存在したりする場合に、所望のパートについてヒヤリング練習やリーディング練習を行う際は、該当パート毎にテキストの識別表示をしたり、対応人物画像の変更表示をしたりする方が望ましい。しかし、このようなＭＰ３音声ファィルによる英会話テキストを作成した場合、当該ＭＰ３音声ファィルの各パートの各フレーム毎に対応パートのテキスト，画像の同期情報を埋め込んでおく必要があるばかりか、リーディング練習用として特定のパートに無音声期間を設定しても、前述した通り、当該特定パートでの音声再生処理を継続して行わないと該当パートのテキスト，画像表示を行うことができない。
【０００８】
また、前記従来の特許文献１に記載の装置にあっても、音声信号に対応する文字が予め関連付けされて音声及び文字の同期再生が成されるため、あるパート別に選択的に音声出力及び文字を同期表示させたり、対応画像を同期表示させたりすることはできない。
【０００９】
一方、カラオケのモニタなど、音声出力と同期してパート別の文字列や人物画像を表示するものでは、当該文字列の各パートに応じた色分け表示や人物画像の切り替え表示などが行われるが、前述したＭＰ３ファイルのように、音声，文字列，画像それぞれ別々でそのもののデータを予め合わせて組み込んだ同期再生用のファイルを作成しなければならない。
【００１０】
本発明は、前記のような問題に鑑みてなされたもので、テキストデータと音声データと画像データとの同期出力に際し、パート別の音声出力、テキスト表示、画像表示を簡単に同期して出力することが可能になる音声表示出力制御装置および音声表示出力制御処理プログラムを提供することを目的とる。
【００１１】
【課題を解決するための手段】
すなわち、本発明の請求項１に係る音声表示出力制御装置では、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに基づいて前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した画像がパート別画像記憶手段により記憶された各パートに対応した画像の中から読み出されて表示される。
【００１２】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した画像を表示できることになる。
【００１３】
また、本発明の請求項２に係る音声表示出力制御装置では、前記請求項１に係る音声表示出力制御装置にあって、テキスト同期表示制御手段は、予め設定された経過時間に従い前記複数種類のパートに区分されたテキストデータを前記音声データ出力手段により出力される音声データに同期して表示させるための命令コードを記憶する命令コード記憶手段を有し、この命令コード記憶手段により記憶された命令コードに応じて、前記パート区分されたテキストデータが音声データに同期されて表示される。
【００１４】
これによれば、命令コード記憶手段により記憶された命令コードによる設定経過時間に応じた指示に従い、パート区分されたテキストデータおよび当該パートに対応した画像を音声データに同期させて表示できることになる。
【００１５】
また、本発明の請求項３に係る音声表示出力制御装置では、前記請求項２に係る音声表示出力制御装置にあって、パート指定手段は、前記命令コード記憶手段により記憶される命令コードと対応付けられて記憶されるパート指定命令コードであり、このパート指定命令コードにより、複数種類に区分されたテキストデータのパートが前記音声データ出力手段による音声データの出力に合わせて順次指定される。
【００１６】
これによれば、命令コード記憶手段により記憶された命令コードによる設定経過時間に応じた指示、および該命令コードに対応付けられたパート指定命令コードに従い、順次指定されるパートのテキストデータおよび当該パートに対応した画像を音声データに同期させて表示できることになる。
【００１７】
また、本発明の請求項４に係る音声表示出力制御装置では、前記請求項１乃至請求項３に係る音声表示出力制御装置にあって、パート別画像記憶手段には、複数種類の各パート別に当該各パートに対応した人物画像の口の動きを表現した画像が記憶される。
【００１８】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口の動きを表現した画像を表示できることになる。
【００１９】
また、本発明の請求項５に係る音声表示出力制御装置では、前記請求項１乃至請求項３の何れか１項に係る音声表示出力制御装置にあって、さらに、口の動きを表現した画像を記憶する口画像記憶手段が備えられ、パート別画像記憶手段には、複数種類の各パートに対応した人物画像と各人物画像の口の位置情報が記憶される。そして、パート指定手段により指定されたパートに対応してテキストデータをテキスト同期表示制御手段により同期表示させる際には、前記パート別画像記憶手段により記憶された指定のパートの人物画像にその口の位置情報に応じて前記口画像記憶手段により記憶された口の動きの画像が合成されて表示される。
【００２０】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口位置に対し口の動きを表現した画像を合成して表示できることになる。
【００２１】
また、本発明の請求項６に係る音声表示出力制御装置では、前記請求項５に係る音声表示出力制御装置にあって、さらに、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータを音声データ出力手段により出力される音声データに同期して表示させる際に、当該同期表示されるテキスト部分を検知するテキスト部分検知手段が備えられ、口画像記憶手段には、種々の文字の発音にそれぞれ対応して口の動きを表現した画像が記憶される。そして、パート指定手段により指定されたパートに対応してテキストデータを前記テキスト同期表示制御手段により同期表示させる際には、前記パート別画像記憶手段により記憶された指定のパートの人物画像にその口の位置情報に基づき、前記テキスト部分検知手段により検知される同期表示テキスト部分の発音に対応して前記口画像記憶手段により記憶された口の動きの画像が合成されて表示される。
【００２２】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口位置に対し、同期表示中のテキスト部分の発音に対応する口の動きの画像を合成して表示できることになる。
【００２３】
また、本発明の請求項７に係る音声表示出力制御装置では、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。また画像表示制御手段により前記複数種類の各パートに対応した人物画像が表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに対応して前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した口の画像が前記画像表示制御手段により表示された該当パートに対応した人物画像の口位置に合成されて表示される。
【００２４】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し該当パートに対応した口の画像を合成して表示できることになる。
【００２５】
また、本発明の請求項８に係る音声表示出力制御装置では、前記請求項７に係る音声表示出力制御装置にあって、さらに、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータを音声データ出力手段により出力される音声データに同期して表示させる際に、当該同期表示されるテキスト部分を検知するテキスト部分検知手段と、種々の文字の発音にそれぞれ対応して口の動きを表現した画像を記憶する口画像記憶手段が備えられる。そして、パート指定手段により指定されたパートに対応してテキストデータを前記テキスト同期表示制御手段により同期表示させる際には、画像表示制御手段により表示された該当パートに対応した人物画像の口位置に対して前記テキスト部分検知手段により検知される同期表示テキスト部分の発音に対応して前記口画像記憶手段により記憶された口の動きの画像が合成されて表示される。
【００２６】
これによれば、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し、同期表示中のテキスト部分の発音に対応する口の動きの画像を合成して表示できることになる。
【００２７】
さらに、本発明の請求項９に係る音声表示出力制御処理プログラムでは、当該プログラムを電子機器のコンピュータにインストールすることで、この電子機器のコンピュータにおいて、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに基づいて前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した画像がパート別画像記憶手段により記憶された各パートに対応した画像の中から読み出されて表示される。
【００２８】
これにより電子機器では、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した画像を表示できることになる。
【００２９】
さらに、本発明の請求項１０に係る音声表示出力制御処理プログラムでは、当該プログラムを電子機器のコンピュータにインストールすることで、この電子機器のコンピュータにおいて、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。また画像表示制御手段により前記複数種類の各パートに対応した人物画像が表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに対応して前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した口の画像が前記画像表示制御手段により表示された該当パートに対応した人物画像の口位置に合成されて表示される。
【００３０】
これにより電子機器では、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し該当パートに対応した口の画像を合成して表示できることになる。
【００３１】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について説明する。
【００３２】
図１は本発明の音声表示出力制御装置の実施形態に係る携帯機器１０の電子回路の構成を示すブロック図である。
【００３３】
この携帯機器（ＰＤＡ：ｐｅｒｓｏｎａｌｄｉｇｉｔａｌａｓｓｉｓｔａｎｔｓ）１０は、各種の記録媒体に記録されたプログラム、又は、通信伝送されたプログラムを読み込んで、その読み込んだプログラムによって動作が制御されるコンピュータによって構成され、その電子回路には、ＣＰＵ（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ）１１が備えられる。
【００３４】
ＣＰＵ１１は、メモリ１２内のＲＯＭ１２Ａに予め記憶されたＰＤＡ制御プログラム、あるいはＲＯＭカードなどの外部記録媒体１３から記録媒体読取部１４を介して前記メモリ１２に読み込まれたＰＤＡ制御プログラム、あるいはインターネットなどの通信ネットワークＮ上の他のコンピュータ端末（３０）から電送制御部１５を介して前記メモリ１２に読み込まれたＰＤＡ制御プログラムに応じて、回路各部の動作を制御するもので、前記メモリ１２に記憶されたＰＤＡ制御プログラムは、スイッチやキーからなる入力部１７ａおよびマウスやタブレットからなる座標入力装置１７ｂからのユーザ操作に応じた入力信号、あるいは電送制御部１５に受信される通信ネットワークＮ上の他のコンピュータ端末（３０）からの通信信号、あるいはＢｌｕｅｔｏｏｔｈ（Ｒ）による近距離無線接続や有線接続による通信部１６を介して受信される外部の通信機器（ＰＣ：ｐｅｒｓｏｎａｌｃｏｍｐｕｔｅｒ）２０からの通信信号に応じて起動される。
【００３５】
前記ＣＰＵ１１には、前記メモリ１２、記録媒体読取部１４、電送制御部１５、通信部１６、入力部１７ａ、座標入力装置１７ｂが接続される他に、ＬＣＤからなる表示部１８、マイクを備え音声を入力する音声入力部１９ａ、左右チャンネルのスピーカＬ，Ｒを備え音声を出力するステレオ音声出力部１９ｂなどが接続される。
【００３６】
また、ＣＰＵ１１には、処理時間計時用のタイマが内蔵される。
【００３７】
この携帯機器１０のメモリ１２は、ＲＯＭ１２Ａ、ＦＬＡＳＨメモリ（ＥＥＰ−ＲＯＭ）１２Ｂ、ＲＡＭ１２Ｃを備えて構成される。
【００３８】
ＲＯＭ１２Ａには、当該携帯機器１０の全体の動作を司るシステムプログラムや電送制御部１５を介して通信ネットワークＮ上の各コンピュータ端末（３０）とデータ通信するためのネット通信プログラム、通信部１６を介して外部の通信機器（ＰＣ）２０とデータ通信するための外部機器通信プログラムが記憶される他に、スケジュール管理プログラムやアドレス管理プログラム、そして音声・テキスト・画像などの各種のファイルを同期再生するための再生処理プログラム１２ａ１など、種々のＰＤＡ制御プログラムが記憶される。
【００３９】
また、ＲＯＭ１２Ａにはさらに、辞書データ１２ａ２および口型データ１２ａ３（図２参照）が記憶される。辞書データ１２ａ２ｓとしては、英和辞書、和英辞書、国語辞書など、各種の辞書のデータが記憶される。
【００４０】
図２は前記携帯機器１０のＲＯＭ１２Ａに記憶される口型データ１２ａ３の内容を示す図である。
【００４１】
この口型データ１２ａ３としては、例えば英語の各発音記号とその口型番号のそれぞれに対応付けられて、正面方向から見た口型画像と、これを拡大した横方向からの口型断面画像と、その小説明（コメント）データが記憶される。
【００４２】
ＦＬＡＳＨメモリ（ＥＥＰ−ＲＯＭ）１２Ｂには、前記再生処理プログラム１２ａ１に基づき再生処理の対象となる暗号化された再生用ファイル（ＣＡＳファイル）１２ｂが記憶される他に、前記スケジュール管理プログラムやアドレス管理プログラムに基づき管理されるユーザのスケジュール及び友人・知人のアドレスなどが記憶される。
【００４３】
ここで、前記ＦＬＡＳＨメモリ（ＥＥＰ−ＲＯＭ）１２Ｂ内に記憶される暗号化再生用ファイル１２ｂは、例えば英語や歌の練習をテキスト・音声・画像の同期再生により行うためのファイルであり、所定のアルゴリズムにより圧縮・暗号化されている。
【００４４】
この暗号化再生用ファイル１２ｂは、例えばＣＤ−ＲＯＭに記録されて配布されたり、通信ネットワーク（インターネット）Ｎ上のファイル配信サーバ３０から配信配布されたりするもので、前記ＣＤ−ＲＯＭあるいはネットサーバ（３０）により配布された暗号化再生用ファイル１２ｂは、例えばユーザ自宅ＰＣとしての通信機器（ＰＣ）２０に読み込まれた後、携帯機器（ＰＤＡ）１０の通信部１６を介してＦＬＡＳＨメモリ（ＥＥＰ−ＲＯＭ）１２Ｂに転送格納される。
【００４５】
ＲＡＭ１２Ｃには、前記暗号化された再生用ファイル１２ｂを伸張・復号化した解読された再生用ファイル（ＣＡＳファイル）１２ｃが記憶されると共に、この解読再生ファイル１２ｃの中の画像ファイルが展開されて記憶される画像展開バッファ１２ｋが備えられる。解読されたＣＡＳファイル１２ｃは、再生命令の処理単位時間（１２ｃ１ａ）を記憶するヘッダ情報（１２ｃ１）、および後述するファイルシーケンステーブル（１２ｃ２）、タイムコードファイル（１２ｃ３）、コンテンツ内容データ（１２ｃ４）で構成される。そして、ＲＡＭ１２Ｃには、この画像展開バッファ１２ｋに展開されて記憶された画像ファイルそれぞれの展開済の画像番号を示す画像展開済フラグ１２ｊが記憶される。
【００４６】
また、ＲＡＭ１２Ｃには、再生用ファイル１２ｂ（１２ｃ）の選択コンテンツにおける指定テキスト番号１２ｄ、指定されたテキスト番号に対応する英会話テキストに対しユーザにより指定された指定パート番号１２ｅ、指定されたテキスト番号に対応する英会話テキストの各英単語それぞれの全発音記号とその口型番号が前記辞書データ１２ａ２および口型データ１２ａ３から検索されて記憶されるテキスト対応発音口型データ１２ｆ（図７参照）、前記再生用ファイル１２ｂ（１２ｃ）基づき英会話テキストの各文字がその読み上げ音声と読み上げアニメ画像に同期付けられてハイライト識別表示された際に、現時点のハイライト識別文字としてリアルタイムに検知記憶されるテキストハイライト指定文字１２ｇ、そして前記再生用ファイル１２ｂ（１２ｃ）内の同期再生ファイルとして含まれる画像ファイルの同期再生する／しないを設定するための画像（オン／オフ）フラグ１２ｈ、前記再生用ファイル１２ｂ（１２ｃ）内の同期再生ファイルとして含まれる音声ファイルの同期再生する／しないを設定するための音声（オン／オフ）フラグ１２ｉが記憶される。
【００４７】
さらに、ＲＡＭ１２Ｃには、その他各種の処理に応じてＣＰＵ１１に入出力される種々のデータを一時記憶するためワークエリアが用意される。
【００４８】
図３は前記携帯機器１０のメモリ１２に格納された再生用ファイル１２ｂ（１２ｃ）を構成するタイムコードファイル１２ｃ３を示す図である。
【００４９】
図４は前記携帯機器１０のメモリ１２に格納された再生用ファイル１２ｂ（１２ｃ）を構成するファイルシーケンステーブル１２ｃ２を示す図である。
【００５０】
図５は前記携帯機器１０のメモリ１２に格納される再生用ファイル１２ｂ（１２ｃ）を構成するコンテンツ内容データ１２ｃ４を示す図である。
【００５１】
この携帯機器１０の再生対象ファイルとなる再生用ファイル１２ｂ（１２ｃ）は、図３〜図５で示すように、タイムコードファイル１２ｃ３とファイルシーケンステーブル１２ｃ２とコンテンツ内容データ１２ｃ４との組み合わせにより構成される。
【００５２】
図３で示すタイムコードファイル１２ｃ３には、個々のファイル毎に予め設定される一定時間間隔の元の（基準）処理単位時間（例えば２５ｍｓ）で各種ファイル同期再生のコマンド処理を行うためのタイムコードが記述配列されるもので、この各タイムコードは、命令を指示するコマンドコードと、当該コマンドに関わるファイル内容（図５参照）を対応付けするためのファイルシーケンステーブル１２ｃ２（図４）の参照番号や指定数値からなるパラメータデータとの組み合わせにより構成される。
【００５３】
なお、このタイムコードに従い順次コマンド処理を行うための一定時間間隔の元の（基準）処理単位時間１２ｃ１ａは、当該タイムコードファイル１２ｃ３のヘッダ情報１２ｃ１に記述設定される。
【００５４】
例えば図３で示すタイムコードファイル１２ｃ３からなる再生用ファイル１２ｂ（１２ｃ）のファイル再生時間は、予め設定された元の（基準）処理単位時間が２５ｍｓである場合、２４００ステップのタイムコードからなる再生処理を経て６０秒間となる。
【００５５】
図４で示すファイルシーケンステーブル１２ｃ２は、複数種類のファイル（ＨＴＭＬ／画像／テキスト／音声）の各種類毎に、前記タイムコードファイル１２ｃ３（図３参照）に記述される各コマンドのパラメータデータと実際のファイル内容の格納先（ＩＤ）番号とを対応付けたテーブルである。
【００５６】
図５で示すコンテンツ内容データ１２ｃ４は、前記ファイルシーケンステーブル１２ｃ２（図４参照）により前記各コマンドコードと対応付けされる実際の音声，画像，テキストなどのファイルデータが、そのそれぞれのＩＤ番号を対応付けて記憶される。
【００５７】
本実施形態のファイルシーケンステーブル１２ｃ２においてリンク付けされるコンテンツ内容データ１２ｃ４について簡単に説明すると、例えばＩＤ＝５に対応するデータ内容には、３人の会話パートによる英会話テキストの基本画像Ｇ０（図１５参照）が用意され、ＩＤ＝６に対応するデータ内容には、前記基本画像Ｇ０に基づいたパート１の会話中画像Ｇ１（図１２（Ａ）（Ｄ）参照）が用意され、ＩＤ＝７に対応するデータ内容には、前記基本画像Ｇ０に基づいたパート２の会話中画像Ｇ２（図１２（Ｂ）参照）が用意され、ＩＤ＝８に対応するデータ内容には、前記基本画像Ｇ０に基づいたパート３の会話中画像Ｇ３（図１２（Ｃ）参照）が用意される。
【００５８】
なお、前記コンテンツ内容データ１２ｃ４のＩＤ＝５に対応する英会話テキストの基本画像Ｇ０（図１５参照）には、各パート人物画像毎の口エリアの座標データｘ１ｙ１，ｘ２ｙ２（図１５のＭ１〜Ｍ３参照）が対応付けられて記憶される。
【００５９】
また、例えばＩＤ＝２１に対応するデータ内容には、前記３人の会話パートによる英会話テキストの会話中画像Ｇ１〜Ｇ３が同期付けされる英会話テキストデータ▲２▼（図１２・図１５参照）が用意され、さらに、例えばＩＤ＝３２に対応するデータ内容には、同３人の会話パートによる英会話テキストの会話中画像Ｇ１〜Ｇ３および英会話テキストデータ▲２▼が同期付けされる英会話音声データ▲２▼（図１２・図１５の１９ｂ参照）が用意される。
【００６０】
図６は前記携帯機器１０のタイムコードファイル１２ｃ３（図３参照）にて記述される各種コマンドのコマンドコードとそのパラメータデータおよび再生処理プログラム１２ａ１に基づき解析処理される命令内容を対応付けて示す図である。
【００６１】
タイムコードファイル１２ｃ３に使用されるコマンドとしては、標準コマンドと拡張コマンドがあり、標準コマンドには、ＬＴ（ｉ番目テキストロード）．ＶＤ（ｉ番目テキスト文節表示）．ＢＬ（文字カウンタリセット・ｉ番目文節ブロック指定）．ＨＮ（ハイライト無し・文字カウンタカウントアップ）．ＨＬ（ｉ番目文字までハイライト・文字カウント）．ＬＳ（１行スクロール・文字カウンタカウントアップ）．ＤＨ（ｉ番目ＨＴＭＬファイル表示）．ＤＩ（ｉ番目イメージファイル表示）．ＰＳ（ｉ番目サウンドファイルプレイ）．ＣＳ（クリアオールファイル）．ＰＰ（基本タイムｉ秒間停止）．ＦＮ（処理終了）．ＮＰ（無効）の各コマンドがあり、拡張コマンドには、ＰＴ（ｉ番目パート指定）．ＰＩ（パート別イメージファイル表示）などの各コマンドがある。
【００６２】
図７は前記携帯機器１０のＲＡＭ１２Ｃに記憶されるテキスト対応発音口型データ１２ｆの内容を示す図である。
【００６３】
すなわち、この携帯機器（ＰＤＡ）１０のＲＯＭ１２Ａに記憶されている再生処理プログラム１２ａ１を起動させて、ＦＬＡＳＨメモリ１２Ｂから解読されＲＡＭ１２Ｃに記憶された解読再生用ファイル１２ｃが、例えば図３乃至図５で示したファイル内容であり、設定処理単位時間毎のコマンド処理に伴い３番目のコマンドコード“ＤＩ”およびパラメータデータ“００”が読み込まれた場合には、このコマンド“ＤＩ”はｉ番目のイメージファイル表示命令であるため、パラメータデータｉ＝００からファイルシーケンステーブル１２ｃ２（図４参照）にリンク付けられる画像ファイルのＩＤ番号＝５に従い、コンテンツ内容データ１２ｃ４（図５参照）の英会話テキスト基本画像Ｇ０が読み出されて表示される。
【００６４】
また、設定処理単位時間毎のコマンド処理に伴い６番目のコマンドコード“ＶＤ”およびパラメータデータ“００”が読み込まれた場合には、このコマンド“ＶＤ”はｉ番目のテキスト文節表示命令であるため、パラメータデータｉ＝００に従い、テキストの０番目の文節が表示される。
【００６５】
また、設定処理単位時間毎のコマンド処理に伴い８番目のコマンドコード“ＰＩ”およびパラメータデータ“００”が読み込まれた場合には、このコマンド“ＰＩ”はパート別イメージファイル表示命令であるため、これ以降のコマンドコードＰＴ（ｉ番目パート指定）に従い指定されたパートの画像表示が指示される。
【００６６】
また、設定処理単位時間毎のコマンド処理に伴い９番目のコマンドコード“ＰＴ”およびパラメータデータ“０１”が読み込まれた場合には、このコマンド“ＰＴ”はｉ番目パート指定命令であるため、パラメータデータｉ＝０１に従い、再生対象ファイルにおける１番目のパートが指定される。
【００６７】
さらに、設定処理単位時間毎のコマンド処理に伴い１１番目のコマンドコード“ＮＰ”およびパラメータデータ“００”が読み込まれた場合には、このコマンド“ＮＰ”は無効命令であるため、現状のファイル出力状態が維持される。
【００６８】
なお、この図３乃至図５で示したファイル内容の再生用ファイル１２ｂ（１２ｃ）についての詳細な再生動作は、後述にて改めて説明する。
【００６９】
次に、前記構成の携帯機器１０によるファイル再生機能について説明する。
【００７０】
図８は前記携帯機器１０の再生処理プログラム１２ａ１に従った再生処理を示すフローチャートである。
【００７１】
図９は前記携帯機器１０の再生処理に伴う発音口型データ作成処理を示すフローチャートである。
【００７２】
図１０は前記携帯機器１０の再生処理に伴う対応パート別画像表示処理Ａを示すフローチャートである。
【００７３】
図１１は前記携帯機器１０の再生処理に伴う学習内容の選択操作・表示状態を示す図であり、同図（Ａ）は学習内容選択画面Ｇを示す図、同図（Ｂ）（Ｃ）は当該学習内容選択画面Ｇを対象とする選択操作キーを示す図である。
【００７４】
例えば英語の勉強がテキストと画像と音声で行える英語教材再生ファイルを、ＣＤ−ＲＯＭや通信ネットワーク（インターネット）Ｎ上のサーバ３０から自宅ＰＣである通信機器（ＰＣ）２０に取り込み、携帯機器（ＰＤＡ）１０の通信部１６を介して当該再生用ファイル（ＣＡＳファイル）１２ｂがＦＬＡＳＨメモリ１２Ｂに、あるいは解読された再生用ファイル１２ｃとしてＲＡＭ１２Ｃに格納された状態において、入力部１７ａあるいは座標入力装置（マウス／タブレット）１７ｂの操作によりこの再生用ファイル１２ｂ（１２ｃ）の再生が指示されると、まず、図１１（Ａ）に示すように、学習内容をユーザ選択させるための学習内容選択画面Ｇが表示部１８に表示される（ステップＳ１）。
【００７５】
この学習内容選択画面Ｇにあって、図１１（Ｂ）（Ｃ）に示すように、入力部１７ａのカーソルキー１７ａ１および「決定」キー１７ａ２の操作により、英会話の全体を聞く、またはパート別練習における各会話パート（１：２：３）の何れかが選択されると、ＲＡＭ１２Ｃ内の各ワークエリアのクリア処理などのイニシャライズ処理が行われ、前記選択された英会話パートのパート番号が指定パート番号１２ｅとしてＲＡＭ１２Ｃに記憶される（ステップＳ１，Ｓ２）。
【００７６】
そして、前記ＦＬＡＳＨメモリ１２Ｂに格納された再生用ファイル（ＣＡＳファイル）１２ｂが読み込まれ（ステップＳ３）、当該再生用ファイル（ＣＡＳファイル）１２ｂは暗号化ファイルであるか否か判断される（ステップＳ４）。
【００７７】
ここで、暗号化された再生用ファイル（ＣＡＳファイル）１２ｂであると判断された場合には、当該ＣＡＳファイル１２ｂは解読復号化され（ステップＳ４→Ｓ５）、ＲＡＭ１２Ｃに転送されて格納される（ステップＳ６）。
【００７８】
ここで、図９における発音口型データ作成処理が実行される（ステップＳＡ）。
【００７９】
この発音口型データ作成処理では、まず、コンテンツ内容データ１２ｃ４として記憶されている英会話テキストデータが、例えば図７に示すように、ＲＡＭ１２Ｃ内のテキスト対応発音口型データ１２ｆとして読み込まれる（ステップＡ１）。
【００８０】
すると、このテキスト対応発音口型データ１２ｆに読み込まれた英会話テキストデータのすべての単語が、ＲＯＭ１２Ａに記憶されている辞書データ１２ａ２に基づいて順次辞書引きされ、そのそれぞれの発音記号が読み出される（ステップＡ２）。
【００８１】
そして、前記英会話テキストデータの全ての単語について読み出された発音記号の各々に対応する口型番号データが、ＲＯＭ１２Ａに記憶されている口型データ１２ａ３（図２参照）から読み出され、各会話パート毎のテキスト単語、発音記号、口型番号として対応付けられて前記ＲＡＭ１２Ｃ内のテキスト対応発音口型データ１２ｆに記憶される。
【００８２】
こうして、発音口型データ作成処理が完了すると、ＣＰＵ１１による再生用ファイル（ＣＡＳファイル）１２ｃの処理単位時間１２ｃ１ａ（例えば２５ｍｓ）がタイムコードファイル１２ｃ３のヘッダ情報１２ｃ１として設定される（ステップＳ７）。
【００８３】
そして、ＲＡＭ１２Ｃに格納された解読済再生用ファイル（ＣＡＳファイル）１２ｃの先頭に読み出しポインタがセットされ（ステップＳ８）、当該再生用ファイル１２ｃの再生処理タイミングを計時するためのタイマがスタートされる（ステップＳ９）。
【００８４】
ここで、先読み処理が当該再生処理に並行して起動される（ステップＳ１０）。
【００８５】
この先読み処理では、再生用ファイル１２ｃのタイムコードファイル１２ｃ３（図３参照）に従った現在の読み出しポインタの位置のコマンド処理よりも後に画像ファイル表示の“ＤＩ”コマンドがある場合は、予め当該“ＤＩ”コマンドのパラメータデータにより指示される画像ファイルを先読みして画像展開バッファ１２ｋに展開しておくことで、前記読み出しポインタが実際に後の“ＤＩ”コマンドの位置まで移動した場合に、処理に遅れなく指定の画像ファイルを直ちに出力表示できるようにする。
【００８６】
前記ステップＳ９において、処理タイマがスタートされると、前記ステップＳ７にて設定された今回の再生対象ファイル１２ｃに応じた処理単位時間（２５ｍｓ）毎に、前記ステップＳ８にて設定された読み出しポインタの位置の当該再生用ファイル１２ｃを構成するタイムコードファイル１２ｃ３（図３参照）のコマンドコードおよびそのパラメータデータが読み出される（ステップＳ１１）。
【００８７】
そして、前記再生用ファイル１２ｃにおけるタイムコードファイル１２ｃ３（図３参照）から読み出されたコマンドコードが、“ＦＮ”か否か判断され（ステップＳ１２）、“ＦＮ”と判断された場合には、その時点で当該ファイル再生処理の停止処理が指示実行される（ステップＳ１２→Ｓ１３）。
【００８８】
一方、前記再生用ファイル１２ｃにおけるタイムコードファイル１２ｃ３（図３参照）から読み出されたコマンドコードが、“ＦＮ”ではないと判断された場合には、当該コマンドコードが、“ＰＴ”か否か判断される（ステップＳ１２→Ｓ１４）。
【００８９】
そして、コマンドコード“ＰＴ”と判断された場合には、図１０における対応パート別画像表示処理Ａが実行される（ステップＳＢ）。
【００９０】
この対応パート別画像表示処理Ａでは、まず、前記コマンドコード“ＰＴ”およびそのパラメータデータにより指定された英会話テキストのパート番号ｐが検知され（ステップＢａ１）、このパート番号ｐを前記タイムコードファイル１２ｃ３（図３参照）のコマンドコード“ＤＩ”およびそのパラメータデータで指定される英会話テキスト基本画像Ｇ０の画像番号に加算した値が、再生対象パートの画像番号ｉとして設定される（ステップＢａ２）。例えば、タイムコードファイル１２ｃ３（図３参照）の最初のコマンドコード“ＤＩ”のパラメータデータにより指定される基本画像番号が“００”であり、前記コマンドコード“ＰＴ”のパラメータデータにより指定されたパート番号ｐ＝“０１”である場合には、再生対象パートの画像番号ｉ＝０１（００＋０１）となる。
【００９１】
すると、コマンドコード“ＤＩ”に対応する命令（ｉ番目の画像表示命令）が実行され、例えばファイルシーケンステーブル１２ｃ２の画像番号ｉ＝０１に対応付けられてコンテンツ内容データ１２ｃ４に記憶されているＩＤ番号＝６のパート１の会話中画像Ｇ１が表示される（ステップＢａ３）。
【００９２】
こうして、コマンドコード“ＰＴ”により指定されたパート別画像の表示が行われると、該コマンドコード“ＰＴ”のパラメータデータにより指定される英会話パートのパート番号ｐが、前記学習内容の選択処理（ステップＳ１）においてユーザ選択されて記憶されたパート別練習を行う場合の指定パート番号１２ｅと一致するか否か判断される（ステップＳ１５）。
【００９３】
ここで、コマンドコード“ＰＴ”のパラメータデータにより指定される英会話パートのパート番号ｐが、ユーザ選択された指定パート番号１２ｅと一致しない場合、つまりユーザ所望の会話練習パートと異なる会話パートであると判断された場合には、当該会話パート部分における音声出力を行うために音声オンフラグ１２ｉがＲＡＭ１２Ｃにセットされると共に（ステップＳ１５→Ｓ１６）、コマンドコード“ＨＬ”に応じて該当会話パートのテキスト文字列を識別表示させるためのハイライトの種類が、アンダーラインによる識別表示形態に変更設定される（ステップＳ１７）。
【００９４】
一方、コマンドコード“ＰＴ”のパラメータデータにより指定される英会話パートのパート番号ｐが、ユーザ選択された指定パート番号１２ｅと一致した場合、つまりユーザ所望の会話練習パートであると判断された場合には、当該会話パート部分における音声出力を停止させるために音声オフフラグ１２ｉがＲＡＭ１２Ｃにセットされると共に（ステップＳ１５→Ｓ１８）、コマンドコード“ＨＬ”に応じて該当会話パートのテキスト文字列を識別表示させるためのハイライトの種類が、反転による識別表示形態に変更設定される（ステップＳ１９）。
【００９５】
すると、再び処理タイマによる計時動作に基づいて、当該タイマによる計時時間が次の処理単位時間１２ｃ１ａに到達したか否か判断される（ステップＳ２０）。
【００９６】
一方、前記ステップＳ１４において、前記再生用ファイル１２ｃにおけるタイムコードファイル１２ｃ３（図３参照）から読み出されたコマンドコードが、“ＰＴ”ではないと判断された場合には、他のコマンド処理へ移行されて各コマンド内容（図６参照）に対応する処理が実行される（ステップＳＣ）。
【００９７】
そして、ステップＳ２０において、前記タイマによる計時時間が次の処理単位時間１２ｃ１ａに到達したと判断された場合には、ＲＡＭ１２Ｃに格納された解読済再生用ファイル（ＣＡＳファイル）１２ｃに対する読み出しポインタが次の位置に更新セットされ（ステップＳ２０→Ｓ２１）、前記ステップＳ１１における当該読み出しポインタの位置のタイムコードファイル１２ｃ３（図３参照）のコマンドコードおよびそのパラメータデータ読み出しからの処理が繰り返される（ステップＳ２１→Ｓ１１〜Ｓ１９（ＳＣ））。
【００９８】
すなわち、携帯機器１０のＣＰＵ１１は、ＲＯＭ１２Ａに記憶された同期コンテンツ再生処理プログラム１２ａ２に従って、再生用ファイル１２ｂ（１２ｃ）に予め設定記述されているコマンド処理の単位時間毎に、タイムコードファイル１２ｃ３（図３参照）に配列されたコマンドコードおよびそのパラメータデータを読み出し、そのコマンドに対応する処理を指示するだけで、当該タイムコードファイル１２ｃ３に記述された各コマンドに応じたテキスト，音声，画像の同期再生出力処理が実行される。
【００９９】
そして、このように再生用ファイル１２ｃにおけるタイムコードファイル１２ｃ３（図３参照）からのコマンドコードの読み出しに応じて、音声，テキスト，画像の同期再生出力の指示を行う場合に、前記対応パート別画像表示処理Ａに従い、コマンドコード“ＰＴ”によって指示された次の再生パートに対応する会話中画像Ｇｎの表示が行われるので、簡単に各会話パート別の口の動きを表した画像表示を行うことができ、ユーザはより効果的に会話の練習を行うことができる。
【０１００】
また、コマンドコード“ＰＴ”によって指示された次の再生パートが、ユーザ設定された練習対象のパートであるときには、当該パート部分の音声出力は停止され、テキスト，パート別画像Ｇｎのみの同期再生出力状態となることで、ユーザは該ユーザ自身で設定した練習パートにおいて表示出力されるテキスト，パート別画像Ｇｎを見ながら該テキストを自身で読み上げて会話の練習を行うことができる。
【０１０１】
また、同コマンドコード“ＰＴ”によって指示された次の再生パートが、ユーザ設定された練習対象のパートでないときには、当該パート部分の音声出力は停止されず、テキスト，音声，画像の同期再生出力状態となることで、ユーザは他のパートについて表示出力されるテキスト，画像を見ながら該テキストの音声出力を聞いて会話の練習を行うことができる。
【０１０２】
ここで、図３で示す英語教材再生ファイル１２ｃに基づいた、前記同期コンテンツ再生処理プログラム１２ａ１（図８〜図１０参照）による音声・テキスト・画像ファイルの同期再生出力動作について詳細に説明する。
【０１０３】
図１２は前記図３乃至図５における英語教材再生ファイル１２ｃに基づいた英会話テキスト・音声・画像ファイルの同期再生状態（その１）を示す図である。
【０１０４】
この英語教材ファイル（１２ｃ）は、そのヘッダに予め記述設定された（基準）処理単位時間（例えば２５ｍｓ）１２ｃ１ａ毎にコマンド処理が実行されるもので、まず、タイムコードファイル１２ｃ３（図３参照）の第１コマンドコード“ＣＳ”（クリアオールファイル）およびそのパラメータデータ“００”が読み出されると、全ファイルの出力をクリアする指示が行われ、テキスト・音声・画像ファイルの出力がクリアされる（ステップＳＣ）。
【０１０５】
第２コマンドコード“ＤＨ”（ｉ番目ＨＴＭＬファイル表示）およびそのパラメータデータ“０１”が読み出されると、当該コマンドコードＤＨと共に読み出されたパラメータデータ（ｉ＝１）に応じて、ファイルシーケンステーブル１２ｃ２（図４参照）からＨＴＭＬ番号１のＩＤ番号＝２が読み出される。
【０１０６】
そして、このＩＤ番号＝２に対応付けられてコンテンツ内容データ１２ｃ４（図５参照）から読み出されるＨＴＭＬデータの英会話テキスト・画像フレームデータに応じて、図１２（Ａ）に示すように、表示部１８に対するテキスト表示フレームＸや画像表示フレームＹが設定される（ステップＳＣ）。
【０１０７】
第３コマンドコード“ＤＩ”（ｉ番目イメージファイル表示）およびそのパラメータデータ“００”が読み出されると、当該コマンドコードＤＩと共に読み出されたパラメータデータ（ｉ＝０）に応じて、ファイルシーケンステーブル１２ｃ２（図４参照）から画像番号１のＩＤ番号＝５が読み出される。
【０１０８】
そして、このＩＤ番号＝５に対応付けられてコンテンツ内容データ１２ｃ４（図５参照）から読み出されて画像展開バッファ１２ｋに展開された画像データ（英会話テキスト基本画像Ｇ０）が、前記ＨＴＭＬファイルで設定された画像表示フレームＹ内に表示される（ステップＳＣ）。
【０１０９】
第４コマンドコード“ＰＳ”（ｉ番目サウンドファイルプレイ）およびそのパラメータデータ“０２”が読み出されると、当該コマンドコードＰＳと共に読み出されたパラメータデータ（ｉ＝２）に応じて、ファイルシーケンステーブル１２ｃ２（図４参照）から音声番号２のＩＤ番号＝３２が読み出される。
【０１１０】
そして、このＩＤ番号＝３２に対応付けられてコンテンツ内容データ１２ｃ４（図５参照）から読み出された英会話音声データ▲２▼がステレオ音声出力部１９ｂから出力開始される（ステップＳＣ）。
【０１１１】
第５コマンドコード“ＬＴ”（ｉ番目テキストロード）およびそのパラメータデータ“０２”が読み出されると、当該コマンドコードＬＴと共に読み出されたパラメータデータ（ｉ＝２）に応じて、ファイルシーケンステーブル１２ｃ２（図４参照）からテキスト番号２のＩＤ番号＝２１が読み出される。
【０１１２】
そして、このＩＤ番号＝２１に対応付けられてコンテンツ内容データ１２ｃ４（図５参照）から読み出された英会話テキストデータ▲２▼がＲＡＭ１２Ｃのワークエリアにロードされる（ステップＳＣ）。
【０１１３】
第６コマンドコード“ＶＤ”（ｉ番目テキスト文節表示）およびそのパラメータデータ“００”が読み出されると、当該コマンドコードＶＤと共に読み出されたパラメータデータ（ｉ＝０）に応じて、ファイルシーケンステーブル１２ｃ２（図４参照）からテキスト番号０のＩＤ番号＝１９が読み出され、これに対応付けられてコンテンツ内容データ１２ｃ４（図５参照）にて指定された英会話タイトル文字の文節が、前記ＲＡＭ１２Ｃにロードされた英会話テキストデータ▲２▼の中から呼び出されて表示画面上のテキスト表示フレームＸ内に表示される（ステップＳＣ）。
【０１１４】
第７コマンドコード“ＢＬ”（文字カウンタリセット・ｉ番目文節ブロック指定）およびそのパラメータデータ“００”が読み出されると、前記テキスト表示フレームＸで表示中の英会話文節の文字カウンタがリセットされ、０番目のブロックが指定される（ステップＳＣ）。
【０１１５】
第８コマンドコード“ＰＩ”（パート別イメージファイル表示）およびそのパラメータデータ“００”が読み出されると、これ以降に指定されたパートの画像を表示する指示が行われる（ステップＳＣ）。
【０１１６】
第９コマンドコード“ＰＴ”（ｉ番目パート指定）およびそのパラメータデータ“０１”が読み出されると、前記英会話テキスト基本画像Ｇ０に基づき、これから同期再生すべき前記英会話音声データ▲２▼および前記英会話テキストデータ▲２▼における会話パート１（Ａさん）が指定される（ステップＳ１４）。
【０１１７】
すると、図９における対応パート別画像表示処理Ａに従い、前記指定のパート番号ｐ＝１に応じた画像番号ｉ＝１に対応して、パート１の人物の口の動作を表した会話中画像Ｇ１が読み出され、図１２（Ａ）に示すように表示される（ステップＳＢ）。
【０１１８】
そして、予めユーザ設定された指定パート番号１２ｅが会話パート２（Ｂさん）である場合には、前記第９コマンドコード“ＰＴ”により指定された会話パート１（Ａさん）と一致しないので（ステップＳ１５）、該当パート１の音声出力オンにされ（ステップＳ１６）、また、コマンドコード“ＨＬ”に応じた処理内容のハイライト処理がアンダーライン処理に変更設定される（ステップＳ１７）。
【０１１９】
そして、第１０コマンドコード“ＨＬ”（ｉ番目文字までハイライト・文字カウント）およびそのパラメータデータ“０７”が読み出されると、当該コマンドコードＨＬと共に読み出されたパラメータデータ（ｉ＝７）に応じて、図１２（Ａ）に示すように、テキストデータの７番目の文字「Ａ：Ｗｈａｔ」（スペース含む）までアンダーライン表示（下線表示）され、文字カウンタが同７番目の文字までカウントアップされる（ステップＳＣ）。
【０１２０】
この際、前記テキストデータのパート１（Ａさん）部分である会話文字列のアンダーライン表示中には、音声出力オンに設定されているので、前記第４コマンドコード“ＰＳ”に応じてステレオ音声出力部１９ｂから出力されている英会話音声データ▲２▼は、同会話パート１（Ａさん）のアンダーライン表示部分を読み上げるところの音声が出力されることになる。
【０１２１】
第１１コマンドコード“ＮＰ”が読み出されると、現在の画像および英会話テキストデータの同期表示画面および英会話音声データの同期出力状態が維持される。
【０１２２】
この後、第１３コマンドコード“ＨＬ”、第３５コマンドコード“ＨＬ”、第５８コマンドコード“ＨＬ”に従い、パート１会話中画像Ｇ１の表示状態において、会話パート１（Ａさん）部分のテキストデータが、順次、１２番目の文字「ｈｉｇｈ」、１９番目の文字「ｓｃｈｏｏｌ」、２２番目の文字「ｄｏ」というように、アンダーライン表示（下線表示）されて行くのと共に、前記第４コマンドコード“ＰＳ”に応じてステレオ音声出力部１９ｂから出力されている英会話音声データ▲２▼も、同会話パート１（Ａさん）のアンダーライン表示部分を読み上げるところの音声が順次出力される（ステップＳ１１〜Ｓ１４→ＳＣ，Ｓ２０，Ｓ２１→Ｓ１１）。
【０１２３】
つまり、コマンドコード“ＰＴ”によって指示された次の英会話再生パートが、ユーザ設定された練習対象のパート（例えばパート２（Ｂさん））でないときには、当該他のパート部分の音声出力は停止されず、英会話テキスト，その読み上げ音声，対応パート別画像Ｇｎの同期再生出力状態となることで、ユーザは他のパートについて表示出力されるテキスト，画像を見ながら該テキストの音声出力を聞いて会話の練習を行うことができる。
【０１２４】
そして、第１１９コマンドコード“ＰＴ”（ｉ番目パート指定）およびそのパラメータデータ“０２”が読み出されると、次に同期再生すべき前記英会話音声データ▲２▼および前記英会話テキストデータ▲２▼およびパート別画像Ｇｎにおける会話パート２（Ｂさん）が指定される（ステップＳ１４）。
【０１２５】
すると、図９における対応パート別画像表示処理Ａに従い、前記指定のパート番号ｐ＝２に応じた画像番号ｉ＝２に対応して、パート２の人物の口の動作を表した会話中画像Ｇ２が読み出され、図１２（Ｂ）に示すように表示される（ステップＳＢ）。
【０１２６】
ここで、予めユーザ設定された指定パート番号１２ｅが会話パート２（Ｂさん）である場合には、前記第１１９コマンドコード“ＰＴ”により指定された会話パート２（Ｂさん）と一致するので（ステップＳ１５）、該当パート２の音声出力オフにされ（ステップＳ１８）、また、コマンドコード“ＨＬ”に応じた処理内容のハイライト処理が反転処理に変更設定される（ステップＳ１９）。
【０１２７】
そして、第１２０コマンドコード“ＨＬ”（ｉ番目文字までハイライト・文字カウント）およびそのパラメータデータ“３７”が読み出されると、当該コマンドコードＨＬと共に読み出されたパラメータデータ（ｉ＝３７）に応じて、図１２（Ｂ）に示すように、テキストデータの３７番目の文字、つまり会話パート２の文字「Ｂ：Ｉ」（スペース含む）まで今度は反転により表示され、文字カウンタが同３７番目の文字までカウントアップされる（ステップＳＣ）。
【０１２８】
この際、前記テキストデータのパート２（Ｂさん）部分である会話文字列の反転表示中には、音声出力オフに設定されているので、前記第４コマンドコード“ＰＳ”に応じてステレオ音声出力部１９ｂから順次出力されていた英会話音声データ▲２▼は停止される。
【０１２９】
この後、第１３２コマンドコード“ＨＬ”、第１４０コマンドコード“ＨＬ”に従い、パート２会話中画像Ｇ２の表示状態において、会話パート２（Ｂさん）部分のテキストデータが、音声出力無しのままで、順次、４０番目の文字「ｇｏ」、４３番目の文字「ｔｏ」というように、反転表示されて行く（ステップＳ１１〜Ｓ１４→ＳＣ，Ｓ２０，Ｓ２１→Ｓ１１）。
【０１３０】
つまり、コマンドコード“ＰＴ”によって指示された次の英会話再生パートが、ユーザ設定された練習対象のパート（例えばパート２（Ｂさん））であるときには、当該パート部分の音声出力は停止され、英会話テキストパート１，対応パート別画像Ｇ２のみの同期再生出力状態となることで、ユーザは該ユーザ自身で設定した練習パートにおいて表示出力されるテキスト，画像を見ながら該テキストを自身で読み上げて会話の練習を行うことができる。
【０１３１】
さらに、第１５９コマンドコード“ＰＴ”（ｉ番目パート指定）およびそのパラメータデータ“０３”が読み出されると、次に同期再生すべき前記英会話音声データ▲２▼および前記英会話テキストデータ▲２▼およびパート別画像Ｇｎにおける会話パート３（Ｃさん）が指定される（ステップＳ１４）。
【０１３２】
すると、図９における対応パート別画像表示処理Ａに従い、前記指定のパート番号ｐ＝３に応じた画像番号ｉ＝３に対応して、パート３の人物の口の動作を表した会話中画像Ｇ３が読み出され、図１２（Ｃ）に示すように表示される（ステップＳＢ）。
【０１３３】
ここで、予めユーザ設定された指定パート番号１２ｅが会話パート２（Ｂさん）である場合には、前記第１５９コマンドコード“ＰＴ”により指定された会話パート３（Ｃさん）と一致しないので（ステップＳ１５）、該当パート３の音声出力オンにされ（ステップＳ１６）、また、コマンドコード“ＨＬ”に応じた処理内容のハイライト処理がアンダーライン処理に変更設定される（ステップＳ１７）。
【０１３４】
そして、第１６０コマンドコード“ＨＬ”（ｉ番目文字までハイライト・文字カウント）およびそのパラメータデータ“７０”が読み出されると、当該コマンドコードＨＬと共に読み出されたパラメータデータ（ｉ＝７０）に応じて、図１２（Ｃ）に示すように、テキストデータの７０番目の文字「Ｃ：Ｍｅ，」（スペース含む）までアンダーライン表示（下線表示）され、文字カウンタが同７０番目の文字までカウントアップされる（ステップＳＣ）。
【０１３５】
この際、前記テキストデータのパート３（Ｃさん）部分である会話文字列のアンダーライン表示中には、音声出力オンに設定されているので、前記第４コマンドコード“ＰＳ”に応じてステレオ音声出力部１９ｂから出力されている英会話音声データ▲２▼は、同会話パート３（Ｃさん）のアンダーライン表示部分を読み上げるところの音声が出力されることになる。
【０１３６】
この後、第１７２コマンドコード“ＨＬ”に従い、パート３会話中画像Ｇ３の表示状態において、会話パート３（Ｃさん）部分のテキストデータが７５番目の文字「ｔｏｏ．」までアンダーライン表示（下線表示）されて行くのと共に、前記第４コマンドコード“ＰＳ”に応じてステレオ音声出力部１９ｂから出力されている英会話音声データ▲２▼も、同会話パート３（Ｃさん）のアンダーライン表示部分を読み上げるところの音声が続けて出力される（ステップＳ１１〜Ｓ１４→ＳＣ，Ｓ２０，Ｓ２１→Ｓ１１）。
【０１３７】
つまり、コマンドコード“ＰＴ”によって指示された次の英会話再生パートが、再び、ユーザ設定された練習対象のパート（例えばパート２（Ｂさん））でなくなったときには、当該他のパート部分の音声出力が再開され、英会話テキスト，その読み上げ音声，対応パート画像Ｇｎの同期再生出力状態となることで、ユーザは再び他のパートについて表示出力されるテキスト，画像を見ながら該テキストの音声出力を聞いて会話の練習を行うことができる。
【０１３８】
このように、前記英会話教材再生ファイル（１２ｃ）におけるタイムコードファイル１２ｃ３（図３参照）・ファイルシーケンステーブル１２ｃ２（図４参照）・コンテンツ内容データ１２ｃ４（図５参照）に従って、当該再生ファイルに予め設定された（基準）処理単位時間（例えば２５ｍｓ）１２ｃ１ａ毎のコマンド処理を行うことで、図１２（Ａ）〜（Ｄ）で示したように、表示画面上のテキスト表示フレームＸ内に英会話テキストデータがそのパート部分を識別表示されながら表示されると共に、画像表示フレームＹ内にそのパート別画像Ｇｎが同期表示され、さらに、ステレオ音声出力部１９ｂから識別表示中の英会話パートのテキストを読み上げる英会話音声データが同期出力されると共に、当該英会話テキストの読み上げ文節が各文字毎に順次同期ハイライト（強調）表示されるようになる。
【０１３９】
この際、対応パート別画像表示処理Ａに従い、コマンドコード“ＰＴ”によって指示された次の再生パートに対応する会話中画像Ｇｎの表示が行われるので、簡単に各会話パート別の口の動きを表した画像表示を行うことができ、ユーザはより効果的に会話の練習を行うことができる。
【０１４０】
また、ユーザが指定した会話パートの再生出力期間においては、その音声データの同期出力が停止されるので、ユーザは該ユーザ自身で設定した練習パートにおいて表示出力されるテキスト，パート別画像Ｇｎを見ながら該テキストを自身で読み上げて会話の練習を行うことができ、他の会話パートにおいては、テキスト，パート別画像Ｇｎを見ながらその読み上げ音声出力を聞いて練習することができる。
【０１４１】
したがって、前記構成の携帯機器１０によるファイル再生機能によれば、ＲＯＭ１２Ａに予め記憶された再生処理プログラム１２ａ１に従って、再生用ファイル１２ｂ（１２ｃ）に予め設定記述されているコマンド処理の基準単位時間（２５ｍｓ／５０ｍｓ）毎に、タイムコードファイル１２ｃ３に配列されたコマンドコードおよびそのパラメータデータを読み出し、そのコマンドに対応する処理を指示するだけで、当該タイムコードファイル１２ｃ３に記述された各コマンドに応じたテキスト・画像・音声ファイルなどの同期再生処理が実行される。
【０１４２】
そして、例えば英会話教材の再生用ファイル１２ｂ（１２ｃ）による複数の会話パートからなるテキスト・音声・画像の同期再生出力が行われる場合に、対応パート別画像表示処理Ａに従い、コマンドコード“ＰＴ”およびそのパラメータデータによって指示された次の再生対象会話パートに対応するパート別の会話中画像Ｇｎの表示が行われるので、簡単に各会話パート別の口の動きを表した画像表示を行うことができ、ユーザはより効果的に会話の練習を行うことができる。
【０１４３】
また、前記コマンドコードおよびそのパラメータデータに応じて指定される次の出力対象会話パートが、ユーザにより設定した練習対象の会話パートと一致した場合には、当該設定会話パートでは音声出力を停止させてパート別テキストの識別・対応パーツ別画像Ｇｎの同期表示出力のみ行い、また、練習対象の会話パートと一致しない場合には、音声出力を再開させてパート別テキストの識別・対応パーツ別画像Ｇｎと共に同期再生が行われるので、ユーザ所望の会話パートのみテキストの読み上げ音声出力を簡単に消してユーザ自身が発音練習することができ、パート別の練習を容易且つ効果的に行うことができる。
【０１４４】
また、前記構成の携帯機器１０によるファイル再生機能によれば、タイムコードファイル１２ｃ３に記述された基準処理単位時間毎のコマンドコードに応じて、音声データの出力指示“ＰＳ”や該音声データに合わせたテキストデータの表示指示“ＶＤ”“ＨＬ”およびパート別画像Ｇｎの表示指示“ＤＩ”“ＰＩ”を行う場合に、各会話パートの開始の指定を“ＰＴ”により行って、対応するパート別画像Ｇｎを表示したり、ユーザ設定された練習パートと一致した場合には当該パートの音声出力をオフにしたりする構成としたので、非常に簡単にパート別の練習を行うことができる。
【０１４５】
また、前記構成の携帯機器１０によるファイル再生機能によれば、タイムコードファイル１２ｃ３のコマンドコード“ＰＴ”によって指定されるパート番号（種類）は、当該コマンドコード“ＰＴ”と対に記述されたパラメータデータによって設定されるので、簡単に各再生パート指定を行ったタイムコードファイル１２ｃ３を作成することができる。
【０１４６】
なお、前記実施形態の再生処理に伴う対応パート別画像表示処理Ａ（図１０参照）では、タイムコードファイル１２ｃ３のコマンドコードＰＴによって指定されたパート番号ｐに応じて、予め各会話パート毎にそのパート人物の口の動きを表現して用意したパート別の会話中画像Ｇｎを切り替え表示させ、各パートの会話の状態を画像の上でも同期して表現する構成としたが、次の図１３〜図１５において示す対応パート別画像表示処理Ｂおよびテキスト対応口表示処理に従い説明するように、各パートの何れにも口の動きがない英会話テキスト基本画像Ｇ０をベースとして、前記コマンドコードＰＴにより指定される各パート人物の口エリアＭ１〜Ｍ３に対し、各パート英会話テキストの現在の識別表示文字に対応してテキスト対応発音口型データ１２ｆから読み出される該当テキスト文字の発音口型画像を合成表示することで、各パートの会話の状態を画像の上でよりリアルに同期して表現する構成としもよい。
【０１４７】
図１３は前記携帯機器１０の再生処理に伴う対応パート別画像表示処理Ｂを示すフローチャートである。
【０１４８】
図１４は前記携帯機器１０の再生処理におけるコマンドコードＨＬに応じたパート別テキストのハイライト（強調）表示処理に伴い割り込みで実行されるテキスト対応口表示処理を示すフローチャートである。
【０１４９】
図１５は前記図３乃至図５における英語教材再生ファイル１２ｃに基づいた英会話テキスト・音声・画像ファイルの同期再生状態（その２）を示す図である。
【０１５０】
すなわち、前記図８を参照して説明した再生処理のステップＳ１４において、タイムコードファイル１２ｃ３からコマンドコードＰＴが読み出された場合に、図１３における対応パート別画像表示処理Ｂに移行されると、まず、コンテンツ内容データ１２ｃ４の英会話テキスト基本画像Ｇ０が読み出されて、図１５に示すように、表示画面上の画像表示フレームＹ内に表示される（ステップＢｂ１）。
【０１５１】
すると、前記コマンドコードＰＴのパラメータデータにより指定された次の再生対象パート番号ｐが検知され（ステップＢｂ２）、当該指定のパート番号ｐに対応する前記英会話テキスト基本画像Ｇ０上での対応パート人物画像の口位置の矩形エリアデータ（位置座標）Ｍｎが、当該英会話テキスト基本画像Ｇ０が記憶されたコンテンツ内容データ１２ｃ４から読み出される（ステップＢｂ３）。
【０１５２】
そして、再生処理のステップＳＣにおいて、タイムコードファイル１２ｃ３から読み出されたコマンドコードＨＬに従い、表示中にある英会話テキストの現在の読み上げ文字までがハイライト処理により識別表示されるのに伴い、図１４におけるテキスト対応口表示処理が割り込みで起動されると、現在のテキストハイライト処理位置の文字が、当該コマンドコードＨＬのパラメータデータに基づき検知される（ステップＤ１）。
【０１５３】
すると、このテキストハイライト位置の文字に対応する発音の口型画像データが、前記図９における発音口型データ作成処理により作成されたテキスト対応発音口型データ１２ｆ（図７参照）の口番号に従いＲＯＭ１２Ａ内の口型データ１２ａ３（図２参照）から読み出される（ステップＤ２）。
【０１５４】
そして、この現在のテキストハイライト位置の文字に対応する発音の口型画像データが、前記対応パート別画像表示処理ＢのステップＢｂ３において読み出された指定パート人物画像の口位置の矩形エリアデータ（位置座標）Ｍｎに従い、図１５に示すように、表示中の英会話テキスト基本画像Ｇ０上に合成表示される（ステップＤ３）。
【０１５５】
これにより、現在再生中の英会話テキストの各パート別に、その発音に応じた口型を対応パート人物画像の口位置に合成して表示させることができ、各パートの会話の状態を画像の上でよりリアルに同期表現し、さらに効果的な会話の練習を行うことができる。
【０１５６】
そして、入力部１７ａあるいは座標入力装置１７ｂによって、「口内拡大表示」を指示するキー（ボタン）操作が行われると（ステップＤ４）、前記パート別のテキストハイライト表示に対応して英会話テキスト基本画像Ｇ０上で合成表示中の口型画像の拡大画像が、前記ＲＯＭ１２Ａ内の口型データ１２ａ３（図２参照）から読み出され（ステップＤ５）、前記英会話テキスト基本画像Ｇ０とは別の表示エリアに表示される（ステップＤ６）。
【０１５７】
これにより、ユーザは会話練習中の発音の口型をリアルタイムに且つより詳しく知ることができ、最も効果的に英会話の練習を実施することができる。
【０１５８】
なお、このテキストハイライト位置に応じた口型合成表示の実施形態では、実際の発音に対応した口型画像を口型データ１２ａ３から読み出して各パート別人物画像の口エリアＭｎに合成表示する構成としたが、全パートの人物画像が口を閉じている英会話テキスト基本画像Ｇ０に対して、単に開いた口画像をパート別に合成表示する構成としてもよい。
【０１５９】
なお、前記実施形態において記載した携帯機器１０による各処理の手法、すなわち、図８のフローチャートに示すファイル再生処理、図９のフローチャートに示す前記ファイル再生処理に伴う発音口型データ作成処理、図１０のフローチャートに示す前記ファイル再生処理に伴う対応パート別画像表示処理Ａ、図１３のフローチャートに示す前記ファイル再生処理に伴う対応パート別画像表示処理Ｂ、図１４のフローチャートに示す前記ファイル再生処理におけるコマンドコードＨＬに応じたパート別テキストのハイライト（強調）表示処理に伴うテキスト対応口表示処理などの手法は、何れもコンピュータに実行させることができるプログラムとして、メモリカード（ＲＯＭカード、ＲＡＭカード等）、磁気ディスク（フロッピディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の外部記録媒体１３に格納して配布することができる。そして、通信ネットワーク（インターネット）Ｎとの通信機能を備えた種々のコンピュータ端末は、この外部記録媒体１３に記憶されたプログラムを記録媒体読取部１４によってメモリ１２に読み込み、この読み込んだプログラムによって動作が制御されることにより、前記実施形態において説明した指定パートのパート別画像Ｇｎを切り替え表示可能なテキスト，音声，画像の同期再生機能を実現し、前述した手法による同様の処理を実行することができる。
【０１６０】
また、前記各手法を実現するためのプログラムのデータは、プログラムコードの形態として通信ネットワーク（インターネット）Ｎ上を伝送させることができ、この通信ネットワーク（インターネット）Ｎに接続されたコンピュータ端末から前記のプログラムデータを取り込み、前述した指定パートのパート別画像Ｇｎを切り替え表示可能なテキスト，音声，画像の同期再生機能を実現することもできる。
【０１６１】
なお、本願発明は、前記各実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。さらに、前記各実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出され得る。例えば、各実施形態に示される全構成要件から幾つかの構成要件が削除されたり、幾つかの構成要件が組み合わされても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果が得られる場合には、この構成要件が削除されたり組み合わされた構成が発明として抽出され得るものである。
【０１６２】
【発明の効果】
以上のように、本発明の請求項１に係る音声表示出力制御装置によれば、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに基づいて前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した画像がパート別画像記憶手段により記憶された各パートに対応した画像の中から読み出されて表示されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した画像を表示できるようになる。
【０１６３】
また、本発明の請求項２に係る音声表示出力制御装置によれば、前記請求項１に係る音声表示出力制御装置にあって、テキスト同期表示制御手段は、予め設定された経過時間に従い前記複数種類のパートに区分されたテキストデータを前記音声データ出力手段により出力される音声データに同期して表示させるための命令コードを記憶する命令コード記憶手段を有し、この命令コード記憶手段により記憶された命令コードに応じて、前記パート区分されたテキストデータが音声データに同期されて表示されるので、命令コード記憶手段により記憶された命令コードによる設定経過時間に応じた指示に従い、パート区分されたテキストデータおよび当該パートに対応した画像を音声データに簡単に同期させて表示できるようになる。
【０１６４】
また、本発明の請求項３に係る音声表示出力制御装置によれば、前記請求項２に係る音声表示出力制御装置にあって、パート指定手段は、前記命令コード記憶手段により記憶される命令コードと対応付けられて記憶されるパート指定命令コードであり、このパート指定命令コードにより、複数種類に区分されたテキストデータのパートが前記音声データ出力手段による音声データの出力に合わせて順次指定されるので、命令コード記憶手段により記憶された命令コードによる設定経過時間に応じた指示、および該命令コードに対応付けられたパート指定命令コードに従い、順次指定されるパートのテキストデータおよび当該パートに対応した画像を音声データに簡単に同期させて表示できるようになる。
【０１６５】
また、本発明の請求項４に係る音声表示出力制御装置によれば、前記請求項１乃至請求項３に係る音声表示出力制御装置にあって、パート別画像記憶手段には、複数種類の各パート別に当該各パートに対応した人物画像の口の動きを表現した画像が記憶されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口の動きを表現した画像を表示できるようになる。
【０１６６】
また、本発明の請求項５に係る音声表示出力制御装置によれば、前記請求項１乃至請求項３の何れか１項に係る音声表示出力制御装置にあって、さらに、口の動きを表現した画像を記憶する口画像記憶手段が備えられ、パート別画像記憶手段には、複数種類の各パートに対応した人物画像と各人物画像の口の位置情報が記憶される。そして、パート指定手段により指定されたパートに対応してテキストデータをテキスト同期表示制御手段により同期表示させる際には、前記パート別画像記憶手段により記憶された指定のパートの人物画像にその口の位置情報に応じて前記口画像記憶手段により記憶された口の動きの画像が合成されて表示されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口位置に対し口の動きを表現した画像を合成して表示できるようになる。
【０１６７】
また、本発明の請求項６に係る音声表示出力制御装置によれば、前記請求項５に係る音声表示出力制御装置にあって、さらに、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータを音声データ出力手段により出力される音声データに同期して表示させる際に、当該同期表示されるテキスト部分を検知するテキスト部分検知手段が備えられ、口画像記憶手段には、種々の文字の発音にそれぞれ対応して口の動きを表現した画像が記憶される。そして、パート指定手段により指定されたパートに対応してテキストデータを前記テキスト同期表示制御手段により同期表示させる際には、前記パート別画像記憶手段により記憶された指定のパートの人物画像にその口の位置情報に基づき、前記テキスト部分検知手段により検知される同期表示テキスト部分の発音に対応して前記口画像記憶手段により記憶された口の動きの画像が合成されて表示されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像の口位置に対し、同期表示中のテキスト部分の発音に対応する口の動きの画像を合成して表示できるようになる。
【０１６８】
また、本発明の請求項７に係る音声表示出力制御装置によれば、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。また画像表示制御手段により前記複数種類の各パートに対応した人物画像が表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに対応して前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した口の画像が前記画像表示制御手段により表示された該当パートに対応した人物画像の口位置に合成されて表示されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し該当パートに対応した口の画像を合成して表示できるようになる。
【０１６９】
また、本発明の請求項８に係る音声表示出力制御装置によれば、前記請求項７に係る音声表示出力制御装置にあって、さらに、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータを音声データ出力手段により出力される音声データに同期して表示させる際に、当該同期表示されるテキスト部分を検知するテキスト部分検知手段と、種々の文字の発音にそれぞれ対応して口の動きを表現した画像を記憶する口画像記憶手段が備えられる。そして、パート指定手段により指定されたパートに対応してテキストデータを前記テキスト同期表示制御手段により同期表示させる際には、画像表示制御手段により表示された該当パートに対応した人物画像の口位置に対して前記テキスト部分検知手段により検知される同期表示テキスト部分の発音に対応して前記口画像記憶手段により記憶された口の動きの画像が合成されて表示されるので、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し、同期表示中のテキスト部分の発音に対応する口の動きの画像を合成して表示できるようになる。
【０１７０】
さらに、本発明の請求項９に係る音声表示出力制御処理プログラムによれば、当該プログラムを電子機器のコンピュータにインストールすることで、この電子機器のコンピュータにおいて、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに基づいて前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した画像がパート別画像記憶手段により記憶された各パートに対応した画像の中から読み出されて表示されるので、この電子機器では、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した画像を表示できるようになる。
【０１７１】
さらに、本発明の請求項１０に係る音声表示出力制御処理プログラムによれば、当該プログラムを電子機器のコンピュータにインストールすることで、この電子機器のコンピュータにおいて、音声データ出力手段により音声データが出力されると共に、テキスト同期表示制御手段により複数種類のパートに区分されたテキストデータが前記音声データ出力手段により出力される音声データに同期して表示される。また画像表示制御手段により前記複数種類の各パートに対応した人物画像が表示される。そして、パート指定手段により前記テキストデータの何れかのパートが指定されると、この指定されたパートに対応して前記テキストデータを前記テキスト同期表示制御手段により同期表示させる際には、当該パートに対応した口の画像が前記画像表示制御手段により表示された該当パートに対応した人物画像の口位置に合成されて表示されるので、この電子機器では、音声データの出力と同期して、指定されたパートのテキストデータを表示できると共に、当該パートに対応した人物画像表示の口位置に対し該当パートに対応した口の画像を合成して表示できるようになる。
【０１７２】
よって、本発明によれば、テキストデータと音声データと画像データとの同期出力に際し、パート別の音声出力、テキスト表示、画像表示を簡単に同期して出力することが可能になる音声表示出力制御装置および音声表示出力制御処理プログラムを提供できる。
【図面の簡単な説明】
【図１】本発明の音声表示出力制御装置の実施形態に係る携帯機器１０の電子回路の構成を示すブロック図。
【図２】前記携帯機器１０のＲＯＭ１２Ａに記憶される口型データ１２ａ３の内容を示す図。
【図３】前記携帯機器１０のメモリ１２に格納された再生用ファイル１２ｂ（１２ｃ）を構成するタイムコードファイル１２ｃ３を示す図。
【図４】前記携帯機器１０のメモリ１２に格納された再生用ファイル１２ｂ（１２ｃ）を構成するファイルシーケンステーブル１２ｃ２を示す図。
【図５】前記携帯機器１０のメモリ１２に格納される再生用ファイル１２ｂ（１２ｃ）を構成するコンテンツ内容データ１２ｃ４を示す図。
【図６】前記携帯機器１０のタイムコードファイル１２ｃ３（図３参照）にて記述される各種コマンドのコマンドコードとそのパラメータデータおよび再生処理プログラム１２ａ１に基づき解析処理される命令内容を対応付けて示す図。
【図７】前記携帯機器１０のＲＡＭ１２Ｃに記憶されるテキスト対応発音口型データ１２ｆの内容を示す図。
【図８】前記携帯機器１０の再生処理プログラム１２ａ１に従った再生処理を示すフローチャート。
【図９】前記携帯機器１０の再生処理に伴う発音口型データ作成処理を示すフローチャート。
【図１０】前記携帯機器１０の再生処理に伴う対応パート別画像表示処理Ａを示すフローチャート。
【図１１】前記携帯機器１０の再生処理に伴う学習内容の選択操作・表示状態を示す図であり、同図（Ａ）は学習内容選択画面Ｇを示す図、同図（Ｂ）（Ｃ）は当該学習内容選択画面Ｇを対象とする選択操作キーを示す図。
【図１２】前記図３乃至図５における英語教材再生ファイル１２ｃに基づいた英会話テキスト・音声・画像ファイルの同期再生状態（その１）を示す図。
【図１３】前記携帯機器１０の再生処理に伴う対応パート別画像表示処理Ｂを示すフローチャート。
【図１４】前記携帯機器１０の再生処理におけるコマンドコードＨＬに応じたパート別テキストのハイライト（強調）表示処理に伴い割り込みで実行されるテキスト対応口表示処理を示すフローチャート。
【図１５】前記図３乃至図５における英語教材再生ファイル１２ｃに基づいた英会話テキスト・音声・画像ファイルの同期再生状態（その２）を示す図。
【符号の説明】
１０ …携帯機器
１１ …ＣＰＵ
１２ …メモリ
１２Ａ…ＲＯＭ
１２ａ１…ファイル再生処理プログラム
１２ａ２…辞書データ
１２ａ３…口型データ
１２Ｂ…ＦＬＡＳＨメモリ
１２ｂ…暗号化された再生用ファイル（ＣＡＳファイル）
１２Ｃ…ＲＡＭ
１２ｃ…解読された再生用ファイル（ＣＡＳファイル）
１２ｃ１…ヘッダ情報
１２ｃ１ａ…処理単位時間
１２ｃ２…ファイルシーケンステーブル
１２ｃ３…タイムコードファイル
１２ｃ４…コンテンツ内容データ
１２ｄ…指定テキスト番号
１２ｅ…指定パート番号
１２ｆ…テキスト対応発音口型データ
１２ｇ…テキストハイライト指定文字
１２ｈ…画像（オン／オフ）フラグ
１２ｉ…音声（オン／オフ）フラグ
１２ｊ…画像展開済みフラグ
１２ｋ…画像展開バッファ
１３ …外部記録媒体
１４ …記録媒体読取部
１５ …電送制御部
１６ …通信部
１７ａ…入力部
１７ｂ…座標入力装置
１８ …表示部
１９ａ…音声入力部
１９ｂ…ステレオ音声出力部
２０ …通信機器（自宅ＰＣ）
３０ …Ｗｅｂサーバ
Ｎ …通信ネットワーク（インターネット）
Ｘ …テキスト表示フレーム
Ｙ …画像表示フレーム
Ｇ０ …英会話テキスト基本画像
Ｇ１〜Ｇ３…パート１〜パート３会話中画像
Ｍ１〜Ｍ３…パート１〜パート３口エリア[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice display output control device and a voice display output control processing program for synchronously outputting data such as voice, text, and image.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, music, text, images, and the like are simultaneously reproduced in parallel. By embedding the synchronization information of a text file or an image file to be synchronously reproduced in a file, for example, in the case of karaoke, there is a type in which the karaoke voice and its image image and text of the lyrics are synchronously reproduced.
[0003]
Further, on the premise that temporal correspondence information of a character to a voice is prepared in advance, a device that extracts a feature amount of the voice signal and displays it in association with a corresponding character has been considered. (For example, refer to Patent Document 1.)
[0004]
[Patent Document 1]
Japanese Patent Publication No. 06-025905
[0005]
[Problems to be solved by the invention]
However, in the conventional synchronous reproduction technique of a plurality of types of files using the additional data area of the MPEG file, embedding of the synchronization information is defined in the additional data area of each frame of the main MP3 audio file. Therefore, the synchronization information cannot be extracted unless the MP3 audio file is reproduced, and other types of files can be synchronously reproduced only by reproducing the MP3 file.
[0006]
For this reason, for example, when the synchronization information of the text file and the image file is embedded in the MP3 audio file, even if the audio file is not reproduced, the audio reproduction process must be continuously performed as the non-audio data. There is a problem that the text and image to be synchronized cannot be reproduced.
[0007]
For example, if the text content is divided into parts according to the characters, such as English conversation text, or if there is a person according to the part on the image, when conducting hearing practice or reading practice for the desired part It is desirable to display text for each corresponding part or to change and display the corresponding person image. However, when an English conversation text using such an MP3 audio file is created, it is necessary not only to embed the corresponding part text and image synchronization information for each frame of each part of the MP3 audio file, but also to use it for reading practice. Even if a non-speech period is set for a specific part, as described above, the text and image of the corresponding part cannot be displayed unless the sound reproduction processing for the specific part is continued.
[0008]
Also, even in the device described in the above-mentioned conventional patent document 1, since the character corresponding to the sound signal is associated in advance and the sound and the character are reproduced synchronously, the sound output and the character can be selectively performed for each part. Cannot be displayed synchronously, and the corresponding images cannot be displayed synchronously.
[0009]
On the other hand, in the case of displaying a character string or a person image for each part in synchronization with audio output, such as a karaoke monitor, a color-coded display corresponding to each part of the character string or a switching display of a person image is performed. Like the above-mentioned MP3 file, it is necessary to create a file for synchronous reproduction in which voice, character strings, and images are separately incorporated and their own data are incorporated in advance.
[0010]
SUMMARY OF THE INVENTION The present invention has been made in view of the above-described problems, and in synchronizing and outputting text data, audio data, and image data, audio output, text display, and image display for each part are easily synchronized and output. It is an object of the present invention to provide a voice display output control device and a voice display output control processing program which can perform the same.
[0011]
[Means for Solving the Problems]
That is, in the voice display output control device according to claim 1 of the present invention, the voice data is output by the voice data output means, and the text data divided into a plurality of types of parts by the text synchronous display control means is converted into the voice data. It is displayed in synchronization with the audio data output by the output means. Then, when any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means based on the designated part, the part corresponding to the part is designated. The read image is read out from the image corresponding to each part stored by the part-by-part image storage means and displayed.
[0012]
According to this, the text data of the designated part can be displayed in synchronization with the output of the audio data, and the image corresponding to the part can be displayed.
[0013]
Also, in the audio display output control device according to claim 2 of the present invention, in the audio display output control device according to claim 1, the text synchronous display control means is configured to control the plurality of types of the plurality of types according to a preset elapsed time. Instruction code storage means for storing an instruction code for displaying the text data divided into parts in synchronization with the audio data output by the audio data output means, wherein the instruction code stored by the instruction code storage means is provided. According to the code, the text data divided into parts is displayed in synchronization with the audio data.
[0014]
According to this, according to the instruction according to the set elapsed time by the instruction code stored in the instruction code storage means, the text data divided into parts and the image corresponding to the part can be displayed in synchronization with the audio data.
[0015]
According to a third aspect of the present invention, there is provided the audio display output control device according to the second aspect, wherein the part designation means corresponds to the instruction code stored in the instruction code storage means. The part designation instruction code is attached and stored. The part designation instruction code sequentially designates a plurality of types of text data parts in accordance with the output of the audio data by the audio data output means.
[0016]
According to this, according to the instruction according to the set elapsed time by the instruction code stored by the instruction code storage means and the part designation instruction code associated with the instruction code, the text data of the part sequentially designated and the part Can be displayed in synchronization with the audio data.
[0017]
Also, in the audio display output control device according to claim 4 of the present invention, in the audio display output control device according to any one of claims 1 to 3, the image storage means for each part has a plurality of types of individual parts. An image expressing the movement of the mouth of the person image corresponding to each part is stored.
[0018]
According to this, the text data of the designated part can be displayed in synchronization with the output of the audio data, and the image expressing the movement of the mouth of the person image corresponding to the part can be displayed.
[0019]
According to a fifth aspect of the present invention, there is provided the audio display output control device according to any one of the first to third aspects, further comprising an image expressing mouth movement. Is stored, and the part-by-part image storage means stores a plurality of types of person images corresponding to each part and mouth position information of each person image. When text data is synchronously displayed by the text synchronous display control means corresponding to the part designated by the part designating means, the person image of the designated part stored by the part-by-part image storing means is added to the mouth of the mouth. An image of the mouth movement stored by the mouth image storage means is synthesized and displayed according to the position information.
[0020]
According to this, the text data of the specified part can be displayed in synchronization with the output of the audio data, and an image expressing the movement of the mouth with respect to the mouth position of the person image corresponding to the part is displayed. You can do it.
[0021]
According to a sixth aspect of the present invention, there is provided the audio display output control device according to the fifth aspect, further comprising: text data divided into a plurality of types of parts by text synchronous display control means. Is displayed in synchronism with the audio data output by the audio data output means, a text part detecting means for detecting the text part displayed synchronously is provided, and the mouth image storage means has a pronunciation of various characters. Are stored in correspondence with the images. When text data is synchronously displayed by the text synchronous display control means in correspondence with the part designated by the part designation means, the text image is added to the person image of the designated part stored by the part-by-part image storage means. Based on the position information, the image of the mouth movement stored by the mouth image storage unit is synthesized and displayed in accordance with the pronunciation of the synchronous display text portion detected by the text portion detection unit.
[0022]
According to this, the text data of the designated part can be displayed in synchronization with the output of the audio data, and the pronunciation of the text part being displayed synchronously with the mouth position of the person image corresponding to the part can be displayed. The image of the mouth movement can be synthesized and displayed.
[0023]
Further, in the audio display output control device according to claim 7 of the present invention, the audio data is output by the audio data output means, and the text data divided into a plurality of types of parts by the text synchronous display control means is converted into the audio data. It is displayed in synchronization with the audio data output by the output means. Further, the image display control means displays the person image corresponding to each of the plurality of types of parts. When any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the designated part, The corresponding mouth image is synthesized and displayed at the mouth position of the person image corresponding to the corresponding part displayed by the image display control means.
[0024]
According to this, the text data of the designated part can be displayed in synchronization with the output of the audio data, and the mouth image corresponding to the part is synthesized with the mouth position of the person image display corresponding to the part. Can be displayed.
[0025]
An audio display output control device according to an eighth aspect of the present invention is the audio display output control device according to the seventh aspect, further comprising text data divided into a plurality of types of parts by text synchronous display control means. Is displayed in synchronization with the audio data output by the audio data output means, a text part detection means for detecting the text part displayed in synchronization, and a mouth movement corresponding to the pronunciation of various characters, respectively. Mouth image storage means for storing the expressed image is provided. When text data is synchronously displayed by the text synchronous display control means in accordance with the part designated by the part designating means, the mouth position of the person image corresponding to the corresponding part displayed by the image display control means is displayed. On the other hand, the mouth movement image stored by the mouth image storage means is synthesized and displayed in accordance with the pronunciation of the synchronous display text part detected by the text part detection means.
[0026]
According to this, the text data of the specified part can be displayed in synchronization with the output of the audio data, and the pronunciation of the text part being displayed in synchronization with the mouth position of the person image display corresponding to the part can be displayed. This makes it possible to combine and display the images of the mouth movements.
[0027]
Further, in the audio display output control processing program according to claim 9 of the present invention, by installing the program in a computer of the electronic device, the audio data is output by the audio data output means in the computer of the electronic device. The text data divided into a plurality of types of parts by the text synchronous display control means is displayed in synchronization with the audio data output by the audio data output means. Then, when any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means based on the designated part, the part corresponding to the part is designated. The read image is read out from the image corresponding to each part stored by the part-by-part image storage means and displayed.
[0028]
As a result, the electronic device can display the text data of the specified part in synchronization with the output of the audio data, and can display the image corresponding to the part.
[0029]
Further, in the audio display output control processing program according to claim 10 of the present invention, the audio data is output by the audio data output means in the computer of the electronic device by installing the program in the computer of the electronic device. The text data divided into a plurality of types of parts by the text synchronous display control means is displayed in synchronization with the audio data output by the audio data output means. Further, the image display control means displays the person image corresponding to each of the plurality of types of parts. When any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the designated part, The corresponding mouth image is synthesized and displayed at the mouth position of the person image corresponding to the corresponding part displayed by the image display control means.
[0030]
This allows the electronic device to display the text data of the specified part in synchronization with the output of the audio data, and synthesize the mouth image corresponding to the part with the mouth position of the person image display corresponding to the part. Will be displayed.
[0031]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0032]
FIG. 1 is a block diagram showing a configuration of an electronic circuit of a portable device 10 according to an embodiment of a voice display output control device of the present invention.
[0033]
The portable device (PDA: personal digital assistants) 10 is configured by a computer which reads a program recorded on various recording media or a program transmitted and transmitted, and whose operation is controlled by the read program. The electronic circuit includes a CPU (central processing unit) 11.
[0034]
The CPU 11 executes a PDA control program stored in the ROM 12A in the memory 12 in advance, a PDA control program read from the external recording medium 13 such as a ROM card into the memory 12 via the recording medium reading unit 14, or the Internet or the like. In accordance with the PDA control program read from the other computer terminal (30) on the communication network N via the transmission control unit 15 into the memory 12, the operation of each circuit unit is controlled. The PDA control program receives an input signal corresponding to a user operation from an input unit 17a including switches and keys and a coordinate input device 17b including a mouse and a tablet, or another signal on the communication network N received by the transmission control unit 15. There is a communication signal from the computer terminal (30) The external communication device is received via the communication unit 16 by the short-range wireless connection or a wired connection using Bluetooth (R): is activated in response to the communication signal from the (PC personal computer) 20.
[0035]
The CPU 11 is connected to the memory 12, the recording medium reading unit 14, the electric transmission control unit 15, the communication unit 16, the input unit 17a, and the coordinate input device 17b. And a stereo sound output unit 19b having left and right channel speakers L and R and outputting sound.
[0036]
The CPU 11 has a built-in timer for measuring the processing time.
[0037]
The memory 12 of the portable device 10 includes a ROM 12A, a flash memory (EEP-ROM) 12B, and a RAM 12C.
[0038]
The ROM 12A has a network program for data communication with each computer terminal (30) on the communication network N via the transmission control unit 15 and a system program for controlling the entire operation of the portable device 10; In addition to storing an external device communication program for performing data communication with an external communication device (PC) 20, a schedule management program, an address management program, and synchronous playback of various files such as voice, text, and image. Various PDA control programs, such as the reproduction processing program 12a1, are stored.
[0039]
The ROM 12A further stores dictionary data 12a2 and mouth shape data 12a3 (see FIG. 2). As the dictionary data 12a2s, data of various dictionaries such as an English-Japanese dictionary, a Japanese-English dictionary, and a Japanese language dictionary are stored.
[0040]
FIG. 2 is a diagram showing the contents of the mouth shape data 12a3 stored in the ROM 12A of the portable device 10.
[0041]
As the mouth shape data 12a3, for example, a mouth shape image viewed from the front direction and a mouth shape cross-sectional image from the lateral direction which are enlarged in correspondence with each of the phonetic symbols in English and the mouth shape number thereof. , The small explanation (comment) data is stored.
[0042]
The FLASH memory (EEP-ROM) 12B stores an encrypted reproduction file (CAS file) 12b to be subjected to reproduction processing based on the reproduction processing program 12a1, as well as the schedule management program and the address management. The user's schedule and addresses of friends and acquaintances managed based on the program are stored.
[0043]
Here, the encrypted reproduction file 12b stored in the FLASH memory (EEP-ROM) 12B is, for example, a file for practicing English or singing by synchronizing reproduction of text, voice, and image. It is compressed and encrypted by an algorithm.
[0044]
The encrypted reproduction file 12b is recorded and distributed on a CD-ROM, for example, or distributed and distributed from a file distribution server 30 on a communication network (Internet) N. 30), the encrypted playback file 12b is read into, for example, a communication device (PC) 20 as a user's home PC, and then transmitted via a communication unit 16 of the portable device (PDA) 10. (ROM) 12B.
[0045]
The RAM 12C stores a decrypted playback file (CAS file) 12c obtained by expanding and decrypting the encrypted playback file 12b, and expands the image file in the decrypted playback file 12c. An image development buffer 12k to be stored is provided. The decrypted CAS file 12c is composed of header information (12c1) for storing a processing unit time (12c1a) of a reproduction command, a file sequence table (12c2), a time code file (12c3), and content content data (12c4) described later. Be composed. Then, in the RAM 12C, an image development completed flag 12j indicating the developed image number of each of the image files developed and stored in the image development buffer 12k is stored.
[0046]
In the RAM 12C, the designated text number 12d in the selected content of the reproduction file 12b (12c), the designated part number 12e designated by the user for the English conversation text corresponding to the designated text number, and the designated text number Text-based pronunciation mouth type data 12f (see FIG. 7) in which all pronunciation symbols and mouth type numbers of the respective English words of the corresponding English conversation text are retrieved from the dictionary data 12a2 and mouth shape data 12a3 and stored. Text highlight that is detected and stored in real time as the current highlight identification character when each character of the English conversation text is highlighted and displayed in synchronization with the reading voice and the reading animation image based on the application file 12b (12c). The designated character 12g and the reproduction file (On / off) flag 12h for setting whether or not to synchronously reproduce an image file included as a synchronous reproduction file in the file 12b (12c), and included as a synchronous reproduction file in the reproduction file 12b (12c). An audio (on / off) flag 12i for setting whether to perform synchronous reproduction of the audio file to be reproduced or not is stored.
[0047]
Further, the RAM 12C is provided with a work area for temporarily storing various data input to and output from the CPU 11 in accordance with other various processes.
[0048]
FIG. 3 is a diagram showing a time code file 12c3 constituting the reproduction file 12b (12c) stored in the memory 12 of the portable device 10.
[0049]
FIG. 4 is a view showing a file sequence table 12c2 constituting the reproduction file 12b (12c) stored in the memory 12 of the portable device 10.
[0050]
FIG. 5 is a diagram showing content content data 12c4 constituting the reproduction file 12b (12c) stored in the memory 12 of the portable device 10.
[0051]
The reproduction file 12b (12c) to be reproduced by the portable device 10 is composed of a combination of a time code file 12c3, a file sequence table 12c2, and content data 12c4, as shown in FIGS. .
[0052]
The time code file 12c3 shown in FIG. 3 includes a time code for performing various file synchronous playback command processing in the original (reference) processing unit time (for example, 25 ms) at a fixed time interval preset for each file. Are described and arranged. Each time code is a reference number of a file sequence table 12c2 (FIG. 4) for associating a command code designating an instruction with a file content (see FIG. 5) related to the command. And parameter data consisting of specified numerical values.
[0053]
The original (reference) processing unit time 12c1a at a fixed time interval for sequentially executing command processing according to this time code is described and set in the header information 12c1 of the time code file 12c3.
[0054]
For example, the file playback time of the playback file 12b (12c) consisting of the time code file 12c3 shown in FIG. 3 is a playback consisting of 2400 steps of time code if the original (reference) processing unit time is 25 ms. It will be 60 seconds after the processing.
[0055]
The file sequence table 12c2 shown in FIG. 4 includes, for each type of a plurality of types of files (HTML / image / text / audio), parameter data of each command described in the time code file 12c3 (see FIG. 3) and actual data. 5 is a table in which storage destination (ID) numbers of the file contents of FIG.
[0056]
In the content content data 12c4 shown in FIG. 5, actual file data such as voice, image, text, etc. associated with the respective command codes by the file sequence table 12c2 (see FIG. 4) correspond to their respective ID numbers. It is stored along with it.
[0057]
Briefly describing the content content data 12c4 linked in the file sequence table 12c2 of the present embodiment, for example, the data content corresponding to ID = 5 includes a basic image G0 of an English conversation text by three conversation parts (FIG. 15). In the data content corresponding to ID = 6, an in-conversation image G1 (see FIGS. 12A and 12D) of Part 1 based on the basic image G0 is prepared, and ID = 7. As the corresponding data content, an in-conversation image G2 (see FIG. 12B) of Part 2 based on the basic image G0 is prepared, and the data content corresponding to ID = 8 is based on the basic image G0. An in-conversation image G3 (see FIG. 12C) of the part 3 is prepared.
[0058]
The basic image G0 (see FIG. 15) of the English conversation text corresponding to ID = 5 of the content content data 12c4 has coordinate data x1y1, x2y2 of the mouth area for each part person image (see M1 to M3 in FIG. 15). ) Are stored in association with each other.
[0059]
Further, for example, the data content corresponding to ID = 21 includes English conversation text data {circle around (2)} (see FIGS. 12 and 15) in which images G1 to G3 of the conversation text of the three English conversation parts are synchronized. Further, for example, the data contents corresponding to ID = 32 include English conversation voice data {2} to which the conversational images G1 to G3 of the English conversation text and the English conversation text data {2} by the three conversation parts are synchronized. ▼ (see 19b in FIGS. 12 and 15) is prepared.
[0060]
FIG. 6 is a diagram showing the command codes of various commands described in the time code file 12c3 (see FIG. 3) of the portable device 10, their parameter data, and the contents of the commands analyzed based on the reproduction processing program 12a1 in association with each other. It is.
[0061]
Commands used for the time code file 12c3 include a standard command and an extended command. The standard commands include LT (i-th text load). VD (i-th text segment display). BL (character counter reset / i-th phrase block specification). HN (no highlight, character counter count up). HL (highlight / character count up to i-th character). LS (1 line scroll / character counter count up). DH (i-th HTML file display). DI (i-th image file display). PS (i-th sound file play). CS (clear all file). PP (stop for basic time i seconds). FN (processing end). There are NP (invalid) commands, and extended commands include PT (i-th part designation). There are commands such as PI (part-by-part image file display).
[0062]
FIG. 7 is a diagram showing the contents of the text-based pronunciation mouth type data 12f stored in the RAM 12C of the portable device 10.
[0063]
That is, the reproduction processing program 12a1 stored in the ROM 12A of the portable device (PDA) 10 is activated, and the decryption / reproduction file 12c decrypted from the FLASH memory 12B and stored in the RAM 12C is read, for example, as shown in FIGS. When the third command code “DI” and the parameter data “00” are read in with the command processing for each set processing unit time, the command “DI” is the i-th image file. Since this is a display instruction, the English text basic image G0 of the content content data 12c4 (see FIG. 5) is obtained from the parameter data i = 00 in accordance with the image file ID number = 5 linked to the file sequence table 12c2 (see FIG. 4). Read and displayed.
[0064]
When the sixth command code “VD” and the parameter data “00” are read in accordance with the command processing for each set processing unit time, the command “VD” is the i-th text segment display instruction. According to the parameter data i = 00, the 0th phrase of the text is displayed.
[0065]
When the eighth command code “PI” and the parameter data “00” are read in accordance with the command processing for each set processing unit time, since the command “PI” is a part-specific image file display instruction, The image display of the designated part is instructed in accordance with the subsequent command code PT (i-th part designation).
[0066]
When the ninth command code “PT” and the parameter data “01” are read in accordance with the command processing for each set processing unit time, since the command “PT” is the i-th part designation command, According to the data i = 01, the first part in the reproduction target file is specified.
[0067]
Further, when the eleventh command code “NP” and the parameter data “00” are read in accordance with the command processing for each set processing unit time, since the command “NP” is an invalid command, the current file output The state is maintained.
[0068]
The detailed reproduction operation of the reproduction file 12b (12c) having the file contents shown in FIGS. 3 to 5 will be described later.
[0069]
Next, a file reproducing function of the portable device 10 having the above configuration will be described.
[0070]
FIG. 8 is a flowchart showing a reproducing process of the portable device 10 according to the reproducing process program 12a1.
[0071]
FIG. 9 is a flowchart showing the sound opening mouth type data creation processing accompanying the reproduction processing of the portable device 10.
[0072]
FIG. 10 is a flowchart showing the corresponding part-by-correspondence image display processing A accompanying the reproduction processing of the portable device 10.
[0073]
FIGS. 11A and 11B are diagrams showing a selection operation / display state of learning contents in the reproduction process of the portable device 10, in which FIG. 11A shows a learning content selection screen G, and FIGS. FIG. 7 is a diagram showing selection operation keys for the learning content selection screen G;
[0074]
For example, an English teaching material playback file that allows studying English with text, images, and voice is taken from a CD-ROM or a server 30 on a communication network (Internet) N to a communication device (PC) 20 as a home PC, and the portable device (PDA) is read. In the state where the reproduction file (CAS file) 12b is stored in the FLASH memory 12B or the decrypted reproduction file 12c in the RAM 12C via the communication unit 16 of 10, the input unit 17a or the coordinate input device (mouse) When the reproduction of the reproduction file 12b (12c) is instructed by the operation of the (/ tablet) 17b, first, as shown in FIG. 11A, a learning content selection screen G for allowing the user to select the learning content is displayed. It is displayed on the unit 18 (step S1).
[0075]
In the learning content selection screen G, as shown in FIGS. 11B and 11C, the user operates the cursor key 17a1 and the "Enter" key 17a2 of the input unit 17a to listen to the entire English conversation or practice by part. When any one of the conversation parts (1: 2: 3) is selected, initialization processing such as clearing of each work area in the RAM 12C is performed, and the part number of the selected English conversation part is designated as the designated part number. 12e is stored in the RAM 12C (steps S1 and S2).
[0076]
Then, the reproduction file (CAS file) 12b stored in the FLASH memory 12B is read (step S3), and it is determined whether or not the reproduction file (CAS file) 12b is an encrypted file (step S4). ).
[0077]
Here, if it is determined that the file is an encrypted playback file (CAS file) 12b, the CAS file 12b is decrypted and decrypted (steps S4 → S5), transferred to the RAM 12C and stored therein (step S4). Step S6).
[0078]
Here, the sound opening type data creation processing in FIG. 9 is executed (step SA).
[0079]
In this pronunciation mouth type data creation process, first, the English conversation text data stored as the content content data 12c4 is read as the text corresponding pronunciation mouth type data 12f in the RAM 12C as shown in FIG. 7, for example (step A1). .
[0080]
Then, all the words of the English conversation text data read into the text-based pronunciation mouth type data 12f are sequentially searched in the dictionary based on the dictionary data 12a2 stored in the ROM 12A, and their respective phonetic symbols are read (step). A2).
[0081]
Then, the mouth type number data corresponding to each of the phonetic symbols read for all the words of the English conversation text data is read from the mouth shape data 12a3 (see FIG. 2) stored in the ROM 12A, and each conversation is read. The text words, pronunciation symbols, and mouth type numbers of each part are stored in the text-based pronunciation mouth shape data 12f in the RAM 12C in association with each other.
[0082]
Thus, when the sound opening type data creation processing is completed, the processing unit time 12c1a (for example, 25 ms) of the reproduction file (CAS file) 12c by the CPU 11 is set as the header information 12c1 of the time code file 12c3 (step S7).
[0083]
Then, a read pointer is set at the beginning of the decrypted playback file (CAS file) 12c stored in the RAM 12C (step S8), and a timer for measuring the playback processing timing of the playback file 12c is started (step S8). Step S9).
[0084]
Here, the pre-reading process is started in parallel with the reproduction process (step S10).
[0085]
In this pre-reading process, if there is a “DI” command for displaying an image file after the command processing of the position of the current read pointer according to the time code file 12c3 (see FIG. 3) of the reproduction file 12c, By pre-reading the image file specified by the parameter data of the "DI" command and expanding the image file in the image expansion buffer 12k, when the read pointer actually moves to the position of the subsequent "DI" command, the processing is performed. The specified image file can be output and displayed immediately without delay.
[0086]
In step S9, when the processing timer is started, the read pointer set in step S8 is set for each processing unit time (25 ms) corresponding to the current playback target file 12c set in step S7. The command code and its parameter data of the time code file 12c3 (see FIG. 3) constituting the reproduction file 12c at the position are read (step S11).
[0087]
Then, it is determined whether or not the command code read from the time code file 12c3 (see FIG. 3) in the reproduction file 12c is "FN" (step S12). If it is determined that the command code is "FN", At that time, a stop process of the file reproduction process is instructed and executed (step S12 → S13).
[0088]
On the other hand, if it is determined that the command code read from the time code file 12c3 (see FIG. 3) in the reproduction file 12c is not "FN", it is determined whether the command code is "PT". It is determined (step S12 → S14).
[0089]
When it is determined that the command code is “PT”, the image display processing A for each corresponding part in FIG. 10 is executed (step SB).
[0090]
In the image display processing A for each corresponding part, first, the part number p of the English conversation text designated by the command code “PT” and its parameter data is detected (step Ba1), and this part number p is stored in the time code file 12c3. The value added to the image code of the English conversation text basic image G0 specified by the command code “DI” (see FIG. 3) and its parameter data is set as the image number i of the reproduction target part (step Ba2). For example, the basic image number specified by the parameter data of the first command code “DI” in the time code file 12c3 (see FIG. 3) is “00”, and the part specified by the parameter data of the command code “PT” is When the number p is “01”, the image number i of the reproduction target part is 01 (00 + 01).
[0091]
Then, an instruction (i-th image display instruction) corresponding to the command code “DI” is executed. For example, the ID number stored in the content data 12c4 in association with the image number i = 01 in the file sequence table 12c2 The in-conversation image G1 of part 1 of = 6 is displayed (step Ba3).
[0092]
When the image for each part specified by the command code “PT” is displayed in this manner, the part number p of the English conversation part specified by the parameter data of the command code “PT” is changed to the learning content selection process (step In S1), it is determined whether or not the selected part number matches the designated part number 12e when the user selects and stores the part-by-part practice (step S15).
[0093]
Here, when the part number p of the English conversation part specified by the parameter data of the command code "PT" does not match the designated part number 12e selected by the user, that is, the conversation part is different from the conversation practice part desired by the user. If it is determined, the voice on flag 12i is set in the RAM 12C to output a voice in the conversation part (step S15 → S16), and the text character string of the conversation part is set according to the command code “HL”. Is changed and set to an underlined identification display form (step S17).
[0094]
On the other hand, when the part number p of the English conversation part specified by the parameter data of the command code “PT” matches the designated part number 12e selected by the user, that is, when it is determined that the part is the conversation practice part desired by the user. Sets the voice off flag 12i in the RAM 12C to stop the voice output in the conversation part (step S15 → S18), and identifies and displays the text character string of the conversation part according to the command code “HL”. The type of highlight is changed to an identification display form by inversion (step S19).
[0095]
Then, based on the time counting operation of the processing timer, it is determined whether the time measured by the timer has reached the next processing unit time 12c1a (step S20).
[0096]
On the other hand, if it is determined in step S14 that the command code read from the time code file 12c3 (see FIG. 3) in the reproduction file 12c is not "PT", the process proceeds to another command process. Then, processing corresponding to each command content (see FIG. 6) is executed (step SC).
[0097]
If it is determined in step S20 that the time measured by the timer has reached the next processing unit time 12c1a, the read pointer to the decrypted playback file (CAS file) 12c stored in the RAM 12C is set to the next value. The position is updated and set (step S20 → S21), and the processing from reading the command code of the time code file 12c3 (see FIG. 3) and the parameter data at the position of the read pointer in step S11 is repeated (step S21 → S11). To S19 (SC)).
[0098]
In other words, the CPU 11 of the portable device 10 executes the time code file 12c3 (see FIG. 4) for each unit time of the command processing set and described in advance in the reproduction file 12b (12c) according to the synchronous content reproduction processing program 12a2 stored in the ROM 12A. 3), simply read out the command code and its parameter data and instruct processing corresponding to the command, and synchronously reproduce text, voice, and image corresponding to each command described in the time code file 12c3. Output processing is performed.
[0099]
When the instruction for synchronous reproduction output of voice, text, and image is performed in response to the reading of the command code from the time code file 12c3 (see FIG. 3) in the reproduction file 12c, the image corresponding to the corresponding part is output. In accordance with the display processing A, the in-conversation image Gn corresponding to the next playback part specified by the command code “PT” is displayed, so that the image display showing the mouth movement for each conversation part can be easily performed. The user can practice the conversation more effectively.
[0100]
When the next playback part specified by the command code “PT” is a part to be practiced set by the user, the audio output of the part is stopped, and the synchronous playback output of only the text and part-specific image Gn is performed. By being in the state, the user can read aloud the text by himself while practicing a conversation while looking at the text displayed and output in the practice part set by the user and the image Gn for each part.
[0101]
If the next playback part specified by the command code “PT” is not the part to be practiced set by the user, the audio output of the part is not stopped, and the synchronous playback output state of the text, audio, and image is not stopped. Thus, the user can practice the conversation by listening to the audio output of the text while viewing the text and image displayed and output for other parts.
[0102]
Here, the synchronous reproduction output operation of the audio / text / image file by the synchronous content reproduction processing program 12a1 (see FIGS. 8 to 10) based on the English teaching material reproduction file 12c shown in FIG. 3 will be described in detail.
[0103]
FIG. 12 is a diagram showing a synchronous reproduction state (No. 1) of English conversation text / audio / image files based on the English learning material reproduction file 12c in FIGS.
[0104]
In the English language teaching material file (12c), command processing is executed for each (reference) processing unit time (for example, 25 ms) 12c1a described and set in advance in its header. First, a time code file 12c3 (see FIG. 3) When the first command code "CS" (clear all file) and its parameter data "00" are read, an instruction to clear the output of all the files is issued, and the output of the text / audio / image file is cleared ( Step SC).
[0105]
When the second command code “DH” (i-th HTML file display) and its parameter data “01” are read, the file sequence table 12c2 is read in accordance with the parameter data (i = 1) read together with the command code DH. The ID number = 2 of the HTML number 1 is read from (see FIG. 4).
[0106]
Then, according to the English conversation text / image frame data of the HTML data read from the content data 12c4 (see FIG. 5) in association with the ID number = 2, as shown in FIG. Are set for the text display frame X and the image display frame Y (step SC).
[0107]
When the third command code “DI” (i-th image file display) and its parameter data “00” are read out, the file sequence table 12c2 according to the parameter data (i = 0) read out together with the command code DI. From FIG. 4 (see FIG. 4), the ID number = 5 of the image number 1 is read.
[0108]
The image data (English conversation text basic image G0) read out from the content data 12c4 (see FIG. 5) in association with the ID number = 5 and expanded in the image expansion buffer 12k is set in the HTML file. It is displayed in the displayed image display frame Y (step SC).
[0109]
When the fourth command code “PS” (the i-th sound file play) and its parameter data “02” are read, the file sequence table 12c2 according to the parameter data (i = 2) read with the command code PS. The ID number = 32 of the voice number 2 is read from (see FIG. 4).
[0110]
Then, the English voice data {circle around (2)} read from the content data 12c4 (see FIG. 5) in association with the ID number = 32 is output from the stereo voice output unit 19b (step SC).
[0111]
When the fifth command code “LT” (i-th text load) and its parameter data “02” are read, the file sequence table 12c2 ( The ID number = 21 of the text number 2 is read out from FIG. 4).
[0112]
Then, English conversation text data {circle around (2)} read from the content content data 12c4 (see FIG. 5) in association with the ID number = 21 is loaded into the work area of the RAM 12C (step SC).
[0113]
When the sixth command code “VD” (i-th text segment display) and its parameter data “00” are read out, the file sequence table 12c2 according to the parameter data (i = 0) read out together with the command code VD. The ID number = 19 of text number 0 is read from (see FIG. 4), and the phrase of the English conversation title character specified by the content data 12c4 (see FIG. 5) is loaded into the RAM 12C in association with the ID number = 19. It is called out of the English text data {circle around (2)} and displayed in the text display frame X on the display screen (step SC).
[0114]
When the seventh command code “BL” (character counter reset / i-th phrase block designation) and its parameter data “00” are read, the character counter of the English conversation phrase being displayed in the text display frame X is reset to the 0th. Is designated (step SC).
[0115]
When the eighth command code “PI” (part-specific image file display) and its parameter data “00” are read, an instruction to display the image of the part specified thereafter is issued (step SC).
[0116]
When the ninth command code "PT" (i-th part designation) and its parameter data "01" are read out, the English conversation voice data {2} and the English conversation text to be synchronously reproduced based on the English conversation text basic image G0. The conversation part 1 (Mr. A) in the data (2) is designated (step S14).
[0117]
Then, according to the image display processing A for each corresponding part in FIG. 9, the conversation image G1 representing the movement of the mouth of the person of part 1 corresponding to the image number i = 1 corresponding to the designated part number p = 1. Is read out and displayed as shown in FIG. 12A (step SB).
[0118]
If the designated part number 12e set in advance by the user is the conversation part 2 (Mr. B), it does not match the conversation part 1 (Mr. A) designated by the ninth command code "PT" (step S1). S15), the audio output of the corresponding part 1 is turned on (step S16), and the highlight processing of the processing content according to the command code "HL" is changed to underline processing (step S17).
[0119]
When the tenth command code “HL” (highlight / character count up to the i-th character) and its parameter data “07” are read, the tenth command code “HL” is read in accordance with the parameter data (i = 7) read together with the command code HL. As shown in FIG. 12A, the text data is underlined (underlined) until the seventh character "A: What" (including a space), and the character counter is counted up to the seventh character. (Step SC).
[0120]
At this time, during the display of the underline of the conversation character string which is the part 1 (Mr. A) of the text data, the audio output is set to ON, so that the stereo audio is output in accordance with the fourth command code “PS”. The English conversation audio data {circle around (2)} output from the output unit 19b is a voice that reads the underlined display portion of the conversation part 1 (Mr. A).
[0121]
When the eleventh command code "NP" is read, the synchronous display screen of the current image and English conversation text data and the synchronous output state of English conversation voice data are maintained.
[0122]
Thereafter, in accordance with the thirteenth command code "HL", the thirty-fifth command code "HL", and the fifty-eighth command code "HL", the text data of the conversation part 1 (Mr. A) is displayed in the display state of the part 1 conversation image G1. Are sequentially underlined (underlined), such as the twelfth character "high", the nineteenth character "schol", and the twenty-second character "do", and the fourth command code " As for the English conversation voice data {2} output from the stereo voice output unit 19b in response to "PS", the voice reading the underlined display portion of the conversation part 1 (Mr. A) is sequentially output (steps S11 to S11). S14 → SC, S20, S21 → S11).
[0123]
That is, when the next English conversation playback part instructed by the command code “PT” is not the part to be practiced (eg, part 2 (Mr. B)) set by the user, the audio output of the other part is not stopped. The user enters the synchronized playback output state of the English conversation text, its read-out voice, and the corresponding part-specific image Gn, so that the user can practice the conversation by listening to the voice output of the text while viewing and displaying the text and images displayed for other parts. It can be performed.
[0124]
When the 119th command code "PT" (i-th part designation) and its parameter data "02" are read, the English conversation voice data (2) and the English conversation text data (2) and the part to be synchronously reproduced next are read out. The conversation part 2 (Mr. B) in the separate image Gn is designated (step S14).
[0125]
Then, according to the image display processing A for each corresponding part in FIG. 9, the conversation image G2 representing the mouth movement of the person of part 2 corresponding to the image number i = 2 corresponding to the specified part number p = 2 Is read out and displayed as shown in FIG. 12B (step SB).
[0126]
Here, when the designated part number 12e set in advance by the user is the conversation part 2 (Mr. B), it matches the conversation part 2 (Mr. B) designated by the 119th command code "PT". In step S15, the audio output of the corresponding part 2 is turned off (step S18), and the highlight processing of the processing content corresponding to the command code "HL" is changed to the inversion processing (step S19).
[0127]
Then, when the 120th command code “HL” (highlight / character count up to the i-th character) and its parameter data “37” are read, according to the parameter data (i = 37) read together with the command code HL. Then, as shown in FIG. 12B, the 37th character of the text data, that is, the character "B: I" (including the space) of the conversation part 2 is displayed in reverse, and the character counter is displayed at the 37th character. The character is counted up (step SC).
[0128]
At this time, during the reverse display of the conversation character string which is the part 2 (Mr. B) of the text data, the audio output is set to OFF, so that the stereo audio output is performed according to the fourth command code "PS". The English conversation voice data {2} sequentially output from the unit 19b is stopped.
[0129]
Thereafter, in accordance with the 132nd command code “HL” and the 140th command code “HL”, the text data of the conversation part 2 (Mr. B) is displayed without sound output in the display state of the part 2 conversation image G2. Are sequentially displayed in reverse, such as the 40th character "go" and the 43rd character "to" (steps S11 to S14 → SC, S20, S21 → S11).
[0130]
That is, when the next English conversation playback part instructed by the command code “PT” is the part to be practiced (for example, part 2 (Mr. B)) set by the user, the audio output of the part is stopped, and the English conversation is stopped. By entering the synchronized playback output state of only the text part 1 and the corresponding part-specific image G2, the user reads out the text by himself / herself while looking at the text and image displayed and output in the practice part set by the user himself, and converses with the user. Can practice.
[0131]
Further, when the 159th command code "PT" (i-th part designation) and its parameter data "03" are read, the English conversation voice data {2} and the English conversation text data {2} to be reproduced synchronously next and the part are read. The conversation part 3 (Mr. C) in the separate image Gn is designated (step S14).
[0132]
Then, in accordance with the image display processing A for each corresponding part in FIG. 9, the conversational image G3 representing the motion of the mouth of the person of part 3 corresponding to the image number i = 3 corresponding to the specified part number p = 3. Is read out and displayed as shown in FIG. 12C (step SB).
[0133]
Here, if the designated part number 12e set in advance by the user is the conversation part 2 (Mr. B), it does not match the conversation part 3 (Mr. C) designated by the 159th command code "PT". In step S15, the audio output of the corresponding part 3 is turned on (step S16), and the highlight processing of the processing content corresponding to the command code "HL" is changed to underline processing (step S17).
[0134]
When the 160th command code “HL” (highlighted / character count up to the i-th character) and its parameter data “70” are read out, the command data according to the parameter data (i = 70) read out together with the command code HL is read out. As shown in FIG. 12C, the character data is underlined (underlined) up to the 70th character "C: Me," (including spaces), and the character counter is counted up to the 70th character. (Step SC).
[0135]
At this time, during the display of the underline of the conversation character string, which is the part 3 (Mr. C) of the text data, the voice output is set to ON, so that the stereo voice is set according to the fourth command code “PS”. The English conversation audio data {circle around (2)} output from the output unit 19b is a voice that reads the underlined display portion of the conversation part 3 (Mr. C).
[0136]
Thereafter, in accordance with the 172nd command code “HL”, in the display state of the part 3 conversation image G3, the text data of the conversation part 3 (Mr. C) is underlined to the 75th character “too.” (Underlined display). ), The English conversation voice data {2} output from the stereo voice output unit 19b in response to the fourth command code “PS” also changes the underlined display part of the conversation part 3 (Mr. C). The voice to be read out is continuously output (steps S11 to S14 → SC, S20, S21 → S11).
[0137]
That is, when the next English conversation playback part specified by the command code “PT” is no longer the part to be practiced by the user (for example, part 2 (Mr. B)), the audio output of the other part is performed. Is resumed and the English conversation text, its read-out voice, and the corresponding part image Gn are synchronously reproduced and output, so that the user listens again to the voice output of the text while viewing the text and image displayed and output for other parts. Can practice conversation.
[0138]
In this way, according to the time code file 12c3 (see FIG. 3), the file sequence table 12c2 (see FIG. 4), and the content data 12c4 (see FIG. 5) in the English conversation teaching material reproduction file (12c), the reproduction file is set in advance. By performing the command processing for each of the processed (reference) processing unit times (for example, 25 ms) 12c1a, as shown in FIGS. 12A to 12D, the English conversation text data is displayed in the text display frame X on the display screen. Is displayed while the part is identified and displayed, the image Gn for each part is synchronously displayed in the image display frame Y, and the English conversation voice which reads out the text of the English conversation part being identified and displayed from the stereo voice output unit 19b. The data is output synchronously and the spoken phrase of the English conversation text is read. Each character every sequential synchronization highlight (emphasis) will be displayed.
[0139]
At this time, the in-conversation image Gn corresponding to the next playback part specified by the command code “PT” is displayed according to the corresponding part-specific image display processing A, so that the mouth movement for each conversation part can be easily displayed. The displayed image can be displayed, and the user can practice the conversation more effectively.
[0140]
In addition, during the playback output period of the conversation part specified by the user, the synchronous output of the audio data is stopped, so that the user can view the text and the part image Gn displayed and output in the practice part set by the user. It is possible to practice the conversation by reading out the text by itself while practicing the conversation in the other conversation parts while listening to the read-out voice output while looking at the text and part-specific image Gn.
[0141]
Therefore, according to the file playback function of the portable device 10 having the above-described configuration, the reference unit time (25 ms) of the command processing set and described in the playback file 12b (12c) in accordance with the playback processing program 12a1 stored in the ROM 12A in advance. / 50 ms), the command code and its parameter data arranged in the time code file 12c3 are read out, and only the processing corresponding to the command is instructed, and the text corresponding to each command described in the time code file 12c3 is read. -Synchronous playback processing of images and audio files is performed.
[0142]
Then, for example, when synchronous reproduction output of text / voice / image composed of a plurality of conversation parts is performed by the English conversation teaching material reproduction file 12b (12c), the command code "PT" and the command code "PT" Since the in-conversation image Gn for each part corresponding to the next conversation part to be reproduced indicated by the parameter data is displayed, it is possible to easily display an image showing the mouth movement for each conversation part. Thus, the user can practice the conversation more effectively.
[0143]
Further, when the next output target conversation part specified according to the command code and its parameter data matches the practice target conversation part set by the user, the voice output is stopped at the set conversation part. Only the part-specific text identification / corresponding parts image Gn is displayed synchronously, and if it does not match the conversation part to be practiced, the voice output is resumed, and the part-specific text identification / corresponding parts image Gn is displayed. Since the synchronous reproduction is performed, only the conversation part desired by the user can easily turn off the text-to-speech voice output and practice the pronunciation by himself, and the practice for each part can be performed easily and effectively.
[0144]
In addition, according to the file playback function of the portable device 10 having the above-described configuration, according to the command code for each reference processing unit time described in the time code file 12c3, the audio data output instruction “PS” and the audio data are matched. When the text data display instruction "VD""HL" and the part-specific image Gn display instruction "DI""PI" are performed, the start of each conversation part is designated by "PT", and the corresponding part is designated. Since the configuration is such that the image Gn is displayed and the voice output of the part is turned off when the part matches the practice part set by the user, the practice for each part can be performed very easily.
[0145]
Further, according to the file playback function of the portable device 10 having the above-described configuration, the part number (type) specified by the command code “PT” of the time code file 12c3 is a parameter described in a pair with the command code “PT”. Since the time code file is set by the data, the time code file 12c3 in which each reproduction part is specified can be easily created.
[0146]
In the image display processing A for each corresponding part accompanying the reproduction processing of the embodiment (see FIG. 10), each conversation part is previously determined for each conversation part in accordance with the part number p specified by the command code PT of the time code file 12c3. The conversation image Gn of each part prepared by expressing the movement of the mouth of the part person is switched and displayed, and the conversation state of each part is synchronously expressed on the image. As described in accordance with the image display processing B for each corresponding part and the text-corresponding mouth display processing shown in FIG. 15, the English conversation text basic image G0 having no mouth movement in any of the parts is designated by the command code PT. For each part person's mouth area M1 to M3, a text corresponding pronunciation mouth corresponding to the current identification display character of each part English conversation text By the sound outlet type image of the relevant text characters to composite display to be read from the data 12f, it may be configured cities representing synchronously more realistic conversational state of each part on the image.
[0147]
FIG. 13 is a flowchart showing a corresponding part-by-correspondence image display processing B accompanying the reproduction processing of the portable device 10.
[0148]
FIG. 14 is a flowchart showing a text-corresponding mouth display process which is executed by interruption in conjunction with a part-by-part text highlight (emphasis) display process corresponding to the command code HL in the reproduction process of the portable device 10.
[0149]
FIG. 15 is a diagram showing a synchronous reproduction state (No. 2) of the English conversation text / sound / image file based on the English teaching material reproduction file 12c in FIGS.
[0150]
That is, when the command code PT is read from the time code file 12c3 in step S14 of the reproduction process described with reference to FIG. 8, when the process proceeds to the corresponding part-by-part image display process B in FIG. First, the English conversation text basic image G0 of the content content data 12c4 is read and displayed in the image display frame Y on the display screen as shown in FIG. 15 (step Bb1).
[0151]
Then, the next reproduction target part number p specified by the parameter data of the command code PT is detected (step Bb2), and the corresponding part person image on the English conversation text basic image G0 corresponding to the specified part number p Is read from the content data 12c4 in which the English conversation text basic image G0 is stored (step Bb3).
[0152]
Then, in step SC of the reproduction process, according to the command code HL read from the time code file 12c3, up to the current read-out character of the English conversation text being displayed is identified and displayed by the highlighting process. Is activated by interruption, the character at the current text highlight processing position is detected based on the parameter data of the command code HL (step D1).
[0153]
Then, the mouth shape image data of the pronunciation corresponding to the character at the text highlight position is determined according to the mouth number of the text-based pronunciation mouth shape data 12f (see FIG. 7) created by the pronunciation mouth shape data creation process in FIG. It is read from the mouth shape data 12a3 (see FIG. 2) in the ROM 12A (step D2).
[0154]
Then, the mouth-shaped image data of the pronunciation corresponding to the character at the current text highlight position is the rectangular area data of the mouth position of the designated part person image read out in step Bb3 of the corresponding part-by-part image display processing B (FIG. According to the (position coordinates) Mn, as shown in FIG. 15, it is synthesized and displayed on the displayed English conversation text basic image G0 (step D3).
[0155]
As a result, for each part of the currently reproduced English conversation text, a mouth shape corresponding to the pronunciation can be synthesized and displayed at the mouth position of the corresponding part person image, and the state of the conversation of each part can be displayed on the image. Synchronous expressions can be made more realistic, and more effective conversation practice can be performed.
[0156]
When a key (button) operation for instructing "enlarged display in the mouth" is performed by the input unit 17a or the coordinate input device 17b (step D4), the English conversation text basic image corresponding to the text highlight display for each part is performed. An enlarged image of the mouth image that is being synthesized and displayed on G0 is read from mouth shape data 12a3 (see FIG. 2) in the ROM 12A (step D5), and is displayed in a display area different from the English conversation text basic image G0. It is displayed (step D6).
[0157]
This allows the user to know the mouth shape of the pronunciation during the conversation practice in more detail in real time, and can practice the English conversation most effectively.
[0158]
In the embodiment of the mouth-shaped composite display according to the text highlight position, a mouth-shaped image corresponding to the actual pronunciation is read from the mouth-shaped data 12a3 and is synthesized and displayed in the mouth area Mn of each part-by-part person image. However, the configuration may be such that the opened mouth image is simply combined and displayed for each part with respect to the English conversation text basic image G0 in which the person images of all the parts are closed.
[0159]
In addition, the method of each processing by the portable device 10 described in the embodiment, that is, the file reproduction processing shown in the flowchart of FIG. 8, the sound opening type data creation processing accompanying the file reproduction processing shown in the flowchart of FIG. 13, the image display processing A for each corresponding part associated with the file reproduction processing shown in the flowchart of FIG. 13, the image display processing B corresponding to the corresponding part associated with the file reproduction processing shown in the flowchart of FIG. Any of the methods such as the text-corresponding mouth display processing associated with the part-specific text highlight (highlight) display processing according to the code HL is a program that can be executed by a computer, such as a memory card (ROM card, RAM card, etc.). , Magnetic disk (floppy disk, Dodisuku etc.), optical disk (CD-ROM, DVD, etc.) can be distributed and stored in the external recording medium 13 such as a semiconductor memory. Various computer terminals having a communication function with the communication network (Internet) N read the program stored in the external recording medium 13 into the memory 12 by the recording medium reading unit 14, and operate according to the read program. By being controlled, a synchronous reproduction function of text, audio, and image capable of switching and displaying the part-by-part image Gn of the designated part described in the above-described embodiment is realized, and the same processing by the above-described method can be executed. .
[0160]
Further, data of a program for realizing each of the above methods can be transmitted on a communication network (Internet) N in the form of a program code, and a computer terminal connected to the communication network (Internet) N transmits the program data. It is also possible to realize a synchronous reproduction function of text, voice, and image that can take in program data and switch and display the part-specific image Gn of the specified part described above.
[0161]
It should be noted that the present invention is not limited to the above-described embodiments, and can be variously modified in an implementation stage without departing from the scope of the invention. Furthermore, the embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some components are deleted from all the components shown in each embodiment or some components are combined, the problem described in the section of the problem to be solved by the invention can be solved. In the case where the effects described in the section of the effects of the invention can be obtained, a configuration in which this component is deleted or combined can be extracted as the invention.
[0162]
【The invention's effect】
As described above, according to the audio display output control device according to the first aspect of the present invention, the audio data is output by the audio data output unit, and the text is divided into a plurality of types of parts by the text synchronous display control unit. The data is displayed in synchronization with the audio data output by the audio data output means. Then, when any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means based on the designated part, the part corresponding to the part is designated. The read image is read out from the image corresponding to each part stored by the part-specific image storage means and displayed, so that the text data of the designated part can be displayed in synchronization with the output of the audio data. Thus, an image corresponding to the part can be displayed.
[0163]
Further, according to the audio display output control device according to claim 2 of the present invention, in the audio display output control device according to claim 1, the text synchronous display control means is configured to control the plurality of text synchronous display controls according to a preset elapsed time. Instruction code storage means for storing an instruction code for displaying text data divided into different parts in synchronization with the audio data output by the audio data output means, and stored by the instruction code storage means; In accordance with the instruction code, the text data divided into parts is displayed in synchronization with the audio data, so that the part data is divided according to the instruction according to the set elapsed time by the instruction code stored by the instruction code storage means. The text data and the image corresponding to the part can be displayed easily in synchronization with the audio data.
[0164]
According to a third aspect of the present invention, there is provided the audio display output control device according to the second aspect, wherein the part designation means includes an instruction code stored by the instruction code storage means. The part designation instruction code is stored in association with the part designation instruction code. With this part designation instruction code, the parts of the text data classified into a plurality of types are sequentially designated in accordance with the output of the audio data by the audio data output means. Therefore, according to the instruction according to the set elapsed time by the instruction code stored by the instruction code storage means, and the part designation instruction code associated with the instruction code, the text data of the sequentially designated part and the corresponding part Images can be easily synchronized with audio data and displayed.
[0165]
According to the audio display output control device according to claim 4 of the present invention, in the audio display output control device according to any one of claims 1 to 3, a plurality of types of each Since an image expressing the movement of the mouth of the person image corresponding to each part is stored for each part, the text data of the specified part can be displayed in synchronization with the output of the audio data, and the corresponding part can be displayed. It becomes possible to display an image expressing the movement of the mouth of a person image.
[0166]
According to the audio display output control device according to claim 5 of the present invention, in the audio display output control device according to any one of claims 1 to 3, furthermore, the motion of the mouth is expressed. A mouth image storage unit for storing the extracted image is provided, and the part-by-part image storage unit stores a plurality of types of person images corresponding to each part and position information of the mouth of each person image. When text data is synchronously displayed by the text synchronous display control means corresponding to the part designated by the part designating means, the person image of the designated part stored by the part-by-part image storing means is added to the mouth of the mouth. Since the image of the mouth movement stored by the mouth image storage means is synthesized and displayed according to the position information, the text data of the designated part can be displayed in synchronization with the output of the audio data, and It becomes possible to combine and display an image expressing the movement of the mouth with respect to the mouth position of the person image corresponding to the part.
[0167]
According to the audio display output control device according to claim 6 of the present invention, in the audio display output control device according to claim 5, the audio display output control device is further divided into a plurality of types of parts by text synchronous display control means. When the text data is displayed in synchronization with the audio data output by the audio data output means, a text part detection means for detecting the text part displayed synchronously is provided, and the mouth image storage means includes various characters. The image which expresses the movement of the mouth corresponding to each pronunciation of is stored. When text data is synchronously displayed by the text synchronous display control means in correspondence with the part designated by the part designation means, the text image is added to the person image of the designated part stored by the part-by-part image storage means. Based on the position information, the mouth movement image stored by the mouth image storage unit is synthesized and displayed in accordance with the pronunciation of the synchronous display text portion detected by the text portion detection unit, so that the audio data The text data of the specified part can be displayed in synchronization with the output of the part, and the mouth position image of the text part being displayed synchronously is synthesized with the mouth position of the person image corresponding to the part. To be displayed.
[0168]
Further, according to the audio display output control device according to claim 7 of the present invention, the audio data is output by the audio data output means, and the text data divided into a plurality of types of parts by the text synchronous display control means is output. It is displayed in synchronization with the audio data output by the audio data output means. Further, the image display control means displays the person image corresponding to each of the plurality of types of parts. When any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the designated part, Since the image of the corresponding mouth is synthesized and displayed at the mouth position of the person image corresponding to the corresponding part displayed by the image display control means, the text data of the specified part is synchronized with the output of the audio data. Can be displayed, and the image of the mouth corresponding to the part can be synthesized and displayed with the mouth position of the person image display corresponding to the part.
[0169]
Further, according to the audio display output control device according to claim 8 of the present invention, in the audio display output control device according to claim 7, the text synchronous display control unit further divides the audio display into a plurality of types of parts. When the text data is displayed in synchronization with the audio data output by the audio data output means, the text part detection means for detecting the text part displayed synchronously, and the mouth corresponding to the pronunciation of various characters, respectively. Mouth image storage means for storing an image representing the movement is provided. When text data is synchronously displayed by the text synchronous display control means in accordance with the part designated by the part designating means, the mouth position of the person image corresponding to the corresponding part displayed by the image display control means is displayed. On the other hand, the image of the mouth movement stored by the mouth image storage means is synthesized and displayed in accordance with the pronunciation of the synchronously displayed text part detected by the text part detection means. In addition to displaying the text data of the specified part, the image of the mouth movement corresponding to the pronunciation of the text part being displayed in synchronization with the mouth position of the person image display corresponding to the part can be displayed. become able to.
[0170]
Further, according to the sound display output control processing program according to the ninth aspect of the present invention, by installing the program in the computer of the electronic device, the sound data is output by the sound data output means in the computer of the electronic device. In addition, text data divided into a plurality of types of parts by the text synchronous display control means is displayed in synchronization with the audio data output by the audio data output means. Then, when any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means based on the designated part, the part corresponding to the part is designated. The read out image is read out from the image corresponding to each part stored by the part-by-part image storage means and displayed. Therefore, in this electronic device, the text of the specified part is synchronized with the output of the audio data. The data can be displayed, and an image corresponding to the part can be displayed.
[0171]
Further, according to the audio display output control processing program according to claim 10 of the present invention, the audio data is output by the audio data output means in the computer of the electronic device by installing the program in the computer of the electronic device. In addition, text data divided into a plurality of types of parts by the text synchronous display control means is displayed in synchronization with the audio data output by the audio data output means. Further, the image display control means displays the person image corresponding to each of the plurality of types of parts. When any part of the text data is designated by the part designation means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the designated part, Since the corresponding mouth image is synthesized and displayed at the mouth position of the person image corresponding to the corresponding part displayed by the image display control means, in this electronic device, the designated mouth is synchronized with the output of the audio data. The text data of the part can be displayed, and the image of the mouth corresponding to the part can be synthesized and displayed with the mouth position of the person image display corresponding to the part.
[0172]
Therefore, according to the present invention, in the synchronous output of text data, audio data, and image data, audio display output control that can easily output audio output, text display, and image display for each part in a synchronized manner A device and a sound display output control processing program can be provided.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an electronic circuit of a portable device 10 according to an embodiment of a voice display output control device of the present invention.
FIG. 2 is a diagram showing contents of mouth shape data 12a3 stored in a ROM 12A of the portable device 10.
FIG. 3 is a view showing a time code file 12c3 constituting a reproduction file 12b (12c) stored in a memory 12 of the portable device 10.
FIG. 4 is a view showing a file sequence table 12c2 constituting a reproduction file 12b (12c) stored in a memory 12 of the portable device 10.
FIG. 5 is a view showing content content data 12c4 constituting a reproduction file 12b (12c) stored in a memory 12 of the portable device 10.
FIG. 6 shows, in association with each other, command codes of various commands described in a time code file 12c3 (see FIG. 3) of the portable device 10, parameter data thereof, and command contents to be analyzed based on a reproduction processing program 12a1. FIG.
FIG. 7 is a view showing the contents of text-based pronunciation mouth type data 12f stored in a RAM 12C of the portable device 10.
FIG. 8 is a flowchart showing a reproduction process of the portable device 10 according to a reproduction process program 12a1.
FIG. 9 is a flowchart showing a sound opening type data creation process accompanying the reproduction process of the portable device 10;
FIG. 10 is a flowchart showing an image display process A for each corresponding part in the reproduction process of the portable device 10;
11A and 11B are diagrams showing a selection operation / display state of learning content in the reproduction process of the portable device 10, wherein FIG. 11A shows a learning content selection screen G, and FIGS. 9 is a diagram showing selection operation keys for the learning content selection screen G. FIG.
FIG. 12 is a view showing a synchronized playback state (No. 1) of an English conversation text / sound / image file based on the English teaching material playback file 12c in FIGS. 3 to 5;
FIG. 13 is a flowchart showing a corresponding part-by-correspondence image display process B accompanying the reproduction process of the portable device 10.
FIG. 14 is a flowchart showing a text-corresponding mouth display process which is executed by interruption in accordance with a part-by-part text highlight (highlight) display process corresponding to a command code HL in the reproduction process of the portable device 10;
FIG. 15 is a view showing a synchronized playback state (part 2) of an English conversation text / sound / image file based on the English teaching material playback file 12c in FIGS. 3 to 5;
[Explanation of symbols]
10… Portable equipment
11 ... CPU
12 ... memory
12A… ROM
12a1 ... File playback processing program
12a2 ... Dictionary data
12a3 ... Mouth type data
12B: FLASH memory
12b: Encrypted playback file (CAS file)
12C ... RAM
12c: decrypted playback file (CAS file)
12c1 ... header information
12c1a: Processing unit time
12c2: File sequence table
12c3: Time code file
12c4: Content data
12d: Designated text number
12e: Designated part number
12f: Text-based pronunciation mouth data
12g: Text highlight specification character
12h: Image (on / off) flag
12i: voice (on / off) flag
12j: Image developed flag
12k ... Image expansion buffer
13. External recording medium
14: Recording medium reading unit
15 ... Transmission control unit
16… Communication unit
17a ... input section
17b Coordinate input device
18 Display part
19a: Voice input unit
19b: Stereo audio output unit
20… Communication equipment (home PC)
30… Web server
N: Communication network (Internet)
X: Text display frame
Y: Image display frame
G0… English conversation text basic image
G1 to G3 ... Part 1 to Part 3 Image during conversation
M1 to M3: Part 1 to Part 3 mouth area

Claims

Audio data output means for outputting audio data;
Text synchronous display control means for displaying text data divided into a plurality of types of parts in synchronization with audio data output by the audio data output means,
Part-based image storage means for storing images corresponding to the plurality of types of parts,
Part designating means for designating any of the plurality of types of parts,
When the text data is displayed synchronously by the text synchronous display control means corresponding to the part specified by the part specifying means, an image corresponding to the part is stored in each part stored by the part-by-part image storage means. Image display control means for reading and displaying from among the corresponding images,
A voice display output control device comprising:

The text synchronous display control means,
Instruction code storage means for storing an instruction code for displaying text data divided into the plurality of types according to a preset elapsed time in synchronization with audio data output by the audio data output means; ,
According to the instruction code stored by the instruction code storage means, the part-divided text data is displayed in synchronization with audio data.
The audio display output control device according to claim 1, wherein:

The part designation means is a part designation instruction code stored in association with the instruction code stored by the instruction code storage means, and the part designation instruction code is used to store a plurality of types of text data parts. Sequentially specifying according to the output of audio data by the audio data output means,
3. The audio display output control device according to claim 2, wherein:

The part-by-part image storage means stores, for each of a plurality of types of parts, an image expressing a mouth movement of a person image corresponding to each of the parts.
The voice display output control device according to claim 1, wherein:

further,
Mouth image storage means for storing an image representing the movement of the mouth,
The part-by-part image storage means stores a person image corresponding to each of the plurality of types of parts and position information of a mouth of each person image,
The image display control means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the part designated by the part designation means, the designated part stored by the part-by-part image storage means The image of the mouth movement stored by the mouth image storage means is synthesized and displayed on the person image according to the position information of the mouth,
The voice display output control device according to claim 1, wherein:

further,
When displaying text data divided into a plurality of types of parts by the text synchronous display control means in synchronization with audio data output by the audio data output means, a text part for detecting the synchronously displayed text part Equipped with detection means,
The mouth image storage means stores images representing mouth movements corresponding to pronunciations of various characters, respectively.
The image display control means, when the text data is synchronously displayed by the text synchronous display control means in correspondence with the part designated by the part designation means, the designated part stored by the part-by-part image storage means Based on the mouth position information of the person image, the image of the mouth movement stored by the mouth image storage means is synthesized and displayed in accordance with the pronunciation of the synchronously displayed text part detected by the text part detection means. Let
The audio display output control device according to claim 5, wherein:

Audio data output means for outputting audio data;
Text synchronous display control means for displaying text data divided into a plurality of types of parts in synchronization with audio data output by the audio data output means,
Image display control means for displaying a person image corresponding to each of the plurality of types of parts,
Part designating means for designating any of the plurality of types of parts,
When the text data is displayed synchronously by the text synchronous display control means corresponding to the part designated by the part designation means, an image of the mouth corresponding to the part is displayed by the image display control means. A mouth image display control means for synthesizing and displaying the mouth position of the person image corresponding to
A voice display output control device comprising:

further,
When displaying text data divided into a plurality of types of parts by the text synchronous display control means in synchronization with audio data output by the audio data output means, a text part for detecting the synchronously displayed text part Detecting means;
Mouth image storage means for storing images representing mouth movements corresponding to the pronunciation of various characters,
The mouth image display control means, when the text data is displayed synchronously by the text synchronous display control means corresponding to the part specified by the part designating means, the corresponding part displayed by the image display control means Synthesizing and displaying an image of the mouth movement stored by the mouth image storage means corresponding to the pronunciation of the synchronous display text part detected by the text part detection means at the mouth position of the corresponding person image,
The audio display output control device according to claim 7, wherein:

An audio display output control processing program for controlling a computer of the electronic device to synchronously reproduce audio data, text data, and image data,
Said computer,
Audio data output means for outputting audio data,
Text synchronous display control means for displaying text data divided into a plurality of types of parts in synchronization with audio data output by the audio data output means,
Part-based image storage means for storing images corresponding to the plurality of types of parts,
Part designating means for designating any of the plurality of types of parts,
When the text data is displayed synchronously by the text synchronous display control means corresponding to the part specified by the part specifying means, an image corresponding to the part is stored in each part stored by the part-by-part image storage means. Image display control means for reading and displaying from among the corresponding images,
A computer-readable audio display output control processing program functioning as a computer.

An audio display output control processing program for controlling a computer of the electronic device to synchronously reproduce audio data, text data, and image data,
Said computer,
Audio data output means for outputting audio data,
Text synchronous display control means for displaying text data divided into a plurality of types of parts in synchronization with audio data output by the audio data output means,
Image display control means for displaying a person image corresponding to each of the plurality of types of parts,
Part designating means for designating any of the plurality of types of parts,
When the text data is displayed synchronously by the text synchronous display control means corresponding to the part designated by the part designation means, an image of the mouth corresponding to the part is displayed by the image display control means. A mouth image display control means for combining and displaying the mouth position of the person image corresponding to
A computer-readable audio display output control processing program functioning as a computer.