JP2004357092A

JP2004357092A - Information device, digital camera, and method of data display

Info

Publication number: JP2004357092A
Application number: JP2003153859A
Authority: JP
Inventors: 孝一 ▲斉▼藤; Koichi Saito
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-05-30
Filing date: 2003-05-30
Publication date: 2004-12-16
Anticipated expiration: 2023-05-30
Also published as: JP4269153B2

Abstract

<P>PROBLEM TO BE SOLVED: To increase visibility when a character is overlaid and displayed on a still image or an animation. <P>SOLUTION: When a static image or a dynamic image, which is photographed, a static image or a dynamic image, which is stored in a flash memory 18, and the like are reproduced, a CPU 11 causes a text converted by a speech recognition processing unit 19 and divided by a text dividing processing unit 20 to be overlaid and displayed on a static image or a dynamic image under reproducing on a display unit 15. In this case, when the length of the text is smaller than the screen size, the text is displayed directly in a centering presentation or in a vertical writing presentation. When the length of the text is larger than the screen size (overbrim of the screen), the character of the text is changed to a smaller font, the vocabulary of the text is changed into a shorten one or the text is displayed in scroll. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタルカメラ等の情報機器に係り、撮影した静止画や動画に音声データを関連付けて録音して記憶する情報機器、デジタルカメラおよびデータ表示方法に関する。
【０００２】
【従来の技術】
従来より、デジタルカメラには、画像（静止画、動画）の撮影機能に加え、音声を録音する機能を有し、録音した音声データを上記画像データに埋め込んで記憶することが可能となっている。また、携帯電話においては、カメラを備え、撮影した画像データに録音した音声データを埋め込み、該画像データをメールに添付して送受信する機能を備えるものがある。
【０００３】
従来、このようなデジタルカメラや携帯電話にあっては、再生中の画像データに音声データが埋め込まれていた場合、音声データをスピーカで出力していた。しかしながら、周囲の雑音が大きい場合などには、音声が聞き取りにくかったり、周囲が静かな場合などには、周りに迷惑をかけたりするという問題があった。そこで、画像データに埋め込まれた音声データをテキストに変換し、画像データの再生中に文字として表示させることができれば、ユーザにとって利便性が向上する。
【０００４】
例えば、携帯電話においては、カメラを備え、撮影した画像データをメールに添付して送受信し、メールに画像データが添付されていた場合、画像データを表示するとともに、該画像データ上にメール本文を重ねて表示する技術が提案されている（例えば特許文献１、特許文献２、特許文献３参照）。
【０００５】
特許文献１に記載されている技術では、画像データにメール本文を重ねて表示する際に、画像データの明度、コントラストを補正するとともに、文字色を画像データの色度に応じて設定することで文字の視認性を向上させている。また、画像データが表示しきれない場合には、表示エリアに表示されない部分をスクロールして表示させるようになっている。
【０００６】
また、特許文献２に記載されている技術では、メールに画像データが添付されていた場合、画像データを表示するとともに、該画像データ上にメール本文を重ねて表示する。このとき、メール本文が長い場合、画像データを固定した状態でメール本文のみをスクロールすることができるようになっている。
【０００７】
また、特許文献３に記載されている技術では、多くの画像データの中から所望する画像データを検索するために、画像データ毎に、内容が分かるような音声を関連付け、該音声を可変速再生して目的の画像データを探すようになっている。また、上記音声を音声認識によりテキストに変換し、該テキストを画像データのインデックスとして用いて、画像データのソート（並び替え）を行なうようになっている。
【０００８】
また、特許文献４に記載されている技術では、音楽データの再生中に、再生中の音楽に適した被写体（歌詞）を撮影すると、音楽データの再生タイミングに合わせて撮影した被写体の画像を記憶し、次回の音楽データ再生時には、再生タイミングに合わせて対応する画像を表示するようになっている。被写体として歌詞を撮影した場合には、歌詞（イメージ）をテキストデータに変換することで、音楽データの再生に合わせて歌詞（テキスト）を表示するようになっている。
【０００９】
【特許文献１】
特開２００２−１４０２６５号公報
【特許文献２】
特開２００２−２８８０８２号公報
【特許文献３】
特開２００２−４１５２９号公報
【特許文献４】
特開２０００−３５０１４６号公報
【００１０】
【発明が解決しようとする課題】
しかしながら、上述した特許文献１による従来技術では、テキストが長文であった場合、画像データを表示した画面一杯にテキストが表示されてしまい画像データが見にくくなるという問題があった。仮に、上述した特許文献２にあるように、メール本文が長い場合、画像データを固定した状態でメール本文のみを行単位でスクロールしたとしても、それ以外の表示方法はなく、表現方法が乏しいという問題がある。また、上述した特許文献３による従来技術では、音声データをテキストへ変換する音声認識技術などが開示されているが、変換したテキストの表示方法などについては何ら記述されていない。また、特許文献４による従来技術では、変換したテキストの表示方法については、単に画像データに重ねて表示するか、異なる領域に表示するとしており、それ以外の表示方法はなく、表現方法が乏しく、画像データおよびテキストデータの視認性を向上させることができない。
【００１１】
また、従来のデジタルカメラや携帯電話においては、上述した携帯電話のように画像データの再生中に、これら画像データに埋め込まれた音声データをテキストに変換し、該テキストを文字として表示させる技術は提案されていない。仮に、特許文献１ないし４に記載されている従来技術をデジタルカメラや携帯電話に適用したとしても、上述した課題、すなわち画像が見にくくなる、表現方法が乏しいという課題が残ってしまうという問題があった。
【００１２】
そこで本発明は、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができる情報機器、デジタルカメラおよびデータ表示方法を提供することを目的とする。
【００１３】
【課題を解決するための手段】
上記目的達成のため、請求項１記載の発明による情報機器は、音声データが付加された画像データを再生表示する情報機器において、画像データを表示する表示手段と、画像データに付加された音声データを認識して文字データに変換する音声認識手段と、前記音声認識手段により変換された文字データを、前記表示手段に表示されている画像データに重ねて表示させる文字表示制御手段とを具備することを特徴とする。
【００１４】
また、好ましい態様として、例えば請求項２記載のように、請求項１記載の情報機器において、前記音声認識手段により変換された文字データが前記表示手段の一画面内に表示可能である否かを判断する判断手段を備え、前記文字表示制御手段は、前記判断手段により文字データが一画面上に入りきらないと判断された場合、文字データをスクロール表示させるようにしてもよい。
【００１５】
また、好ましい態様として、例えば請求項３記載のように、請求項１記載の情報機器において、前記文字表示制御手段は、前記判断手段により文字データが一画面上に入りきると判断された場合、文字データの全文を一画面上に表示させるようにしてもよい。
【００１６】
また、好ましい態様として、例えば請求項４記載のように、請求項１記載の情報機器において、前記文字表示制御手段は文字データの文字サイズを縮小する縮小手段を備え、前記文字表示制御手段は、前記判断手段により文字データが１画面上に入りきらないと判断された場合、前記縮小手段により文字サイズが縮小された文字データの全文を一画面上に表示させるようにしてもよい。
【００１７】
また、好ましい態様として、例えば請求項５記載のように、請求項１記載の情報機器において、前記文字表示制御手段は文字データに含まれる語彙を短縮語に置換する短縮語置換手段を備え、前記文字表示制御手段は、前記判断手段により文字データが一画面上に入りきらないと判断された場合、前記短縮語置換手段により語彙が置換された文字データの全文を一画面上に表示させるようにしてもよい。
【００１８】
また、好ましい態様として、例えば請求項６記載のように、請求項１ないし５のいずれかに記載の情報機器において、前記音声認識手段により変換された文字データを所定の長さで分割する分割手段を更に備え、前記文字表示制御手段は、前記分割手段により分割された文字データ毎に所定のタイミングで、前記表示手段に表示されている静止画あるいは動画に重ねて表示させるようにしてもよい。
【００１９】
また、好ましい態様として、例えば請求項７記載のように、請求項６記載の情報機器において、前記文字表示制御手段は、画像データの再生時間に基づいて、前記分割手段により分割された文字データ毎に表示時間を設定する表示時間設定手段を備え、前記分割手段により分割された文字データ毎に、前記表示時間設定手段により設定された表示時間に基づいて表示させるようにしてもよい。
【００２０】
また、好ましい態様として、例えば請求項８記載のように、請求項６または７に記載の情報機器において、前記表示手段に表示されている文字データの切換指示を行なう指示手段を更に備え、前記文字表示制御手段は、前記指示手段から切換指示があると、前記表示手段に表示されている文字データを、前記分割手段により分割された次の文字データに切り換えるようにしてもよい。
【００２１】
また、好ましい態様として、例えば請求項９記載のように、請求項１ないし８のいずれかに記載の情報機器において、画像データを撮影する撮影手段と、音声を記録する音声記録手段と、前記撮影手段により撮影された画像データに前記音声記録手段により記録された音声データを埋め込む埋め込み手段とを具備し、前記音声認識手段は、前記撮影手段により撮影された画像データに埋め込まれている音声データを認識して文字データに変換するようにしてもよい。
【００２２】
また、好ましい態様として、例えば請求項１０記載のように、請求項１ないし８のいずれかに記載の情報機器において、通信回線網を介して機器間でメールを送受信するメール送受信手段を具備し、前記音声認識手段は、前記メール送受信手段により送受信されるメールに添付されている画像データに埋め込まれている音声データを認識して文字データに変換するようにしてもよい。
【００２３】
上記目的達成のため、請求項１１記載の発明によるデータ表示方法は、音声データが埋め込まれた画像データを再生表示するデータ表示方法において、前記画像データに埋め込まれた音声データを認識して文字データに変換し、該文字データを、前記画像データに重ねて表示させることを特徴とする。
【００２４】
また、好ましい態様として、例えば請求項１２記載のように、請求項１１記載のデータ表示方法において、前記文字データが一画面内に表示可能である否かを判断し、文字データが一画面上に入りきらないと判断された場合、文字データをスクロール表示させるようにしてもよい。
【００２５】
また、好ましい態様として、例えば請求項１３記載のように、請求項１１記載のデータ表示方法において、前記文字データが一画面上に入りきると判断された場合、文字データの全文を一画面上に表示させるようにしてもよい。
【００２６】
また、好ましい態様として、例えば請求項１４記載のように、請求項１１記載のデータ表示方法において、前記文字データが１画面上に入りきらないと判断された場合、文字データの文字サイズを縮小し、縮小された文字データの全文を一画面上に表示させるようにしてもよい。
【００２７】
また、好ましい態様として、例えば請求項１５記載のように、請求項１１記載のデータ表示方法において、前記文字データが一画面上に入りきらないと判断された場合、文字データに含まる語彙を短縮語に置換し、短縮語に置換された文字データの全文を一画面上に表示させるようにしてもよい。
【００２８】
また、好ましい態様として、例えば請求項１６記載のように、請求項１１ないし１５のいずれかに記載のデータ表示方法において、前記文字データを所定の長さで分割し、分割された文字データ毎に、前記静止画あるいは動画に重ねて表示させるようにしてもよい。
【００２９】
また、好ましい態様として、例えば請求項１７記載のように、請求項１６記載のデータ表示方法において、前記静止画あるいは動画の再生時間に基づいて、前記分割された文字データ毎に表示時間を設定し、分割された文字データ毎に、前記設定された表示時間に基づいて表示させるようにしてもよい。
【００３０】
また、好ましい態様として、例えば請求項１８記載のように、請求項１６または１７に記載のデータ表示方法において、前記文字データの切換指示があると、画面上に表示されている文字データを、分割された次の文字データに切り換えるようにしてもよい。
【００３１】
また、好ましい態様として、例えば請求項１９記載のように、請求項１１ないし１８のいずれかに記載のデータ表示方法において、前記画像データは、撮影手段により撮影されたものであり、前記音声データは、音声記録手段により記録されたものであってもよい。
【００３２】
また、好ましい態様として、例えば請求項２０記載のように、請求項１１ないし１８のいずれかに記載のデータ表示方法において、前記画像データは、通信回線網を介して機器間で送受信されるメールに添付されたものであってもよい。
【００３３】
上記目的達成のため、請求項２１記載の発明によるデジタルカメラは、画像データを撮影する撮影手段と、音声を記録する音声記録手段と、前記撮影手段により撮影された画像データに前記音声記録手段により記録された音声データを埋め込む埋め込み手段と、前記画像データを表示する表示手段と、前記画像データに埋め込まれた音声データを認識して文字データに変換する音声認識手段と、前記音声認識手段により変換された文字データを、前記表示手段に表示されている画像データに重ねて表示させる文字表示制御手段とを具備したことを特徴とする。
【００３４】
上記目的達成のため、請求項２２記載の発明による情報機器は、通信回線網を介して機器間でメールを送受信するメール送受信手段と、前記メール送受信手段により送受信されるメールに添付されている画像データに埋め込まれている音声データを認識して文字データに変換する音声認識手段と、前記音声認識手段により変換された文字データを、前記メールに添付されている画像データに重ねて表示させる表示制御手段とを具備したことを特徴とする。
【００３５】
また、好ましい態様として、例えば請求項２３記載のように、請求項２２記載の情報機器において、前記情報機器は、携帯電話機であってもよい。
【００３６】
また、好ましい態様として、例えば請求項２４記載のように、請求項２２または２３のいずれかに記載の情報機器において、前記音声認識手段は、前記メールに添付されている画像データに埋め込まれている音声データを可聴音に変換せずに認識するようにしてもよい。
【００３７】
【発明の実施の形態】
以下、本発明の実施の形態を、デジタルカメラに適用した一実施例として、図面を参照して説明する。
【００３８】
Ａ．第１実施形態
Ａ−１．第１実施形態の構成
図１は、本発明の第１実施形態によるデジタルカメラの構成を示すブロック図である。図において、レンズ１は、いわゆる写真レンズであり、被写体を光学的に撮影し、ＣＣＤ２上に結像する。ＣＣＤ２は、電荷をアレイ状に転送するＭＯＳ（ｍｅｔａｌ−ｏｘｉｄｅｓｅｍｉｃｏｎｄｕｃｔｏｒ）構造のデバイスで、タイミング発生器（ＴＧ）３、垂直ドライバ４によって駆動され、一定周期毎に光電変換出力を１画面分出力する。タイミング発生器３および垂直ドライバ４は、ＣＣＤ２の読み出しに必要なタイミング信号を生成する。サンプルホールド回路（Ｓ／Ｈ）５は、ＣＣＤ２から読み出された時系列的なアナログ信号を、ＣＣＤ２の解像度に適合した周波数でサンプリングする。Ａ／Ｄ変換器６は、サンプリングされた信号をデジタル信号（ベイヤーデータ）に変換する。
【００３９】
カラープロセス回路７は、Ａ／Ｄ変換器６の出力から輝度・色差マルチプレクス信号（以下、ＹＵＶ信号と言う）を生成するためのカラープロセス処理を行なう。カラープロセス処理では、上記ベイヤーデータがＲ，Ｇ，Ｂデータに変換され、さらに、デジタルの輝度、色差マルチプレクス信号（Ｙ，Ｃｂ，Ｃｒデータ）に変換される。
【００４０】
ＤＭＡコントローラ８は、カラープロセス回路７とＤＲＡＭ１０（正確にはＤＲＡＭインターフェース９）との間のデータ転送を、ＣＰＵ１１の介在なしに行なうものであり、いわゆるダイレクト・メモリ転送（ＤＭＡ：ｄｉｒｅｃｔｍｅｍｏｒｙａｃｃｅｓｓ）を行なうものである。ＤＭＡコントローラ８は、カラープロセス回路７のＹ，Ｃｂ，Ｃｒデータ出力を、同じくカラープロセス回路７の同期信号、メモリ書き込みイネーブル、クロック出力を用いて、一度、ＤＭＡコントローラ８内部のバッファに書き込み、ＤＲＡＭインターフェース（ＤＲＡＭＩ／Ｆ）９を介してＤＲＡＭ１０にＤＭＡ転送を行なう。ＤＲＡＭインターフェース９は、ＤＲＡＭ１０とＤＭＡコントローラ８との間の信号インターフェース、およびＤＲＡＭ１０とバスとの間の信号インターフェースをとるものである。ＤＲＡＭ１０は、ＤＲＡＭインターフェース９を介してＤＭＡコントローラ８からＤＭＡ転送される画像データ（Ｙ，Ｃｂ，Ｃｒデータ）を蓄積する。
【００４１】
ＣＰＵ１１は、プログラムＲＯＭ２０に記録された、所定のプログラムを実行してカメラの動作を集中制御するものであり、メインスイッチ、記録／再生モード切り換えスイッチ、機能選択キー、シャッターキーなどの実施ボタンを含む操作部１６が接続されている。記録モードでは、そのモード用のプログラムが、また、再生モードでは、そのモード用のプログラムがプログラムＲＯＭ２０からＣＰＵ１１の内部のＲＡＭにロードされて実行される。ＣＰＵ１１は、上記画像データ（Ｙ，Ｃｂ，Ｃｒデータ）のＤＲＡＭ１０へのＤＭＡ転送終了後に、該画像データ（Ｙ，Ｃｂ，Ｃｒデータ）を、ＤＲＡＭインターフェース９を介してＤＲＡＭ１０から読み出し、ＶＲＡＭコントローラ１２を介してＶＲＡＭ１３に書き込む。
【００４２】
ＣＰＵ１１は、上記シャッターキーが押下された記録保存の状態では、ＤＲＡＭ１０に書き込まれている１フレーム分のＹ，Ｃｂ，Ｃｒデータを、ＤＲＡＭインターフェース９を介して、Ｙ，Ｃｂ，Ｃｒの各コンポーネント毎にＭＣＵ単位で、１フレームを８０×６０ブロック（「０」〜「４７９９」）に分割した、１６×１６ピクセルからなるＭＣＵブロック毎に読み出して、さらに付加する画像のＭＣＵブロックを挿入してＪＰＥＧ処理部１７に送る。該ＪＰＥＧ処理部１７に送られた画像データは、ＤＣＴ変換、量子化、符号化といった処理を経て圧縮される。ＣＰＵ１１は、圧縮後の画像データに、ヘッダ情報を付加して、不揮発性メモリであるフラッシュメモリ１８に書き込む。上記ヘッダ情報には、画像に関する情報などが含まれる。
【００４３】
上記ＤＣＴ変換では、上記ＭＣＵブロックのデータ（以下、単にＭＣＵデータという）は、所定ピクセルからなるブロックの輝度成分データと、所定ピクセルからなるブロックの色差成分データとを１組として、個々のブロック毎に、周波数成分の大きさを示すＤＣＴ係数へ変換される。そして、ＣＰＵ１１は、上記１フレーム分のＹ，Ｃｂ，Ｃｒデータの圧縮処理およびフラッシュメモリ１８への全圧縮データの書き込みが終了すると、再度、ＣＣＤ２からＤＲＡＭ１０への経路を起動する。
【００４４】
ＶＲＡＭコントローラ１２は、ＶＲＡＭ１３とバスとの間、およびＶＲＡＭ１３とデジタルビデオエンコーダ１４との間のデータ転送を制御するものであり、表示用画像（プレビュー画像）のＶＲＡＭ１３への書き込みと、同画像のＶＲＡＭ１３からの読み出しを制御する。
【００４５】
ＶＲＡＭ１３は、いわゆるビデオＲＡＭであり、プレビュー画像が書き込まれると、そのプレビュー画像がデジタルビデオエンコーダ１４を介して表示装置１５に送られ、表示されるようになっている。なお、ビデオＲＡＭには、書き込み用と読み出し用の２つのポートを備え、画像の書き込みと読み出しを同時並行的に行なうことができるようになっている。
【００４６】
デジタルビデオエンコーダ（以下、単にビデオエンコーダという）１４は、上記画像データ（Ｙ，Ｃｂ，Ｃｒデータ）を、ＶＲＡＭコントローラ１２を介してＶＲＡＭ１３から周期的に読み出して、該画像データを元にビデオ信号を発生して表示装置１５に出力する。これにより、記録モードの状態における表示装置１５には、現在、ＣＣＤ２から取り込まれている画像情報に基づくスルー画像が表示される。表示装置１５は、例えば２７９×２２０の画素数からなる、カメラ本体の裏側に取り付けられた数インチ程度の小型の液晶パネルである。
【００４７】
ＪＰＥＧ処理部１７は、ＪＰＥＧの圧縮と伸長を行なう部分である。ＪＰＥＧの圧縮パラメータは圧縮処理の都度、ＣＰＵ１１から与えられる。なお、ＪＰＥＧ処理部１７は、処理速度の点で専用のハードウェアにより実現することが好ましいが、ＣＰＵ１１でソフト的に行なうことも可能である。
【００４８】
フラッシュメモリ１８は、書き換え可能な読み出し専用メモリ（ＰＲＯＭ：ｐｒｏｇｒａｍｍａｂｌｅｒｅａｄｏｎｌｙｍｅｍｏｒｙ）のうち、電気的に全ビット（又はブロック単位）の内容を消して内容を書き直せるものを指す。フラッシュメモリ１８は、カメラ本体から取り外せない固定型であってもよいし、カード型やパッケージ型のように取り外し可能なものであってもよい。
【００４９】
音声認識処理部１９は、撮影した画像データ（静止画像や動画像）、記録されている画像データ（静止画像、動画像など）に埋め込まれた音声、または付加された音声を認識し、テキストに変換する。画像データに埋め込まれた、あるいは付加された音声データは、所定のフォーマット（例えば、ＷＡＶ形式、ＭＰ３形式、ＰＣＭ形式など）により符号化されている。テキスト分割処理部２０は、上記音声認識部１９により変換されたテキストを解析し、句点、読点、もしくは文末を検出してテキストを分割する。
【００５０】
また、プログラムＲＯＭ２１は、ＣＰＵ１１の動作用のプログラム（撮影用プログラムや、静止画・動画再生プログラム、テキスト表示用プログラムなど）や、上記音声認識処理部１９により変換されたテキストを、再生されている静止画像や動画像上に表示する際に、所定の語彙を短縮語彙（例えば、ワールドカップ杯→Ｗ杯など）に変換するための短縮類語データベースを記憶している。ＪＰＥＧ処理部１７は、原画像データを所定の圧縮率で圧縮し、フラッシュメモリ１８に保存する。
【００５１】
ＣＰＵ１１は、上記プログラムＲＯＭ２０の各種プログラムに従って各部を制御する。特に、本実施形態では、ＣＰＵ１１は、撮影した静止画像や動画像、フラッシュメモリ１８に記録されている静止画像、動画像などを再生する際に、音声認識処理部１９により変換され、テキスト分割処理部２０により分割されたテキストを、表示装置１５に再生中の静止画像や動画像上に表示するようになっている。このとき、ＣＰＵ１１は、テキスト長が画面サイズより小さい場合には、そのままセンタリング表示や縦書き表示する。また、テキスト長が画面サイズより大きい（画面に入りきらない）場合には、テキスト文字のフォントを小さいものに変換したり、テキストの語彙を短縮語彙に変換したり、スクロールしたりするなどして表示する。テキストの表示方法、すなわち縮小するか、短縮するかなどは、ユーザによって設定可能となっている。また、本第１実施形態では、再生中の静止画や動画像、および変換されたテキストの双方が見やすいように、１行分のテキスト表示領域を確保して表示するようにしているが、表示装置１５の表示領域の大きさによっては、２行分、あるいは３行分など、複数行のテキスト表示領域を確保して表示するようにしてもよい。
【００５２】
Ａ−２．第１実施形態の動作
次に、上述した第１実施形態によるデジタルカメラの動作について説明する。ここで、図２は、本第１実施形態の動作を説明するためのフローチャートである。まず、ＣＰＵ１１は、撮影した静止画像、フラッシュメモリ１８に記録されている静止画像などを再生する際に、ステップＳ１０で、音声データがあるか否かを判断する。静止画像に音声データがない場合には、当該処理を終了し、通常の再生を行なう。
【００５３】
一方、静止画像に音声データがある場合には、ステップＳ１２で、音声認識処理部１９により静止画像に音声データを認識し、テキストデータに変換する。次に、ステップＳ１４で、テキスト分割処理部２０により、上記テキストデータの句点、読点または文末を認識し、テキストを分割し、ステップＳ１６で、変数Ｎを「１」だけインクリメントする。該変数Ｎは、分割したテキストをカウントするもので、初期値は「０」である。
【００５４】
次に、ステップＳ１８で、縦書き表示モードに設定されているか否かを判断する。縦書き表示にするか否かは、予めユーザにより設定されているものとする。ここで、縦書き表示に設定されている場合には、ステップＳ２０で、表示テキストを縦書き用の文字フォントに変換した後、ステップＳ２２へ進む。一方、縦書き表示に設定されていなければ、そのままステップＳ２２へ進む。
【００５５】
次に、ステップＳ２２では、音声認識により変換されたテキスト長が画面サイズより大であるか否かを判断する。テキスト長が画面サイズと同じか、小さければ、ステップＳ２４で、図４に示すように、再生中の静止画像の下部中央にセンタリング表示する。図示の例では、音声データが「おきなわのうみ」であり、音声認識の結果、テキストが「沖縄の海」となる。そして、この場合、テキスト「沖縄の海」が画面サイズより小さいので、再生画面の下部にセンタリング表示されている。
【００５６】
一方、テキスト長が画面サイズより大である場合には、ステップＳ２６で、縮小モードであるか否かを判断する。そして、縮小モードに設定されている場合には、ステップＳ２８で、縮小可能であるか否か、すなわち既に最小文字サイズになっていないかを判断する。そして、縮小可能であるならば、ステップＳ３０で、テキストの文字サイズを縮小し、ステップＳ２２へ戻り、上述した処理を繰り返し、ステップＳ２８で、文字サイズを段階的に縮小し、テキスト長が画面サイズと同じか、小さくなれば、ステップＳ２４で、図５に示すように、再生中の静止画像の下部中央にセンタリング表示する。
【００５７】
一方、縮小過程で、これ以上縮小すると、判読が難しくなるサイズまで縮小しても、画面サイズより小さくならなければ、後述する短縮処理（ステップＳ３４）へ進む。
【００５８】
一方、縮小モードに設定されていない場合には、ステップＳ３２で、短縮モードであるか否かを判断する。そして、短縮モードに設定されていない場合には、ステップＳ４０で、図６（ａ）〜（ｃ）に順次示すように、再生画面の下部に、右から左へ流れるようにスクロール表示する。
【００５９】
一方、短縮モードに設定されている場合には、ステップＳ３４で、短縮類語データベースを検索し、ステップＳ３６で、マッチングするか否かを判断する。そして、マッチングする語彙があれば、ステップＳ３８で、その語彙を短縮語彙に変換し、ステップＳ２２へ戻り、上述した処理を繰り返し、ステップＳ３２で、テキストに含まれる語彙を短縮語彙に変換し、テキスト長が画面サイズと同じか、小さくなれば、ステップＳ２４で、再生中の静止画像または動画像の下部中央にセンタリング表示する。図７には、テキストに含まれる語彙「ワールドカップ」が短縮語彙「Ｗ杯」に変換される様子を示している。
【００６０】
一方、テキストに含まれる語彙を短縮語彙に変換し、短縮すべき語彙がなくなり、かつテキスト長が画面サイズより大きい場合、例えば、図７に示すテキストの場合には、ステップＳ４０で、図８に示すように、テキストをスクロール表示する。
【００６１】
次に、ステップＳ４２で、所定のキー操作が行なわれたか否かを判断し、所定のキー操作が行なわれるまで待機し、テキストを画面上に表示させておく。スクロール表示の場合には、一旦、一文のスクロールが終わっても、再度、スクロールを繰り返す。
【００６２】
そして、画面上にテキストを表示させた状態で、所定のキー操作が行なわれると、ステップＳ４４で、テキストが終了であるか否かを判断し、まだ、表示すべきテキストがあれば、ステップｓ１６へ戻り、変数Ｎを「１」だけインクリメントし、前述した処理に従って、次の分割したテキストの再生を行なう。一方、分割したテキストの全てを表示し終えれば、当該処理を終了する。
【００６３】
また、図９には、表示テキストを縦書き用の文字フォントに変換する様子を示しており、図１０には、再生画面上に縦書きでテキストを表示する様子を示している。縦書きテキストについても、前述したセンタリング、縮小、短縮、スクロール表示のいずれもが有効である。
【００６４】
Ｂ．第２実施形態
次に、本発明の第２実施形態について説明する。
Ｂ−１．第２実施形態の構成
なお、デジタルカメラの構成は、図１に示す第１実施形態によるデジタルカメラと同様であるので説明を省略する。本第２実施形態では、前述した第１実施形態の動作に加えて、動画に添付された音声の表示を制御することを特徴としている。静止画であれば、再生時間の長短を考慮せずに、静止画を表示した状態で、テキストを表示すればよいが、動画の場合、可能な限り、動画再生時間内で全てのテキストを表示することが望ましい。但し、再生時間が短いような場合には、テキスト一文の表示時間が短くなる可能性があり、あまり短いと、判読するのが難しくなるので、容易に判読できるような手段が必要となる。そこで、本第２実施形態では、動画再生時間内で、全てのテキストを表示させつつ、表示時間が短くなるようなテキストについては、判読可能とすることを優先し、全体として、円滑に、かつ容易に動画に添付された音声をテキスト表示する手段を提供する。
【００６５】
Ｂ−２．第２実施形態の動作
ここで、図１１および図１２は、本発明の第２実施形態の動作を説明するためのフローチャートである。まず、ＣＰＵ１１は、撮影した動画像、フラッシュメモリ１８に記録されている動画像などを再生する際に、ステップＳ６０で、動画像に添付された音声データがあるか否かを判断する。動画像に添付された音声データがない場合には、当該処理を終了し、通常の動画再生を行なう。
【００６６】
一方、動画像に添付された音声データがある場合には、ステップＳ５２で、音声認識処理部１９により静止画像、動画像に音声データを認識し、テキストデータに変換する。次に、ステップＳ５４で、動画像の再生時間Ｔを取得する。そして、ステップＳ５６で、テキスト分割処理部２０により、上記テキストデータの句点、読点または文末を認識してテキストを分割し、ステップＳ５８で、分割数おび動画再生時間に従って、各分割テキストの表示時間を自動的に算出する。本第２実施形態では、例えば、分割数が３つであれば、動画再生時間を３分割するという単純な演算により、各分割テキストの表示時間を算出する。これ以外には、分割テキストの長さ（文字数）に応じて、表示時間を長くしたり短くしたりするようにしてもよい。
【００６７】
次に、ステップＳ６０で、動画の再生が終了したか否かを判断し、再生が終了していれば、当該処理を終了する。一方、再生が終了していなければ、ステップＳ６２で、変数Ｎを「１」だけインクリメントする。該変数Ｎは、分割したテキストをカウントするもので、初期値は「０」である。次に、ステップＳ６４で、テキストの表示時間を計時する表示カウンタをスタートする。
【００６８】
次に、ステップＳ６６で、変数Ｎで示される、分割テキスト長（文字数）が画面サイズより大であるか否かを判断する。そして、テキスト長が画面サイズと同じか、小さければ、ステップＳ６８で、再生中の動画像の下部中央にセンタリング表示する。
【００６９】
一方、テキスト長が画面サイズより大である場合には、ステップＳ７０で、縮小モードであるか否かを判断する。そして、縮小モードに設定されている場合には、ステップＳ７２で、縮小可能であるか否か、すなわち既に最小文字サイズになっていないかを判断する。そして、縮小可能であるならば、ステップＳ７４で、テキストの文字サイズを縮小し、ステップＳ６６へ戻り、上述した処理を繰り返し、ステップＳ７４で、文字サイズを段階的に縮小し、テキスト長が画面サイズと同じか、小さくなれば、ステップＳ６８で、再生中の動画像の下部中央にセンタリング表示する。
【００７０】
一方、縮小過程で、これ以上縮小すると、判読が難しくなるサイズまで縮小しても、画面サイズより小さくならなければ、短縮処理（ステップＳ７８）へ進む。
【００７１】
一方、縮小モードに設定されていない場合には、ステップＳ７６で、短縮モードであるか否かを判断する。そして、短縮モードに設定されていない場合には、ステップＳ８４で、再生画面の下部に、右から左へ流れるようにスクロール表示する。
【００７２】
一方、短縮モードに設定されている場合には、ステップＳ７８で、短縮類語データベースを検索し、ステップＳ８０で、マッチングするか否かを判断する。そして、マッチングする語彙があれば、ステップＳ８２で、その語彙を短縮語彙に変換し、ステップＳ６６へ戻り、上述した処理を繰り返し、ステップＳ８２で、テキストに含まれる語彙を短縮語彙に変換し、テキスト長が画面サイズと同じか、小さくなれば、ステップＳ６８で、再生中の動画像の下部中央にセンタリング表示する。
【００７３】
一方、テキストに含まれる語彙を短縮語彙に変換し、短縮すべき語彙がなくなり、かつテキスト長が画面サイズより大きい場合には、ステップＳ８４で、テキストをスクロール表示する。
【００７４】
次に、ステップＳ８６で、表示時間が終了したか否かを判断し、終了していなければ、ステップＳ８８で、所定のキー操作が行なわれたか否かを判断し、所定のキー操作が行なわれなければ、分割したテキストの表示を継続する。一方、表示時間内に所定のキー操作が行なわれると、ステップＳ６０へ戻り、動画再生が終了していなければ、変数Ｎを「１」だけインクリメントし、前述した処理に従って、次の分割したテキストの表示を行なう。これは、動画再生時間がテキストの長さに比べて十分に長い場合、１つ１つの分割されたテキストの表示時間が長くなりすぎて冗長になってしまうためで、この場合、ユーザが所定のキー操作を行なうことで、次の分割したテキストの表示へ移行させることが可能となっている。
【００７５】
一方、表示時間が終了すると、ステップＳ９０で、該分割したテキストに対する表示時間が所定時間（例えば、２秒）より短いか否かを判断する。これは、分割したテキストの表示時間が極端に短いと、判読するのが難しくなるためにである。そして、表示時間が所定時間以上であれば、ステップＳ９２で、テキストが終了であるか否かを判断し、まだ、表示すべきテキストがあれば、ステップＳ６０へ戻り、動画再生が終了していなければ、変数Ｎを「１」だけインクリメントし、前述した処理に従って、次の分割したテキストの表示を行なう。一方、分割したテキストの全てを表示し終えれば、当該処理を終了する。
【００７６】
一方、分割したテキストに対する表示時間が所定時間より短い場合には、判読が難しい可能性があるので、ステップＳ９４で、動画再生を一時停止する。そして、ステップＳ９６で、所定のキー操作が行なわれたか否かを判断し、所定のキー操作が行なわれるまで待機し、動画再生を一時停止するとともに、分割テキストを画面上に表示させておく。スクロール表示の場合には、動画再生を一時停止するとともに、分割テキストのスクロールを繰り返す。
【００７７】
そして、一時停止した再生動画上にテキストを表示させた状態で、所定のキー操作が行なわれると、ステップＳ９８で、テキストが終了であるか否かを判断し、まだ、表示すべきテキストがあれば、ステップＳ１００で、動作再生の一時停止を解除し（再生を再開し）、ステップＳ６０へ戻り、動画再生が終了していなければ、変数Ｎを「１」だけインクリメントし、前述した処理に従って、次の分割したテキストの再生を行なう。一方、分割したテキストの全てを再生し終えれば、当該処理を終了する。
【００７８】
なお、上述した第２実施形態では、縦書きについて説明しなかったが、第１実施形態と同様に、縦書きテキストについても、前述したセンタリング、縮小、短縮、スクロール表示のいずれもが有効である。
【００７９】
Ｃ．第３実施形態
次に、本発明の第３実施形態を、カメラ付き携帯電話に適用した一実施例として、図面を参照して説明する。
【００８０】
Ｃ−１．第３実施形態の構成
図１３（ａ），（ｂ）は、携帯電話１の外観を示す模式図である。本第３実施形態における携帯電話１は、蓋部と本体部からなる二つ折り構造であり、図１３（ａ）は、携帯電話１を開いた状態での正面図を示し、図１３（ｂ）は、携帯電話１を開いた状態での背面図を示したものである。アンテナ１１は、蓋部の背面に設けられており、伸縮自在となっている。スピーカ１２は、蓋部の前面側に設けられており、音声出力を行なう。表示部（メイン表示部）１３は、カラーＬＣＤ（液晶表示器）であり、１２０ドット（幅）×１６０ドット（高さ）である。該表示部１３には、画像付き（背景画付き）メールの画像（写真）と本文とを同時に表示可能となっている。
【００８１】
キー操作部１４は、本体部の前面に設けられており、メールキー１４１、アドレスキー１４２、カメラキー１４３、サブメニューキー１４４、十字キー（４方向カーソルキー）１４５、センターキー（決定キー）１４６、オフフックキー１４７、オンフックキー１４８、テンキー１４９などからなる。メールキー１４１は、メール機能を起動し、メールメニューを表示するためのものである。アドレスキー１４２は、送信先のメールアドレスや発呼先の電話番号などを選択する際に用いるアドレス帳を開くためのものである。カメラキー１４３は、カメラ機能を起動し、カメラメニューを表示するためのものである。なお、このカメラキー１４３は、メール作成モードやメール表示モードにおいてメール本文と背景画像とを重ね合わせて表示しているときには、表示をメール本文のみにしたり背景画像のみにしたりする表示切換キーとしての機能を備えている。
【００８２】
サブメニューキー１４４は、各機能におけるサブメニューを表示させるためのものである。十字キー１４５は、各機能において表示されたメニューの中から所望のメニューを選択したり、テンキーでデータを入力する際にデータ入力位置をシフトさせたりするためのものである。センターキー（決定キー）１４６は十字キー１４５の操作で選択されたメニューなどを確定するためのものである。オフフックキー１４７は通話を開始するためのものであり、オンフックキー１４８通話を終了するためのものである。テンキー１４９は、電話番号や文字の入力の際に使用するものである。マイク１５は、本体部の下部に設けられており、音声入力を行なう。
【００８３】
サブ表示部１６は、蓋部の背面に設けられている。背面キー１７は、透明、または半透明部材で構成され、着信の際、発光するＬＥＤ１７１を内蔵している。撮像レンズ１８は、蓋部２の背面、上記サブ表示部１６の下部に設けられている。報知スピーカ１９は、着信などを報知するものであり、蓋部を本体部に閉じた状態でも報知音が聞こえるように、本体部の裏面に配置されている。
【００８４】
次に、図１４は、携帯電話１の回路構成を示すブロック図である。無線送受信部２０は、無線によりアンテナ１１を介して音声やデータ（メールデータ）を送受する。無線信号処理部２１は、無線送受信部２０で受信した音声やデータ（メールデータ）を復調し、あるいは無線送受信部２０へ送信するための音声やデータを変調するなどの無線通信に必要な処理をする。制御部２２は、各種動作および全体の動作を制御する。
【００８５】
画像メモリ２３は、撮像部（撮像レンズ１８、撮像モジュール１８１、ＤＳＰ１８２）で撮像され、画像処理プログラム領域２４１３に格納されたプログラムにより圧縮符号化された画像ファイルを格納するためのメモリである。ＲＯＭ２４は、書換可能なＦｌａｓｈＲＯＭで構成され、本発明の特徴となる、後述の各種プログラムを格納する。
【００８６】
ドライバ２５は、表示部１３を駆動させる。ドライバ２６は、サブ表示部１６を駆動させる。加入者情報記憶部２７は、本携帯電話１を呼び出すための電話番号や、操作者（加入者）のＩＤ等、プロフィールデータを格納する。ＲＯＭ２８は、制御部１８を制御する各種プログラムなどを格納する。ＲＡＭ２９は、無線通信端末として必要な各種データを記憶し、かつ制御部２５が動作する上で必要なデータを格納するとともに、メールデータやメールに添付された画像データなどを格納記憶する。
【００８７】
音声信号処理部２００は、マイク１５から入力された音声信号を符号化処理して無線信号処理部２１へ出力したり、無線信号処理部２１から入力された通信相手からの音声信号を復号化してスピーカ１２を駆動させ、音声を出力する。撮像モジュール１８１は、ＣＣＤ、若しくはＣＭＯＳで構成され、カラー画像を取り込む。ＤＳＰ１８２は、撮像モジュール１８１に取り込まれた画像を符号化処理する。報知ドライバ１９２は、報音スピーカ１９、バイブレータ１９１、ＬＥＤ１７１を駆動させるためのドライバである。
【００８８】
Ｃ−２．第３実施形態の動作
次に、本第３実施形態による携帯電話の動作について説明する。ここで、図１５は、本第３実施形態による携帯電話の動作を説明するためのフローチャートである。また、図１６および図１７は、携帯電話の表示例を示す模式図である。
携帯電話では、ステップＳ１１０で、メールを受信したか否か、あるいはステップＳ１１２で、メールの受信フォルダの選択が検出されたか否かを判断する。そして、メールを受信したか、あるいは受信フォルダが選択されると、ステップＳ１１４で、受信リストを表示する。
【００８９】
図１６は、表示部１３に表示される受信リストの表示例を示す模式図である。アイコン１３０１は、バッテリの充電量を表わしている。また、アイコン１３０２は、電波受信状態を表わしている。また、受信リストでは、アイコン１３０３〜１３０６でメールの既読／未読を判別できるようになっている。すなわち、アイコン１３０３は、未読メールを表わしている。また、アイコン１３０４は、未読の画像付きメールを表わしている。また、アイコン１３０５は、既読メールを表わしている。そして、アイコン１３０６は、既読の画像付きメールを表わしている。
【００９０】
そして、ステップＳ１１６で、制御部２２は、画像付きメールが選択されたか否かを判断し、画像付きメールが選択されなかった場合には、ステップＳ１１８で、通常のメール処理へ進む。一方、画像付きメールが選択された場合には、ステップＳ１２０で、図１７（ａ）に示すように、メールに添付されている画像を背景にしてメール本文を表示部１３に表示する。このとき、メール本文がよく見えるように、画像の明度を変えたり、メール本文の文字色を変えたりするようにしてもよい。
【００９１】
画像付きメールが選択されて、メールに添付されている画像を背景にしてメール本文が表示部１３に表示されている状態では、次に、ステップＳ１２２で、キャンセル操作が検出されたか否かを判断するとともに、ステップＳ１２４で、カメラキー１４３が操作されたか否かを判断する。
【００９２】
これらの判断において何れの操作も検出されなかった場合には、ステップＳ１２０へ戻って、図１７（ａ）に示すように、メール本文が重ねられた画像の表示を継続する。一方、キャンセル操作が検出された場合には、ステップＳ１１４へ戻って受信リストの表示へ移行する。
【００９３】
一方、カメラキー１４３が操作された場合には、ステップＳ１２６で、メール本文の表示を徐々に消し、図１７（ｂ）に示すように、背景画像のみの表示にする。その後、第１実施形態で説明した、図２に示すステップＳ１０へ進み、画像データに音声データが添付されているか否かを判断し、音声データが添付されている場合には、音声データを音声認識によりテキストに変換し、図１７（ｃ）に示すように、画像データ上に重ねてテキストを表示する。このとき、第１実施形態と同様に、テキストの長さに応じて、センタリング、縮小、短縮、スクロール表示を行なう。なお、本第３実施形態においても、静止画だけでなく、動画についても同様の処理が可能である。
【００９４】
なお、上述した第１ないし第３実施形態では、画像データに付加されていた音声データをテキストに変換し、センタリング、縮小、短縮、スクロール表示を行なう際、元の音声データの再生出力について説明しなかったが、元の音声データを音声として出力するか、出力しなかをユーザが状況に応じて選択できるようにしてもよい。
【００９５】
また、上述した第１ないし第３実施形態では、デジタルカメラ、カメラ付き携帯電話について説明したが、これに限らず、音声データが付加された画像（静止画、動画）を再生する機能を有する機器であれば適用可能である。例えば、近年、ハードディスクなどの大容量記憶媒体と表示装置とを備え、デジタルカメラやカメラ付き携帯電話で撮影した画像データ（静止画、動画）を記憶する記録媒体を装着すると、記録媒体から画像データを大容量記憶媒体に転送する、いわゆるバックアップする情報機器が実用化されている。
【００９６】
このような情報機器においては、大容量記録媒体に画像データをバックアップするだけでなく、バックアップした画像データを表示装置に表示（スライド表示なども可能）することができるようになっている。この場合、本発明の実施形態として説明してきた、音声認識機能、テキスト表示機能を付加することにより、上述した情報機器においても、バックアップした画像データを表示する際に、画像データに音声データが付加されていた場合、その音声データをテキストに変換し、画像データ上に重ねて表示させることが可能となる。
【００９７】
【発明の効果】
請求項１記載の発明によれば、表示手段に画像データを表示する際、音声認識手段により、画像データに付加された音声データを認識して文字データに変換し、文字表示制御手段により、変換された文字データを、前記表示手段に表示されている画像データに重ねて表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができるという利点が得られる。
【００９８】
また、請求項２記載の発明によれば、判断手段により、前記音声認識手段により変換された文字データが前記表示手段の一画面内に表示可能である否かを判断し、前記文字表示制御手段により、前記判断手段により文字データが一画面上に入りきらないと判断された場合、文字データをスクロール表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【００９９】
また、請求項３記載の発明によれば、前記文字表示制御手段により、前記判断手段により文字データが一画面上に入りきると判断された場合、文字データの全文を一画面上に表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１００】
また、請求項４記載の発明によれば、前記判断手段により文字データが１画面上に入りきらないと判断された場合、前記文字表示制御手段により、前記縮小手段により文字サイズが縮小された文字データの全文を一画面上に表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０１】
また、請求項５記載の発明によれば、前記判断手段により文字データが一画面上に入りきらないと判断された場合、前記文字表示制御手段により、前記短縮語置換手段により語彙が置換された文字データの全文を一画面上に表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０２】
また、請求項６記載の発明によれば、分割手段により、前記音声認識手段により変換された文字データを所定の長さで分割し、前記文字表示制御手段により、前記分割手段により分割された文字データ毎に、所定のタイミングで、前記表示手段に表示されている静止画あるいは動画に重ねて表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０３】
また、請求項７記載の発明によれば、表示時間設定手段により、画像の再生時間に基づいて、前記分割手段により分割された文字データ毎に表示時間を設定し、前記文字表示制御手段により、前記分割手段により分割された文字データ毎に、前記表示時間設定手段により設定された表示時間に基づいて表示させるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０４】
また、請求項８記載の発明によれば、指示手段により、前記表示手段に表示されている文字データの切換指示を行ない、前記文字表示制御手段により、前記指示手段から切換指示があると、前記表示手段に表示されている文字データを、前記分割手段により分割された次の文字データに切り換えるようにしたので、画像データに付加された音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０５】
また、請求項９記載の発明によれば、埋め込み手段により、音声記録手段により記録した音声データを、撮影手段により画像データに埋め込み、前記音声認識手段により、前記撮影手段により撮影された画像データに埋め込まれている音声データを認識して文字データに変換するようにしたので、デジタルカメラや携帯電話などが備えるカメラによって撮影した画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１０６】
また、請求項１０記載の発明によれば、前記音声認識手段により、前記メール送受信手段により通信回線網を介して機器間で送受信されるメールに添付されている画像データに埋め込まれている音声データを認識して文字データに変換するようにしたので、携帯電話などの通信機器によって受信されたメールに添付されている画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１０７】
また、請求項１１記載の発明によれば、画像データに埋め込まれた音声データを認識して文字データに変換し、該文字データを、前記画像データに重ねて表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１０８】
また、請求項１２記載の発明によれば、前記文字データが一画面内に表示可能である否かを判断し、文字データが一画面上に入りきらないと判断された場合、文字データをスクロール表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１０９】
また、請求項１３記載の発明によれば、前記文字データが一画面上に入りきると判断された場合、文字データの全文を一画面上に表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１０】
また、請求項１４記載の発明によれば、前記文字データが１画面上に入りきらないと判断された場合、文字データの文字サイズを縮小し、縮小された文字データの全文を一画面上に表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１１】
また、請求項１５記載の発明によれば、前記文字データが一画面上に入りきらないと判断された場合、文字データに含まる語彙を短縮語に置換し、短縮語に置換された文字データの全文を一画面上に表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１２】
また、請求項１６記載の発明によれば、前記文字データを所定の長さで分割し、分割された文字データ毎に、前記静止画あるいは動画に重ねて表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１３】
また、請求項１７記載の発明によれば、前記静止画あるいは動画の再生時間に基づいて、前記分割された文字データ毎に表示時間を設定し、分割された文字データ毎に、前記設定された表示時間に基づいて表示させるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１４】
また、請求項１８記載の発明によれば、前記文字データの切換指示があると、画面上に表示されている文字データを、分割された次の文字データに切り換えるようにしたので、画像データに埋め込まれた音声データを容易に認識可能にすることができ、また、静止画や動画に文字を重ねて表示する際に視認性を向上させることができるという利点が得られる。
【０１１５】
また、請求項１９記載の発明によれば、前記画像データを、撮影手段により撮影されたものとし、前記音声データを、音声記録手段により記録されたものとするようにしたので、デジタルカメラや携帯電話などが備えるカメラによって撮影した画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１１６】
また、請求項２０記載の発明によれば、前記画像データを、通信回線網を介して機器間で送受信されるメールに添付されたものとしたので、携帯電話などの通信機器によって受信されたメールに添付されている画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１１７】
また、請求項２１記載の発明によれば、埋め込み手段により、前記撮影手段により撮影された画像データに前記音声記録手段により記録された音声データを埋め込み、前記画像データを表示手段で表示する際、音声認識手段により、前記画像データに埋め込まれた音声データを認識して文字データに変換し、文字表示制御手段により、変換された文字データを、前記表示手段に表示されている画像データに重ねて表示させるようにしたので、デジタルカメラなどが備えるカメラによって撮影した画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１１８】
また、請求項２２記載の発明によれば、音声認識手段により、メール送受信手段により送受信されるメールに添付されている画像データに埋め込まれている音声データを認識して文字データに変換し、表示制御手段により、前記音声認識手段により変換された文字データを、前記メールに添付されている画像データに重ねて表示させるようにしたので、携帯電話などの通信機器によって受信されたメールに添付されている画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１１９】
また、請求項２３記載の発明によれば、情報機器を携帯電話としたので、携帯電話によって受信されたメールに添付されている画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【０１２０】
また、請求項２４記載の発明によれば、前記音声認識手段により、前記メールに添付されている画像データに埋め込まれている音声データを可聴音に変換せずに認識するようにしたので、携帯電話などの通信機器によって受信されたメールに添付されている画像データに埋め込まれた音声データを容易に認識可能にすることができるという利点が得られる。
【図面の簡単な説明】
【図１】本発明の第１実施形態によるデジタルカメラの構成を示すブロック図である。
【図２】本第１実施形態の動作を説明するためのフローチャートである。
【図３】本第１実施形態の動作を説明するためのフローチャートである。
【図４】テキストをセンタリング表示した状態を示す模式図である。
【図５】文字サイズを縮小してセンタリング表示した状態を示す模式図である。
【図６】テキストをスクロール表示した状態を示す模式図である。
【図７】テキストに含まれる語彙を短縮語に置換する様子を示す概念図である。
【図８】短縮語に置換したテキストをスクロール表示した状態を示す模式図である。
【図９】テキストを縦書きに変換する様子を示す概念図である。
【図１０】テキストを縦書き表示した状態を示す模式図である。
【図１１】本発明の第２実施形態の動作を説明するためのフローチャートである。
【図１２】本発明の第２実施形態の動作を説明するためのフローチャートである。
【図１３】本発明の第３実施形態による携帯電話の外観を示す模式図である。
【図１４】携帯電話の回路構成を示すブロック図である。
【図１５】本第３実施形態による携帯電話の動作を説明するためのフローチャートである。
【図１６】本第３実施形態による携帯電話の表示例を示す模式図である。
【図１７】本第３実施形態による携帯電話の表示例を示す模式図である。
【符号の説明】
１レンズ
２ＣＣＤ
３タイミング発生器
４垂直ドライバ
５サンプルホールド回路
６Ａ／Ｄ変換器
７カラープロセス回路
８ＤＭＡコントローラ
９ＤＲＡＭインターフェース
１０ＤＲＡＭ
１１ＣＰＵ（画像表示制御手段、文字表示制御手段、判断手段、縮小手段、短縮語置換手段、分割手段、表示時間設定手段）
１２ＶＲＡＭコントローラ
１３ＶＲＡＭ
１４デジタルビデオエンコーダ
１５表示装置
１６操作部（指示手段）
１７ＪＰＥＧ処理部
１８フラッシュメモリ
１９音声認識処理部（音声認識手段）
２０テキスト分割処理部
２１プログラムＲＯＭ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information device such as a digital camera, and relates to an information device, a digital camera, and a data display method for recording and storing audio data in association with a photographed still image or moving image.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a digital camera has a function of recording sound in addition to a function of photographing an image (still image or moving image), and it is possible to embed and store recorded sound data in the image data. . Some mobile phones are equipped with a camera and have a function of embedding recorded voice data in photographed image data and attaching and receiving the image data to mail.
[0003]
Conventionally, in such digital cameras and mobile phones, when audio data is embedded in the image data being reproduced, the audio data is output from a speaker. However, there is a problem that the sound is difficult to hear when the surrounding noise is loud, and that the surroundings are troublesome when the surrounding is quiet. Therefore, if the audio data embedded in the image data can be converted into text and displayed as characters during the reproduction of the image data, the convenience for the user is improved.
[0004]
For example, a mobile phone is provided with a camera, and transmits and receives image data obtained by attaching the image data to an e-mail. A technique for superimposed display has been proposed (see, for example, Patent Document 1, Patent Document 2, and Patent Document 3).
[0005]
According to the technology described in Patent Document 1, when a mail text is superimposed on image data and displayed, the brightness and contrast of the image data are corrected, and the character color is set according to the chromaticity of the image data. Improves the visibility of characters. When the image data cannot be displayed, a portion that is not displayed in the display area is scrolled and displayed.
[0006]
Further, in the technique described in Patent Document 2, when image data is attached to a mail, the image data is displayed and the text of the mail is displayed over the image data. At this time, if the mail text is long, only the mail text can be scrolled with the image data fixed.
[0007]
Further, in the technique described in Patent Document 3, in order to search for desired image data from a large amount of image data, a sound whose contents can be understood is associated with each image data and the sound is reproduced at a variable speed. To search for the desired image data. Further, the voice is converted into text by voice recognition, and the text is used as an index of the image data to sort the image data.
[0008]
Further, in the technique described in Patent Document 4, when a subject (lyrics) suitable for the music being reproduced is photographed during the reproduction of music data, an image of the photographed subject is stored in synchronization with the reproduction timing of the music data. Then, at the next music data reproduction, a corresponding image is displayed in accordance with the reproduction timing. When lyrics are photographed as a subject, the lyrics (image) are converted into text data, so that the lyrics (text) are displayed according to the reproduction of the music data.
[0009]
[Patent Document 1]
JP-A-2002-140265
[Patent Document 2]
JP-A-2002-288082
[Patent Document 3]
JP-A-2002-41529
[Patent Document 4]
JP-A-2000-350146
[0010]
[Problems to be solved by the invention]
However, in the related art described in Patent Literature 1 described above, when the text is long, the text is displayed on the entire screen on which the image data is displayed, and the image data is difficult to see. As described in Patent Document 2 described above, if the mail body is long, even if only the mail body is scrolled line by line with the image data fixed, there is no other display method and the expression method is poor. There's a problem. Further, in the above-described conventional technique of Patent Document 3, a speech recognition technique for converting speech data into text is disclosed, but there is no description about a method of displaying the converted text. Further, in the conventional technique according to Patent Document 4, as for the display method of the converted text, the converted text is simply superimposed on the image data or displayed in a different area. There is no other display method, and the expression method is poor. The visibility of image data and text data cannot be improved.
[0011]
Further, in a conventional digital camera or mobile phone, a technique of converting audio data embedded in the image data into text and displaying the text as characters during reproduction of the image data as in the above-described mobile phone is known. Not proposed. Even if the conventional techniques described in Patent Literatures 1 to 4 are applied to digital cameras and mobile phones, there remains a problem that the above-mentioned problem, that is, the problem that images are difficult to see and the expression method is poor remains. Was.
[0012]
Therefore, the present invention provides an information device capable of easily recognizing audio data added to image data and improving visibility when characters are superimposed and displayed on a still image or a moving image. It is an object to provide a digital camera and a data display method.
[0013]
[Means for Solving the Problems]
In order to achieve the above object, an information device according to the invention according to claim 1 is an information device that reproduces and displays image data to which audio data has been added, wherein the display device displays image data; and the audio data added to the image data. Voice recognition means for recognizing and converting the character data into character data, and character display control means for displaying the character data converted by the voice recognition means so as to overlap the image data displayed on the display means. It is characterized by.
[0014]
In a preferred embodiment, for example, in the information device according to claim 1, it is determined whether or not the character data converted by the voice recognition unit can be displayed on one screen of the display unit. The character display control means may include a determination means for making a determination, and when the determination means determines that the character data does not fit on one screen, the character data may be scroll-displayed.
[0015]
As a preferred mode, in the information device according to claim 1, for example, as in claim 3, the character display control unit determines whether the character data can be included in one screen by the determination unit. The full text of the data may be displayed on one screen.
[0016]
In a preferred embodiment, for example, in the information device according to claim 1, the character display control unit includes a reduction unit that reduces a character size of character data, and the character display control unit includes: If the determining means determines that the character data does not fit on one screen, the whole text of the character data whose character size has been reduced by the reducing means may be displayed on one screen.
[0017]
In a preferred embodiment, for example, as in claim 5, in the information device according to claim 1, the character display control means includes abbreviated word replacement means for replacing a vocabulary included in character data with abbreviated words, The character display control means, when the determination means determines that the character data does not fit on one screen, displays on a single screen the full text of the character data whose vocabulary has been replaced by the shortened word replacement means. You may.
[0018]
In a preferred embodiment, the information device according to any one of claims 1 to 5, wherein the character data converted by the voice recognition means is divided by a predetermined length. And the character display control unit may display the character data divided by the division unit at a predetermined timing on a still image or a moving image displayed on the display unit.
[0019]
As a preferred mode, in the information device according to claim 6, for example, the character display control unit is configured to output the character data divided by the division unit based on a reproduction time of image data. The display time setting means for setting the display time may be provided, and each character data divided by the dividing means may be displayed based on the display time set by the display time setting means.
[0020]
In a preferred embodiment, the information device according to claim 6 or 7, further comprising an instruction unit for instructing switching of character data displayed on the display unit, wherein The display control unit may switch the character data displayed on the display unit to the next character data divided by the division unit when there is a switching instruction from the instruction unit.
[0021]
As a preferred mode, in the information device according to any one of claims 1 to 8, for example, a photographing unit for photographing image data, a sound recording unit for recording sound, Embedding means for embedding sound data recorded by the sound recording means in image data photographed by the means, wherein the sound recognizing means converts the sound data embedded in the image data shot by the photographing means. You may make it recognize and convert it into character data.
[0022]
In a preferred embodiment, the information device according to any one of claims 1 to 8 further includes a mail transmission / reception unit configured to transmit and receive a mail between the devices via a communication line network. The voice recognition unit may recognize voice data embedded in image data attached to a mail transmitted / received by the mail transmission / reception unit and convert the voice data into character data.
[0023]
To achieve the above object, a data display method according to claim 11 is a data display method for reproducing and displaying image data in which audio data is embedded. And the character data is superimposed and displayed on the image data.
[0024]
Further, as a preferable mode, for example, in the data display method according to claim 11, it is determined whether or not the character data can be displayed on one screen, and the character data is displayed on one screen. If it is determined that the character data cannot be accommodated, the character data may be scroll-displayed.
[0025]
Further, as a preferred mode, in the data display method according to the eleventh aspect, when it is determined that the character data can fit on one screen, the entire text data is displayed on a single screen. You may make it do.
[0026]
Also, as a preferred mode, in the data display method according to claim 11, when it is determined that the character data does not fit on one screen, the character size of the character data is reduced. Alternatively, the entire text of the reduced character data may be displayed on one screen.
[0027]
In a preferred aspect, in the data display method according to the eleventh aspect, when it is determined that the character data does not fit on one screen, the vocabulary included in the character data is shortened. The whole sentence of the character data replaced with the word and replaced with the shortened word may be displayed on one screen.
[0028]
Further, as a preferred mode, for example, in the data display method according to any one of claims 11 to 15, the character data is divided by a predetermined length, and the character data is divided for each divided character data. Alternatively, it may be displayed so as to be superimposed on the still image or the moving image.
[0029]
In a preferred embodiment, for example, in the data display method according to claim 16, a display time is set for each of the divided character data based on a reproduction time of the still image or the moving image. Alternatively, each of the divided character data may be displayed based on the set display time.
[0030]
Also, as a preferred embodiment, for example, in the data display method according to claim 16 or 17, when the character data switching instruction is issued, the character data displayed on the screen is divided. It is also possible to switch to the next character data.
[0031]
As a preferred mode, in the data display method according to any one of claims 11 to 18, for example, as in claim 19, the image data is captured by a capturing unit, and the audio data is , May be recorded by the voice recording means.
[0032]
As a preferred embodiment, in the data display method according to any one of claims 11 to 18, for example, the image data is transmitted to a mail transmitted and received between devices via a communication line network. It may be attached.
[0033]
In order to achieve the above object, a digital camera according to the twenty-first aspect of the present invention includes a photographing means for photographing image data, a sound recording means for recording sound, and a sound recording means for recording image data photographed by the photographing means. Embedding means for embedding recorded voice data, display means for displaying the image data, voice recognition means for recognizing voice data embedded in the image data and converting it to character data, and conversion by the voice recognition means Character display control means for superimposing the displayed character data on the image data displayed on the display means.
[0034]
In order to achieve the above object, an information apparatus according to the present invention comprises: a mail transmitting / receiving means for transmitting / receiving a mail between apparatuses via a communication network; and an image attached to the mail transmitted / received by the mail transmitting / receiving means. Voice recognition means for recognizing voice data embedded in the data and converting the data into character data, and display control for superimposing the character data converted by the voice recognition means on the image data attached to the mail. Means.
[0035]
In a preferred embodiment, for example, in the information device according to claim 22, the information device may be a mobile phone.
[0036]
In a preferred aspect, in the information device according to any one of claims 22 and 23, the voice recognition unit is embedded in image data attached to the mail. The voice data may be recognized without being converted to an audible sound.
[0037]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described as an example in which the present invention is applied to a digital camera with reference to the drawings.
[0038]
A. First embodiment
A-1. Configuration of the first embodiment
FIG. 1 is a block diagram showing the configuration of the digital camera according to the first embodiment of the present invention. In the figure, a lens 1 is a so-called photographic lens, which optically photographs a subject and forms an image on a CCD 2. The CCD 2 is a device having a MOS (metal-oxide semiconductor) structure for transferring electric charges in an array, is driven by a timing generator (TG) 3 and a vertical driver 4, and outputs a photoelectric conversion output for one screen at regular intervals. . The timing generator 3 and the vertical driver 4 generate a timing signal required for reading the CCD 2. The sample hold circuit (S / H) 5 samples the time-series analog signal read from the CCD 2 at a frequency suitable for the resolution of the CCD 2. The A / D converter 6 converts the sampled signal into a digital signal (Bayer data).
[0039]
The color process circuit 7 performs a color process for generating a luminance / color difference multiplex signal (hereinafter, referred to as a YUV signal) from the output of the A / D converter 6. In the color process, the Bayer data is converted into R, G, B data, and further converted into digital luminance and color difference multiplex signals (Y, Cb, Cr data).
[0040]
The DMA controller 8 performs data transfer between the color process circuit 7 and the DRAM 10 (precisely, the DRAM interface 9) without the intervention of the CPU 11, and performs a so-called direct memory access (DMA: direct memory access). It is what you do. The DMA controller 8 writes the Y, Cb, and Cr data outputs of the color process circuit 7 to the buffer inside the DMA controller 8 once by using the synchronization signal, the memory write enable, and the clock output of the color process circuit 7 as well. DMA transfer is performed to the DRAM 10 via the interface (DRAM I / F) 9. The DRAM interface 9 has a signal interface between the DRAM 10 and the DMA controller 8 and a signal interface between the DRAM 10 and the bus. The DRAM 10 stores image data (Y, Cb, Cr data) DMA-transferred from the DMA controller 8 via the DRAM interface 9.
[0041]
The CPU 11 executes a predetermined program recorded in the program ROM 20 to centrally control the operation of the camera, and includes an execution button such as a main switch, a recording / playback mode switch, a function selection key, and a shutter key. The operation unit 16 is connected. In the recording mode, the mode program is loaded, and in the reproduction mode, the mode program is loaded from the program ROM 20 to the RAM inside the CPU 11 and executed. After completing the DMA transfer of the image data (Y, Cb, Cr data) to the DRAM 10, the CPU 11 reads the image data (Y, Cb, Cr data) from the DRAM 10 via the DRAM interface 9, and controls the VRAM controller 12. Write to the VRAM 13 via the
[0042]
When the shutter key is depressed, the CPU 11 transfers the Y, Cb, and Cr data for one frame written in the DRAM 10 to each of the Y, Cb, and Cr components via the DRAM interface 9 in the recording and saving state. Is read out for each MCU block composed of 16 × 16 pixels obtained by dividing one frame into 80 × 60 blocks (“0” to “4799”) in MCU units, and further inserting an MCU block of an image to be added into JPEG. Send to processing unit 17. The image data sent to the JPEG processing unit 17 is compressed through processes such as DCT conversion, quantization, and encoding. The CPU 11 adds header information to the compressed image data and writes the image data to the flash memory 18 which is a nonvolatile memory. The header information includes information on an image and the like.
[0043]
In the DCT transform, the data of the MCU block (hereinafter simply referred to as MCU data) is defined as a set of luminance component data of a block composed of predetermined pixels and color difference component data of a block composed of predetermined pixels. Is converted to a DCT coefficient indicating the magnitude of the frequency component. When the compression processing of the Y, Cb, and Cr data for one frame and writing of all the compressed data to the flash memory 18 are completed, the CPU 11 activates the path from the CCD 2 to the DRAM 10 again.
[0044]
The VRAM controller 12 controls data transfer between the VRAM 13 and the bus, and between the VRAM 13 and the digital video encoder 14. Control the reading from.
[0045]
The VRAM 13 is a so-called video RAM, and when a preview image is written, the preview image is sent to the display device 15 via the digital video encoder 14 and displayed. The video RAM has two ports for writing and reading, so that writing and reading of an image can be performed simultaneously and in parallel.
[0046]
A digital video encoder (hereinafter, simply referred to as a video encoder) 14 periodically reads the image data (Y, Cb, Cr data) from the VRAM 13 via the VRAM controller 12 and converts a video signal based on the image data. Generated and output to the display device 15. As a result, a through image based on the image information currently taken in from the CCD 2 is displayed on the display device 15 in the recording mode. The display device 15 is a small liquid crystal panel of about several inches and having, for example, 279 × 220 pixels and attached to the back of the camera body.
[0047]
The JPEG processing unit 17 is a part that performs JPEG compression and decompression. JPEG compression parameters are provided from the CPU 11 each time compression processing is performed. The JPEG processing unit 17 is preferably implemented by dedicated hardware in terms of processing speed, but can be implemented by the CPU 11 as software.
[0048]
The flash memory 18 refers to a rewritable read-only memory (PROM) that can be electrically rewritten by erasing the contents of all bits (or blocks) in a programmable read only memory (PROM). The flash memory 18 may be a fixed type that cannot be removed from the camera body, or may be a removable type such as a card type or a package type.
[0049]
The voice recognition processing unit 19 recognizes captured image data (still image or moving image), sound embedded in recorded image data (still image, moving image, or the like), or added sound, and converts the data into text. Convert. The audio data embedded or added to the image data is encoded in a predetermined format (for example, WAV format, MP3 format, PCM format, etc.). The text division processing unit 20 analyzes the text converted by the speech recognition unit 19 and detects a punctuation mark, a reading point, or the end of a sentence to divide the text.
[0050]
In addition, the program ROM 21 reproduces a program for operation of the CPU 11 (a photographing program, a still image / moving image reproducing program, a text displaying program, etc.) and a text converted by the voice recognition processing unit 19. When displaying on a still image or a moving image, a shortened synonym database for converting a predetermined vocabulary into a shortened vocabulary (for example, World Cup Cup → W Cup) is stored. The JPEG processing unit 17 compresses the original image data at a predetermined compression ratio, and stores it in the flash memory 18.
[0051]
The CPU 11 controls each unit according to various programs in the program ROM 20. In particular, in the present embodiment, the CPU 11 converts the captured still image and moving image, the still image and the moving image recorded in the flash memory 18 by the voice recognition processing unit 19, and performs the text division processing. The text divided by the unit 20 is displayed on a still image or a moving image being reproduced on the display device 15. At this time, if the text length is smaller than the screen size, the CPU 11 performs centering display or vertical writing display as it is. Also, if the text length is larger than the screen size (can not fit on the screen), change the font of the text characters to a smaller one, convert the text vocabulary to a shortened vocabulary, scroll, etc. indicate. The display method of the text, that is, whether the text is reduced or shortened, can be set by the user. In the first embodiment, a text display area for one line is secured and displayed so that both the still image or moving image being reproduced and the converted text can be easily viewed. Depending on the size of the display area of the device 15, a text display area of a plurality of lines, such as two lines or three lines, may be secured and displayed.
[0052]
A-2. Operation of the first embodiment
Next, the operation of the digital camera according to the first embodiment will be described. Here, FIG. 2 is a flowchart for explaining the operation of the first embodiment. First, when reproducing a photographed still image, a still image recorded in the flash memory 18, or the like, the CPU 11 determines whether or not there is audio data in step S10. If there is no audio data in the still image, the process ends and normal reproduction is performed.
[0053]
On the other hand, when there is audio data in the still image, in step S12, the audio data is recognized in the still image by the audio recognition processing unit 19 and converted into text data. Next, in step S14, the text division processing unit 20 recognizes the punctuation, the reading point, or the end of the sentence of the text data and divides the text, and in step S16, increments the variable N by "1". The variable N counts the divided text, and its initial value is “0”.
[0054]
Next, in a step S18, it is determined whether or not the vertical writing mode is set. It is assumed that whether or not to display vertically is set in advance by the user. Here, if the display mode is set to vertical writing, the display text is converted into a character font for vertical writing in step S20, and then the process proceeds to step S22. On the other hand, if the display is not set to the vertical writing mode, the process directly proceeds to step S22.
[0055]
Next, in step S22, it is determined whether or not the text length converted by the voice recognition is larger than the screen size. If the text length is the same as or smaller than the screen size, a centering display is performed at the lower center of the still image being reproduced in step S24 as shown in FIG. In the illustrated example, the voice data is “Okinawa no Umi”, and as a result of the voice recognition, the text is “Okinawa Sea”. In this case, since the text "Okinawa no Umi" is smaller than the screen size, it is centered and displayed at the bottom of the playback screen.
[0056]
On the other hand, if the text length is larger than the screen size, it is determined in step S26 whether or not the mode is the reduction mode. If the reduction mode has been set, it is determined in step S28 whether or not reduction is possible, that is, whether or not the minimum character size has already been reached. If the text size can be reduced, the character size of the text is reduced in step S30, the process returns to step S22, and the above-described processing is repeated. In step S28, the character size is reduced stepwise, and the text length is changed to the screen size. If it is the same or smaller, the centering display is performed at the lower center of the still image being reproduced in step S24 as shown in FIG.
[0057]
On the other hand, if the image is further reduced in the reduction process to a size where reading becomes more difficult, if the size is not smaller than the screen size, the process proceeds to a reduction process (step S34) described later.
[0058]
On the other hand, if the reduction mode has not been set, it is determined in step S32 whether or not the reduction mode has been set. If the short mode is not set, in step S40, scroll display is performed at the bottom of the playback screen so as to flow from right to left as shown in FIGS. 6A to 6C.
[0059]
On the other hand, if the abbreviated mode is set, the abbreviation synonym database is searched in a step S34, and it is determined whether or not the matching is performed in a step S36. If there is a matching vocabulary, the vocabulary is converted into a shortened vocabulary in step S38, the process returns to step S22, and the above processing is repeated. In step S32, the vocabulary included in the text is converted into the shortened vocabulary, If the length is equal to or smaller than the screen size, in step S24, the centering display is performed at the lower center of the still image or moving image being reproduced. FIG. 7 shows how the vocabulary “World Cup” included in the text is converted to the shortened vocabulary “World Cup”.
[0060]
On the other hand, if the vocabulary included in the text is converted to the abbreviated vocabulary and there is no vocabulary to be abbreviated and the text length is larger than the screen size, for example, the text shown in FIG. Scroll the text as shown.
[0061]
Next, in step S42, it is determined whether or not a predetermined key operation is performed, and the process waits until the predetermined key operation is performed, and the text is displayed on the screen. In the case of the scroll display, once the scroll of one sentence is completed, the scroll is repeated again.
[0062]
When a predetermined key operation is performed in a state where the text is displayed on the screen, it is determined in a step S44 whether or not the text is completed. If there is a text to be displayed, a step s16 is performed. Then, the variable N is incremented by "1", and the next divided text is reproduced according to the above-described processing. On the other hand, when all the divided texts have been displayed, the process ends.
[0063]
FIG. 9 shows a state in which display text is converted into a character font for vertical writing, and FIG. 10 shows a state in which text is displayed in vertical writing on a reproduction screen. For the vertically written text, any of the above-described centering, reduction, reduction, and scroll display is effective.
[0064]
B. Second embodiment
Next, a second embodiment of the present invention will be described.
B-1. Configuration of the second embodiment
The configuration of the digital camera is the same as that of the digital camera according to the first embodiment shown in FIG. The second embodiment is characterized in that, in addition to the operation of the first embodiment, the display of the sound attached to the moving image is controlled. If it is a still image, the text may be displayed in a state where the still image is displayed without considering the length of the playback time, but if it is a moving image, all text is displayed within the video playback time as much as possible It is desirable to do. However, if the reproduction time is short, there is a possibility that the display time of one sentence of text is shortened, and if it is too short, it becomes difficult to read. Therefore, means that can be easily read is required. Therefore, in the second embodiment, while displaying all the texts within the video playback time, priority is given to making the text whose display time is short to be legible, and as a whole, smoothly and Provided is a means for easily displaying text attached to a moving image as text.
[0065]
B-2. Operation of the second embodiment
Here, FIGS. 11 and 12 are flowcharts for explaining the operation of the second embodiment of the present invention. First, when reproducing a captured moving image, a moving image recorded in the flash memory 18, and the like, the CPU 11 determines in step S60 whether or not there is audio data attached to the moving image. If there is no audio data attached to the moving image, the process ends, and normal moving image reproduction is performed.
[0066]
On the other hand, if there is audio data attached to the moving image, in step S52, the audio data is recognized by the audio recognition processing unit 19 into the still image and the moving image and converted into text data. Next, in step S54, the reproduction time T of the moving image is obtained. In step S56, the text division processing unit 20 recognizes a punctuation mark, a reading point, or the end of a sentence of the text data and divides the text. In step S58, the display time of each divided text is determined according to the number of divisions and the video playback time. Calculated automatically. In the second embodiment, for example, if the number of divisions is three, the display time of each divided text is calculated by a simple operation of dividing the moving image reproduction time into three. Otherwise, the display time may be lengthened or shortened according to the length (number of characters) of the divided text.
[0067]
Next, in step S60, it is determined whether or not the reproduction of the moving image has been completed. If the reproduction has been completed, the process ends. On the other hand, if the reproduction has not been completed, the variable N is incremented by “1” in step S62. The variable N counts the divided text, and its initial value is “0”. Next, in step S64, a display counter for measuring the display time of the text is started.
[0068]
Next, in step S66, it is determined whether or not the segmented text length (the number of characters) indicated by the variable N is larger than the screen size. If the text length is equal to or smaller than the screen size, in step S68, the moving image being reproduced is centered and displayed at the lower center.
[0069]
On the other hand, if the text length is larger than the screen size, it is determined in a step S70 whether or not the mode is the reduction mode. If the reduction mode has been set, it is determined in step S72 whether or not reduction is possible, that is, whether or not the minimum character size has already been reached. If it is possible to reduce the size, the character size of the text is reduced in step S74, the process returns to step S66, and the above processing is repeated. In step S74, the character size is reduced stepwise, and the text length is changed to the screen size. If it is the same as or smaller, the centering display is performed at the lower center of the moving image being reproduced in step S68.
[0070]
On the other hand, if the image is further reduced in the reduction process to a size where reading becomes more difficult, if the size is not smaller than the screen size, the process proceeds to the reduction process (step S78).
[0071]
On the other hand, if the reduction mode has not been set, it is determined in step S76 whether or not the reduction mode has been set. If the short mode is not set, scroll display is performed at the lower part of the playback screen so as to flow from right to left in step S84.
[0072]
On the other hand, if the abbreviated mode is set, the abbreviation synonym database is searched in a step S78, and it is determined whether or not the matching is performed in a step S80. If there is a matching vocabulary, the vocabulary is converted into a shortened vocabulary in step S82, the process returns to step S66, and the above processing is repeated. In step S82, the vocabulary included in the text is converted into a shortened vocabulary, and If the length is equal to or smaller than the screen size, in step S68, the moving image being reproduced is centered and displayed at the lower center.
[0073]
On the other hand, the vocabulary included in the text is converted to the abbreviated vocabulary. If there is no vocabulary to be abbreviated and the text length is larger than the screen size, the text is scrolled and displayed in step S84.
[0074]
Next, in step S86, it is determined whether or not the display time has ended, and if not, in step S88, it is determined whether or not a predetermined key operation has been performed. If not, the display of the divided text is continued. On the other hand, if a predetermined key operation is performed within the display time, the process returns to step S60, and if the reproduction of the moving image has not ended, the variable N is incremented by “1”, and the next divided text of the divided text is processed according to the above-described processing. Display. This is because if the video playback time is sufficiently longer than the length of the text, the display time of each of the divided texts becomes too long and becomes redundant. By performing key operations, it is possible to shift to the display of the next divided text.
[0075]
On the other hand, when the display time ends, it is determined in step S90 whether the display time for the divided text is shorter than a predetermined time (for example, 2 seconds). This is because if the display time of the divided text is extremely short, it becomes difficult to read. If the display time is equal to or longer than the predetermined time, it is determined in step S92 whether or not the text is completed. If there is still text to be displayed, the process returns to step S60, and the reproduction of the moving image has not been completed. For example, the variable N is incremented by "1", and the next divided text is displayed according to the above-described processing. On the other hand, when all the divided texts have been displayed, the process ends.
[0076]
On the other hand, if the display time for the divided text is shorter than the predetermined time, it may be difficult to read the text, and the reproduction of the moving image is suspended in step S94. Then, in a step S96, it is determined whether or not a predetermined key operation is performed, and the process waits until the predetermined key operation is performed, pauses the reproduction of the moving image, and displays the divided text on the screen. In the case of the scroll display, the reproduction of the moving image is paused, and the scroll of the divided text is repeated.
[0077]
When a predetermined key operation is performed in a state where the text is displayed on the paused playback moving image, it is determined in step S98 whether or not the text is completed. For example, in step S100, the pause of the operation reproduction is released (reproduction is resumed), and the process returns to step S60. If the reproduction of the moving image is not completed, the variable N is incremented by "1", and according to the above-described processing, Play the next segmented text. On the other hand, when all the divided texts have been reproduced, the process ends.
[0078]
Although the vertical writing is not described in the above-described second embodiment, any of the above-described centering, reduction, shortening, and scroll display is effective for the vertical writing text as in the first embodiment. .
[0079]
C. Third embodiment
Next, a third embodiment of the present invention will be described as an example applied to a camera-equipped mobile phone with reference to the drawings.
[0080]
C-1. Configuration of Third Embodiment
FIGS. 13A and 13B are schematic diagrams illustrating the appearance of the mobile phone 1. The mobile phone 1 in the third embodiment has a two-fold structure including a lid and a main body. FIG. 13A shows a front view of the mobile phone 1 in an open state, and FIG. 2 shows a rear view in a state where the mobile phone 1 is opened. The antenna 11 is provided on the back surface of the lid, and is extendable. The loudspeaker 12 is provided on the front side of the lid, and outputs audio. The display unit (main display unit) 13 is a color LCD (liquid crystal display), and has a size of 120 dots (width) × 160 dots (height). The display unit 13 can simultaneously display an image (photo) of a mail with an image (with a background image) and a text.
[0081]
The key operation unit 14 is provided on the front of the main unit, and includes a mail key 141, an address key 142, a camera key 143, a submenu key 144, a cross key (four-way cursor key) 145, and a center key (enter key) 146. , An off-hook key 147, an on-hook key 148, a numeric keypad 149, and the like. The mail key 141 is for starting a mail function and displaying a mail menu. The address key 142 is used to open an address book used to select a destination mail address, a call destination telephone number, and the like. The camera key 143 is for activating a camera function and displaying a camera menu. The camera key 143 is used as a display switching key for displaying only the mail text or only the background image when the mail text and the background image are displayed in a superimposed manner in the mail creation mode or the mail display mode. Has functions.
[0082]
The submenu key 144 is for displaying a submenu for each function. The cross key 145 is used to select a desired menu from menus displayed in each function, and to shift a data input position when data is input using the ten keys. A center key (decision key) 146 is for confirming a menu or the like selected by operating the cross key 145. The off-hook key 147 is used to start a call and the on-hook key 148 is used to end a call. The ten keys 149 are used for inputting telephone numbers and characters. The microphone 15 is provided at a lower portion of the main body, and performs voice input.
[0083]
The sub display unit 16 is provided on the back of the lid. The back key 17 is made of a transparent or translucent member, and has a built-in LED 171 that emits light when an incoming call is received. The imaging lens 18 is provided on the back of the lid 2 and below the sub display unit 16. The notification speaker 19 is for notifying an incoming call or the like, and is arranged on the back surface of the main body so that a notification sound can be heard even when the lid is closed to the main body.
[0084]
Next, FIG. 14 is a block diagram illustrating a circuit configuration of the mobile phone 1. The wireless transmission / reception unit 20 wirelessly transmits and receives voice and data (mail data) via the antenna 11. The wireless signal processing unit 21 performs processing necessary for wireless communication, such as demodulating voice or data (mail data) received by the wireless transmitting / receiving unit 20 or modulating voice or data to be transmitted to the wireless transmitting / receiving unit 20. I do. The control unit 22 controls various operations and overall operations.
[0085]
The image memory 23 is a memory for storing an image file imaged by the imaging unit (the imaging lens 18, the imaging module 181, and the DSP 182) and compression-encoded by a program stored in the image processing program area 2413. The ROM 24 is composed of a rewritable Flash ROM, and stores various programs described later, which are features of the present invention.
[0086]
The driver 25 drives the display unit 13. The driver 26 drives the sub display unit 16. The subscriber information storage unit 27 stores profile data such as a telephone number for calling the mobile phone 1 and an ID of an operator (subscriber). The ROM 28 stores various programs for controlling the control unit 18 and the like. The RAM 29 stores various data necessary for the wireless communication terminal, stores data necessary for the operation of the control unit 25, and stores and stores mail data and image data attached to the mail.
[0087]
The audio signal processing unit 200 encodes the audio signal input from the microphone 15 and outputs the encoded signal to the wireless signal processing unit 21 or decodes the audio signal from the communication partner input from the wireless signal processing unit 21. The speaker 12 is driven to output sound. The imaging module 181 is configured by a CCD or a CMOS, and captures a color image. The DSP 182 performs an encoding process on the image captured by the imaging module 181. The notification driver 192 is a driver for driving the notification speaker 19, the vibrator 191, and the LED 171.
[0088]
C-2. Operation of the third embodiment
Next, the operation of the mobile phone according to the third embodiment will be described. Here, FIG. 15 is a flowchart for explaining the operation of the mobile phone according to the third embodiment. FIGS. 16 and 17 are schematic diagrams showing display examples of a mobile phone.
The mobile phone determines in step S110 whether a mail has been received or not, in step S112, whether or not selection of a mail reception folder has been detected. Then, when the mail is received or the receiving folder is selected, a receiving list is displayed in step S114.
[0089]
FIG. 16 is a schematic diagram illustrating a display example of a reception list displayed on the display unit 13. Icon 1301 indicates the amount of charge of the battery. An icon 1302 indicates a radio wave reception state. In the reception list, the read / unread status of the mail can be determined by the icons 1303 to 1306. That is, icon 1303 represents an unread mail. An icon 1304 represents an unread image-attached mail. An icon 1305 indicates a read mail. The icon 1306 indicates a read mail with an image.
[0090]
Then, in step S116, the control unit 22 determines whether or not the mail with the image is selected. If the mail with the image is not selected, the process proceeds to the normal mail processing in step S118. On the other hand, when the mail with the image is selected, in step S120, as shown in FIG. 17A, the mail text is displayed on the display unit 13 with the image attached to the mail as the background. At this time, the brightness of the image may be changed or the character color of the mail body may be changed so that the mail body can be seen clearly.
[0091]
In a state where the mail with the image is selected and the mail text is displayed on the display unit 13 with the image attached to the mail as a background, it is next determined in step S122 whether a cancel operation is detected. At the same time, it is determined in step S124 whether the camera key 143 has been operated.
[0092]
If no operation is detected in these determinations, the process returns to step S120 to continue displaying the image with the mail text superimposed as shown in FIG. On the other hand, if a cancel operation is detected, the process returns to step S114 to shift to the display of the reception list.
[0093]
On the other hand, when the camera key 143 is operated, the display of the mail body is gradually erased in step S126, and only the background image is displayed as shown in FIG. Thereafter, the process proceeds to step S10 shown in FIG. 2 described in the first embodiment, and it is determined whether or not the audio data is attached to the image data. The text is converted by recognition, and the text is displayed over the image data as shown in FIG. At this time, centering, reduction, shortening, and scroll display are performed according to the length of the text, as in the first embodiment. In the third embodiment, the same processing can be performed not only for a still image but also for a moving image.
[0094]
In the first to third embodiments described above, when audio data added to image data is converted to text and centering, reduction, shortening, and scroll display are performed, reproduction and output of the original audio data will be described. Although not provided, the user may be allowed to select whether to output the original audio data as audio or not, depending on the situation.
[0095]
In the first to third embodiments described above, the digital camera and the camera-equipped mobile phone have been described. However, the present invention is not limited to this, and a device having a function of reproducing an image (still image, moving image) to which audio data is added. If so, it is applicable. For example, in recent years, when a recording medium that includes a large-capacity storage medium such as a hard disk and a display device and stores image data (still images and moving images) captured by a digital camera or a mobile phone with a camera is mounted, the image data is stored Is transferred to a large-capacity storage medium, that is, a so-called backup information device has been put to practical use.
[0096]
In such information equipment, not only the image data is backed up on a large-capacity recording medium, but also the backed up image data can be displayed on a display device (slide display or the like is also possible). In this case, by adding the voice recognition function and the text display function, which have been described as the embodiments of the present invention, even when the backed-up image data is displayed in the above-described information device, the voice data is added to the image data. If so, the audio data can be converted to text and displayed over the image data.
[0097]
【The invention's effect】
According to the first aspect of the present invention, when the image data is displayed on the display unit, the voice recognition unit recognizes the voice data added to the image data and converts the voice data into character data, and the character display control unit converts the voice data. Since the displayed character data is superimposed on the image data displayed on the display means, the advantage is obtained that the voice data added to the image data can be easily recognized.
[0098]
According to the second aspect of the present invention, the determining means determines whether or not the character data converted by the voice recognizing means can be displayed on one screen of the display means. Thus, when it is determined by the determining means that the character data does not fit on one screen, the character data is scroll-displayed, so that the voice data added to the image data can be easily recognized. Also, there is an advantage that visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0099]
According to the third aspect of the present invention, when the character display control means determines that the character data can fit on one screen by the determining means, the entire text of the character data is displayed on one screen. Therefore, it is possible to easily recognize the audio data added to the image data, and it is possible to obtain an advantage that the visibility can be improved when characters are superimposed and displayed on a still image or a moving image. .
[0100]
According to the fourth aspect of the present invention, when the determination means determines that the character data does not fit on one screen, the character display control means reduces the character size by the reduction means. The entire text of the data is displayed on a single screen, so that the audio data added to the image data can be easily recognized, and can be visually recognized when displaying characters overlaid on still images or moving images. The advantage that the property can be improved is obtained.
[0101]
According to the fifth aspect of the present invention, when the determination means determines that the character data does not fit on one screen, the vocabulary is replaced by the abbreviation word replacement means by the character display control means. Since the entire text data is displayed on a single screen, the audio data added to the image data can be easily recognized. The advantage that visibility can be improved is obtained.
[0102]
According to the invention described in claim 6, the character data converted by the voice recognition means is divided by a predetermined length by the dividing means, and the character display control means separates the character data divided by the dividing means. For each data, at a predetermined timing, it is displayed so as to be superimposed on the still image or the moving image displayed on the display means, so that the audio data added to the image data can be easily recognized, In addition, there is an advantage that visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0103]
According to the invention of claim 7, the display time is set by the display time setting means for each of the character data divided by the dividing means based on the reproduction time of the image. Since each character data divided by the dividing means is displayed based on the display time set by the display time setting means, the audio data added to the image data can be easily recognized. Also, there is an advantage that visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0104]
According to the invention described in claim 8, the instruction means switches the character data displayed on the display means, and when the character display control means receives a switching instruction from the instruction means, Since the character data displayed on the display means is switched to the next character data divided by the dividing means, it is possible to easily recognize the audio data added to the image data, This has the advantage that visibility can be improved when characters are displayed over a still image or a moving image.
[0105]
According to the ninth aspect of the present invention, the embedding unit embeds the audio data recorded by the audio recording unit into the image data by the imaging unit, and the audio recognition unit inserts the audio data into the image data captured by the imaging unit. Since the embedded voice data is recognized and converted into character data, the voice data embedded in the image data captured by a camera provided in a digital camera or a mobile phone can be easily recognized. The advantage is obtained.
[0106]
According to the tenth aspect of the present invention, the voice recognition means includes voice data embedded in image data attached to a mail transmitted and received between devices by the mail transmitting / receiving means via a communication network. Is recognized and converted to character data, so that the voice data embedded in the image data attached to the mail received by the communication device such as a mobile phone can be easily recognized. Is obtained.
[0107]
According to the eleventh aspect of the present invention, the voice data embedded in the image data is recognized and converted into character data, and the character data is displayed so as to overlap the image data. There is an advantage that the voice data embedded in the data can be easily recognized.
[0108]
According to the twelfth aspect of the present invention, it is determined whether or not the character data can be displayed on one screen, and if it is determined that the character data does not fit on one screen, the character data is scrolled. Because it is displayed, it is possible to easily recognize the voice data embedded in the image data, and it is possible to improve the visibility when characters are superimposed and displayed on a still image or a moving image. Benefits are obtained.
[0109]
According to the thirteenth aspect, when it is determined that the character data can fit on one screen, the entire text of the character data is displayed on one screen. The advantage is that data can be easily recognized, and visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0110]
According to the fourteenth aspect, when it is determined that the character data does not fit on one screen, the character size of the character data is reduced, and the entire text of the reduced character data is displayed on one screen. Because it is displayed, it is possible to easily recognize the voice data embedded in the image data, and it is possible to improve the visibility when characters are superimposed and displayed on a still image or a moving image. Benefits are obtained.
[0111]
According to the fifteenth aspect, when it is determined that the character data does not fit on one screen, the vocabulary included in the character data is replaced with a shortened word, and the character data replaced with the shortened word is used. Is displayed on a single screen, so that the voice data embedded in the image data can be easily recognized, and the visibility is improved when displaying characters over still images and moving images. Can be improved.
[0112]
According to the invention of claim 16, the character data is divided by a predetermined length, and each divided character data is displayed so as to be superimposed on the still image or the moving image. There is an advantage that the embedded audio data can be easily recognized, and the visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0113]
According to the seventeenth aspect, a display time is set for each of the divided character data based on a reproduction time of the still image or the moving image, and the set time is set for each of the divided character data. The display is based on the display time, so the audio data embedded in the image data can be easily recognized, and the visibility is improved when characters are overlaid on still images or moving images. This has the advantage that it can be
[0114]
According to the eighteenth aspect of the invention, when there is an instruction to switch the character data, the character data displayed on the screen is switched to the next divided character data. There is an advantage that the embedded audio data can be easily recognized, and the visibility can be improved when characters are superimposed and displayed on a still image or a moving image.
[0115]
According to the nineteenth aspect of the present invention, the image data is taken by a photographing means, and the audio data is recorded by an audio recording means. An advantage is obtained in that audio data embedded in image data captured by a camera included in a telephone or the like can be easily recognized.
[0116]
According to the twentieth aspect of the present invention, the image data is attached to a mail transmitted / received between devices via a communication network, so that the mail received by a communication device such as a mobile phone can be used. There is an advantage that the audio data embedded in the image data attached to the image data can be easily recognized.
[0117]
According to the invention of claim 21, when the embedding unit embeds the audio data recorded by the audio recording unit in the image data photographed by the photographing unit, and displays the image data on the display unit, The voice recognition unit recognizes the voice data embedded in the image data and converts it into character data, and the character display control unit superimposes the converted character data on the image data displayed on the display unit. Since the display is performed, there is obtained an advantage that audio data embedded in image data captured by a camera provided in a digital camera or the like can be easily recognized.
[0118]
According to the invention of claim 22, the voice recognition means recognizes voice data embedded in the image data attached to the mail transmitted / received by the mail transmission / reception means, converts the voice data into character data, and displays the data. The control means superimposes the character data converted by the voice recognition means on the image data attached to the mail, so that the character data converted by the voice recognition means is attached to the mail received by a communication device such as a mobile phone. There is an advantage that audio data embedded in existing image data can be easily recognized.
[0119]
According to the twenty-third aspect of the present invention, since the information device is a mobile phone, it is possible to easily recognize voice data embedded in image data attached to a mail received by the mobile phone. The advantage that it can be obtained is obtained.
[0120]
According to the invention described in claim 24, the voice recognition means recognizes the voice data embedded in the image data attached to the mail without converting it into audible sound. An advantage is obtained in that audio data embedded in image data attached to mail received by a communication device such as a telephone can be easily recognized.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a digital camera according to a first embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the first embodiment.
FIG. 3 is a flowchart for explaining the operation of the first embodiment.
FIG. 4 is a schematic diagram showing a state in which text is centered and displayed.
FIG. 5 is a schematic diagram showing a state in which the character size is reduced and centered and displayed.
FIG. 6 is a schematic diagram showing a state in which text is scroll-displayed.
FIG. 7 is a conceptual diagram showing how vocabulary included in text is replaced with abbreviated words.
FIG. 8 is a schematic diagram showing a state in which text replaced with abbreviated words is scroll-displayed.
FIG. 9 is a conceptual diagram showing how text is converted to vertical writing.
FIG. 10 is a schematic diagram showing a state in which text is displayed vertically.
FIG. 11 is a flowchart for explaining the operation of the second embodiment of the present invention.
FIG. 12 is a flowchart for explaining the operation of the second embodiment of the present invention.
FIG. 13 is a schematic view illustrating an appearance of a mobile phone according to a third embodiment of the present invention.
FIG. 14 is a block diagram illustrating a circuit configuration of a mobile phone.
FIG. 15 is a flowchart for explaining the operation of the mobile phone according to the third embodiment.
FIG. 16 is a schematic diagram showing a display example of a mobile phone according to the third embodiment.
FIG. 17 is a schematic diagram showing a display example of the mobile phone according to the third embodiment.
[Explanation of symbols]
1 lens
2 CCD
3 Timing generator
4 Vertical driver
5 Sample hold circuit
6 A / D converter
7 Color process circuit
8 DMA controller
9 DRAM interface
10 DRAM
11 CPU (image display control means, character display control means, determination means, reduction means, abbreviation word replacement means, division means, display time setting means)
12 VRAM controller
13 VRAM
14 Digital Video Encoder
15 Display device
16 Operation unit (instruction means)
17 JPEG processing unit
18 Flash memory
19 voice recognition processing unit (voice recognition means)
20 Text segmentation processing unit
21 Program ROM

Claims

In an information device that reproduces and displays image data to which audio data has been added,
Display means for displaying image data;
Voice recognition means for recognizing voice data added to the image data and converting it to character data;
An information device comprising: character display control means for displaying character data converted by the voice recognition means so as to be superimposed on image data displayed on the display means.

Determining means for determining whether the character data converted by the voice recognition means can be displayed in one screen of the display means,
The character display control means includes:
2. The information device according to claim 1, wherein when the determination unit determines that the character data does not fit on one screen, the character data is scroll-displayed.

The character display control means includes:
2. The information device according to claim 1, wherein when the determination unit determines that the character data can fit on one screen, the entire text data is displayed on one screen.

The character display control unit includes a reduction unit that reduces a character size of character data,
The character display control means, when the determination means determines that the character data does not fit on one screen, displays the full text of the character data whose character size has been reduced by the reduction means on one screen. The information device according to claim 1, wherein:

The character display control means includes abbreviated word replacement means for replacing vocabulary included in character data with abbreviated words,
The character display control means, when it is determined by the determination means that the character data does not fit on one screen,
2. The information device according to claim 1, wherein the whole sentence of the character data whose vocabulary has been replaced by the shortened word replacement means is displayed on one screen.

The apparatus further includes a dividing unit that divides the character data converted by the voice recognition unit by a predetermined length,
The character display control means includes:
6. The information device according to claim 1, wherein the character data divided by the dividing unit is displayed at a predetermined timing so as to overlap the image data displayed on the display unit.

The character display control means includes a display time setting means for setting a display time for each of the character data divided by the dividing means based on a reproduction time of the image data, and for each of the character data divided by the dividing means 7. The information device according to claim 6, wherein the information is displayed based on the display time set by the display time setting means.

Further comprising instruction means for instructing switching of character data displayed on the display means,
The character display control means includes:
8. The information apparatus according to claim 6, wherein, when a switching instruction is issued from the instruction unit, the character data displayed on the display unit is switched to the next character data divided by the dividing unit. .

Photographing means for photographing image data;
Voice recording means for recording voice;
Embedding means for embedding audio data recorded by the audio recording means in the image data photographed by the imaging means,
9. The information device according to claim 1, wherein the voice recognition unit recognizes voice data embedded in the image data captured by the capturing unit and converts the voice data into character data.

E-mail transmission / reception means for transmitting / receiving e-mail between devices via a communication network,
9. The apparatus according to claim 1, wherein the voice recognition unit recognizes voice data embedded in image data attached to a mail transmitted / received by the mail transmission / reception unit and converts the voice data into character data. Information devices described in Crab.

In a data display method for reproducing and displaying image data in which audio data is embedded,
A data display method comprising recognizing voice data embedded in the image data, converting the voice data into character data, and displaying the character data so as to overlap the image data.

12. The method according to claim 11, wherein it is determined whether or not the character data can be displayed on one screen, and if it is determined that the character data does not fit on one screen, the character data is scroll-displayed. Data display method.

12. The data display method according to claim 11, wherein when it is determined that the character data can fit on one screen, the whole text of the character data is displayed on one screen.

12. The method according to claim 11, wherein when it is determined that the character data does not fit on one screen, the character size of the character data is reduced, and the entire text of the reduced character data is displayed on one screen. Data display method.

When it is determined that the character data does not fit on one screen, the vocabulary included in the character data is replaced with abbreviated words, and the full text of the character data replaced with the abbreviated words is displayed on one screen. The data display method according to claim 11, wherein

16. The data display method according to claim 11, wherein the character data is divided by a predetermined length, and each divided character data is displayed so as to be superimposed on the still image or the moving image.

A display time is set for each of the divided character data based on the reproduction time of the still image or the moving image, and display is performed for each of the divided character data based on the set display time. The data display method according to claim 16.

18. The data display method according to claim 16, wherein when there is an instruction to switch the character data, the character data displayed on the screen is switched to the next divided character data.

19. The data display method according to claim 11, wherein the image data is photographed by photographing means, and the sound data is recorded by sound recording means.

19. The data display method according to claim 11, wherein the image data is attached to a mail transmitted and received between devices via a communication line network.

Photographing means for photographing image data;
Voice recording means for recording voice;
Embedding means for embedding audio data recorded by the audio recording means in image data photographed by the imaging means,
Display means for displaying the image data;
Voice recognition means for recognizing voice data embedded in the image data and converting it to character data,
A digital camera, comprising: character display control means for displaying the character data converted by the voice recognition means so as to overlap the image data displayed on the display means.

A mail sending / receiving means for sending / receiving mail between devices via a communication network,
Voice recognition means for recognizing voice data embedded in the image data attached to the mail transmitted and received by the mail transmitting and receiving means and converting it to character data,
An information device, comprising: display control means for superimposing and displaying the character data converted by the voice recognition means on the image data attached to the mail.

The information device according to claim 22, wherein the information device is a mobile phone.

24. The information device according to claim 22, wherein the voice recognition unit recognizes voice data embedded in the image data attached to the mail without converting the voice data into an audible sound. .