JP3962820B2

JP3962820B2 - Mobile phone, document display method with voice, call voice display method, and mobile phone display method

Info

Publication number: JP3962820B2
Application number: JP2003187864A
Authority: JP
Inventors: 明弘塚本
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-06-30
Filing date: 2003-06-30
Publication date: 2007-08-22
Anticipated expiration: 2023-06-30
Also published as: JP2005026822A

Description

【０００１】
【発明の属する技術分野】
本発明は、携帯電話、音声付き文書表示方法、通話音声表示方法および携帯電話の表示方法に関する。
【０００２】
【従来の技術】
従来、携帯電話においては、音声の録音・再生機能を備え、送信側では、録音した音声ファイルをメールに添付して送信し、受信側では、メールに添付されている音声を再生出力するものがある。また、近年、データ転送速度の向上により、携帯電話同士で、通話時に動画を送受信するものがある。この場合、ユーザは、携帯電話本体を耳に当てた状態でスピーカから通話音声を聞くことになるが、この状態では、動画を見ることができない。そこで、携帯電話本体を対面状態で把持し、イヤフォンなどを用いて通話音声を聞くとともに、対面状態の携帯電話の表示部で動画を見ることになる。
【０００３】
ところで、メールに添付されている音声を再生出力する場合や、通話時に動画を再生しつつ通話音声を出力する場合においては、音声をスピーカで出力すると、周囲の雑音が大きい場合などには、音声が聞き取りにくかったり、周囲が静かな場合などには、周りに迷惑をかけたりするという問題がある。また、イヤフォンなどを用いて音声を聞くことも考えられるが、イヤフォンを取り出して接続したりするのは手間がかかるという問題がある。そこで、音声を音声認識してテキストに変換し、テキスト文字として表示させることができれば、ユーザにとって利便性が向上すると考えられる。
【０００４】
例えば、音声を音声認識する従来技術としては、多くの画像データの中から所望する画像データを検索するために、画像データ毎に、内容が分かるような音声を録音し、それぞれの画像データに関連付け、該音声を可変速再生して目的の画像データを探す技術が提案されている（例えば特許文献１参照）。該特許文献１では、上記音声を音声認識によりテキストに変換し、該テキストを画像データのインデックスとして用いて、画像データのソート（並び替え）を行なうことが可能になっている。
【０００５】
【特許文献１】
特開２００２−４１５２９号公報
【０００６】
【発明が解決しようとする課題】
上述した特許文献１による従来技術では、音声データをテキストへ変換する音声認識技術や、音声認識したテキストによりソートを行なう技術などが開示されている。しかしながら、携帯電話でメールに添付されている音声や、通話時において動画と通話音声などについて、従来技術により、音声を音声認識してテキストに変換したとしても、携帯電話などの限られた面積（解像度）の表示部にどのように表示するかは何ら記述されていない。
【０００７】
そこで本発明は、メールに添付されている音声データを、より容易に確認することができ、また、通話時における通話音声と通話時に送信されてくる動画とを、より容易に確認することができる携帯電話、音声付き文書表示方法、通話音声表示方法および携帯電話の表示方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
上記目的達成のため、請求項１の発明は、動画を表示しながらテキストを画面に重ねて表示する携帯電話において、相手側から送信されてくる動画の明度が中間値（Ｍ）より明るいか否かを判別する第１の判別手段と、前記第１の判別手段により動画の明度が中間値（Ｍ）より明るいと判別されたとき、更に動画の明度が中間値（Ｍ）と閾値（ＴＨ１）（ただしＭ＜ＴＨ１）との間であるか否かを判別する第２の判別手段と、前記第２の判別手段により動画の明度が中間値（Ｍ）より暗いと判別されたとき、更に動画の明度が中間値（Ｍ）と閾値（ＴＨ２）（ただしＭ＞ＴＨ２）との間であるか否かを判別する第３の判別手段と、前記第２の判別手段により動画の明度が中間値（Ｍ）と閾値（ＴＨ１）との間にあると判別されたとき、動画の明度を所定量増加させ、前記第３の判別手段により動画の明度が中間値（Ｍ）と閾値（ＴＨ２）との間にあると判別されたとき、動画の明度を所定量減少させる明度調整手段と、前記明度調整手段により明度が調整された動画上に前記テキストを重ねて表示する表示制御手段とを具備することを特徴とする。
【００２３】
また、好ましい態様として、例えば請求項２記載のように、請求項１記載の携帯電話において、前記第１の判別手段、前記第２の判別手段、前記第３の判別手段、および前記明度調整手段は、前記動画の明度判別と明度の増加または減少を、フレーム毎に行ない、リアルタイムで動画の明度を調整するようにしてもよい。
【００２４】
また、好ましい態様として、例えば請求項３記載のように、請求項１または２記載の携帯電話において、前記第１の判別手段、前記第２の判別手段、または、前記第３の判別手段による動画の明度の判別結果に基づいて、画面に重ねて表示するテキストの文字色を設定する文字色設定手段を更に備えるようにしてもよい。
【００２５】
また、好ましい態様として、例えば請求項４記載のように、請求項３記載の携帯電話において、前記文字色設定手段は、動画の明度を減少させたとき、テキストの文字色を白に設定するようにしてもよい。
【００２６】
また、上記目的達成のため、請求項５記載の発明による携帯電話の表示方法は、動画を表示しながらテキストを画面に重ねて表示する携帯電話の表示方法において、前記動画の明度をＢ、明度の中間値をＭ、所定の明るい側の閾値をＴＨ１、所定の暗い側の閾値をＴＨ２としたとき、動画のフレーム単位で、ＴＨ１＞Ｂ＞Ｍであるときは明度Ｍを増加するステップと、Ｍ＞Ｂ＞ＴＨ２であるときは明度Ｍを減少するステップと、前記明度Ｍが調整された動画上に前記テキストを重ねて表示するステップとを含むことを特徴とする。
【００２７】
また、好ましい態様として、例えば請求項６記載のように、請求項５記載の携帯電話の表示方法において、前記Ｍ＞Ｂ＞ＴＨ２であるときは、更にテキストの文字色を白に設定するステップを更に含むようにしてもよい。
【００２９】
【発明の実施の形態】
以下、本発明の実施の形態を、携帯電話に適用した一実施例として、図面を参照して説明する。
【００３０】
Ａ．第１実施形態
Ａ−１．第１実施形態の構成
図１は、本発明の第１実施形態による携帯電話システムの構成を示すブロック図である。図において、携帯電話（無線通信端末／データ通信装置）１ａ，１ｂは、音声録音機能を備え、録音した音声データを添付したメールをシステム（特に、後述するメールサーバ３３、６３）に送信する機能、および音声付きメールを受信する機能を備えている。該携帯電話１ａ，１ｂは、音声付きメールを作成する際、該メール本文に添付する音声を音声認識してテキストデータに変換し、メール本文とテキストデータとを一画面で確認できるように表示する。また、携帯電話１ａ，１ｂは、通話時に相手側から動画が送信されてくると、通話音声を音声認識してテキストデータに変換し、動画とテキストデータとを同一画面で確認できるように表示する。
【００３１】
また、動画再生時には、９６×８０ピクセルの動画を１２８×９６ピクセルの最大表示領域に拡大することが可能となっている。拡大方式には、動画周囲を単純に拡大する方式、動画の周囲にいくほど直線的に拡大率を大きくする方式、中央部から周囲にいくほど指数関数的に拡大率を大きくする方式などが考えられるが、いずれの場合においても、拡大により動画の視認性を低下させないため、中央部は１倍のままとする。なお、動画の拡大アルゴリズムは、周知の技術を用いるものとし説明を省略する。
【００３２】
無線基地局２，２は、公衆回線網４を介して携帯電話１ａ，１ｂのユーザが加入する通信サービス事業者（インターネットプロバイダを含む）３に接続する。通信サービス事業者（インターネットプロバイダを含む）３は、主なサービスとして提供している携帯電話サービスに必要とする交換機３４の他、後述のＷＷＷ５に接続するためのシステム（Ｗｅｂサーバ３２、ルータ３５）、メールシステム（メールサーバ３３）を備えている。無線基地局２をＡＰ（アクセス・ポイント）として、携帯電話１ａ，１ｂをＷＷＷ５に接続させるための機能も備えている。
【００３３】
公衆回線網４は、アナログ、デジタル電話回線網である。ＷＷＷ５は、所謂インターネットである。インターネットサービスプロバイダ（以下ＩＳＰと称す）６は、交換機を除いて、通信サービス事業者３と同じ構成であり、ＷＷＷ５に接続するためのシステム（便宜的にＷｅｂサーバ６２、ルータ６５）、メールシステム（メールサーバ６３）を備えている。パーソナルコンピュータ７は、公衆回線４、ＩＳＰ６を介してＷＷＷ５に接続したり、メールの送受信を行なう機能を備えている。
【００３４】
なお、本実施形態の特徴として、携帯電話１ａ，１ｂ同士では、音声付きメールを、メール本文と音声を変換したテキストとを一画面で表示するが（詳細は後述）、携帯電話１ａ，１ｂからパーソナルコンピュータ７へ音声付きメールを送信した場合、パーソナルコンピュータ７では、音声は、添付ファイルとして扱われる。すなわち、本実施形態による「音声付きメールソフトプログラム」は、一般に知られるメールソフトと互換性を持っている。
【００３５】
次に、図２（ａ），（ｂ）は、携帯電話１ａ，１ｂの外観図（開状態：正面図および背面図）である。本実施形態における携帯電話１ａ、１ｂは、蓋部と本体部からなる二つ折り構造である。アンテナ１１は、蓋部の背面に設けられており、伸縮自在となっている。スピーカ１２は、蓋部の前面側に設けられており、音声出力を行なう。表示部（メイン表示部）１３は、カラー液晶であり、例えば１２０ドット（幅）×１６０ドット（高さ）である。該表示部１３には、音声付きメールの音声（テキスト）とメール本文、通話音声（テキスト）と動画とを同一画面上に表示可能となっている。
【００３６】
キー操作部１４は、本体部の前面に設けられており、各種機能キー（メールキー１４１、アドレスキー１４２、ファンクションキー１４３）、テンキー１４４、シャッターキー１４５などからなる。メールキー１４１は、メール機能を起動し、メールメニューを表示するためのものである。アドレスキー１４２は、送信先のメールアドレスを選択する際に用いるアドレス帳を開くためのものである。ファンクションキー１４３は、音声付きメールの作成時、あるいは音声付きメールの確認時、動画の再生や一時停止などを指示するためのものである。テンキー１４４は、電話番号や文字の入力の際に使用する。シャッターキー１４５は、撮像モード時において、押圧検出で、静止画を撮影したり、所定秒（例えば２秒）の押圧維持検出で、動画を撮影したりする。
【００３７】
マイク１５は、本体部の下部に設けられており、通話時の音声入力や、録音時の音声入力を行なう。サブ表示部１６は、蓋部の背面に設けられている。背面キー１７は、透明、または半透明部材で構成され、着信の際、発光するＬＥＤ１７１を内蔵している。撮像レンズ１８は、蓋部２の背面、上記サブ表示部１６の下部に設けられている。報知スピーカ１９は、着信などを報知するものであり、蓋部を本体部に閉じた状態でも報知音が聞こえるように、本体部の裏面に配置されている。
【００３８】
次に、図３は、携帯電話１ａ，１ｂの構成を示すブロック図である。無線送受信部２０は、無線によりアンテナ１１を介して音声やデータ（メールデータ）を送受し変調／復調する。無線信号処理部２１は、無線送受信部２０で受信した音声やデータ（メールデータ）を復調し、あるいは無線送受信部２０へ送信するための音声やデータを変調するなどの無線通信に必要な処理をする。制御部２２は、各種動作および全体の動作を制御する。
【００３９】
画像メモリ２３は、撮像部（撮像レンズ１８、撮像モジュール１８１、ＤＳＰ１８２）で撮像され、画像処理プログラム領域２４１３に格納されたプログラムにより圧縮符号化された画像ファイルや、ＷＷＷ５を介してダウンロードした画像ファイルを格納するためのメモリである。ＲＯＭ２４は、書換可能なＦｌａｓｈＲＯＭで構成され、本発明の特徴となる、後述の各種プログラムを格納する。
【００４０】
ドライバ２５は、表示部１３を駆動させる。ドライバ２６は、サブ表示部１６を駆動させる。加入者情報記憶部２７は、本携帯電話１を呼び出すための電話番号や、操作者（加入者）のＩＤ等、プロフィールデータを格納する。ＲＯＭ２８は、制御部１８を制御する各種プログラムなどを格納する。ＲＡＭ２９は、無線通信端末として必要な各種データを記憶し、かつ制御部２５が動作する上で必要なデータを格納するとともに、メールデータも格納記憶する。特に、本実施形態では、ＲＡＭ２９には、撮影モードにおいて、撮影中の動画を一度バッファリグするための記憶領域がある。
【００４１】
音声信号処理部２００は、マイク１５から入力された音声信号を符号化処理したり、無線信号処理部２１から出力された信号に基づいて復号化してスピーカ１２を駆動させ、音声を出力する。撮像モジュール１８１は、ＣＣＤ、若しくはＣＭＯＳで構成され、カラー画像を取り込む。ＤＳＰ１８２は、撮像モジュール１８１にて取り込まれた画像を符号化処理する。報知デバイス１９２は、報音スピーカ１９、バイブレータ１９１、ＬＥＤ１７１を駆動させるためのドライバである。音声認識処理部３０は、メールに添付されている音声データ、または通話時の相手の音声を認識してテキストデータに変換する。
【００４２】
次に、図４は、本第１実施形態による携帯電話１ａ，１ｂのＲＯＭ２４のメモリエリアの構成を示す概念図である。該ＲＯＭ２４は、本実施形態の特徴であるソフトウェアプログラムを格納する。ＲＯＭ２４は、メールソフトウェアプログラム領域２４１１、音声付きメールソフトプログラム領域２４１２、音声認識プログラム領域２４１３、およびその他のプログラム領域２４１５からなる。
【００４３】
メールソフトウェアプログラム領域２４１１は、既知のメールソフトウェアプログラムを格納する。音声付きメールソフトプログラム領域２４１２は、音声付きメールの作成、受信した音声付きメールの閲覧を行なうための音声付きメールソフトプログラムを格納する。なお、音声付きメールソフトウェアプログラムは、ＩＳＰ６を介してＷＷＷ５からダウンロードするようにしてもよい。
【００４４】
音声認識プログラム領域２４１３は、メールに添付された音声を認識し、テキストに変換する。音声データは、所定のフォーマット（例えば、ＷＡＶ形式、ＭＰ３形式、ＰＣＭ形式など）により符号化されている。その他のプログラム領域２４１４は、上記以外のアプリケーションプログラムなどを格納する。
【００４５】
次に、図５は、本第１実施形態による携帯電話１ａ，１ｂのＲＡＭ２９のメモリエリアの構成を示す概念図である。ＲＡＭ２９は、アドレス帳データ格納領域２９１、メールデータ格納領域２９２、添付ファイル格納領域２９３、その他・ワークメモリ２９４および音声データバッファ領域２９５からなる。アドレス帳データ格納領域２９１は、氏名、電話番号、メールアドレス等を一組のレコードとして複数格納する。メールデータ格納領域２９２は、メールソフトウェアを使用して作成されたメールデータや、受信したメールデータを格納する。添付ファイル格納領域２９３は、メールにファイルが添付されている場合、このファイルを格納する。その他・ワークメモリ２９４は、その他のワークメモリとして、例えば、各種データを格納する。音声データバッファ領域２９５は、録音中の音声を一度バッファリングするための記憶領域である。
【００４６】
Ａ−２．第１実施形態の動作
次に、上述した第１実施形態による携帯電話の動作について説明する。
（１）メール作成
次に、本第１実施形態によるメール作成時の動作について説明する。ここで、図６ないし図９は、本実施形態による携帯電話のメール作成時の動作を説明するためのフローチャートである。また、図１０（ａ），（ｂ）は、メールに添付される音声ファイル再生例やメール本文の表示例を示す模式図である。
【００４７】
ユーザによりメール作成モードが選択されると、メールメニューを表示する（ステップＳ４０）。次に、メールメニューから新規作成が選択されたか否かを判断する（ステップＳ４２）。ここで、新規作成が選択されなかった場合には、受信ボックスが選択されたか否かを判断し（ステップＳ４４）、受信ボックスが選択されなければ、その他の処理へ進む（ステップＳ４６）。なお、受信ボックスが選択された場合、すなわち受信メールを表示するための処理については後述する。
【００４８】
一方、メールメニューから新規作成が選択されると、作成メニューを表示する（ステップＳ４８）。次に、メールメニューから音声データ付きメールが選択された否かを判断する（ステップＳ５０）。そして、音声データ付きメールが選択されなかった場合には、通常のメール処理へ進む（ステップＳ５４）。
【００４９】
一方、メールメニューから音声データ付きメールが選択された場合には、音声データ付きメールプログラムおよび音声認識プログラムをロードし（ステップＳ５６）、作成画面を表示する（ステップＳ６０）。
【００５０】
次に、作成画面において、「音声選択」、「メール作成（編集）」のいずれの項目が選択されたか否かを判断する（ステップＳ６２）。ここで、「音声選択」が選択された場合には、まず、音声ファイルを検索する（ステップＳ６４）。そして、音声ファイルがあったか否かを判断し（ステップＳ６６）、音声ファイルがなければ、音声を入力して録音し（ステップＳ６８）、録音した音声を音声ファイルとして保存し（ステップＳ７０）、ステップＳ６４の音声ファイル検索へ戻り、上述した処理を繰り返す。
【００５１】
一方、音声ファイルがあった場合には、音声ファイルをリスト表示する（ステップＳ７２）。次に、リスト表示からいずれかの音声ファイルが選択されたか否かを判断する（ステップＳ７４）。そして、選択されない場合には、ステップＳ７２に戻り、リスト表示を継続する。
【００５２】
一方、音声ファイルが選択されると、図１０（ａ）に示すように、選択された音声を再生出力し（ステップＳ７６）、音声認識処理部３０により音声データを認識し、テキストデータに変換し（ステップＳ７８）、認識したテキストデータを表示部１３に表示する（ステップＳ８０）。その後、ステップＳ６０の作成画面に戻り、上述した処理を繰り返す。
【００５３】
また、作成画面において、「メール作成・編集」が選択された場合には、まず、各項目・編集・表示を行なう画面を表示する（図８のステップＳ８２）。ユーザは、該画面において、メールのタイトル、メール送信者の名前、メール宛先、メール本文などを入力する。次に、「再生」を指示するファンクションキー１４３が操作されたか否かを判断し（ステップＳ８４）、さらに「決定」が検出されたか否かを判断する（ステップＳ８６）。そして、「再生」を指示するファンクションキー１４３の操作も「決定」もされなかった場合には、ステップＳ８２へ戻り、「各項目・編集・表示」の画面表示を継続する。
【００５４】
一方、「再生」を指示するファンクションキー１４３が操作されると、メールに添付すべき音声が選択されているか否かを判断する（ステップＳ８８）。そして、音声が選択されている場合には、図１０（ｂ）に示すように、選択された音声を再生出力し（ステップＳ９０）、音声認識処理部３０により音声データを認識し、テキストデータに変換し（ステップＳ９２）、メール本文と認識したテキストデータとを表示部１３に表示する（ステップＳ９４）。その後、ステップＳ８２に戻り、上述した処理を繰り返す。
【００５５】
また、ステップＳ８２で、各項目・編集・表示画面が表示されているとき、音声が選択されていない状態で、ステップＳ８４でファンクションキーが操作された場合には、「音声が選択されていません」というメッセージを表示する（図９のステップＳ１０６）。次に、所定時間経過したか否かを判断し（ステップＳ１０８）、所定時間経過していなければ、ステップＳ１０６へ戻り、メッセージの表示を維持する。一方、メッセージの表示が所定時間経過した場合には、メッセージを消去する（ステップＳ１１０）。その後、図６のステップＳ６０へ戻り、前述した処理を繰り返す。
【００５６】
そして、ステップＳ８２で、各項目・編集・表示画面が表示されているとき、「決定」の操作が検出されると、未記入項目があるか、あるいは音声が選択されていないかを判断する（ステップＳ１１０）。そして、未記入項目があるか、あるいは音声ファイルが選択されていない場合には、ユーザに対する確認表示を行ない（ステップＳ１１２）、「ＯＫ」が選択されたか否かを判断する（ステップＳ１１４）。そして、「ＯＫ」が選択されない場合、すなわち未記入項目があるか、音声ファイルが選択されていない場合には、図６に示すステップＳ６０へ戻り、作成画面の表示へ進む。以降、ユーザは、未記入項目への入力、あるいはメールに添付する音声ファイルを選択するなどしてメールを完成させることになる。
【００５７】
一方、ユーザへの確認表示において、「ＯＫ」が選択された場合には、未記入項目があってもよい、あるいは音声ファイルが選択されていなくてもよい、ということであるので、メールを送信ボックスに格納し（ステップＳ１１６）、図６に示すステップＳ４０のメールメニューの表示へ戻り、前述した処理を繰り返す。また、未記入項目がないか、あるいは音声ファイルが選択されている場合には、確認することなく、音声ファイルが添付されたメールを送信ボックスに格納し（ステップＳ１１６）、図６に示すステップＳ４０のメールメニューの表示へ戻り、前述した処理を繰り返す。送信ボックスに格納された音声ファイル付きメールは、所定のタイミングで送信される。
【００５８】
（２）受信メール表示処理
次に、図１１は、本第１実施形態による受信メール表示処理を説明するためのフローチャートである。なお、図１１に示すフローチャートは、前述した図６に示すステップＳ４４における「ＹＥＳ」からの分岐である。
【００５９】
受信フォルダが選択された場合には、受信したメールの一覧を表示すべく受信リストを表示する（ステップＳ２００）。ここで、図１２は、表示部１３に表示される受信リストの表示例を示す模式図である。アイコン１３０１は、バッテリの充電量を表わしている。また、アイコン１３０２は、電波受信状態を表わしている。また、受信リストでは、アイコン１３０３〜１３０６でメールの既読／未読を判別できるようになっている。すなわち、アイコン１３０３は、未読メールを表わしている。また、アイコン１３０４は、未読音声付きメールを表わしている。また、アイコン１３０５は、既読メールを表わしている。アイコン１３０６は、既読音声付きメールを表わしている。ここで、ユーザは、表示したいメールを受信リストから選択操作することになる。
【００６０】
次に、音声付きメールが選択されたか否かを判断する（ステップＳ２０２）。そして、音声付きメールが選択された場合には、音声付きメールソフトプログラムおよび音声認識プログラムをロードし（ステップＳ２０４）、音声認識処理部３０により音声データを認識し、テキストデータに変換し（ステップＳ２０６）、図１３に示すように、メール本文と認識したテキストデータとを表示部１３に表示する（ステップＳ２０８）。次に、キャンセル操作が検出されたか否かを判断し（ステップＳ２１０）、キャンセル操作が検出された場合には、ステップＳ２００へ戻り、受信リストの表示へ移行する。
【００６１】
一方、キャンセル操作が検出されなかった場合には、ファンクションキー１４３が操作されたか否かを判断する（ステップＳ２１２）。そして、ファンクションキー１４３が操作されない場合には、ステップＳ２０６へ戻り、メール本文と音声認識したテキストとの表示を継続する。
【００６２】
一方、ファンクションキー１４３が操作されると、図１３に示すように、メールに添付されていた音声ファイルを再生出力する（ステップＳ２１４）。次に、再度、ファンクションキー１４３が操作されたか否かを判断する（ステップＳ２１６）。そして、ファンクションキー１４３が操作されない場合には、ステップＳ２１２へ戻り、音声の再生出力を繰り返す。一方、音声の再生出力後に、ファンクションキー１４３が操作されると、ステップＳ２０８へ戻り、図１３に示すように、メール本文と音声認識したテキストとを表示する。
【００６３】
上述した第１実施形態では、メールに音声ファイルが添付されていた場合、該音声を音声認識によりテキストデータに変換し、メール本文とともに表示部１３に表示するようにしたので、メールに添付された音声ファイルを容易に認識可能にすることができる。
【００６４】
Ｂ．第２実施形態
次に、本発明の第２実施形態について説明する。本第２実施形態では、カメラ付き携帯電話において、通話時に動画（または静止画）が送信されてきた場合、通話音声（相手側）を音声認識によりテキストデータに変換し、再生表示している動画に重ねて、上記テキストを表示するようにしたものである。また、このとき、動画をそのままのサイズ（９６×８０ピクセル）で再生表示させ、該動画にテキストデータを重ねて表示したり、動画を１２８×９６ピクセルに拡大して再生表示させ、該動画にテキストデータを並べて表示したりすることが可能となっている。
【００６５】
Ｂ−１．第２実施形態の構成
本第２実施形態による携帯電話の構成は、第１実施形態で説明した図３と同様であるので説明を省略する。
【００６６】
図１４は、本第２実施形態による携帯電話１ａ，１ｂのＲＯＭ２４のメモリエリアの構成を示す概念図である。該ＲＯＭ２４は、本実施形態の特徴であるソフトウェアプログラムを格納する。ＲＯＭ２４は、動画再生プログラム領域２４１１、音声認識プログラム領域２４１３、画像処理プログラム領域２４１５、およびその他のプログラム領域２４１４からなる。
【００６７】
動画再生プログラム領域２４１１は、通話時に送信されてくる動画を再生するための動画再生プログラムを格納する。音声認識プログラム領域２４１３は、メールに添付された音声を認識し、テキストに変換する。音声データは、所定のフォーマット（例えば、ＷＡＶ形式、ＭＰ３形式、ＰＣＭ形式など）により符号化されている。
【００６８】
画像処理プログラム領域２４１５は、撮像部（撮像レンズ１８、撮像モジュール１８１、ＤＳＰ１８２）で撮像・デジタル符号化されてＲＡＭ２９にバッファリングされた動画データを、ＭＰＥＧ−４に準拠する符号化圧縮処理で圧縮ファイル化するための画像処理プログラムを格納する。また、該画像処理プログラムは、外部より受信、もしくは自機においてファイル化された動画を、再生時に、動画の周囲部を拡大し、全体として１２８×９６ピクセルの表示サイズに変更して表示するようになっている。その他のプログラム領域２４１４は、上記以外のアプリケーションプログラムなどを格納する。
【００６９】
また、図１５は、本第２実施形態による携帯電話１ａ，１ｂのＲＡＭ２９のメモリエリアの構成を示す概念図である。ＲＡＭ２９は、その他・ワークメモリ２９４、音声データバッファ領域２９５、拡大データ領域２９６、および表示バッファ２９７からなる。その他・ワークメモリ２９４は、その他のワークメモリとして、例えば、上述したデコード処理におけるフレームメモリ（３フレーム分）として用いたり、各種データを格納する。音声データバッファ領域２９５は、録音中の音声を一度バッファリングするための記憶領域である。拡大データ領域２９５は、部分拡大した動画データを格納する。表示バッファ２９６は、動画を表示する際のバッファとして用いられる。
【００７０】
次に、図１６は、本第２の実施形態による動画拡大方式を説明するための概念図である。本第２実施態様では、静止画の表示サイズは、１２８×９６ピクセルであり、動画の表示サイズは、通常、９６×８０ピクセルである。但し、本第２実施形態では、動画の視認性を向上させるために、所定の拡大方式により、１２８×９６ピクセルに拡大して表示することができるようになっている。より詳細には、図示するように、９６×８０ピクセルの動画の周囲部（図示の斜線部）を、所定の動画拡大方式（後述）に従って部分拡大し、全体として、１２８×９６ピクセルの表示領域で再生可能となっている。また、動画の停止（一時停止）時は、全体を一様拡大して１２８×９６ピクセル（Ｓｕｂ−ＱＣＩＦサイズ）で静止画として表示する。
【００７１】
次に、図１７は、本第２実施形態による動画拡大方式例を説明するための概念図である。図１７（ａ）では、中央部を１倍、周囲を３倍とし、単純に拡大する方式である。また、図１７（ｂ）では、動画の周囲にいくほど直線的に拡大率を大きくする方式である。また、図１７（ｃ）では、中央部から周囲にいくほど指数関数的に拡大率を大きくする方式である。いずれの場合においても、拡大により動画の視認性を低下させないため、中央部の拡大率は１倍のままとする。いずれの動画拡大方式を用いるかは、予め決めておいてもよいし、ユーザによって選択可能としてもよい。
【００７２】
次に、図１８および図１９は、動画拡大の方式の一例を説明するための概念図である。動画がＭＰＥＧ−４の場合、デコードする際には、図１８に示すように、前後のフレームを参照する必要があるので、通常、バッファには３フレーム分のメモリ領域が必要となる。また、部分拡大の方式としては、一例として、図１９に示すように、注目画素のデータを隣接画素に埋め込むことにより補間する補間法を用いればよい。通常、補間後のデータにフィルタ処理を施すという画像処理を加える。
【００７３】
Ｂ−２．第２実施形態の動作
次に、上述した第２実施形態による携帯電話の動作について説明する。
（１）着信動作
図２０は、本第２実施形態による携帯電話において着信時の動作を説明するためのフローチャートである。まず、待機状態において、着信したか否かを判断する（ステップＳ３００）。そして、着信すると、相手側から動画を受信したか否かを判断する（ステップＳ３０２）。ここで、動画を受信していない場合には、通常の通話処理へ進む（ステップＳ３０４）。
【００７４】
一方、動画を受信した場合には、動画再生プログラム、音声認識プログラムおよび画像処理プログラムをロードする（ステップＳ３０６）。次に、動画再生プログラムにより、受信した動画を再生する（ステップＳ３０８）。動画再生の詳細については後述する。動画再生においては、所定のファンクションキーの操作に応じて、９６×８０ピクセルで再生するか、図１７（ａ）〜（ｃ）に示す拡大方式のいずれかに従って、１２８×９６ピクセルに部分拡大して再生する。部分拡大して１２８×９６ピクセルで再生しているとき、フラグＦは「１」であり、９６×８０ピクセルで通常再生しているとき、フラグＦは「０」である。
【００７５】
次に、相手側からの通話音声を音声認識し、テキストデータに変換し（Ｓ３１０）、フラグＦが「１」であるか否かを判断する（ステップＳ３１２）。フラグＦが「１」である場合、すなわち部分拡大再生が選択されている場合には、図２１（ａ）に示すように、１２８×９６ピクセルに部分拡大された動画を背景に音声認識したテキストを重ねて表示する（ステップＳ３１４）。一方、フラグＦが「０」である場合、すなわち通常再生が選択されている場合には、図２１（ｂ）に示すように、９６×８０ピクセルの動画に対して、並列に音声認識したテキストを表示する（ステップＳ３１６）。
【００７６】
いずれの場合も、次に、オフフック（通話終了）されたか否かを判断し（Ｓ３１８）、通話が継続されている場合には、ステップＳ３０８へ戻り、動画の再生、通話音声を音声認識したテキストの表示を継続する。一方、オフフックされた場合には、回線を切断して当該処理を終了する（ステップＳ３２０）。
【００７７】
（２）動画再生
次に、上述した動画再生処理について説明する。ここで、図２２は、本第２実施形態による携帯電話において通話時に相手側から送信されてくる動画の再生動作について説明するためのフローチャートである。上述した着信動作において、通話時に動画を受信すると、図２０のステップＳ３０８における動画の再生では、以下のようにして処理が実行される。
【００７８】
まず、受信した動画データをバッファリングし（ステップＳ１６０）、デコードし（ステップＳ１６２）、フラグＦが「１」であるか否か、すなわち部分拡大表示するか否かを判断する（ステップＳ１６４）。そして、フラグＦが「１」である場合には、図１７（ａ）〜（ｃ）に示す拡大方式のいずれかに従って部分拡大処理を実行する（ステップＳ１６６）。そして、部分拡大処理した、１２８×９６ピクセルの動画をＬＣＤの表示バッファへ格納する（ステップＳ１６８）。この場合、図１６に示すように、９６×８０ピクセルの周辺部が拡大（斜線部分）されて、１２８×９６ピクセルの動画として再生され、前述した図２１（ａ）に示すように、部分拡大された再生動画上に音声認識されたテキストが重ねて表示される。
【００７９】
一方、フラグＦが「１」でない場合には、部分拡大することなく、９６×８０ピクセルのサイズのまま、動画をＬＣＤの表示バッファへ格納する（ステップＳ１６８）。この場合、９６×８０ピクセルの動画として再生され、前述した図２１（ｂ）に示すように、動画と並列に音声認識されたテキストが表示される。
【００８０】
次に、所定のファンクションキー（部分拡大←→通常を選択するキー）が押下されたか否かを判断する（ステップＳ１７０）。そして、所定のファンクションキーが押下されない場合には、図２０へ戻り、ステップＳ３１０以降へ進む。
【００８１】
一方、所定のファンクションキーが押下された場合には、フラグＦが「１」であるか否かを判断し（ステップＳ１７２）、フラグＦが「１」の場合には、「０」にし（ステップＳ１７４）、フラグＦが「０」の場合には、フラグＦを「１」とする（ステップＳ１７６）。すなわち、動画再生中（＝通話中）に、所定のファンクションキー（部分拡大←→通常を選択するキー）が押下される度に、部分拡大した再生と通常サイズの再生とが交互に実行されることになる。そして、図２０へ戻り、ステップＳ３１０以降へ進む。
【００８２】
（３）動画への音声認識によるテキストの重ね表示処理
次に、本第２実施形態による動画への音声認識によるテキストの重ね表示時の動作について説明する。ここで、図２３は、本第２実施形態による動画への音声認識によるテキストの重ね表示時の動作を説明するためのフローチャートである。上述した通話時において、ステップＳ３１４の再生動画上への音声認識によるテキストの重ね表示では、以下のようにして処理が実行される。
【００８３】
まず、相手側から送信されてくる動画（フレーム）の明度を判別する（ステップＳ１８０）。そして、動画の明度が中間値Ｍであるか否かを判断する（ステップＳ１８２）。そして、動画の明度が中間値Ｍであれば、動画の明度を所定量増やす（ステップＳ１８８）。
【００８４】
一方、動画の明度が中間値Ｍでなければ、中間値Ｍより明るいか否かを判断し（ステップＳ１８４）、中間値Ｍより明るければ、さらに、中間値Ｍと閾値ＴＨ１（中間値Ｍ＜閾値ＴＨ１）との間であるか否かを判断する（ステップＳ１８６）。そして、動画の明度が中間値Ｍと閾値ＴＨ１との間であれば、動画の明度を所定量増やす、すなわち明るくする（ステップＳ１８８）。
【００８５】
一方、動画の明度が中間値Ｍより暗ければ、さらに、中間値Ｍと閾値ＴＨ２（中間値Ｍ＞閾値ＴＨ２）との間であるか否かを判断する（ステップＳ１９０）。そして、動画の明度が中間値Ｍと閾値ＴＨ２との間であれば、動画の明度を所定量減らす、すなわち暗くする（ステップＳ１９２）。そして、表示する文字色（音声認識によるテキスト）を白に設定する（ステップＳ１９４）。また、動画の明度が閾値ＴＨ２より暗い場合には、明度を調整することなく、そのまま表示する文字色（音声認識によるテキスト）を白に設定する（ステップＳ１９４）。
【００８６】
上述したように、動画の明度を調整した後（あるいは文字色を白に設定した後）、表示部１３に、音声認識したテキストと動画とを重ねてプレビュー表示する（ステップＳ１９６）。そして、図２０に戻り、ステップＳ３１８以降へ進む。
【００８７】
このように、本第２実施形態では、明度判別によって、通話音声を音声認識したテキストの背景として表示される動画の明るさを調整することを特徴としている。ここで、図２４は、動画の明度調整について説明するための概念図である。また、図２５は、動画の明るさ（明度）が中間値Ｍ、若しくは、中間値Ｍと閾値ＴＨ１（明度７０％）の間にあるときに明度調整された動画に音声認識によるテキストを重ねて表示した場合と、動画の明るさ（明度）が中間値Ｍと閾値ＴＨ２（明度３０％）の間にあるときに明度調整された動画に音声認識によるテキストを重ねて表示した場合との模式図である。
【００８８】
動画の明るさ（明度）が、図２４に示すように、中間値Ｍ、若しくは、中間値Ｍと閾値ＴＨ１（明度７０％）の間にあるときは、この動画の明度を上げるように加工処理し、図２５（ａ）に示すように、音声認識によるテキストを重ねて表示する。一方、動画の明るさ（明度）が、図２４に示すように、中間値Ｍ以下で、中間値Ｍと閾値ＴＨ２（明度３０％）の間にあるときは、この動画の明度を下げるように加工処理し、図２５（ｂ）に示すように、音声認識によるテキストを重ねて表示する。
【００８９】
上述した第２実施形態では、通話時に動画が送られてきた場合、動画を再生するとともに、通話による相手側の音声を音声認識によりテキスト化し、該テキストを再生動画に重ねて表示したり、動画に並べて表示したりするようにしたので、通話の音声と動画との視認性を同時に向上させることができる。
【００９０】
【発明の効果】
請求項１記載の発明によれば、第１の判別手段により、相手側から送信されてくる動画の明度が中間値（Ｍ）より明るいか否かを判別し、第２の判別手段により、前記第１の判別手段により動画の明度が中間値（Ｍ）より明るいと判別されたとき、更に動画の明度が中間値（Ｍ）と閾値（ＴＨ１）（ただしＭ＜ＴＨ１）との間であるか否かを判別し、第３の判別手段により、前記第２の判別手段により動画の明度が中間値（Ｍ）より暗いと判別されたとき、更に動画の明度が中間値（Ｍ）と閾値（ＴＨ２）（ただしＭ＞ＴＨ２）との間であるか否かを判別し、明度調整手段により、前記第２の判別手段により動画の明度が中間値（Ｍ）と閾値（ＴＨ１）との間にあると判別されたとき、動画の明度を所定量増加させ、前記第３の判別手段により動画の明度が中間値（Ｍ）と閾値（ＴＨ２）との間にあると判別されたとき、動画の明度を所定量減少させ、表示制御手段により、前記明度調整手段により明度が調整された動画上に前記テキストを重ねて表示するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【０１０５】
また、請求項２記載の発明によれば、前記第１の判別手段、前記第２の判別手段、前記第３の判別手段、および前記明度調整手段により、前記動画の明度判別と明度の増加または減少を、フレーム毎に行ない、リアルタイムで動画の明度を調整するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【０１０６】
また、請求項３記載の発明によれば、文字色設定手段により、前記第１の判別手段、前記第２の判別手段、または、前記第３の判別手段による動画の明度の判別結果に基づいて、画面に重ねて表示するテキストの文字色を設定するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【０１０７】
また、請求項４記載の発明によれば、前記文字色設定手段は、動画の明度を減少させたとき、テキストの文字色を白に設定するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【０１０８】
また、請求項５記載の発明によれば、前記動画の明度をＢ、明度の中間値をＭ、所定の明るい側の閾値をＴＨ１、所定の暗い側の閾値をＴＨ２としたとき、動画のフレーム単位で、ＴＨ１＞Ｂ＞Ｍであるときは明度Ｍを増加し、Ｍ＞Ｂ＞ＴＨ２であるときは、明度Ｍを減少し、前記明度Ｍが調整された動画上に前記テキストを重ねて表示するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【０１０９】
また、請求項６記載の発明によれば、前記Ｍ＞Ｂ＞ＴＨ２であるときは、更にテキストの文字色を白に設定するようにしたので、動画とテキストとを、より容易に確認することができるという利点が得られる。
【図面の簡単な説明】
【図１】本発明の第１実施形態による携帯電話システムの構成を示すブロック図である。
【図２】携帯電話１ａ，１ｂの外観図（開状態：正面図および背面図）である。
【図３】携帯電話１ａ，１ｂの構成を示すブロック図である。
【図４】本第１実施形態による携帯電話１ａ，１ｂのＲＯＭ２４のメモリエリアの構成を示す概念図である。
【図５】本第１実施形態による携帯電話１ａ，１ｂのＲＡＭ２９のメモリエリアの構成を示す概念図である。
【図６】本第１実施形態による携帯電話のメール作成時の動作を説明するためのフローチャートである。
【図７】本第１実施形態による携帯電話のメール作成時の動作を説明するためのフローチャートである。
【図８】本第１実施形態による携帯電話のメール作成時の動作を説明するためのフローチャートである。
【図９】本第１実施形態による携帯電話のメール作成時の動作を説明するためのフローチャートである。
【図１０】メールに添付される音声ファイル再生例やメール本文の表示例を示す模式図である。
【図１１】本第１実施形態による受信メール表示処理を説明するためのフローチャートである。
【図１２】表示部１３に表示される受信リストの表示例を示す模式図である。
【図１３】音声ファイルが添付されているメールの表示例を示す模式図である。
【図１４】本第２実施形態による携帯電話１ａ，１ｂのＲＯＭ２４のメモリエリアの構成を示す概念図である。
【図１５】本第２実施形態による携帯電話１ａ，１ｂのＲＡＭ２９のメモリエリアの構成を示す概念図である。
【図１６】本第２の実施形態による動画拡大方式を説明するための概念図である。
【図１７】本第２実施形態による動画拡大方式例を説明するための概念図である。
【図１８】動画拡大の方式の一例を説明するための概念図である。
【図１９】表示部１３に表示される受信リストの表示例を示す模式図である。
【図２０】本第２実施形態による携帯電話において着信時の動作を説明するためのフローチャートである。
【図２１】本第２実施形態による携帯電話において通話時の動画および音声認識した音声のテキストの表示例を示す模式図である。
【図２２】本第２実施形態による携帯電話において通話時に相手側から送信されてくる動画の再生動作について説明するためのフローチャートである。
【図２３】本第２実施形態による動画への音声認識によるテキストの重ね表示時の動作を説明するためのフローチャートである。
【図２４】動画の明度調整について説明するための概念図である。また、
【図２５】明度調整された動画に音声認識によるテキストを重ねて表示した状態を示す模式図である。
【符号の説明】
１ａ，１ｂ携帯電話
２無線基地局
３通信サービス事業者
３２Ｗｅｂサーバ
３３メールサーバ
３４交換機
３５ルータ
４公衆回線網
５ＷＷＷ
６インターネットプロバイダ
６２Ｗｅｂサーバ
６３メールサーバ
６５ルータ
７パーソナルコンピュータ
１１アンテナ
１２スピーカ（音声出力手段）
１３表示部（表示手段）
１３８ファンクションキー
１３９ファンクションキー
１４０ファンクションキー
１４キー入力部
１４１メールキー
１４２アドレスキー
１４３ファンクションキー
１４４テンキー
１４５シャッターキー
１５マイク
１６サブ表示部
１７背面キー
１７１ＬＥＤ
１８撮像レンズ
１９報知スピーカ
２０無線送受信部
２１無線信号処理部
２２制御部（メール作成手段、拡大手段、画像処理手段）
２３画像メモリ
２４ＦｌａｓｈＲＯＭ
２５，２６ドライバ
２７加入者情報記憶部
２８システムＲＯＭ
２９ＲＡＭ（記憶手段、動画記憶手段）
３０音声認識部（音声認識手段）
１８１撮像モジュール
１８２ＤＳＰ
１９２ドライバ
２００音声信号処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a mobile phone, a voice-added document display method, a call voice display method, and a mobile phone display method.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a mobile phone has a voice recording / playback function, and a transmission side attaches a recorded voice file to a mail and transmits it, and a reception side plays and outputs the voice attached to the mail. is there. In recent years, some mobile phones transmit and receive moving images during a call due to an improvement in data transfer speed. In this case, the user listens to the call voice from the speaker in a state where the mobile phone body is placed on the ear, but in this state, the user cannot see the moving image. Therefore, the mobile phone main body is gripped in a face-to-face state, the call voice is heard using an earphone or the like, and a moving image is viewed on the display unit of the face-to-face mobile phone.
[0003]
By the way, when playing and outputting the voice attached to an email, or when outputting a call voice while playing a video during a call, if the voice is output by a speaker, the ambient noise is high, etc. When it is difficult to hear, or when the surroundings are quiet, there is a problem of causing trouble around. Although it is conceivable to listen to the sound using an earphone or the like, there is a problem that it takes time and effort to take out and connect the earphone. Therefore, if the voice can be recognized and converted into text and displayed as text characters, it is considered that convenience for the user is improved.
[0004]
For example, as a conventional technique for recognizing sound, in order to search for desired image data from a lot of image data, sound that can be understood is recorded for each image data and associated with each image data. A technique for searching for desired image data by reproducing the sound at a variable speed has been proposed (see, for example, Patent Document 1). In Patent Document 1, it is possible to convert the voice into text by voice recognition and sort the image data using the text as an index of the image data.
[0005]
[Patent Document 1]
JP 2002-41529 A
[0006]
[Problems to be solved by the invention]
The above-described prior art disclosed in Patent Document 1 discloses a speech recognition technology that converts speech data into text, a technology that performs sorting based on speech-recognized text, and the like. However, even if the voice attached to an e-mail on a mobile phone or the video and the voice of a call during a call is converted to text by voice recognition according to the conventional technology, a limited area (such as a mobile phone) ( There is no description of how the image is displayed on the (resolution) display section.
[0007]
Therefore, the present invention can more easily confirm the voice data attached to the e-mail, and can more easily confirm the call voice during the call and the video transmitted during the call. It is an object of the present invention to provide a mobile phone, a voice-attached document display method, a call voice display method, and a mobile phone display method.
[0008]
[Means for Solving the Problems]
To achieve the above object, the invention of claim 1 Is a first determination unit for determining whether or not the brightness of a moving image transmitted from the other party is brighter than an intermediate value (M) in a mobile phone that displays a moving image while displaying text on the screen; When the brightness of the moving image is determined to be brighter than the intermediate value (M) by the first determining means, the brightness of the moving image is further between the intermediate value (M) and the threshold value (TH1) (where M <TH1). And when the brightness of the moving image is determined to be darker than the intermediate value (M) by the second determining means and the second determining means, the brightness of the moving image is further set to the intermediate value (M) and a threshold value ( TH2) (however, M> TH2) and a third determining means for determining whether or not the brightness of the moving image is between the intermediate value (M) and the threshold value (TH1) by the second determining means. When it is determined that the brightness of the moving image is increased by a predetermined amount, When the brightness of the moving image is determined to be between the intermediate value (M) and the threshold (TH2) by the means, the brightness is adjusted by the brightness adjusting means for reducing the brightness of the moving image by a predetermined amount, and the brightness adjusting means. Display control means for displaying the text superimposed on the video; It is characterized by comprising.
[0023]
Moreover, as a preferable aspect, for example, a claim 2 As stated 1 The mobile phone according to claim 1, wherein the first discriminating means, the second discriminating means, The third discriminating means; The brightness adjustment means may perform brightness determination of the moving image and increase or decrease of the brightness for each frame to adjust the brightness of the moving image in real time.
[0024]
Moreover, as a preferable aspect, for example, a claim 3 Claim 1 Or 2 The mobile phone according to claim 1, wherein the first discrimination means , Said second discrimination means Or the third discriminating means. Further includes character color setting means for setting the character color of the text to be displayed on the screen based on the determination result of the brightness of the moving image. You may do .
[0025]
Moreover, as a preferable aspect, for example, a claim 4 As stated 3 In the mobile phone described above, the character color setting means may set the character color of the text to white when the brightness of the moving image is reduced.
[0026]
In order to achieve the above object, a mobile phone display method according to the invention described in claim 5 is a mobile phone display method in which a text is displayed on a screen while displaying a moving image. Said When the brightness of the moving image is B, the intermediate value of the brightness is M, the predetermined bright side threshold value is TH1, and the predetermined dark side threshold value is TH2, the lightness level is TH1>B> M in units of moving image frames. Increase M Steps to do , When M>B> TH2, decrease brightness M Steps to do , And displaying the text superimposed on the moving image with the lightness M adjusted. It is characterized by that.
[0027]
Moreover, as a preferable aspect, for example, a claim 6 As stated 5 In the mobile phone display method described above, when M>B> TH2, the text color of the text is further set to white. Further includes steps You may do it.
[0029]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described as an example applied to a mobile phone with reference to the drawings.
[0030]
A. First embodiment
A-1. Configuration of the first embodiment
FIG. 1 is a block diagram showing the configuration of the mobile phone system according to the first embodiment of the present invention. In the figure, mobile phones (wireless communication terminals / data communication devices) 1a and 1b have a voice recording function, and a function of transmitting a mail attached with recorded voice data to a system (particularly, mail servers 33 and 63 described later). , And the function to receive mail with voice. When creating a mail with voice, the mobile phones 1a and 1b recognize the voice attached to the mail body and convert it into text data so that the mail body and text data can be confirmed on a single screen. . In addition, when a moving image is transmitted from the other party during a call, the mobile phones 1a and 1b recognize the call voice and convert it into text data so that the moving image and the text data can be confirmed on the same screen. .
[0031]
Further, at the time of moving image reproduction, a 96 × 80 pixel moving image can be expanded to a maximum display area of 128 × 96 pixels. The enlargement method includes a method of simply enlarging the periphery of the movie, a method of increasing the enlargement factor linearly as it goes around the movie, and a method of increasing the enlargement factor exponentially as it goes from the center to the periphery. However, in any case, since the visibility of the moving image is not deteriorated by enlargement, the central portion remains at 1 ×. The moving image enlargement algorithm uses a well-known technique and will not be described.
[0032]
The radio base stations 2 and 2 are connected via a public network 4 to a communication service provider (including an Internet provider) 3 to which a user of the mobile phone 1a or 1b subscribes. A communication service provider (including an Internet provider) 3 is connected to a WWW 5 described later (Web server 32, router 35) in addition to the exchange 34 required for the mobile phone service provided as the main service. And a mail system (mail server 33). A function for connecting the mobile phones 1a and 1b to the WWW 5 using the wireless base station 2 as an AP (access point) is also provided.
[0033]
The public line network 4 is an analog or digital telephone line network. WWW5 is the so-called Internet. An Internet service provider (hereinafter referred to as ISP) 6 has the same configuration as that of the communication service provider 3 except for an exchange, and is a system (Web server 62, router 65 for convenience), a mail system (for convenience) A mail server 63). The personal computer 7 has a function of connecting to the WWW 5 via the public line 4 and ISP 6 and transmitting / receiving mails.
[0034]
As a feature of the present embodiment, the mobile phones 1a and 1b display the mail with voice and the mail text and the text converted from the voice on a single screen (details will be described later), but from the mobile phones 1a and 1b. When a mail with sound is transmitted to the personal computer 7, the personal computer 7 handles the sound as an attached file. That is, the “mail software program with voice” according to the present embodiment is compatible with generally known mail software.
[0035]
Next, FIGS. 2A and 2B are external views (open state: front view and rear view) of the mobile phones 1a and 1b. The cellular phones 1a and 1b in the present embodiment have a two-fold structure including a lid portion and a main body portion. The antenna 11 is provided on the back surface of the lid, and is extendable. The speaker 12 is provided on the front side of the lid, and performs audio output. The display unit (main display unit) 13 is a color liquid crystal and has, for example, 120 dots (width) × 160 dots (height). The display unit 13 can display the voice (text) of the mail with voice and the mail body, the call voice (text) and the moving image on the same screen.
[0036]
The key operation unit 14 is provided on the front surface of the main body, and includes various function keys (email key 141, address key 142, function key 143), numeric key 144, shutter key 145, and the like. The mail key 141 is for starting a mail function and displaying a mail menu. The address key 142 is for opening an address book used when selecting a destination mail address. The function key 143 is for instructing to play or pause a moving image when creating a voice mail or checking a voice mail. The numeric keypad 144 is used when inputting a telephone number or characters. The shutter key 145 shoots a still image by pressing detection in the imaging mode, or shoots a moving image by pressing maintenance detection for a predetermined second (for example, 2 seconds).
[0037]
The microphone 15 is provided at the lower part of the main body, and performs voice input during a call and voice input during recording. The sub display unit 16 is provided on the back surface of the lid. The back key 17 is made of a transparent or translucent member, and incorporates an LED 171 that emits light when an incoming call is received. The imaging lens 18 is provided on the back surface of the lid portion 2 and below the sub display portion 16. The notification speaker 19 notifies an incoming call or the like, and is arranged on the back surface of the main body so that a notification sound can be heard even when the lid is closed to the main body.
[0038]
Next, FIG. 3 is a block diagram showing the configuration of the mobile phones 1a and 1b. The wireless transmission / reception unit 20 wirelessly transmits and receives voice and data (mail data) via the antenna 11 to modulate / demodulate. The wireless signal processor 21 demodulates the voice and data (mail data) received by the wireless transmitter / receiver 20 or processes necessary for wireless communication such as modulating voice and data to be transmitted to the wireless transmitter / receiver 20. To do. The control unit 22 controls various operations and overall operations.
[0039]
The image memory 23 is an image file captured by the imaging unit (imaging lens 18, imaging module 181, DSP 182) and compressed and encoded by a program stored in the image processing program area 2413, or an image file downloaded via WWW 5. Is a memory for storing. The ROM 24 is composed of a rewritable flash ROM, and stores various programs to be described later, which are features of the present invention.
[0040]
The driver 25 drives the display unit 13. The driver 26 drives the sub display unit 16. The subscriber information storage unit 27 stores profile data such as a telephone number for calling the mobile phone 1 and an operator (subscriber) ID. The ROM 28 stores various programs for controlling the control unit 18. The RAM 29 stores various data necessary as a wireless communication terminal, stores data necessary for the operation of the control unit 25, and stores mail data. In particular, in the present embodiment, the RAM 29 has a storage area for buffering a moving image being shot once in the shooting mode.
[0041]
The audio signal processing unit 200 encodes the audio signal input from the microphone 15 or decodes the audio signal based on the signal output from the radio signal processing unit 21 to drive the speaker 12 and outputs audio. The imaging module 181 is configured by a CCD or a CMOS, and captures a color image. The DSP 182 encodes the image captured by the imaging module 181. The notification device 192 is a driver for driving the notification sound speaker 19, the vibrator 191, and the LED 171. The voice recognition processing unit 30 recognizes voice data attached to an e-mail or the voice of the other party during a call and converts it into text data.
[0042]
Next, FIG. 4 is a conceptual diagram showing the configuration of the memory area of the ROM 24 of the cellular phones 1a and 1b according to the first embodiment. The ROM 24 stores a software program that is a feature of the present embodiment. The ROM 24 includes a mail software program area 2411, a mail software program area 2412 with voice, a voice recognition program area 2413, and other program areas 2415.
[0043]
The mail software program area 2411 stores known mail software programs. The voice-attached mail software program area 2412 stores a voice-added mail software program for creating a voice-added mail and browsing the received voice-added mail. Note that the voice mail software program may be downloaded from the WWW 5 via the ISP 6.
[0044]
The voice recognition program area 2413 recognizes the voice attached to the mail and converts it into text. The audio data is encoded in a predetermined format (for example, WAV format, MP3 format, PCM format, etc.). The other program area 2414 stores application programs other than the above.
[0045]
Next, FIG. 5 is a conceptual diagram showing the configuration of the memory area of the RAM 29 of the mobile phones 1a and 1b according to the first embodiment. The RAM 29 includes an address book data storage area 291, a mail data storage area 292, an attached file storage area 293, other work memory 294, and an audio data buffer area 295. The address book data storage area 291 stores a plurality of names, telephone numbers, mail addresses, etc. as a set of records. The mail data storage area 292 stores mail data created using mail software and received mail data. When a file is attached to the mail, the attached file storage area 293 stores the file. Others The work memory 294 stores, for example, various data as other work memories. The audio data buffer area 295 is a storage area for once buffering the audio being recorded.
[0046]
A-2. Operation of the first embodiment
Next, the operation of the mobile phone according to the first embodiment described above will be described.
(1) Mail creation
Next, the operation at the time of mail creation according to the first embodiment will be described. Here, FIG. 6 to FIG. 9 are flowcharts for explaining the operation at the time of mail creation of the mobile phone according to the present embodiment. FIGS. 10A and 10B are schematic diagrams showing an example of reproducing an audio file attached to an e-mail and an example of displaying an e-mail text.
[0047]
When the mail creation mode is selected by the user, a mail menu is displayed (step S40). Next, it is determined whether new creation is selected from the mail menu (step S42). If new creation is not selected, it is determined whether or not an inbox is selected (step S44). If no inbox is selected, the process proceeds to other processing (step S46). Note that the processing for displaying the received mail when the inbox is selected, that is, the received mail will be described later.
[0048]
On the other hand, when new creation is selected from the mail menu, the creation menu is displayed (step S48). Next, it is determined whether or not a mail with voice data is selected from the mail menu (step S50). If the mail with voice data is not selected, the process proceeds to normal mail processing (step S54).
[0049]
On the other hand, when the mail with voice data is selected from the mail menu, the mail program with voice data and the voice recognition program are loaded (step S56), and the creation screen is displayed (step S60).
[0050]
Next, it is determined whether any item of “voice selection” or “mail creation (edit)” has been selected on the creation screen (step S62). Here, when “select voice” is selected, first, a voice file is searched (step S64). Then, it is determined whether or not there is an audio file (step S66). If there is no audio file, the audio is input and recorded (step S68), and the recorded audio is saved as an audio file (step S70), and step S64. Returning to the voice file search, the above-described processing is repeated.
[0051]
On the other hand, if there is an audio file, the audio file is displayed as a list (step S72). Next, it is determined whether any audio file is selected from the list display (step S74). If not selected, the process returns to step S72 and the list display is continued.
[0052]
On the other hand, when an audio file is selected, the selected audio is reproduced and output as shown in FIG. 10A (step S76), and the audio recognition processing unit 30 recognizes the audio data and converts it into text data. (Step S78) The recognized text data is displayed on the display unit 13 (Step S80). Thereafter, the process returns to the creation screen in step S60 and the above-described processing is repeated.
[0053]
If “mail creation / editing” is selected on the creation screen, first, a screen for performing each item / editing / display is displayed (step S82 in FIG. 8). On the screen, the user inputs a mail title, a mail sender name, a mail destination, a mail text, and the like. Next, it is determined whether or not the function key 143 for instructing “play” has been operated (step S84), and further, it is determined whether or not “decision” has been detected (step S86). If neither the operation of the function key 143 for instructing “reproduction” nor “determination” is performed, the process returns to step S82, and the screen display of “each item / edit / display” is continued.
[0054]
On the other hand, when the function key 143 for instructing “play” is operated, it is determined whether or not the voice to be attached to the mail is selected (step S88). If the voice is selected, as shown in FIG. 10B, the selected voice is reproduced and output (step S90), and the voice recognition processing unit 30 recognizes the voice data and converts it into text data. Conversion is performed (step S92), and the text data recognized as the mail body is displayed on the display unit 13 (step S94). Then, it returns to step S82 and repeats the process mentioned above.
[0055]
In addition, when each item / edit / display screen is displayed in step S82, if no sound is selected and the function key is operated in step S84, "No sound is selected." Is displayed (step S106 in FIG. 9). Next, it is determined whether or not a predetermined time has passed (step S108). If the predetermined time has not passed, the process returns to step S106 and the message display is maintained. On the other hand, when the message display has elapsed for a predetermined time, the message is deleted (step S110). Thereafter, the process returns to step S60 in FIG. 6, and the above-described processing is repeated.
[0056]
In step S82, when each item / edit / display screen is displayed, if a "decision" operation is detected, it is determined whether there is an unfilled item or no sound is selected (step S82). Step S110). If there is an unfilled item or no audio file is selected, a confirmation display for the user is performed (step S112), and it is determined whether or not “OK” is selected (step S114). If “OK” is not selected, that is, if there is an unfilled item or no audio file is selected, the process returns to step S60 shown in FIG. 6 and proceeds to display of the creation screen. Thereafter, the user completes the mail by inputting an unfilled item or selecting an audio file attached to the mail.
[0057]
On the other hand, when “OK” is selected in the confirmation display to the user, it means that there may be an unfilled item, or the voice file may not be selected. The data is stored in the box (step S116), the display returns to the mail menu display in step S40 shown in FIG. 6, and the above-described processing is repeated. If there is no unfilled item or a voice file is selected, the mail with the voice file attached is stored in the transmission box without confirmation (step S116), and step S40 shown in FIG. 6 is performed. Return to the mail menu display and repeat the process described above. The mail-attached mail stored in the transmission box is transmitted at a predetermined timing.
[0058]
(2) Received mail display processing
Next, FIG. 11 is a flowchart for explaining the received mail display process according to the first embodiment. The flowchart shown in FIG. 11 is a branch from “YES” in step S44 shown in FIG.
[0059]
When the reception folder is selected, the reception list is displayed to display a list of received mails (step S200). Here, FIG. 12 is a schematic diagram illustrating a display example of a reception list displayed on the display unit 13. An icon 1301 represents the amount of charge of the battery. An icon 1302 represents a radio wave reception state. In the reception list, the icons 1303 to 1306 can determine whether the mail has been read or not. That is, the icon 1303 represents an unread mail. An icon 1304 represents a mail with unread voice. An icon 1305 represents a read mail. An icon 1306 represents a mail with read voice. Here, the user selects and operates a mail to be displayed from the reception list.
[0060]
Next, it is determined whether or not a mail with voice is selected (step S202). If the mail with voice is selected, the mail software program with voice and the voice recognition program are loaded (step S204), the voice recognition processing unit 30 recognizes the voice data and converts it into text data (step S206). As shown in FIG. 13, the mail text and the recognized text data are displayed on the display unit 13 (step S208). Next, it is determined whether or not a cancel operation has been detected (step S210). If a cancel operation has been detected, the process returns to step S200 and shifts to display of a reception list.
[0061]
On the other hand, if the cancel operation is not detected, it is determined whether or not the function key 143 has been operated (step S212). If the function key 143 is not operated, the process returns to step S206 to continue displaying the mail body and the speech-recognized text.
[0062]
On the other hand, when the function key 143 is operated, as shown in FIG. 13, the audio file attached to the mail is reproduced and output (step S214). Next, it is determined again whether or not the function key 143 has been operated (step S216). If the function key 143 is not operated, the process returns to step S212 to repeat the audio reproduction output. On the other hand, if the function key 143 is operated after the voice is reproduced and output, the process returns to step S208 to display the mail body and the voice-recognized text as shown in FIG.
[0063]
In the first embodiment described above, when a voice file is attached to the mail, the voice is converted into text data by voice recognition and displayed on the display unit 13 together with the mail body. An audio file can be easily recognized.
[0064]
B. Second embodiment
Next, a second embodiment of the present invention will be described. In the second embodiment, in a camera-equipped mobile phone, when a moving image (or still image) is transmitted during a call, the call voice (the other party) is converted into text data by voice recognition, and the reproduced video is displayed. The above text is displayed on top of each other. Also, at this time, the moving image is reproduced and displayed in the same size (96 × 80 pixels), and the text data is displayed superimposed on the moving image, or the moving image is enlarged and displayed to 128 × 96 pixels. It is possible to display text data side by side.
[0065]
B-1. Configuration of the second embodiment
Since the configuration of the mobile phone according to the second embodiment is the same as that of FIG. 3 described in the first embodiment, the description thereof is omitted.
[0066]
FIG. 14 is a conceptual diagram showing the configuration of the memory area of the ROM 24 of the cellular phones 1a and 1b according to the second embodiment. The ROM 24 stores a software program that is a feature of the present embodiment. The ROM 24 includes a moving image reproduction program area 2411, a voice recognition program area 2413, an image processing program area 2415, and other program areas 2414.
[0067]
The moving image reproduction program area 2411 stores a moving image reproduction program for reproducing a moving image transmitted during a call. The voice recognition program area 2413 recognizes the voice attached to the mail and converts it into text. The audio data is encoded in a predetermined format (for example, WAV format, MP3 format, PCM format, etc.).
[0068]
The image processing program area 2415 compresses moving image data captured and digitally encoded and buffered in the RAM 29 by the image pickup unit (the image pickup lens 18, the image pickup module 181, and the DSP 182) by a coding compression process based on MPEG-4. Stores an image processing program for filing. Further, the image processing program displays a moving image received from the outside or filed on its own device by expanding the peripheral portion of the moving image to a display size of 128 × 96 pixels as a whole at the time of reproduction. It has become. The other program area 2414 stores application programs other than the above.
[0069]
FIG. 15 is a conceptual diagram showing the configuration of the memory area of the RAM 29 of the mobile phones 1a and 1b according to the second embodiment. The RAM 29 includes a work memory 294, an audio data buffer area 295, an enlarged data area 296, and a display buffer 297. Others The work memory 294 is used as other work memory, for example, as a frame memory (for three frames) in the decoding process described above, or stores various data. The audio data buffer area 295 is a storage area for once buffering the audio being recorded. The enlarged data area 295 stores partially enlarged moving image data. The display buffer 296 is used as a buffer when displaying a moving image.
[0070]
Next, FIG. 16 is a conceptual diagram for explaining a moving image enlargement method according to the second embodiment. In the second embodiment, the display size of still images is 128 × 96 pixels, and the display size of moving images is usually 96 × 80 pixels. However, in the second embodiment, in order to improve the visibility of a moving image, it can be displayed in an enlarged size of 128 × 96 pixels by a predetermined enlargement method. More specifically, as shown in the figure, the peripheral part (the hatched part in the figure) of the moving picture of 96 × 80 pixels is partially enlarged according to a predetermined moving picture enlargement method (described later), and the display area is 128 × 96 pixels as a whole. It is possible to play with. When the moving image is stopped (temporarily stopped), the entire image is uniformly enlarged and displayed as a still image with 128 × 96 pixels (Sub-QCIF size).
[0071]
Next, FIG. 17 is a conceptual diagram for explaining an example of a moving image enlargement method according to the second embodiment. In FIG. 17 (a), the central part is 1 time and the periphery is 3 times, and the system is simply enlarged. In FIG. 17B, the enlargement ratio is linearly increased toward the periphery of the moving image. In FIG. 17C, the enlargement ratio is exponentially increased from the center to the periphery. In any case, since the visibility of the moving image is not deteriorated by the enlargement, the enlargement ratio in the central portion remains one time. Which moving image enlargement method is used may be determined in advance or may be selectable by the user.
[0072]
Next, FIGS. 18 and 19 are conceptual diagrams for explaining an example of a moving image enlargement method. When the moving image is MPEG-4, when decoding, it is necessary to refer to the previous and subsequent frames as shown in FIG. 18, and therefore a memory area for three frames is usually required for the buffer. As an example of the partial enlargement method, as shown in FIG. 19, an interpolation method in which interpolation is performed by embedding data of the pixel of interest in adjacent pixels may be used. Usually, an image process of applying a filter process to the interpolated data is added.
[0073]
B-2. Operation of the second embodiment
Next, the operation of the mobile phone according to the second embodiment will be described.
(1) Incoming call operation
FIG. 20 is a flowchart for explaining the operation at the time of an incoming call in the mobile phone according to the second embodiment. First, it is determined whether or not an incoming call is received in the standby state (step S300). When an incoming call is received, it is determined whether a moving image has been received from the other party (step S302). If no moving image has been received, the process proceeds to normal call processing (step S304).
[0074]
On the other hand, when a moving image is received, a moving image reproduction program, a voice recognition program, and an image processing program are loaded (step S306). Next, the received moving image is reproduced by the moving image reproduction program (step S308). Details of the video playback will be described later. In moving image playback, playback is performed at 96 × 80 pixels according to the operation of a predetermined function key, or is partially expanded to 128 × 96 pixels according to any of the enlargement methods shown in FIGS. To play. The flag F is “1” when partially enlarged and reproduced at 128 × 96 pixels, and the flag F is “0” when normally reproduced at 96 × 80 pixels.
[0075]
Next, the call voice from the other party is recognized and converted into text data (S310), and it is determined whether or not the flag F is “1” (step S312). When the flag F is “1”, that is, when partial enlarged playback is selected, as shown in FIG. 21A, the text is voice-recognized with the background of the video partially enlarged to 128 × 96 pixels. Are superimposed and displayed (step S314). On the other hand, when the flag F is “0”, that is, when normal playback is selected, as shown in FIG. 21B, text that has been voice-recognized in parallel for a 96 × 80 pixel moving image. Is displayed (step S316).
[0076]
In any case, it is next determined whether or not the phone has been off-hooked (call ended) (S318). If the call is continued, the process returns to step S308 to reproduce the video, and the voice-recognized text. Continue to display. On the other hand, if it is off-hook, the line is disconnected and the process is terminated (step S320).
[0077]
(2) Movie playback
Next, the above-described moving image reproduction process will be described. Here, FIG. 22 is a flowchart for explaining the reproduction operation of a moving image transmitted from the other party during a call in the mobile phone according to the second embodiment. In the above-described incoming operation, when a moving image is received during a call, the processing is executed as follows in the reproduction of the moving image in step S308 of FIG.
[0078]
First, the received moving image data is buffered (step S160), decoded (step S162), and it is determined whether or not the flag F is “1”, that is, whether or not partial enlarged display is performed (step S164). When the flag F is “1”, the partial enlargement process is executed according to any of the enlargement methods shown in FIGS. 17A to 17C (step S166). Then, the 128 × 96 pixel moving image subjected to the partial enlargement process is stored in the LCD display buffer (step S168). In this case, as shown in FIG. 16, the peripheral part of 96 × 80 pixels is enlarged (shaded portion) and reproduced as a moving picture of 128 × 96 pixels, and as shown in FIG. The voice-recognized text is superimposed on the reproduced video.
[0079]
On the other hand, when the flag F is not “1”, the moving image is stored in the display buffer of the LCD with the size of 96 × 80 pixels without being partially enlarged (step S168). In this case, it is reproduced as a moving image of 96 × 80 pixels, and as shown in FIG. 21 (b) described above, text that has been voice-recognized in parallel with the moving image is displayed.
[0080]
Next, it is determined whether or not a predetermined function key (partial enlargement ← → normal selection key) has been pressed (step S170). If the predetermined function key is not pressed, the process returns to FIG. 20 and proceeds to step S310 and subsequent steps.
[0081]
On the other hand, if a predetermined function key is pressed, it is determined whether or not the flag F is “1” (step S172). If the flag F is “1”, the flag F is set to “0” (step S172). S174) When the flag F is “0”, the flag F is set to “1” (step S176). That is, when a predetermined function key (partial enlargement ← → normal selection key) is pressed during video playback (= during a call), partial enlargement playback and normal size playback are executed alternately. It will be. Then, returning to FIG. 20, the process proceeds to step S310 and subsequent steps.
[0082]
(3) Text overlay display processing by voice recognition on video
Next, an operation at the time of displaying text superimposed by voice recognition on a moving image according to the second embodiment will be described. Here, FIG. 23 is a flowchart for explaining the operation at the time of displaying text superimposed by voice recognition on a moving image according to the second embodiment. At the time of the above-mentioned call, in the superimposed display of the text by the voice recognition on the reproduced moving image in step S314, the process is executed as follows.
[0083]
First, the brightness of a moving image (frame) transmitted from the other party is determined (step S180). Then, it is determined whether or not the brightness of the moving image is the intermediate value M (step S182). If the brightness of the moving image is the intermediate value M, the brightness of the moving image is increased by a predetermined amount (step S188).
[0084]
On the other hand, if the brightness of the moving image is not the intermediate value M, it is determined whether or not it is brighter than the intermediate value M (step S184). If it is brighter than the intermediate value M, the intermediate value M and the threshold value TH1 (intermediate value M <threshold value) are further determined. It is determined whether it is between (TH1) (step S186). If the brightness of the moving image is between the intermediate value M and the threshold value TH1, the brightness of the moving image is increased by a predetermined amount, that is, brightened (step S188).
[0085]
On the other hand, if the brightness of the moving image is darker than the intermediate value M, it is further determined whether or not it is between the intermediate value M and the threshold value TH2 (intermediate value M> threshold value TH2) (step S190). If the brightness of the moving image is between the intermediate value M and the threshold value TH2, the brightness of the moving image is reduced by a predetermined amount, that is, darkened (step S192). Then, the character color to be displayed (text by voice recognition) is set to white (step S194). If the brightness of the moving image is darker than the threshold value TH2, the character color (text by voice recognition) to be displayed as it is is set to white without adjusting the brightness (step S194).
[0086]
As described above, after adjusting the brightness of the moving image (or after setting the character color to white), the voice-recognized text and the moving image are superimposed and displayed on the display unit 13 (step S196). Then, returning to FIG. 20, the process proceeds to step S318 and subsequent steps.
[0087]
As described above, the second embodiment is characterized in that the brightness of the moving image displayed as the background of the text obtained by recognizing the call voice is adjusted by brightness determination. Here, FIG. 24 is a conceptual diagram for explaining brightness adjustment of a moving image. In FIG. 25, text by voice recognition is superimposed on a moving image whose brightness is adjusted when the brightness (brightness) of the moving image is between the intermediate value M or the intermediate value M and the threshold value TH1 (lightness 70%). Schematic diagram of the case of displaying and the case where the text by voice recognition is superimposed and displayed on the moving image whose brightness has been adjusted when the moving image brightness (brightness) is between the intermediate value M and the threshold value TH2 (brightness 30%). It is.
[0088]
As shown in FIG. 24, when the moving image brightness (brightness) is between the intermediate value M or the intermediate value M and the threshold value TH1 (brightness 70%), the processing is performed to increase the brightness of the moving image. Then, as shown in FIG. 25A, the text by voice recognition is displayed in an overlapping manner. On the other hand, as shown in FIG. 24, when the brightness (lightness) of the moving image is equal to or less than the intermediate value M and between the intermediate value M and the threshold value TH2 (lightness 30%), the brightness of the moving image is lowered. After processing, as shown in FIG. 25 (b), the text by voice recognition is superimposed and displayed.
[0089]
In the second embodiment described above, when a moving image is sent during a call, the moving image is reproduced, the other party's voice by the call is converted into text by voice recognition, and the text is superimposed on the reproduced moving image. Since they are displayed side by side, the visibility of the voice and video of the call can be improved at the same time.
[0090]
【The invention's effect】
According to invention of Claim 1, The first determining means determines whether the brightness of the moving image transmitted from the other party is brighter than the intermediate value (M), and the second determining means determines the brightness of the moving image by the first determining means. When it is determined that it is brighter than the intermediate value (M), it is further determined whether or not the brightness of the moving image is between the intermediate value (M) and the threshold value (TH1) (where M <TH1), and the third determination When the means determines that the brightness of the moving image is darker than the intermediate value (M), the brightness of the moving image further exceeds the intermediate value (M) and a threshold value (TH2) (where M> TH2). When the brightness adjustment means determines that the brightness of the moving image is between the intermediate value (M) and the threshold value (TH1), the brightness adjustment means determines the brightness of the moving image. Is increased by a predetermined amount, and the brightness of the moving image is set to an intermediate value (M) by the third determining means If it is determined to be between the value (TH2), the brightness of the video is reduced a predetermined amount, the display control unit, the brightness is displayed over the text on the adjusted video by the brightness adjusting unit Since it did in this way, the advantage that a moving image and a text can be confirmed more easily is acquired.
[0105]
According to a second aspect of the present invention, the first discriminating means, the second discriminating means, The third discriminating means; In addition, since the brightness adjustment unit performs brightness determination and brightness increase or decrease for each frame and adjusts the brightness of the movie in real time, it is possible to more easily check the movie and text. The advantage that it can be obtained.
[0106]
Claims 3 According to the described invention, the character color setting means causes the first color to be set. Discrimination means, Said second discrimination means Or the third discriminating means. Since the character color of the text to be displayed on the screen is set based on the determination result of the brightness of the moving image, the advantage that the moving image and the text can be confirmed more easily is obtained.
[0107]
Claims 4 According to the described invention, the character color setting unit sets the character color of the text to white when the brightness of the video is reduced. Can be confirmed more easily The advantage is obtained.
[0108]
Claims 5 According to the described invention, Said When the brightness of the moving image is B, the intermediate value of the brightness is M, the predetermined bright side threshold value is TH1, and the predetermined dark side threshold value is TH2, the brightness is set when TH1>B> M in the moving image frame unit. When M is increased and M>B> TH2, the brightness M is decreased, Since the text is superimposed on the moving image with the brightness M adjusted, the moving image and the text can be confirmed more easily. The advantage that it can be obtained.
[0109]
Claims 6 According to the described invention, when M>B> TH2, the character color of the text is further set to white, so that there is an advantage that the moving image and the text can be confirmed more easily. can get.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a mobile phone system according to a first embodiment of the present invention.
FIG. 2 is an external view (open state: front view and rear view) of the mobile phones 1a and 1b.
FIG. 3 is a block diagram showing a configuration of mobile phones 1a and 1b.
FIG. 4 is a conceptual diagram showing a configuration of a memory area of a ROM 24 of the cellular phones 1a and 1b according to the first embodiment.
FIG. 5 is a conceptual diagram showing a configuration of a memory area of a RAM 29 of the mobile phones 1a and 1b according to the first embodiment.
FIG. 6 is a flowchart for explaining an operation at the time of mail creation of the mobile phone according to the first embodiment;
FIG. 7 is a flowchart for explaining an operation at the time of mail creation of the mobile phone according to the first embodiment;
FIG. 8 is a flowchart for explaining an operation at the time of mail creation of the mobile phone according to the first embodiment;
FIG. 9 is a flowchart for explaining an operation at the time of mail creation of the mobile phone according to the first embodiment;
FIG. 10 is a schematic diagram showing an example of reproducing an audio file attached to an e-mail and an example of displaying an e-mail text.
FIG. 11 is a flowchart for explaining received mail display processing according to the first embodiment;
12 is a schematic diagram illustrating a display example of a reception list displayed on the display unit 13. FIG.
FIG. 13 is a schematic diagram showing a display example of a mail with an attached audio file.
FIG. 14 is a conceptual diagram showing a configuration of a memory area of a ROM 24 of mobile phones 1a and 1b according to the second embodiment.
FIG. 15 is a conceptual diagram showing a configuration of a memory area of a RAM 29 of the cellular phones 1a and 1b according to the second embodiment.
FIG. 16 is a conceptual diagram for explaining a moving image enlargement method according to the second embodiment;
FIG. 17 is a conceptual diagram for explaining an example of a moving image enlargement method according to the second embodiment.
FIG. 18 is a conceptual diagram for explaining an example of a moving image enlargement method;
FIG. 19 is a schematic diagram illustrating a display example of a reception list displayed on the display unit 13;
FIG. 20 is a flowchart for explaining an operation at the time of an incoming call in the mobile phone according to the second embodiment.
FIG. 21 is a schematic diagram showing a display example of a moving image during a call and voice-recognized voice text in the mobile phone according to the second embodiment.
FIG. 22 is a flowchart for explaining a reproduction operation of a moving image transmitted from the other party during a call in the mobile phone according to the second embodiment.
FIG. 23 is a flowchart for explaining an operation at the time of displaying text superimposed by voice recognition on a moving image according to the second embodiment;
FIG. 24 is a conceptual diagram for explaining brightness adjustment of a moving image. Also,
FIG. 25 is a schematic diagram showing a state in which text by voice recognition is superimposed and displayed on a moving image whose brightness has been adjusted.
[Explanation of symbols]
1a, 1b mobile phone
2 radio base stations
3 communication service providers
32 Web server
33 Mail server
34 exchange
35 routers
4 Public network
5 WWW
6 Internet providers
62 Web server
63 Mail server
65 routers
7 Personal computer
11 Antenna
12 Speaker (Audio output means)
13 Display section (display means)
138 Function key
139 Function key
140 function keys
14 Key input section
141 Mail key
142 Address key
143 function keys
144 Numeric keypad
145 Shutter key
15 microphone
16 Sub display section
17 Back key
171 LED
18 Imaging lens
19 Information speaker
20 Wireless transceiver
21 Radio signal processor
22 Control unit (mail creation means, enlargement means, image processing means)
23 Image memory
24 Flash ROM
25, 26 drivers
27 Subscriber information storage
28 System ROM
29 RAM (storage means, moving image storage means)
30 Voice recognition unit (voice recognition means)
181 Imaging module
182 DSP
192 drivers
200 Audio signal processor

Claims

For mobile phones that display video while overlaying text on the screen,
First discriminating means for discriminating whether or not the brightness of the moving image transmitted from the other side is brighter than the intermediate value (M);
When the brightness of the moving image is determined to be brighter than the intermediate value (M) by the first determining means, the brightness of the moving image is further between the intermediate value (M) and the threshold value (TH1) (where M <TH1). Second determining means for determining whether or not,
When the second determination means determines that the brightness of the moving image is darker than the intermediate value (M), the brightness of the moving image is further between the intermediate value (M) and a threshold value (TH2) (where M> TH2). Third determining means for determining whether or not,
When the brightness of the moving image is determined to be between the intermediate value (M) and the threshold value (TH1) by the second determining means, the brightness of the moving image is increased by a predetermined amount, and the moving image brightness is increased by the third determining means. when the brightness is determined to be between the intermediate value (M) with a threshold value (TH2), and brightness adjusting means Ru is decreased a predetermined amount the brightness of video,
A mobile phone comprising: display control means for displaying the text superimposed on a moving image whose brightness has been adjusted by the brightness adjustment means .

The first discriminating means, the second discriminating means, the third discriminating means, and the lightness adjusting means perform the lightness discrimination of the moving image and the increase or decrease of the lightness for each frame, and the phone of claim 1, wherein the adjusting the brightness.

Character color setting means for setting the character color of the text to be displayed on the screen based on the determination result of the brightness of the moving image by the first determination means , the second determination means , or the third determination means and further comprising the claims 1 or 2 mobile phone according.

The mobile phone according to claim 3, wherein the character color setting means sets the character color of the text to white when the brightness of the moving image is reduced.

In the display method of the mobile phone that displays text on the screen while displaying the video,
B brightness of the moving image,
M for the lightness intermediate value,
The predetermined bright side threshold is TH1,
When the predetermined dark side threshold is TH2,
In video frame units,
A step of increasing the brightness M when a TH1>B> M,
A step of reducing the brightness M when M>B> is TH2,
Displaying the text superimposed on the moving image with the brightness M adjusted;
Display method for a mobile phone, which comprises a.

6. The mobile phone display method according to claim 5 , further comprising the step of setting the text color of the text to white when M>B> TH2.