JP2005039451A

JP2005039451A - Imaging device, imaging method and program

Info

Publication number: JP2005039451A
Application number: JP2003199071A
Authority: JP
Inventors: Keiichi Kobayashi; 圭一小林
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-07-18
Filing date: 2003-07-18
Publication date: 2005-02-10
Anticipated expiration: 2023-07-18
Also published as: JP4366481B2

Abstract

<P>PROBLEM TO BE SOLVED: To realize a diversified use environments by effectively using a plurality of camera units. <P>SOLUTION: The imaging device is provided with: two camera units 16 and 19; microphones 18 and 20 provided corresponding to the camera units 16 and 19; and a control unit 35 which drives them at the same time to photograph moving pictures, compares sound signals obtained by the microphones 18 and 20 with each other, composites the images obtained by the camera parts 16 and 19 according to the comparison result and outputs the composite imsge. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばカメラ機能付きの携帯電話機等に好適な撮像装置、撮像方法及びプログラムに関する。
【０００２】
【従来の技術】
近時、カメラ機能付きの携帯電話機が広く一般に普及するに連れて、カメラ部の撮影方向が回転できるようにしたもの、筐体の異なる方向に向けて２つのカメラ部を設けたもの、カメラ部が折り畳み式ケーシングのヒンジ部軸方向に形成されて光学ズーム機構を有したものなど、様々な機種が製品化されている。
【０００３】
しかるに、２つのカメラ部を設けたものはいずれも、撮影対象に応じてその一方のみを選択的に出力するようになるもので、撮像素子を１つしか用いないものの、２つの撮影光路をプリズムやハーフミラー等の光学部材により選択的に切り換えることで、異なる方向の画像を撮影できるようにしたものも考えられている。（例えば、特許文献１参照。）
【０００４】
【特許文献１】
特開２００１−２２３９２４号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、上記２つのカメラ部、あるいは１つの撮影部に２つの撮影光路を持つものはいずれも、それらを撮影対象に応じて選択的に用いることで、１方向の画像を撮影するようにしたものであった。
【０００６】
本発明は上記のような実情に鑑みてなされたもので、その目的とするところは、複数のカメラ部をより有効に活用して多彩な使用環境を実現することが可能な撮像装置、撮像方法及びプログラムを提供することにある。
【０００７】
【課題を解決するための手段】
請求項１記載の発明は、複数の撮像手段と、これら複数の撮像手段それぞれに対応して設ける複数の音声入力手段と、上記複数の撮像手段及び複数の音声入力手段を同時駆動して動画を撮影する撮影駆動手段と、この撮影駆動手段で複数の音声入力手段の駆動により得た複数の音声信号を比較する比較手段と、この比較手段で得た比較結果に基づいて上記複数の撮像手段で得た画像信号を合成する画像合成手段と、この画像合成手段で得た画像信号を出力する出力手段とを具備したことを特徴とする。
【０００８】
このような構成とすれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力することができる。
【０００９】
請求項２記載の発明は、上記請求項１記載の発明において、複数の撮像手段と、これら複数の撮像手段それぞれに対応して設ける複数の音声入力手段と、上記複数の撮像手段及び複数の音声入力手段を同時駆動して動画を撮影する撮影駆動手段と、この撮影駆動手段で複数の音声入力手段の駆動により得た複数の音声信号を比較する比較手段と、この比較手段で得た比較結果に基づいて上記複数の撮像手段で得た画像信号から１つを選択する画像選択手段と、この画像選択手段で得た画像信号を出力する出力手段とを具備したことを特徴とする。
【００１０】
このような構成とすれば、例えばその時点で音声信号の音圧レベルの大きい側に対応した画像を選択して出力することにより、複数の人物を撮影している状態でその中の発言者の画像をその時々に応じて切り換えるなど、自然な形の合成画像を生成して出力することができる。
【００１１】
請求項３記載の発明は、複数の撮像手段と、これら複数の撮像手段それぞれに対応して設ける複数の音声入力手段と、上記複数の撮像手段及び複数の音声入力手段を同時駆動して動画を撮影する撮影駆動手段と、上記複数の撮像手段で得た画像信号を複数の撮像手段の位置関係を反映して合成する画像合成手段と、この画像合成手段で得た画像信号を出力する出力手段とを具備したことを特徴とする。
【００１２】
このような構成とすれば、例えば複数の人物を撮影している状態でその位置関係に対応して各人物を並べた合成画像を生成して出力するなど、自然な形の合成画像を生成して出力することができる。
【００１３】
請求項４記載の発明は、上記請求項１記載の発明において、上記合成手段は、より音圧レベルの高い音声信号に対応した画像中に、より音圧レベルの低い音声信号に対応した画像を嵌込み合成した画像信号を生成することを特徴とする。
【００１４】
このような構成とすれば、上記請求項１記載の発明の作用に加えて、その時点で音声信号の音圧レベルの大きい側に対応した画像を大きく、音圧レベルの小さい側に対応した画像を小さくしたピクチャ・イン・ピクチャの合成画像を生成して出力することにより、複数の人物を撮影している状態でその中の発言者の画像を他より大きくように、自然な形の合成画像を生成して出力することができる。
【００１５】
請求項５記載の発明は、上記請求項３記載の発明において、上記出力手段は、上記画像信号と共に上記複数の音声入力手段で得た複数の音声信号を、それぞれ対応する複数の撮像手段の位置関係を反映して分離した状態で出力することを特徴とする。
【００１６】
このような構成とすれば、上記請求項３記載の発明の作用に加えて、複数の画像信号の位置関係に対応して音声信号もステレオ化して出力するなど、画像の合成に対応する音声も出力することで、より自然な動画撮影内容を出力することができる。
【００１７】
請求項６記載の発明は、上記請求項１乃至３いずれかに記載の発明において、上記出力手段は、画像信号と共に上記複数の音声入力手段で得た複数の音声信号を出力することを特徴とする。
【００１８】
このような構成とすれば、上記請求項１乃至３いずれか記載の発明の作用に加えて、音声も合わせたきわめて自然な動画データを生成して出力することができる。
【００１９】
請求項７記載の発明は、上記請求項６記載の発明において、上記出力手段は、上記複数の音声信号をそれぞれ分離した状態で出力することを特徴とする。
【００２０】
このような構成とすれば、上記請求項６記載の発明の作用に加えて、画像に対応したステレオ音声を出力できるため、音像の定位をも明確とした、きわめて自然で品位の高い動画データを生成して出力することができる。
【００２１】
請求項８記載の発明は、請求項１または２記載の発明において、上記出力手段は、一定のタイムラグを考慮して時間的に遡った画像信号を出力することを特徴とする。
【００２２】
このような構成とすれば、上記請求項１または２記載の発明の作用に加えて、複数の音声信号の比較に応じた画像の出力に際してタイムラグを考慮して一定の時間を遡った画像を出力することができるため、簡単な処理ながら例えば撮影対象となる人物の発言当初の画像をより自然に生成して出力することができる。
【００２３】
請求項９記載の発明は、上記請求項１乃至８いずれかに記載の発明において、上記出力手段は、記録媒体に少なくとも画像信号を記録することを特徴とする。
【００２４】
このような構成とすれば、上記請求項１乃至８いずれかに記載の発明の作用に加えて、撮影により得た画像を記録媒体に記録するため、該記録媒体を用いて後の再生処理などに活用できる。
【００２５】
請求項１０記載の発明は、上記請求項１乃至８いずれかに記載の発明において、上記出力手段は、通信媒体に少なくとも画像信号を送信することを特徴とする。
【００２６】
このような構成とすれば、上記請求項１乃至８いずれかに記載の発明の作用に加えて、例えば画像を添付した電子メールの発信等に適用可能となるばかりでなく、同一機能を有する他の装置との通信によりテレビ電話などのようにリアルタイムで相互の画像を用いた通信にも適用することが可能となる。
【００２７】
請求項１１記載の発明は、複数の撮像部及びこれら複数の撮像部それぞれに対応して設けられた複数の音声入力部を同時駆動して動画を撮影する撮影駆動工程と、この撮影駆動工程で複数の音声入力部の駆動により得た複数の音声信号を比較する比較工程と、この比較工程で得た比較結果に基づいて上記複数の撮像部で得た画像信号を合成する画像合成工程と、この画像合成工程で得た画像信号を出力する出力工程とを有したことを特徴とする。
【００２８】
このような方法とすれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力させることができる。
【００２９】
請求項１２記載の発明は、複数の撮像部及びこれら複数の撮像部それぞれに対応して設けられた複数の音声入力部を設けた撮像装置が内蔵するコンピュータが実行するプログラムであって、上記複数の撮像部及び複数の音声入力部を同時駆動して動画を撮影する撮影駆動ステップと、この撮影駆動ステップで複数の音声入力部の駆動により得た複数の音声信号を比較する比較ステップと、この比較ステップで得た比較結果に基づいて上記複数の撮像部で得た画像信号を合成する画像合成ステップと、この画像合成ステップで得た画像信号を出力する出力ステップとをコンピュータに実行させることを特徴とする。
【００３０】
このようなプログラム内容とすれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力させることができる。
【００３１】
【発明の実施の形態】
（第１の実施の形態）
以下本発明をカメラ機能付きのＣＤＭＡ（ＣｏｄｅＤｉｖｉｓｉｏｎＭｕｌｔｉｐｌｅＡｃｃｅｓｓ：符号分割多元接続）方式の携帯電話機に適用した場合の第１の実施の形態について図面を参照して説明する。
【００３２】
図１（Ａ），（Ｂ）は、この第１の実施の形態に係る携帯電話機１０の外観構成を示すもので、ヒンジ部１１を介在して２つの筐体１２，１３が一体に構成された折りたたみ式となっており、図１（Ａ）が最大限に開いた状態の内面を、図１（Ｂ）が折りたたんだ状態の主として上部筐体１２の外面を示す。
【００３３】
図１（Ａ）に示すように上部筐体１２の内面には、受話器となるスピーカ１４、メイン表示部１５、及び第１カメラ部１６が備えられる。
【００３４】
一方、下部筐体１３の内面には、ダイヤルキー等を含む各種キー１７及び送話器となる第１マイクロホン１８が備えられる。
【００３５】
また、図１（Ｂ）に示すように、上部筐体１２の外面には、第２カメラ部１９、第２マイクロホン２０、高輝度ＬＥＤでなる撮影ライト２１、及びサブ表示部２２が備えられる。
【００３６】
さらに、下部筐体１３内にも延在されているアンテナ２３がヒンジ部１１外面側より突出形成される。
【００３７】
なお、ここでは図示しないが、下部筐体１３の外面側には、着信時のビープ音やメロディ等を拡声放音するための、上記スピーカ１４より大型の外面スピーカ２４を設ける。
【００３８】
図２は、上記携帯電話機１０の回路構成を示すものである。同図で、上記アンテナ２３は最寄りの基地局とＣＤＭＡ方式の通信を行ない、このアンテナ２３にＲＦ部３１を接続している。
【００３９】
このＲＦ部３１は、受信時にはアンテナ２３から入力された信号をデュプレクサで周波数軸上から分離し、ＰＬＬシンセサイザから出力される所定周波数の局部発振信号と混合することによりＩＦ信号に周波数変換し、さらに広帯域ＢＰＦで受信周波数チャネルのみを抽出し、ＡＧＣ増幅器で希望受信波の信号レベルを一定にしてから次段の変復調部３２へ出力する。
【００４０】
一方、ＲＦ部３１は送信時に、変復調部３２から送られてくるＯＱＰＳＫ（ＯｆｆｓｅｔＱｕａｄｒｉ−ＰｈａｓｅＳｈｉｆｔＫｅｙｉｎｇ）の変調信号を、後述する制御部３５からの制御に基づいてＡＧＣ増幅器で送信電力制御した後にＰＬＬシンセサイザから出力される所定周波数の局部発振信号と混合してＲＦ帯に周波数変換し、ＰＡ（ＰｏｗｅｒＡｍｐｌｉｆｉｅｒ）で大電力に増幅して、上記デュプレクサを介してアンテナ２３より輻射送信させる。
【００４１】
変復調部３２は、受信時にＲＦ部３１からのＩＦ信号を直交検波器でベースバンドＩ・Ｑ（Ｉｎ−ｐｈａｓｅ・Ｑｕａｄｒａｔｕｒｅ−ｐｈａｓｅ）信号に分離し、デジタル化してＣＤＭＡ部３３に出力する。
【００４２】
一方、変復調部３２は送信時に、ＣＤＭＡ部３３から送られてくるデジタル値のＩ・Ｑ信号をアナログ化した後に直交変調器でＯＱＰＳＫ変調してＲＦ部３１に送出する。
【００４３】
ＣＤＭＡ部３３は、受信時に変復調部３２からのデジタル信号をＰＮ（ＰｓｅｕｄｏＮｏｉｓｅ：擬似雑音）符号のタイミング抽出回路及びそのタイミング回路の指示に従って逆拡散・復調を行なう複数の復調回路に入力し、そこから出力される複数の復調シンボルの同期をとって合成器で合成して音声処理部３４に出力する。
【００４４】
一方、ＣＤＭＡ部３３は送信時に、音声処理部３４からの出力シンボルを拡散処理した後にデジタルフィルタで帯域制限をかけてＩ・Ｑ信号とし、変復調部３２に送出する。
【００４５】
音声処理部３４は、受信時にＣＤＭＡ部３３からの出力シンボルをデインタリーブし、それからビタビ復調器で誤り訂正処理を施した後に、音声処理ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｃｅｓｓｏｒ）で圧縮されたデジタル信号から通常のデジタル音声信号へと伸長し、これをアナログ化して上記スピーカ１４または必要により上記外面スピーカ２４を拡声駆動させる。
【００４６】
一方、音声処理部３４は送信時に、上記第１マイクロホン１８及び第２マイクロホン２０から入力されるアナログの音声信号をデジタル化した後に音声処理ＤＳＰでデータ量を圧縮し、それから畳込み符号器で誤り訂正符号化してからインタリーブし、その出力シンボルをＣＤＭＡ部３３へ送出する。
【００４７】
しかして、上記ＲＦ部３１、変復調部３２、ＣＤＭＡ部３３、及び音声処理部３４に対して制御部３５を接続し、この制御部３５にＧＰＳレシーバ３６、画像撮影部３７、動画処理部３８、上記メイン表示部１５、上記サブ表示部２２、メモリカード３９、バイブレータ部４０、及びＬＥＤ駆動部４１を接続している。
【００４８】
ここで制御部３５は、ＣＰＵと後述する動画通信動作等を含むその動作プログラムを固定的に記憶したＲＯＭ、及びワークメモリとして使用されるＲＡＭ等で構成され、この携帯電話機１０全体の動作を制御する。
【００４９】
ＧＰＳレシーバ３６は、ＧＰＳアンテナ４２が受信する複数のＧＰＳ衛星からの測位情報により現在位置の緯度、経度、及び高度と正確な現在時刻とを算出し、制御部３５へ出力する。
【００５０】
画像撮影部３７は、制御部３５の制御の下に、上記第１カメラ部１６を構成する光学レンズ系４３の撮影光軸後方に配置されたＣＣＤ４４での撮影動作、及び上記第２カメラ部１９を構成する光学レンズ系４５の撮影光軸後方に配置されたＣＣＤ４６での撮影動作を制御し、これらの撮影により得られる２つの画像データをデジタル化して出力する。
【００５１】
動画処理部３８は、画像撮影部３７より得られる画像データと上記第１マイクロホン１８及び第２マイクロホン２０で得られる音声データとを例えばＭＰＥＧ４（ＭｏｖｉｎｇＰｉｃｔｕｒｅｓｃｏｄｉｎｇＥｘｒｅｒｔＧｒｏｕｐ４）方式に基づいてデータ圧縮し、動画データを生成する一方で、受信した動画データのデータ圧縮を解いて伸長し、元のビットマップ状の画像データと音声データとを得る。
【００５２】
メモリカード３９は、この携帯電話機１０に着脱自在に備えられるもので、自機で撮影した動画データや受信により得た動画データ等を記憶しておく。
【００５３】
バイブレータ部４０は、着信時等に予め設定された振動パターン及び振動強度で振動する。
【００５４】
ＬＥＤ駆動部４１は、上記撮影ライト２１を構成する高輝度白色ＬＥＤとその駆動回路とでなるものであり、必要により第２カメラ部１９の撮影対象となる被写体方向に向けて補助光を照射する。
【００５５】
なお、上記メイン表示部１５及びサブ表示部２２は、いずれもバックライト付きの反射／透過型カラー液晶パネルで構成され、バックライトを点灯して透過型液晶としての表示が可能である一方、見やすさは多少落ちるものの、バックライトを消灯して外光を用いた反射型液晶としての表示も可能であるものとする。
【００５６】
また、図示はしないが、上記ヒンジ部１１には上部筐体１２と下部筐体１３の展開状態及び折りたたみ状態を検出するための機構を有するものとする。この検出機構からの情報により制御部３５は、第１カメラ部１６と第２カメラ部１９により画像を撮影して通話しようとすべく各種キー１７のカメラキー１７ａを操作して移行するテレビ電話モードにおいて、上記図１（Ａ）に示したように上部筐体１２と下部筐体１３を開いた展開状態で、この携帯電話機１０のユーザが第１カメラ部１６により自分を、第２カメラ部１９により他者を同時に撮影しようとしているものと判断し、サブ表示部２２での表示を停止してメイン表示部１５で第１カメラ部１６と第２カメラ部１９での撮影に基づくモニタ画像を表示させる。
【００５７】
次に上記実施の形態の動作について説明する。
図３は、基本的に制御部３５が予め固定記憶された動作プログラムに基づいて実行するテレビ電話モード時の通話処理内容を示すもので、同様の機能を有する通話相手先の携帯電話機から受信した動画データの再生に関しては本処理と平行して実行するものとして、ここではその説明を省略し、動画データの取得から送信に至る過程のみを述べるものとする。
【００５８】
その当初には、第１マイクロホン１８と第２マイクロホン２０を用いて上部筐体１２の内面と外面両方向の音声（図では「音声Ａ，音声Ｂ」と称する）を録音しながら（ステップＡ０１）、動画像を構成する個々の画像の撮影タイミングとなるのを待機する（ステップＡ０２）。
【００５９】
この場合、例えば動画のフレームレートが１５［フレーム／秒］、解像度が横１６０ドット×縦１２０ドット、音声のサンプリング周波数が１２［ＫＨｚ］であるとすると、１２［ＫＨｚ］で音声信号のサンプリングを実行しながら、１／１５［秒］毎に撮影タイミングとなってステップＡ０２でこれを判断し、第１カメラ部１６と第２カメラ部１９による上部筐体１２の内面と外面両方向の画像（図では「画像Ａ，画像Ｂ」と称する）を撮影する（ステップＡ０３）。
【００６０】
これとともに、上記録音により得た２つの音声データの音圧レベルを比較し、そのいずれが大きいかを判断する（ステップＡ０４）。
【００６１】
第１マイクロホン１８で得た音声データの音圧レベルの方が第２マイクロホン２０で得た音声データの音圧レベル以上であった場合には、音声に対応する画像として第１カメラ部１６で得た画像データ内の一部、例えば右下に第２カメラ部１９で得た画像データを嵌め込んだピクチャ・イン・ピクチャの合成画像を作成する（ステップＡ０５）。
【００６２】
この合成画像としては、上述した如く第２カメラ部１９で得た横１６０ドット×縦１２０ドットの解像度を有する画像データを縦横共に１ドット毎に間引いて横８０ドット×縦６０ドットの画像データを生成し、これを第１カメラ部１６で得た横１６０ドット×縦１２０ドットの解像度を有する画像データの右下１／４に渡る範囲の部分に置換することで、容易に作成できる。
【００６３】
また、上記ステップＡ０４で第１マイクロホン１８で得た音声データの音圧レベルに比して第２マイクロホン２０で得た音声データの音圧レベルの方が大きいと判断した場合には、音声に対応する画像として第２カメラ部１９で得た画像データ内の一部、例えば右下に第１カメラ部１６で得た画像データを嵌め込んだピクチャ・イン・ピクチャの合成画像を作成する（ステップＡ０６）。
【００６４】
こうしてステップＡ０５またはＡ０６で得た合成画像データを用い、併せて前回の画像撮影から今回の画像撮影の間に取得していた１／１５［秒］分の２つの音声データを重畳して所定のフォーマット化し（ステップＡ０７）、通話相手先に送信して（ステップＡ０８）、以上で画像データ単位の一連の処理を終了し、再び上記ステップＡ０１からの処理に戻って、以後このテレビ電話モードでの通話が終わるまで上記処理を繰返し実行する。
【００６５】
図４は、上記ステップＡ０５での処理を経てステップＡ０７で所定のフォーマット化した画像データと対応する音声データの概念を例示するものである。画像データとしては、第１カメラ部１６で得た画像データ内の一部、例えば右下に第２カメラ部１９で得た画像データを嵌め込んだピクチャ・イン・ピクチャの合成画像が配される一方で、音声データとしては第１マイクロホン１８で得た音声データと第２マイクロホン２０で得た音声データとが重畳された状態で配される。
【００６６】
なお、実際に動画データを例えばＭＰＥＧ４の規格に則ってデータ圧縮した後に送信するものとした場合には、動画処理部３８で複数フレームの画像データに対するＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）、動き補償等の処理を施した後に送信することとなるので、上記図４に示した如く１フレームの画像データにその時間分の音声データが付加されたデータフォーマットとは概念が異なるものとなる。
【００６７】
このように、例えばその時点で音声信号の音圧レベルの大きい側に対応した画像を大きく、音圧レベルの小さい側に対応した画像を小さくしたピクチャ・イン・ピクチャの合成画像を生成することにより、１台の携帯電話機１０で二人の人物を同時に撮影している状態でそのうちの発言している側の画像を他より大きくするなど、自然な形の合成画像を生成することができる。
【００６８】
（第２の実施の形態）
以下本発明をカメラ機能付きのＣＤＭＡ方式の携帯電話機に適用した場合の第２の実施の形態について図面を参照して説明する。
【００６９】
なお、その外観構成に関しては上記図１と、回路構成に関しては上記図２とそれぞれ基本的に同様であるので、同一部分には同一符号を用いるものとして、それらの図示及び説明は省略する。
【００７０】
次に上記実施の形態の動作について説明する。
図５は、基本的に制御部３５が予め固定記憶された動作プログラムに基づいて実行するテレビ電話モード時の通話処理内容を示すもので、同様の機能を有する通話相手先の携帯電話機から受信した動画データの再生に関しては本処理と平行して実行するものとして、ここではその説明を省略し、動画データの取得から送信に至る過程のみを述べるものとする。
【００７１】
その当初には、第１マイクロホン１８と第２マイクロホン２０を用いて上部筐体１２の内面と外面両方向の音声（図では「音声Ａ，音声Ｂ」と称する）を録音しながら（ステップＢ０１）、動画像を構成する個々の画像の撮影タイミングとなるのを待機する（ステップＢ０２）。
【００７２】
この場合、例えば動画のフレームレートが１５［フレーム／秒］、解像度が横１６０ドット×縦１２０ドット、音声のサンプリング周波数が１２［ＫＨｚ］であるとすると、１２［ＫＨｚ］で音声信号のサンプリングを実行しながら、１／１５［秒］毎に撮影タイミングとなってステップＢ０２でこれを判断し、第１カメラ部１６と第２カメラ部１９による上部筐体１２の内面と外面両方向の画像（図では「画像Ａ，画像Ｂ」と称する）を撮影する（ステップＢ０３）。
【００７３】
これとともに、上記録音により得た２つの音声データの音圧レベルを比較し、そのいずれが大きいかを判断する（ステップＢ０４）。
【００７４】
第１マイクロホン１８で得た音声データの音圧レベルの方が第２マイクロホン２０で得た音声データの音圧レベル以上であった場合には、音声に対応する画像として第１カメラ部１６で得た画像データを選択する（ステップＢ０５）。
【００７５】
この選択画像としては、上述した如く第１カメラ部１６で得た横１６０ドット×縦１２０ドットの解像度を有する画像データをそのまま利用する。
【００７６】
また、上記ステップＢ０４で第１マイクロホン１８で得た音声データの音圧レベルに比して第２マイクロホン２０で得た音声データの音圧レベルの方が大きいと判断した場合には、音声に対応する画像として第２カメラ部１９で得た画像データを選択する（ステップＢ０６）。
【００７７】
こうしてステップＢ０５またはＢ０６で選択した画像データを用い、併せて前回の画像撮影から今回の画像撮影の間に取得していた１／１５［秒］分の２つの音声データを重畳して所定のフォーマット化し（ステップＢ０７）、通話相手先に送信して（ステップＢ０８）、以上で画像データ単位の一連の処理を終了し、再び上記ステップＢ０１からの処理に戻って、以後このテレビ電話モードでの通話が終わるまで上記処理を繰返し実行する。
【００７８】
図６は、上記ステップＢ０５での処理を経てステップＢ０７で所定のフォーマット化した画像データと対応する音声データの概念を例示するものである。画像データとしては、第１カメラ部１６で得た画像データのみを用いている一方で、音声データとしては第１マイクロホン１８で得た音声データと第２マイクロホン２０で得た音声データとが重畳された状態で配される。
【００７９】
なお、実際に動画データを例えばＭＰＥＧ４の規格に則ってデータ圧縮した後に送信するものとした場合には、動画処理部３８で複数フレームの画像データに対するＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）、動き補償等の処理を施した後に送信することとなるので、上記図６に示した如く１フレームの画像データにその時間分の音声データが付加されたデータフォーマットとは概念が異なるものとなる。
【００８０】
このように、例えばその時点で音声信号の音圧レベルの大きい側に対応した画像の側を選択することにより、１台の携帯電話機１０で二人の人物を同時に撮影している状態でそのうちの発言している側に切り換えるなど、自然な形の画像を生成することができる。
【００８１】
（第３の実施の形態）
以下本発明をカメラ機能付きのＣＤＭＡ方式の携帯電話機に適用した場合の第３の実施の形態について図面を参照して説明する。
【００８２】
なお、その外観構成に関しては上記図１と、回路構成に関しては上記図２とそれぞれ基本的に同様であるので、同一部分には同一符号を用いるものとして、それらの図示及び説明は省略する。
【００８３】
次に上記実施の形態の動作について説明する。
図７は、基本的に制御部３５が予め固定記憶された動作プログラムに基づいて実行するテレビ電話モード時の通話処理内容を示すもので、同様の機能を有する通話相手先の携帯電話機から受信した動画データの再生に関しては本処理と平行して実行するものとして、ここではその説明を省略し、動画データの取得から送信に至る過程のみを述べるものとする。
【００８４】
その当初には、第１マイクロホン１８と第２マイクロホン２０を用いて上部筐体１２の内面と外面両方向の音声（図では「音声Ａ，音声Ｂ」と称する）を録音しながら（ステップＣ０１）、動画像を構成する個々の画像の撮影タイミングとなるのを待機する（ステップＣ０２）。
【００８５】
この場合、例えば動画のフレームレートが１５［フレーム／秒］、解像度が横１６０ドット×縦１２０ドット、音声のサンプリング周波数が１２［ＫＨｚ］であるとすると、１２［ＫＨｚ］で音声信号のサンプリングを実行しながら、１／１５［秒］毎に撮影タイミングとなってステップＣ０２でこれを判断し、第１カメラ部１６と第２カメラ部１９による上部筐体１２の内面と外面両方向の画像（図では「画像Ａ，画像Ｂ」と称する）を撮影する（ステップＣ０３）。
【００８６】
そして、第１カメラ部１６で得た画像データと第２カメラ部１９で得た画像データとを用いてこれらを左右に配置した合成画像を作成する（ステップＣ０４）。
【００８７】
この合成画像としては、上述した如く第１カメラ部１６及び第２カメラ部１９で得た横１６０ドット×縦１２０ドットの解像度を有する画像データをそれぞれ横方向のみ１ドット毎に間引いて横８０ドット×縦１２０ドットの画像データを生成し、第１カメラ部１６で得た画像データを左、第２カメラ部１９で得た画像データを右となるように２つの画像を単純に合成して横１６０ドット×縦１２０ドットの解像度を有する画像データとする。
【００８８】
こうしてステップＣ０４で得た合成画像データを用い、併せて前回の画像撮影から今回の画像撮影の間に取得していた１／１５［秒］分の２つの音声データをそれぞれ左チャンネル（Ｌｃｈ）と右チャンネル（Ｒｃｈ）で分離した状態として所定のフォーマット化し（ステップＣ０５）、通話相手先に送信して（ステップＣ０６）、以上で画像データ単位の一連の処理を終了し、再び上記ステップＣ０１からの処理に戻って、以後このテレビ電話モードでの通話が終わるまで上記処理を繰返し実行する。
【００８９】
図８は、上記ステップＣ０４での処理を経てステップＣ０５で所定のフォーマット化した画像データと対応する音声データの概念を例示するものである。
【００９０】
画像データとしては、第１カメラ部１６で得た画像データが左側に、第２カメラ部１９で得た画像データが右側に配置されて一枚の合成画像として配される一方で、これに対応して音声データとしては第１マイクロホン１８で得た音声データが左チャンネル、第２マイクロホン２０で得た音声データが右チャンネルとなるようにステレオ音声とした状態で配される。
【００９１】
従って、通話相手先も同等の携帯電話機１０を用いてこれを再生するものとすれば、メイン表示部１５に上記二人の人物が写った合成画像が表示されると共に、例えば主として第１マイクロホン１８で得た音声データが外面スピーカ２４により、第２マイクロホン２０で得た音声データが第１マイクロホン１８により拡声放音される、というように、音声データを得た人物の位置関係に対応して、スピーカ１４と外面スピーカ２４を別々に駆動して異なる音声を拡声放音させることができる。
【００９２】
なお、実際に動画データを例えばＭＰＥＧ４の規格に則ってデータ圧縮した後に送信するものとした場合には、動画処理部３８で複数フレームの画像データに対するＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅｓ）、動き補償等の処理を施した後に送信することとなるので、上記図８に示した如く１フレームの画像データにその時間分の音声データが付加されたデータフォーマットとは概念が異なるものとなる。
【００９３】
このように、例えば複数の人物を撮影している状態でその位置関係に対応して各人物を並べた合成画像を生成することにより、自然な形の合成画像を生成することができる。
【００９４】
加えて、第１マイクロホン１８と第２マイクロホン２０で得た音声データをステレオ信号として分離したままの状態で送信するものとしたため、相手先にも２つのスピーカが備えられていれば、各音声を別々に拡声放音することができ、発言する人物の違いを音声の出力位置からも類推できるような、音像の定位を明確と、きわめて自然で品位の高い動画の出力が可能となる。
【００９５】
なお、上記第１及び第２の実施の形態では、２つの音声信号の音圧レベルの大小により即時画像の状態を切り換えるものとなるが、あえてその切換えに対しては前回に切換えてから経過した時間と、２つの音声信号の音圧レベルの差をも考慮して切り換えるものとしてもよい。
【００９６】
その場合、前回の画像切換え直後には、ある一定以上の閾値を越えるような音圧レベルの差がないと画像の切換えを行なわず、且つ切換から時間が経過するに連れてその閾値の内容を段階的に減少させるようにする。
【００９７】
こうすることで、不自然に短い周期で合成画像の内容が切換えられてしまい、見づらいものとなってしまうのを避けることができる。
【００９８】
また、上記第１及び第２の実施の形態における合成画像の切換えに際しては、バッファメモリによりその時点で得られる画像をある程度の時間分保持しておき、切換時に音声との違和感を感じない程度の一定の時間タイムラグ分、例えば０．２［秒］だけ遡った画像を取出して出力に用いるものとしてもよい。
【００９９】
これは、音声の状態の変化に伴う画像の切換えに際して、音声の状態の変化を検出したその時点ではなく、時間的に少し前に遡った画像を対応するものとして用いることにより、例えば新たに発言を開始した人物の、その発言を行なっている状態からではなく、発言をする直前の状態からの画像を出力することにより、視覚的にきわめて自然な画像を提供できるようにしたものである。
【０１００】
また、上記第１乃至第３の実施の形態はいずれも、２つの画像信号及びこれに対応する２つの音声信号から生成した信号を通話相手に即時送信する場合について説明したが、テレビ電話モードでの通話で送信する場合に限らず、一旦メモリカード３９等の記録媒体に記録し、後に編集や再生等の処理を施すものとしてもよく、さらには電子メールの添付ファイルとして使用するものとしてもよい。
【０１０１】
さらに、上記第１乃至第３の実施の形態はいずれも、本発明をカメラ機能付きの携帯電話機に適用した場合について説明したものであるが、本発明はこれに限らず、動画撮影が可能なデジタルカメラや、カメラ機能付きのＰＤＡ、ビデオカメラ装置、会議記録システム等にも多々適用可能となる。
【０１０２】
その他、本発明は上記実施の形態に限らず、その要旨を逸脱しない範囲内で種々変形して実施することが可能であるものとする。
【０１０３】
さらに、上記実施の形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組合わせにより種々の発明が抽出され得る。例えば、実施の形態に示される全構成要件からいくつかの構成要件が削除されても、発明が解決しようとする課題の欄で述べた課題の少なくとも１つが解決でき、発明の効果の欄で述べられている効果の少なくとも１つが得られる場合には、この構成要件が削除された構成が発明として抽出され得る。
【０１０４】
【発明の効果】
請求項１記載の発明によれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力することができる。
【０１０５】
請求項２記載の発明によれば、例えばその時点で音声信号の音圧レベルの大きい側に対応した画像を選択して出力することにより、複数の人物を撮影している状態でその中の発言者の画像をその時々に応じて切り換えるなど、自然な形の合成画像を生成して出力することができる。
【０１０６】
請求項３記載の発明によれば、例えば複数の人物を撮影している状態でその位置関係に対応して各人物を並べた合成画像を生成して出力するなど、自然な形の合成画像を生成して出力することができる。
【０１０７】
請求項４記載の発明によれば、上記請求項１記載の発明の効果に加えて、その時点で音声信号の音圧レベルの大きい側に対応した画像を大きく、音圧レベルの小さい側に対応した画像を小さくしたピクチャ・イン・ピクチャの合成画像を生成して出力することにより、複数の人物を撮影している状態でその中の発言者の画像を他より大きくように、自然な形の合成画像を生成して出力することができる。
【０１０８】
請求項５記載の発明によれば、上記請求項３記載の発明の効果に加えて、複数の画像信号の位置関係に対応して音声信号もステレオ化して出力するなど、画像の合成に対応する音声も出力することで、より自然な動画撮影内容を出力することができる。
【０１０９】
請求項６記載の発明によれば、上記請求項１乃至３いずれか記載の発明の効果に加えて、音声も合わせたきわめて自然な動画データを生成して出力することができる。
【０１１０】
請求項７記載の発明によれば、上記請求項６記載の発明の効果に加えて、画像に対応したステレオ音声を出力できるため、音像の定位をも明確とした、きわめて自然で品位の高い動画データを生成して出力することができる。
【０１１１】
請求項８記載の発明によれば、上記請求項１または２記載の発明の効果に加えて、複数の音声信号の比較に応じた画像の出力に際してタイムラグを考慮して一定の時間を遡った画像を出力することができるため、簡単な処理ながら例えば撮影対象となる人物の発言当初の画像をより自然に生成して出力することができる。
【０１１２】
請求項９記載の発明によれば、上記請求項１乃至８いずれかに記載の発明の効果に加えて、撮影により得た画像を記録媒体に記録するため、該記録媒体を用いて後の再生処理などに活用できる。
【０１１３】
請求項１０記載の発明によれば、上記請求項１乃至８いずれかに記載の発明の効果に加えて、例えば画像を添付した電子メールの発信等に適用可能となるばかりでなく、同一機能を有する他の装置との通信によりテレビ電話などのようにリアルタイムで相互の画像を用いた通信にも適用することが可能となる。
【０１１４】
請求項１１記載の発明によれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力させることができる。
【０１１５】
請求項１２記載の発明によれば、例えばその時点で音声信号の音圧レベルに応じて対応する画像の大きさを変えて合成するなど、自然な形の合成画像を生成して出力させることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係る携帯電話機の外観構成を示す図。
【図２】同実施の形態に係る携帯電話機の電子回路の機能構成を示すブロック図。
【図３】同実施の形態に係る動画データ取得時の処理内容を示すフローチャート。
【図４】同実施の形態に係る単位時間当たりに生成されるデータ構成を例示する図。
【図５】本発明の第２の実施の形態に係る動画データ取得時の処理内容を示すフローチャート。
【図６】同実施の形態に係る単位時間当たりに生成されるデータ構成を例示する図。
【図７】本発明の第３の実施の形態に係る動画データ取得時の処理内容を示すフローチャート。
【図８】同実施の形態に係る単位時間当たりに生成されるデータ構成を例示する図。
【符号の説明】
１０…携帯電話機、１１…ヒンジ部、１２…上部筐体、１３…下部筐体、１４…スピーカ、１５…メイン表示部、１６…第１カメラ部、１７…各種キー、１８…第１マイクロホン、１９…第２カメラ部、２０…第２マイクロホン、２１…撮影ライト、２２…サブ表示部、２３…アンテナ、２４…外面スピーカ、３１…ＲＦ部、３２…変復調部、３３…ＣＤＭＡ部、３４…音声処理部、３５…制御部、３６…ＧＰＳレシーバ、３７…画像撮影部、３８…動画処理部、３９…メモリカード、４０…バイブレータ部、４１…ＬＥＤ駆動部、４２…ＧＰＳアンテナ、４３…光学レンズ系、４４…ＣＣＤ、４５…光学レンズ系、４６…ＣＣＤ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an imaging apparatus, an imaging method, and a program suitable for a mobile phone with a camera function, for example.
[0002]
[Prior art]
Recently, as mobile phones with camera functions have become widespread, the camera unit's shooting direction can be rotated, two camera units provided in different directions of the housing, camera unit Various models have been commercialized, such as those having an optical zoom mechanism formed in the axial direction of the hinge portion of the folding casing.
[0003]
However, any one provided with two camera units can selectively output only one of them depending on the object to be photographed, and only one image sensor is used. It is also considered that images can be taken in different directions by selectively switching with an optical member such as a half mirror. (For example, see Patent Document 1.)
[0004]
[Patent Document 1]
JP 2001-223924 A
[0005]
[Problems to be solved by the invention]
However, any of the above two camera sections or one having two shooting optical paths in one shooting section can be used to selectively shoot an image in one direction according to the shooting target. Met.
[0006]
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an imaging apparatus and an imaging method capable of realizing various usage environments by more effectively using a plurality of camera units. And providing a program.
[0007]
[Means for Solving the Problems]
According to the first aspect of the present invention, a plurality of image pickup means, a plurality of sound input means provided corresponding to each of the plurality of image pickup means, and the plurality of image pickup means and the plurality of sound input means are simultaneously driven to generate a moving image. A photographing drive means for photographing, a comparing means for comparing a plurality of audio signals obtained by driving a plurality of sound input means by the photographing driving means, and a plurality of imaging means based on the comparison result obtained by the comparing means An image synthesizing means for synthesizing the obtained image signals and an output means for outputting the image signals obtained by the image synthesizing means are provided.
[0008]
With such a configuration, it is possible to generate and output a combined image in a natural form, for example, by changing the size of the corresponding image according to the sound pressure level of the audio signal at that time and combining the images.
[0009]
According to a second aspect of the present invention, in the first aspect of the present invention, a plurality of image pickup means, a plurality of sound input means provided corresponding to each of the plurality of image pickup means, the plurality of image pickup means, and a plurality of sound An imaging drive means for capturing a moving image by simultaneously driving the input means, a comparison means for comparing a plurality of audio signals obtained by driving a plurality of audio input means by the imaging drive means, and a comparison result obtained by the comparison means The image selecting means for selecting one of the image signals obtained by the plurality of imaging means based on the above and an output means for outputting the image signal obtained by the image selecting means.
[0010]
With such a configuration, for example, by selecting and outputting an image corresponding to the side with the higher sound pressure level of the audio signal at that time, a plurality of persons can be photographed while the plurality of persons are being photographed. It is possible to generate and output a synthetic image having a natural shape, such as switching images according to the time.
[0011]
According to a third aspect of the present invention, a plurality of image pickup means, a plurality of sound input means provided corresponding to each of the plurality of image pickup means, and the plurality of image pickup means and the plurality of sound input means are simultaneously driven to generate a moving image. An imaging drive unit for imaging, an image synthesis unit for synthesizing image signals obtained by the plurality of imaging units reflecting the positional relationship of the plurality of imaging units, and an output unit for outputting the image signals obtained by the image synthesis unit It was characterized by comprising.
[0012]
With such a configuration, for example, when a plurality of persons are photographed, a combined image in which each person is arranged corresponding to the positional relationship is generated and output. Can be output.
[0013]
According to a fourth aspect of the present invention, in the first aspect of the present invention, the synthesizing unit generates an image corresponding to an audio signal having a lower sound pressure level in an image corresponding to an audio signal having a higher sound pressure level. An image signal combined and synthesized is generated.
[0014]
With such a configuration, in addition to the operation of the invention described in claim 1 above, an image corresponding to the side where the sound pressure level of the audio signal is large at that time is large and an image corresponding to the side where the sound pressure level is small By generating and outputting a composite image of picture-in-picture with a reduced size, a composite image of a natural shape so that the image of the speaker in it is larger than the others while shooting multiple people Can be generated and output.
[0015]
According to a fifth aspect of the present invention, in the invention of the third aspect, the output means outputs a plurality of audio signals obtained by the plurality of audio input means together with the image signals to positions of a plurality of corresponding imaging means, respectively. It is characterized by being output in a separated state reflecting the relationship.
[0016]
With such a configuration, in addition to the operation of the invention described in claim 3, the sound corresponding to the composition of the image can also be output, for example, the sound signal can be output in stereo according to the positional relationship of the plurality of image signals. By outputting, it is possible to output more natural video shooting content.
[0017]
The invention according to claim 6 is the invention according to any one of claims 1 to 3, wherein the output means outputs a plurality of sound signals obtained by the plurality of sound input means together with an image signal. To do.
[0018]
With such a configuration, in addition to the operation of the invention according to any one of the first to third aspects, it is possible to generate and output very natural moving image data combined with sound.
[0019]
A seventh aspect of the invention is characterized in that, in the sixth aspect of the invention, the output means outputs the plurality of audio signals in a separated state.
[0020]
With such a configuration, in addition to the operation of the invention described in claim 6 above, since stereo sound corresponding to the image can be output, very natural and high-quality moving image data in which the localization of the sound image is clear is obtained. Can be generated and output.
[0021]
The invention described in claim 8 is characterized in that, in the invention described in claim 1 or 2, the output means outputs an image signal that goes back in time in consideration of a fixed time lag.
[0022]
With such a configuration, in addition to the operation of the invention described in claim 1 or 2, an image that goes back a certain time is output in consideration of a time lag when outputting an image according to comparison of a plurality of audio signals. Therefore, it is possible to more naturally generate and output, for example, an original image of a person who is a subject to be photographed with simple processing.
[0023]
According to a ninth aspect of the present invention, in the invention according to any one of the first to eighth aspects, the output means records at least an image signal on a recording medium.
[0024]
With such a configuration, in addition to the operation of the invention according to any one of claims 1 to 8, in order to record an image obtained by photographing on a recording medium, a later reproduction process using the recording medium, etc. Can be used for
[0025]
According to a tenth aspect of the present invention, in the invention according to any one of the first to eighth aspects, the output means transmits at least an image signal to a communication medium.
[0026]
With such a configuration, in addition to the operation of the invention according to any one of the first to eighth aspects, not only can it be applied to, for example, transmission of an e-mail attached with an image, but also it has the same function. It is possible to apply to communication using a mutual image in real time, such as a video phone, by communicating with the apparatus.
[0027]
According to an eleventh aspect of the present invention, a plurality of imaging units and a plurality of audio input units provided corresponding to each of the plurality of imaging units are simultaneously driven to capture a moving image, and the imaging driving step A comparison step of comparing a plurality of audio signals obtained by driving a plurality of audio input units, an image synthesis step of synthesizing the image signals obtained by the plurality of imaging units based on the comparison results obtained in the comparison step, And an output process for outputting the image signal obtained in the image synthesis process.
[0028]
With such a method, it is possible to generate and output a synthesized image having a natural shape, for example, by changing the size of the corresponding image in accordance with the sound pressure level of the audio signal at that time.
[0029]
The invention according to claim 12 is a program executed by a computer built in an imaging apparatus provided with a plurality of imaging units and a plurality of audio input units provided corresponding to each of the plurality of imaging units, A shooting drive step of simultaneously driving the imaging unit and the plurality of audio input units to shoot a moving image, a comparison step of comparing a plurality of audio signals obtained by driving the plurality of audio input units in the shooting drive step, An image synthesizing step for synthesizing the image signals obtained by the plurality of imaging units based on the comparison result obtained in the comparing step, and an output step for outputting the image signal obtained in the image synthesizing step. Features.
[0030]
With such program content, it is possible to generate and output a synthetic image in a natural form, for example, by changing the size of the corresponding image according to the sound pressure level of the audio signal at that time and synthesizing it. .
[0031]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
A first embodiment in the case where the present invention is applied to a CDMA (Code Division Multiple Access) mobile phone with a camera function will be described below with reference to the drawings.
[0032]
FIGS. 1A and 1B show an external configuration of the mobile phone 10 according to the first embodiment. Two casings 12 and 13 are integrally formed with a hinge 11 interposed therebetween. FIG. 1A shows the inner surface in a fully opened state, and FIG. 1B mainly shows the outer surface of the upper housing 12 in a folded state.
[0033]
As shown in FIG. 1A, a speaker 14 serving as a receiver, a main display unit 15, and a first camera unit 16 are provided on the inner surface of the upper housing 12.
[0034]
On the other hand, various keys 17 including dial keys and a first microphone 18 serving as a transmitter are provided on the inner surface of the lower housing 13.
[0035]
As shown in FIG. 1B, the outer surface of the upper housing 12 is provided with a second camera unit 19, a second microphone 20, a photographing light 21 made of a high-intensity LED, and a sub display unit 22.
[0036]
Further, an antenna 23 extending also into the lower housing 13 is formed so as to protrude from the outer surface side of the hinge portion 11.
[0037]
Although not shown here, an outer surface speaker 24 larger than the speaker 14 is provided on the outer surface side of the lower housing 13 to emit a beep sound, a melody, or the like at the time of incoming call.
[0038]
FIG. 2 shows a circuit configuration of the mobile phone 10. In the figure, the antenna 23 performs CDMA communication with the nearest base station, and an RF unit 31 is connected to the antenna 23.
[0039]
The RF unit 31 separates the signal input from the antenna 23 from the frequency axis by a duplexer at the time of reception, and converts the frequency into an IF signal by mixing with a local oscillation signal having a predetermined frequency output from the PLL synthesizer. Only the reception frequency channel is extracted by the broadband BPF, and the signal level of the desired reception wave is made constant by the AGC amplifier, and then output to the modulation / demodulation unit 32 at the next stage.
[0040]
On the other hand, after transmitting the modulation power of OQPSK (Offset Quadri-Phase Shift Keying) sent from the modulation / demodulation unit 32, the RF unit 31 uses the AGC amplifier to control the transmission power based on the control from the control unit 35 described later. The signal is mixed with a local oscillation signal having a predetermined frequency output from the PLL synthesizer, frequency-converted to an RF band, amplified to a large power by a PA (Power Amplifier), and radiated and transmitted from the antenna 23 via the duplexer.
[0041]
The modulation / demodulation unit 32 separates the IF signal from the RF unit 31 into a baseband I · Q (In-phase / Quadrature-phase) signal by a quadrature detector at the time of reception, digitizes it, and outputs it to the CDMA unit 33.
[0042]
On the other hand, at the time of transmission, the modem unit 32 converts the digital I / Q signal sent from the CDMA unit 33 into an analog signal, performs OQPSK modulation with a quadrature modulator, and sends the result to the RF unit 31.
[0043]
The CDMA unit 33 inputs the digital signal from the modulation / demodulation unit 32 at the time of reception to a PN (Pseudo Noise) timing extraction circuit and a plurality of demodulation circuits that perform despreading and demodulation in accordance with instructions from the timing circuit. Are synchronized with each other by the synthesizer and output to the speech processing unit 34.
[0044]
On the other hand, at the time of transmission, the CDMA unit 33 spreads the output symbol from the voice processing unit 34 and then limits the band with a digital filter to form an I / Q signal, which is sent to the modem unit 32.
[0045]
The voice processing unit 34 deinterleaves the output symbols from the CDMA unit 33 at the time of reception, and then performs error correction processing by a Viterbi demodulator, and then, from a digital signal compressed by a voice processing DSP (Digital Signal Processor), The digital audio signal is expanded into an analog signal, and the speaker 14 or, if necessary, the external speaker 24 is loudly driven.
[0046]
On the other hand, at the time of transmission, the audio processing unit 34 digitizes the analog audio signals input from the first microphone 18 and the second microphone 20 and then compresses the data amount by the audio processing DSP, and then the error is detected by the convolutional encoder. After performing the correction encoding, interleaving is performed, and the output symbol is sent to the CDMA unit 33.
[0047]
Then, a control unit 35 is connected to the RF unit 31, the modem unit 32, the CDMA unit 33, and the audio processing unit 34, and a GPS receiver 36, an image capturing unit 37, a moving image processing unit 38, The main display unit 15, the sub display unit 22, the memory card 39, the vibrator unit 40, and the LED drive unit 41 are connected.
[0048]
Here, the control unit 35 is composed of a CPU and a ROM in which an operation program including a moving image communication operation, which will be described later, is fixedly stored, a RAM used as a work memory, and the like, and controls the overall operation of the mobile phone 10. To do.
[0049]
The GPS receiver 36 calculates the latitude, longitude and altitude of the current position and the accurate current time based on the positioning information from the plurality of GPS satellites received by the GPS antenna 42 and outputs the calculated current time to the control unit 35.
[0050]
Under the control of the control unit 35, the image photographing unit 37 performs the photographing operation with the CCD 44 disposed behind the photographing optical axis of the optical lens system 43 constituting the first camera unit 16, and the second camera unit 19. The imaging operation of the CCD 46 disposed behind the imaging optical axis of the optical lens system 45 constituting the above is controlled, and two image data obtained by these imaging are digitized and output.
[0051]
The moving image processing unit 38 compresses the image data obtained from the image photographing unit 37 and the audio data obtained from the first microphone 18 and the second microphone 20 based on, for example, the MPEG4 (Moving Pictures coding Expert Group 4) system. While generating the moving image data, the received moving image data is decompressed and decompressed to obtain the original bitmap image data and audio data.
[0052]
The memory card 39 is detachably attached to the cellular phone 10 and stores moving image data photographed by itself, moving image data obtained by reception, and the like.
[0053]
The vibrator unit 40 vibrates with a vibration pattern and vibration intensity set in advance when an incoming call is received.
[0054]
The LED driving unit 41 includes a high-intensity white LED that constitutes the photographing light 21 and a driving circuit for the high-intensity white LED. If necessary, the LED driving unit 41 emits auxiliary light toward a subject to be photographed by the second camera unit 19. .
[0055]
Each of the main display unit 15 and the sub display unit 22 is composed of a reflective / transmissive color liquid crystal panel with a backlight, and can be displayed as a transmissive liquid crystal by turning on the backlight. Although the brightness is somewhat reduced, it is also possible to display as a reflective liquid crystal using external light with the backlight turned off.
[0056]
Although not shown, the hinge portion 11 has a mechanism for detecting the expanded state and the folded state of the upper housing 12 and the lower housing 13. In accordance with information from the detection mechanism, the control unit 35 operates the camera key 17a of the various keys 17 to shift to take a picture by using the first camera unit 16 and the second camera unit 19 to make a call. In FIG. 1A, the user of the cellular phone 10 identifies himself / herself with the first camera unit 16 and the second camera unit 19 with the upper housing 12 and the lower housing 13 opened. Therefore, it is determined that the other person is going to be photographed at the same time, the display on the sub display unit 22 is stopped, and the monitor image based on the photographing by the first camera unit 16 and the second camera unit 19 is displayed on the main display unit 15. Let
[0057]
Next, the operation of the above embodiment will be described.
FIG. 3 shows the contents of the call processing in the video phone mode that is basically executed by the control unit 35 based on the operation program fixedly stored in advance, and is received from the mobile phone of the other party having the same function. It is assumed that the reproduction of the moving image data is executed in parallel with this processing, and the description thereof is omitted here, and only the process from the acquisition of the moving image data to the transmission is described.
[0058]
Initially, the first microphone 18 and the second microphone 20 are used to record sound in both directions of the inner surface and the outer surface of the upper housing 12 (referred to as “voice A, voice B” in the figure) (step A01). The system waits for the timing of shooting individual images constituting the moving image (step A02).
[0059]
In this case, for example, assuming that the frame rate of a moving image is 15 [frames / second], the resolution is 160 dots wide × 120 dots long, and the audio sampling frequency is 12 [KHz], the audio signal is sampled at 12 [KHz]. During execution, the shooting timing is determined every 1/15 [seconds], and this is determined in step A02, and images of both the inner surface and the outer surface of the upper housing 12 by the first camera unit 16 and the second camera unit 19 (see FIG. Then, “image A, image B”) is photographed (step A03).
[0060]
At the same time, the sound pressure levels of the two audio data obtained by the recording are compared to determine which is greater (step A04).
[0061]
If the sound pressure level of the sound data obtained by the first microphone 18 is equal to or higher than the sound pressure level of the sound data obtained by the second microphone 20, the first camera unit 16 obtains an image corresponding to the sound. Then, a picture-in-picture composite image in which a part of the image data, for example, the image data obtained by the second camera unit 19 is inserted in the lower right is created (step A05).
[0062]
As this composite image, the image data having the resolution of 160 dots horizontally × 120 dots vertically obtained by the second camera unit 19 as described above is thinned out for each dot both vertically and horizontally, and image data of 80 dots wide × 60 dots vertically is obtained. It can be easily created by replacing it with a portion in the range over the lower right quarter of the image data having the resolution of 160 horizontal dots × 120 vertical dots obtained by the first camera unit 16.
[0063]
If it is determined in step A04 that the sound pressure level of the sound data obtained by the second microphone 20 is higher than the sound pressure level of the sound data obtained by the first microphone 18, the sound is handled. A picture-in-picture composite image in which a part of the image data obtained by the second camera unit 19 as an image to be processed, for example, the image data obtained by the first camera unit 16 is inserted in the lower right is created (step A06). ).
[0064]
In this way, the composite image data obtained in step A05 or A06 is used, and two audio data for 1/15 [seconds] acquired between the previous image shooting and the current image shooting are superimposed to obtain a predetermined value. The data is formatted (step A07), transmitted to the other party of the call (step A08), and the series of processing in units of image data is completed. Then, the processing returns to the processing from step A01. The above process is repeated until the call is finished.
[0065]
FIG. 4 illustrates the concept of audio data corresponding to the image data that has undergone the processing in step A05 and that has been formatted in a predetermined format in step A07. As the image data, a part of the image data obtained by the first camera unit 16, for example, a picture-in-picture composite image in which the image data obtained by the second camera unit 19 is inserted in the lower right is arranged. On the other hand, as the audio data, the audio data obtained by the first microphone 18 and the audio data obtained by the second microphone 20 are superposed.
[0066]
When the moving image data is actually transmitted after being compressed in accordance with, for example, the MPEG4 standard, the moving image processing unit 38 performs processing such as GOP (Group Of Pictures) and motion compensation on the image data of a plurality of frames. Therefore, the concept is different from the data format in which audio data corresponding to the time is added to the image data of one frame as shown in FIG.
[0067]
In this way, for example, by generating a picture-in-picture composite image in which the image corresponding to the side with the higher sound pressure level of the audio signal is enlarged and the image corresponding to the side with the lower sound pressure level is reduced. It is possible to generate a composite image in a natural form, for example, by enlarging an image on the speaking side of a single mobile phone 10 while simultaneously photographing two persons.
[0068]
(Second Embodiment)
A second embodiment when the present invention is applied to a CDMA mobile phone with a camera function will be described below with reference to the drawings.
[0069]
The external configuration is basically the same as that in FIG. 1 and the circuit configuration is the same as in FIG. 2, and therefore, the same reference numerals are used for the same parts, and illustration and description thereof are omitted.
[0070]
Next, the operation of the above embodiment will be described.
FIG. 5 shows the contents of the call processing in the video phone mode that is basically executed by the control unit 35 based on the operation program fixedly stored in advance, and is received from the mobile phone of the other party having the same function. It is assumed that the reproduction of the moving image data is executed in parallel with this processing, and the description thereof is omitted here, and only the process from the acquisition of the moving image data to the transmission is described.
[0071]
Initially, the first microphone 18 and the second microphone 20 are used to record sound in both directions of the inner surface and the outer surface of the upper housing 12 (referred to as “voice A, voice B” in the figure) (step B01). The system waits for the timing of photographing individual images constituting the moving image (step B02).
[0072]
In this case, for example, assuming that the frame rate of a moving image is 15 [frames / second], the resolution is 160 dots wide × 120 dots long, and the audio sampling frequency is 12 [KHz], the audio signal is sampled at 12 [KHz]. During execution, the shooting timing is determined every 1/15 [seconds], and this is determined in step B02, and images of both the inner surface and the outer surface of the upper housing 12 by the first camera unit 16 and the second camera unit 19 (see FIG. Then, “image A, image B”) is photographed (step B03).
[0073]
At the same time, the sound pressure levels of the two audio data obtained by the recording are compared, and it is determined which one is greater (step B04).
[0074]
If the sound pressure level of the sound data obtained by the first microphone 18 is equal to or higher than the sound pressure level of the sound data obtained by the second microphone 20, the first camera unit 16 obtains an image corresponding to the sound. The selected image data is selected (step B05).
[0075]
As this selected image, the image data having the resolution of 160 horizontal pixels × 120 vertical pixels obtained by the first camera unit 16 as described above is used as it is.
[0076]
If it is determined in step B04 that the sound pressure level of the sound data obtained by the second microphone 20 is higher than the sound pressure level of the sound data obtained by the first microphone 18, the sound is handled. The image data obtained by the second camera unit 19 is selected as the image to be performed (step B06).
[0077]
In this way, the image data selected in step B05 or B06 is used, and two audio data for 1/15 [seconds] acquired between the previous image shooting and the current image shooting are superimposed to form a predetermined format. (Step B07), transmitted to the other party (step B08), the series of processing in units of image data is completed, and the process returns to the processing from step B01. The above process is repeated until the end of.
[0078]
FIG. 6 exemplifies the concept of audio data corresponding to the image data that has undergone the processing in step B05 and that has been formatted in a predetermined format in step B07. As the image data, only the image data obtained by the first camera unit 16 is used, while as the audio data, the audio data obtained by the first microphone 18 and the audio data obtained by the second microphone 20 are superimposed. It is arranged in the state.
[0079]
When the moving image data is actually transmitted after being compressed in accordance with, for example, the MPEG4 standard, the moving image processing unit 38 performs processing such as GOP (Group Of Pictures) and motion compensation on the image data of a plurality of frames. Therefore, the concept is different from the data format in which audio data corresponding to the time is added to the image data of one frame as shown in FIG.
[0080]
In this way, for example, by selecting the image side corresponding to the side where the sound pressure level of the audio signal is large at that time, one mobile phone 10 is simultaneously shooting two persons. It is possible to generate a natural image such as switching to the speaking side.
[0081]
(Third embodiment)
A third embodiment when the present invention is applied to a CDMA mobile phone with a camera function will be described below with reference to the drawings.
[0082]
The external configuration is basically the same as that in FIG. 1 and the circuit configuration is the same as in FIG. 2, and therefore, the same reference numerals are used for the same parts, and illustration and description thereof are omitted.
[0083]
Next, the operation of the above embodiment will be described.
FIG. 7 shows the contents of the call processing in the videophone mode that is basically executed by the control unit 35 based on the operation program fixedly stored in advance, and is received from the mobile phone of the other party having the same function. It is assumed that the reproduction of the moving image data is executed in parallel with this processing, and the description thereof is omitted here, and only the process from the acquisition of the moving image data to the transmission is described.
[0084]
Initially, the first microphone 18 and the second microphone 20 are used to record sound in both directions of the inner surface and the outer surface of the upper housing 12 (referred to as “voice A, voice B” in the figure) (step C01). It waits for the timing of shooting of individual images constituting the moving image (step C02).
[0085]
In this case, for example, assuming that the frame rate of a moving image is 15 [frames / second], the resolution is 160 dots wide × 120 dots long, and the audio sampling frequency is 12 [KHz], the audio signal is sampled at 12 [KHz]. During execution, the shooting timing is determined every 1/15 [second], and this is determined in step C02, and images of the inner surface and the outer surface of the upper housing 12 by the first camera unit 16 and the second camera unit 19 (see FIG. Then, “image A, image B”) is photographed (step C03).
[0086]
Then, using the image data obtained by the first camera unit 16 and the image data obtained by the second camera unit 19, a composite image in which these are arranged on the left and right is created (step C04).
[0087]
As the composite image, as described above, the image data having the resolution of horizontal 160 dots × vertical 120 dots obtained by the first camera unit 16 and the second camera unit 19 is thinned out for each dot only in the horizontal direction, and the horizontal 80 dots. X Image data of 120 dots vertically is generated, and the two images are simply combined horizontally so that the image data obtained by the first camera unit 16 is on the left and the image data obtained by the second camera unit 19 is on the right. The image data has a resolution of 160 dots × 120 dots vertically.
[0088]
In this way, the composite image data obtained in step C04 is used, and two audio data for 1/15 [seconds] acquired between the previous image shooting and the current image shooting are used as the left channel (Lch) and The right channel (Rch) is separated into a predetermined format (step C05), transmitted to the other party (step C06), and the series of processing in units of image data is completed. Returning to the process, the above process is repeated until the telephone call in the videophone mode is finished.
[0089]
FIG. 8 illustrates the concept of audio data corresponding to the image data that has undergone the processing in step C04 and that has been formatted in a predetermined format in step C05.
[0090]
As image data, the image data obtained by the first camera unit 16 is arranged on the left side, and the image data obtained by the second camera unit 19 is arranged on the right side and arranged as a single composite image. As the audio data, the audio data obtained by the first microphone 18 is arranged in the state of stereo audio so that the audio data obtained by the second microphone 20 becomes the right channel.
[0091]
Therefore, if the other party of the call is also reproduced using the equivalent mobile phone 10, a composite image showing the two persons is displayed on the main display unit 15 and, for example, mainly the first microphone 18. Corresponding to the positional relationship of the person who obtained the voice data, the voice data obtained in step 2 is emitted from the outer speaker 24 and the voice data obtained in the second microphone 20 is emitted by the first microphone 18. The speaker 14 and the external speaker 24 can be driven separately to emit different sounds.
[0092]
When the moving image data is actually transmitted after being compressed in accordance with, for example, the MPEG4 standard, the moving image processing unit 38 performs processing such as GOP (Group Of Pictures) and motion compensation on the image data of a plurality of frames. Therefore, the concept is different from the data format in which audio data corresponding to the time is added to the image data of one frame as shown in FIG.
[0093]
Thus, for example, by generating a composite image in which each person is arranged corresponding to the positional relationship in a state where a plurality of persons are photographed, a natural-shaped composite image can be generated.
[0094]
In addition, since the audio data obtained by the first microphone 18 and the second microphone 20 is transmitted in the state of being separated as a stereo signal, if the other party is also equipped with two speakers, each audio is transmitted. The sound image can be output separately, and the sound image can be output clearly and the moving image can be output with a very high quality so that the difference in the person who speaks can be inferred from the output position of the sound.
[0095]
In the first and second embodiments, the state of the immediate image is switched depending on the level of the sound pressure level of the two audio signals. However, the switching has been made since the previous switching. Switching may be performed in consideration of time and the difference in sound pressure level between the two audio signals.
[0096]
In that case, immediately after the previous image switching, if there is no difference in sound pressure level exceeding a certain threshold value, the image is not switched, and the contents of the threshold are changed as time elapses from switching. Reduce in steps.
[0097]
By doing so, it can be avoided that the contents of the composite image are switched unnaturally in a short cycle and become difficult to see.
[0098]
Also, when switching the composite image in the first and second embodiments, the image obtained at that time is held for a certain amount of time by the buffer memory so that the user does not feel uncomfortable with the sound at the time of switching. An image that is traced back by a certain time lag, for example, 0.2 [seconds] may be extracted and used for output.
[0099]
This is because, for example, when an image is switched due to a change in the sound state, an image that is traced back in time is used instead of the point in time when the change in the sound state is detected. By outputting an image from the state immediately before the utterance of the person who started the speech, not from the state of the utterance, a visually very natural image can be provided.
[0100]
Further, in each of the first to third embodiments, the case where the signals generated from the two image signals and the two corresponding audio signals are immediately transmitted to the call partner has been described. However, it may be recorded once on a recording medium such as the memory card 39 and subjected to processing such as editing or reproduction, or may be used as an attached file of an e-mail. .
[0101]
Furthermore, the first to third embodiments described all describe the case where the present invention is applied to a mobile phone with a camera function. However, the present invention is not limited to this, and moving image shooting is possible. The present invention can be applied to a digital camera, a PDA with a camera function, a video camera device, a conference recording system, and the like.
[0102]
In addition, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the invention.
[0103]
Further, the above embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, at least one of the problems described in the column of the problem to be solved by the invention can be solved, and described in the column of the effect of the invention. In a case where at least one of the obtained effects can be obtained, a configuration in which this configuration requirement is deleted can be extracted as an invention.
[0104]
【The invention's effect】
According to the first aspect of the present invention, it is possible to generate and output a synthesized image in a natural form, for example, by changing the size of the corresponding image according to the sound pressure level of the audio signal at that time, and combining the images. it can.
[0105]
According to the second aspect of the present invention, for example, by selecting and outputting an image corresponding to the side with the higher sound pressure level of the audio signal at that time, a remark in the state where a plurality of persons are being photographed It is possible to generate and output a natural composite image, such as switching a person's image according to the situation.
[0106]
According to the third aspect of the present invention, for example, when a plurality of persons are photographed, a combined image in which each person is arranged corresponding to the positional relationship is generated and output. Can be generated and output.
[0107]
According to the invention described in claim 4, in addition to the effect of the invention described in claim 1, the image corresponding to the side where the sound pressure level of the audio signal is large at that time is enlarged and the image corresponding to the side where the sound pressure level is low is supported. By generating and outputting a composite image of a picture-in-picture with a reduced size, it is possible to create a natural image so that the image of the speaker within it is larger than the others while shooting multiple people. A composite image can be generated and output.
[0108]
According to the fifth aspect of the present invention, in addition to the effect of the third aspect of the present invention, in addition to the effects of the third aspect of the invention, the audio signal is also stereoized and output in accordance with the positional relationship of the plurality of image signals. By outputting sound, more natural video shooting content can be output.
[0109]
According to the invention described in claim 6, in addition to the effect of the invention described in any one of claims 1 to 3, it is possible to generate and output very natural moving image data combined with sound.
[0110]
According to the seventh aspect of the invention, in addition to the effect of the sixth aspect of the invention, since stereo sound corresponding to the image can be output, a very natural and high quality moving image with a clear localization of the sound image. Data can be generated and output.
[0111]
According to the invention described in claim 8, in addition to the effect of the invention described in claim 1 or 2, an image that goes back a certain time in consideration of a time lag when outputting an image according to comparison of a plurality of audio signals. Therefore, it is possible to more naturally generate and output, for example, an original image of a person who is a subject to be photographed with simple processing.
[0112]
According to the ninth aspect of the invention, in addition to the effects of the first to eighth aspects of the invention, an image obtained by photographing is recorded on the recording medium. It can be used for processing.
[0113]
According to the invention described in claim 10, in addition to the effect of the invention described in any one of claims 1 to 8, it can be applied not only to, for example, transmission of an e-mail attached with an image, but also to the same function. Communication with other devices can be applied to communication using mutual images in real time, such as a videophone.
[0114]
According to the eleventh aspect of the invention, it is possible to generate and output a synthesized image in a natural form, for example, by changing the size of the corresponding image in accordance with the sound pressure level of the audio signal at that time. it can.
[0115]
According to the twelfth aspect of the invention, it is possible to generate and output a synthetic image having a natural shape, for example, by changing the size of the corresponding image in accordance with the sound pressure level of the audio signal at that time. it can.
[Brief description of the drawings]
FIG. 1 is a diagram showing an external configuration of a mobile phone according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing a functional configuration of an electronic circuit of the mobile phone according to the embodiment;
FIG. 3 is a flowchart showing the processing content when acquiring moving image data according to the embodiment;
FIG. 4 is a diagram illustrating a data configuration generated per unit time according to the embodiment;
FIG. 5 is a flowchart showing the processing contents when moving image data is acquired according to the second embodiment of the present invention.
FIG. 6 is a diagram illustrating a data configuration generated per unit time according to the embodiment;
FIG. 7 is a flowchart showing the processing contents when acquiring moving image data according to the third embodiment of the present invention;
FIG. 8 is a diagram illustrating a data configuration generated per unit time according to the embodiment;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Mobile phone, 11 ... Hinge part, 12 ... Upper housing | casing, 13 ... Lower housing | casing, 14 ... Speaker, 15 ... Main display part, 16 ... 1st camera part, 17 ... Various keys, 18 ... 1st microphone, DESCRIPTION OF SYMBOLS 19 ... 2nd camera part, 20 ... 2nd microphone, 21 ... Shooting light, 22 ... Sub display part, 23 ... Antenna, 24 ... External speaker, 31 ... RF part, 32 ... Modulation / demodulation part, 33 ... CDMA part, 34 ... Audio processing unit, 35 ... control unit, 36 ... GPS receiver, 37 ... image capturing unit, 38 ... moving image processing unit, 39 ... memory card, 40 ... vibrator unit, 41 ... LED drive unit, 42 ... GPS antenna, 43 ... optical Lens system, 44 ... CCD, 45 ... Optical lens system, 46 ... CCD.

Claims

A plurality of imaging means;
A plurality of voice input means provided corresponding to each of the plurality of imaging means;
Shooting driving means for simultaneously driving the plurality of imaging means and the plurality of audio input means to shoot a moving image;
Comparison means for comparing a plurality of sound signals obtained by driving a plurality of sound input means by the photographing driving means,
Image synthesizing means for synthesizing the image signals obtained by the plurality of imaging means based on the comparison result obtained by the comparing means;
An image pickup apparatus comprising: output means for outputting an image signal obtained by the image composition means.

A plurality of imaging means;
A plurality of voice input means provided corresponding to each of the plurality of imaging means;
Shooting driving means for simultaneously driving the plurality of imaging means and the plurality of audio input means to shoot a moving image;
Comparison means for comparing a plurality of sound signals obtained by driving a plurality of sound input means by the photographing driving means,
Image selecting means for selecting one of the image signals obtained by the plurality of imaging means based on the comparison result obtained by the comparing means;
An image pickup apparatus comprising: output means for outputting an image signal obtained by the image selection means.

A plurality of imaging means;
A plurality of voice input means provided corresponding to each of the plurality of imaging means;
Shooting driving means for simultaneously driving the plurality of imaging means and the plurality of audio input means to shoot a moving image;
Image combining means for combining the image signals obtained by the plurality of imaging means to reflect the positional relationship of the plurality of imaging means;
An image pickup apparatus comprising: output means for outputting an image signal obtained by the image composition means.

2. The synthesizing unit generates an image signal obtained by inserting and synthesizing an image corresponding to an audio signal having a lower sound pressure level into an image corresponding to an audio signal having a higher sound pressure level. The imaging device described.

The output means outputs a plurality of audio signals obtained by the plurality of audio input means together with the image signal in a state of being separated by reflecting a positional relationship between a plurality of corresponding imaging means. Item 4. The imaging device according to Item 3.

The imaging apparatus according to claim 1, wherein the output unit outputs a plurality of audio signals obtained by the plurality of audio input units together with the image signal.

The imaging apparatus according to claim 6, wherein the output unit outputs the plurality of audio signals in a separated state.

The image pickup apparatus according to claim 1, wherein the output unit outputs an image signal that goes back in time in consideration of a certain time lag.

The image pickup apparatus according to claim 1, wherein the output unit records at least an image signal on a recording medium.

The imaging apparatus according to claim 1, wherein the output unit transmits at least an image signal to a communication medium.

A shooting driving step of simultaneously driving a plurality of imaging units and a plurality of audio input units provided corresponding to each of the plurality of imaging units to shoot a moving image;
A comparison step of comparing a plurality of audio signals obtained by driving a plurality of audio input units in this photographing driving step;
An image synthesis step of synthesizing the image signals obtained by the plurality of imaging units based on the comparison result obtained in the comparison step;
And an output step of outputting an image signal obtained in the image composition step.

A program executed by a computer built in an imaging apparatus provided with a plurality of imaging units and a plurality of audio input units provided corresponding to each of the plurality of imaging units,
A shooting driving step of simultaneously driving the plurality of imaging units and the plurality of audio input units to shoot a moving image;
A comparison step for comparing a plurality of audio signals obtained by driving a plurality of audio input units in this photographing driving step;
An image synthesis step for synthesizing the image signals obtained by the plurality of imaging units based on the comparison result obtained in the comparison step;
A program for causing a computer to execute an output step of outputting an image signal obtained in the image synthesis step.