JP6167483B2

JP6167483B2 - Reproduction processing device, imaging device, and reproduction processing program

Info

Publication number: JP6167483B2
Application number: JP2012174997A
Authority: JP
Inventors: 麻理杉原
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2012-08-07
Filing date: 2012-08-07
Publication date: 2017-07-26
Anticipated expiration: 2032-08-07
Also published as: JP2014036257A

Description

本発明は、再生処理装置、撮像装置および再生処理プログラムに関する。 The present invention relates to a reproduction processing device, an imaging device, and a reproduction processing program.

従来、複数のマイクロホンを用いて録音した音声からその音声の音源の方向を取得することができる。 Conventionally, the direction of the sound source of the sound can be acquired from the sound recorded using a plurality of microphones.

例えば、複数のマイクロホンを有するデジタルカメラが、静止画像や動画を撮像するとともに音声を録音して、音声の音源ごとの方向および画像における位置を推定し、各音源の方向別音声データおよび位置データを生成して画像データに対応付けて記録する技術がある（特許文献１等参照）。 For example, a digital camera having a plurality of microphones captures still images and moving images and records audio, estimates the direction of each sound source and the position in the image, and obtains sound data and position data for each sound source by direction. There is a technique of generating and recording in association with image data (see Patent Document 1).

特開２００９−２３９３４８号公報JP 2009-239348 A

しかしながら、従来技術では、画像を再生表示すると同時に、単に全ての音源または音源ごとの音声データを再生するだけで、画像のシーンや撮影者または被写体の視点等に応じて、各音源の音声データに効果や演出を施して再生することはできない。 However, according to the conventional technology, at the same time as reproducing and displaying an image, the sound data of each sound source is simply reproduced by reproducing all the sound sources or sound data for each sound source according to the scene of the image, the photographer or the viewpoint of the subject, etc. It cannot be played back with effects and effects.

上記従来技術が有する問題に鑑み、本発明の目的は、音源ごとの音声データに効果や演出を施して再生することができる技術を提供することにある。 In view of the problems of the above-described conventional technology, an object of the present invention is to provide a technology capable of reproducing audio data for each sound source with effects and effects.

本発明を例示する再生処理装置の一態様は、画像データと画像データ生成処理時に録音された音声データとを読み込む入力部と、画像データより主要被写体を検出する被写体検出部と、検出された主要被写体までの距離に関する情報を画像データより取得し、距離に関する情報と、音源の方向と、音源が画角内か否かに基づいて、再生時の音声データの音量を制御する再生制御データを生成する制御部と、を備える。 An aspect of a reproduction processing apparatus illustrating the present invention includes an input unit that reads image data and audio data recorded during image data generation processing, a subject detection unit that detects a main subject from the image data, and a detected main component Obtains information about the distance to the subject from the image data, and generates playback control data that controls the volume of the audio data during playback based on the information about the distance , the direction of the sound source, and whether the sound source is within the angle of view. A control unit.

また、制御部は、音声データから少なくとも１つの音声の音源の方向を求めて音源の音声データを生成し、音源の方向、距離に関する情報および音源が画角内か否かに基づいて、音源に対する重み付けをして再生制御データを生成してもよい。 In addition, the control unit obtains the sound source sound data by obtaining the direction of at least one sound source from the sound data, and determines the sound source sound data based on the sound source direction and distance information and whether the sound source is within the angle of view. The reproduction control data may be generated by weighting.

また、制御部は、音声データから複数の音源それぞれの方向を求めて各音源の音声データを生成し、各音源の方向、距離に関する情報および音源が画角内か否かに基づいて、各音源に対する重み付けをして再生制御データを生成してもよい。 In addition, the control unit determines the direction of each of the plurality of sound sources from the sound data, generates sound data of each sound source, and determines each sound source based on information on the direction and distance of each sound source and whether the sound source is within the angle of view. The reproduction control data may be generated by weighting.

また、画像データのシーンを判定するシーン判定部を備え、制御部は、音源の方向、距離に関する情報および音源が画角内か否かとともに、シーン判定部の判定結果に基づいて、音源に対する重み付けをしてもよい。 The image processing apparatus further includes a scene determination unit that determines a scene of the image data, and the control unit weights the sound source based on information on the direction and distance of the sound source and whether or not the sound source is within the angle of view and the determination result of the scene determination unit. You may do.

また、画像データは、時系列に連続して撮像された複数のフレームからなり、制御部は、画像データを撮像する際の撮像装置のパンニングに伴う動きを検出し、フレームの動きベクトルとして算出する動きベクトル算出部をさらに備え、制御部は、音源の方向、距離に関する情報および音源が画角内か否かとともに、各フレームの動きベクトルに基づいて、各フレームにおける音源に対する重み付けをしてもよい
また、画像データは、時系列に連続して撮像された複数のフレームからなり、制御部は、主要被写体の動きを検出し、主要被写体の動きベクトルを算出する動きベクトル算出部をさらに備え、制御部は、音源の方向、距離に関する情報および音源が画角内か否かとともに、主要被写体の動きベクトルに基づいて、各フレームにおける音源に対する重み付けをしてもよい。 The image data includes a plurality of frames that are continuously captured in time series, and the control unit detects a motion associated with panning of the imaging device when capturing the image data, and calculates the motion vector of the frame. The control unit may further include a motion vector calculation unit, and the control unit may weight the sound source in each frame based on the information on the direction and distance of the sound source and whether the sound source is within the angle of view and the motion vector of each frame. The image data includes a plurality of frames that are continuously captured in time series, and the control unit further includes a motion vector calculation unit that detects a motion of the main subject and calculates a motion vector of the main subject. The information about the direction and distance of the sound source, whether the sound source is within the angle of view, and the motion vector of the main subject The sound source may be weighted.

また、制御部は、フレームにおける音源の重み付けを、隣接するフレームにおける音源の重み付けと加重平均してもよい。 In addition, the control unit may perform weighted averaging of the sound source in the frame and the weight of the sound source in the adjacent frame.

また、音声データを外部に音声で出力するスピーカ部を備え、制御部は、再生制御データに基づいて音声データの音量を制御してスピーカ部に音声を出力させてもよい。 In addition, a speaker unit that outputs voice data to the outside may be provided, and the control unit may control the volume of the voice data based on the reproduction control data and output the voice to the speaker unit.

また、画像データを表示する表示部を備えてもよい。 Moreover, you may provide the display part which displays image data.

本発明を例示する撮像装置の一態様は、被写界を撮像して画像データを生成する撮像部と、音声を受信して音声データを生成するマイクロホン部と、本発明の再生処理装置と、を備える。 An aspect of an imaging device illustrating the present invention includes an imaging unit that captures an object scene and generates image data, a microphone unit that receives audio and generates audio data, and a reproduction processing device of the present invention. Is provided.

本発明を例示する再生処理プログラムの一態様は、画像データと画像データ生成処理時に録音された音声データとを読み込む入力手順、画像データより主要被写体を検出する被写体検出手順、検出された主要被写体までの距離に関する情報を画像データより取得し、距離に関する情報と、音源の方向と、音源が画角内か否かに基づいて、再生時の音声データの音量を制御する再生制御データを生成する制御手順、をコンピュータに実行させる。 An aspect of a reproduction processing program illustrating the present invention includes an input procedure for reading image data and audio data recorded during the image data generation process, a subject detection procedure for detecting a main subject from the image data, and a detected main subject. Control to generate playback control data that controls the volume of audio data during playback based on distance information , the direction of the sound source, and whether or not the sound source is within the angle of view Make the computer execute the procedure.

本発明によれば、音源ごとの音声データに効果や演出を施して再生することができる。 According to the present invention, audio data for each sound source can be reproduced with effects and effects.

本発明の一の実施形態に係るコンピュータの構成を示す図The figure which shows the structure of the computer which concerns on one Embodiment of this invention. 一の実施形態に係るコンピュータによる再生処理を示すフローチャートThe flowchart which shows the reproduction | regeneration processing by the computer which concerns on one Embodiment 被写界の一例を示す図A figure showing an example of the object scene 本発明の他の実施形態に係るコンピュータにおけるＣＰＵの構成を示す図The figure which shows the structure of CPU in the computer which concerns on other embodiment of this invention. 他の実施形態に係るコンピュータによる再生処理を示すフローチャートThe flowchart which shows the reproduction processing by the computer which concerns on other embodiment. 動画の連続するフレームの一例を示す図Diagram showing an example of continuous frames of a video フレームの動きベクトルの一例を示す図Diagram showing an example of a frame motion vector 再生モードに応じて設定される重み係数の一覧を示す図The figure which shows the list of the weighting factor which is set according to the playback mode 主要被写体の動きベクトルの一例を示す図The figure which shows an example of the motion vector of the main subject

《一の実施形態》
図１は、本発明の一の実施形態に係る再生処理装置として動作させるコンピュータ１００の構成を示す図である。 << One Embodiment >>
FIG. 1 is a diagram showing a configuration of a computer 100 that is operated as a reproduction processing apparatus according to an embodiment of the present invention.

図１（ａ）に示すコンピュータ１００は、ＣＰＵ１０、記憶部１１、入出力インタフェース（入出力Ｉ／Ｆ）１２およびバス１３から構成される。ＣＰＵ１０、記憶部１１および入出力Ｉ／Ｆ１２は、バス１３を介して情報伝達可能に接続される。また、コンピュータ１００には、入出力Ｉ／Ｆ１２を介して、画像処理の途中経過や処理結果を表示する出力装置１４、ユーザからの入力を受け付ける入力装置１５がそれぞれ接続される。出力装置１４には、一般的な液晶モニタやプリンタ等を用いることができ、入力装置１５には、キーボードやマウス等をそれぞれ適宜選択して使用できる。なお、本実施形態のコンピュータ１００は、出力装置１４として、スピーカも有するものとする。また、入出力ＩＦ１２は、出力装置１４および入力装置１５とともに、不図示のデジタルカメラをUniversal Serial Bus（ＵＳＢ）ケーブルなどで接続する接続部や、デジタルカメラ（不図示）に装着されたメモリカード（不図示）を差し込む差し込み口を有するものとする。 A computer 100 shown in FIG. 1A includes a CPU 10, a storage unit 11, an input / output interface (input / output I / F) 12, and a bus 13. The CPU 10, the storage unit 11, and the input / output I / F 12 are connected via the bus 13 so that information can be transmitted. The computer 100 is connected to an output device 14 for displaying the progress of image processing and processing results, and an input device 15 for receiving input from the user, via the input / output I / F 12. As the output device 14, a general liquid crystal monitor, a printer, or the like can be used. As the input device 15, a keyboard, a mouse, or the like can be appropriately selected and used. Note that the computer 100 of the present embodiment also has a speaker as the output device 14. The input / output IF 12, together with the output device 14 and the input device 15, connects a digital camera (not shown) with a universal serial bus (USB) cable or a memory card (not shown) attached to the digital camera (not shown). It shall have a slot for inserting (not shown).

本実施形態のコンピュータ１００が処理対象とする画像は、不図示のデジタルカメラ等で撮像され、その撮像時に被写界周辺の音声をステレオ録音された音声データがヘッダ領域に付加された静止画像や動画である。なお、その静止画像や動画のヘッダ領域には、撮像時の露出条件（焦点距離、絞り値、シャッタ速度、ＩＳＯ値等）とともに、少なくとも静止画像や動画に撮像された全ての主要被写体の被写体距離Ｄの情報を有するものとする。 An image to be processed by the computer 100 of the present embodiment is captured by a digital camera (not shown) or the like, and a still image in which audio data obtained by stereo-recording audio around the scene at the time of imaging is added to the header area or It is a video. In the still image or moving image header area, the subject distances of at least all main subjects imaged in the still image or moving image, together with the exposure conditions (focal length, aperture value, shutter speed, ISO value, etc.) at the time of image capturing are included. Assume that the information of D is included.

ＣＰＵ１０は、コンピュータ１００の各部を統括的に制御するプロセッサである。例えば、ＣＰＵ１０は、入力装置１５で受け付けたユーザからの指示に基づいて、記憶部１１に記憶されている再生処理プログラムを読み込む。ＣＰＵ１０は、その再生処理プログラムを実行することにより、被写体検出部２０およびシーン判定部２１として動作し（図１（ｂ））、処理対象の画像に対して再生処理を行う。ＣＰＵ１０は、静止画像や動画を出力装置１４の液晶モニタに再生表示するとともに、付加された音声データの音声をスピーカに出力する。 The CPU 10 is a processor that comprehensively controls each unit of the computer 100. For example, the CPU 10 reads a reproduction processing program stored in the storage unit 11 based on an instruction from the user received by the input device 15. The CPU 10 operates as the subject detection unit 20 and the scene determination unit 21 by executing the reproduction processing program (FIG. 1B), and performs reproduction processing on the processing target image. The CPU 10 reproduces and displays still images and moving images on the liquid crystal monitor of the output device 14 and outputs the sound of the added sound data to the speaker.

被写体検出部２０は、処理対象の静止画像や動画の各フレームに対して被写体検出処理を施し、人物の顔領域や、建物や自動車等の画像領域を検出する。この被写体検出処理は公知のアルゴリズムによって行われる。例えば、被写体検出部２０は、記憶部１１に記憶された、人物や建物等の様々なパターンのテンプレートを用いて、静止画像やフレームに対しパターンマッチング処理等を施して主要被写体の画像領域を検出する。ＣＰＵ１０は、検出された主要被写体の画像領域の大きさや位置等を被写体情報として取得する。 The subject detection unit 20 performs subject detection processing on each frame of a still image or moving image to be processed, and detects a human face region or an image region such as a building or a car. This subject detection process is performed by a known algorithm. For example, the subject detection unit 20 uses a template of various patterns such as a person or a building stored in the storage unit 11 to perform pattern matching processing on a still image or a frame to detect an image area of the main subject. To do. The CPU 10 acquires the size and position of the detected image area of the main subject as subject information.

シーン判定部２１は、公知の手法を用いて静止画像やフレームに撮像された被写界のシーンを判定する。なお、シーン判定部２１は、読み込んだ静止画像や動画のヘッダ領域に撮像時のシーンモード情報を有する場合、そのシーンモード情報に基づいてシーンを判定する。 The scene determination unit 21 determines a scene in the scene captured in a still image or a frame using a known method. In addition, when the scene determination unit 21 has scene mode information at the time of imaging in the header area of the read still image or moving image, the scene determination unit 21 determines a scene based on the scene mode information.

記憶部１１は、制御プログラムや再生処理プログラム等、およびデジタルカメラ（不図示）から読み込んだ画像を記録する。記憶部１１に記憶されるプログラムや画像等は、バス１３を介して、ＣＰＵ１０から適宜参照することができる。記憶部１１には、一般的なハードディスク装置、光磁気ディスク装置等の記憶装置を選択して用いることができる。なお、記憶部１１は、コンピュータ１００に組み込まれるとしたが、外付けの記憶装置でもよい。この場合、記憶部１１は、入出力Ｉ／Ｆ１２を介してコンピュータ１００に接続される。 The storage unit 11 records a control program, a reproduction processing program, and the like, and an image read from a digital camera (not shown). Programs, images, and the like stored in the storage unit 11 can be appropriately referred to from the CPU 10 via the bus 13. A storage device such as a general hard disk device or magneto-optical disk device can be selected and used for the storage unit 11. Although the storage unit 11 is incorporated in the computer 100, it may be an external storage device. In this case, the storage unit 11 is connected to the computer 100 via the input / output I / F 12.

次に、図２のフローチャートを参照しつつ、本実施形態のコンピュータ１００による再生処理について説明する。なお、本実施形態では、処理対象とする画像を、図３（ａ）に示すような人物３０を含む被写界４０が撮像された静止画像とする。ただし、図３（ｂ）は、上記静止画像の撮像後の被写界４０を示し、自動車３２は、撮像時点では被写界４０の外側であるが、人物３０に接近しているものとする。 Next, playback processing by the computer 100 of the present embodiment will be described with reference to the flowchart of FIG. In this embodiment, the image to be processed is a still image in which the object scene 40 including the person 30 as shown in FIG. However, FIG. 3B shows the object scene 40 after the still image is captured, and the car 32 is outside the object scene 40 at the time of image capturing, but is approaching the person 30. .

ユーザは、入力装置１５を用いて、再生処理プログラムのコマンドを入力、または出力装置１４に表示されたそのプログラムのアイコンをダブルクリック等することにより、再生処理プログラムの起動をＣＰＵ１０に指示する。ＣＰＵ１０は、その指示を入出力Ｉ／Ｆ１２を介して受け付け、記憶部１１に記憶されている再生処理プログラムを読み込み実行する。ＣＰＵ１０は、ステップＳ１０１からの処理を開始する。 The user uses the input device 15 to input a command for the playback processing program or double-click the program icon displayed on the output device 14 to instruct the CPU 10 to start the playback processing program. The CPU 10 receives the instruction via the input / output I / F 12 and reads and executes the reproduction processing program stored in the storage unit 11. CPU10 starts the process from step S101.

ステップＳ１０１：ＣＰＵ１０は、入力装置１４を介して、ユーザにより指定された静止画像を、記憶部１１または入出力Ｉ／Ｆ１２を介してデジタルカメラ（不図示）から読み込む。 Step S101: The CPU 10 reads a still image designated by the user via the input device 14 from a digital camera (not shown) via the storage unit 11 or the input / output I / F 12.

ステップＳ１０２：被写体検出部２０は、読み込んだ静止画像から人物３０の顔領域や建物３１の画像領域を検出する。ＣＰＵ１０は、検出された各主要被写体の画像領域の大きさや位置等を、静止画像のヘッダ領域に付加された各主要被写体の被写体距離Ｄと合わせて被写体情報として取得する。 Step S102: The subject detection unit 20 detects the face area of the person 30 and the image area of the building 31 from the read still image. The CPU 10 acquires, as subject information, the size and position of the detected image area of each main subject together with the subject distance D of each main subject added to the header area of the still image.

ステップＳ１０３：ＣＰＵ１０は、特許文献１等の公知の手法を用いて、静止画像に付加されたステレオ録音の音声データから、その音声データに含まれる音声それぞれの音源の方向を求め、音源ごとの音声信号を抽出し各音源の音声データを生成する。本実施形態では、音源として、人物３０の音声、建物３１から流れる音楽、および接近する自動車３２のエンジン音の音源があり、ＣＰＵ１０は、それらの音源の音声データを生成するものとする。また、ＣＰＵ１０は、それらの音源が静止画像上のどこに位置するかを、被写体情報および各音源の方向に基づいて特定し、各音源の方向と特定された位置とからなる位置情報を生成する。ＣＰＵ１０は、音源ごとの音声データおよび位置情報を、上記被写体情報と対応付けて不図示の内部メモリに記録する。 Step S103: The CPU 10 obtains the direction of the sound source of each sound included in the sound data from the sound data of the stereo recording added to the still image using a known method such as Patent Document 1, and the sound for each sound source. The signal is extracted to generate audio data for each sound source. In this embodiment, there are sound sources of the sound of the person 30, music flowing from the building 31, and engine sound of the approaching automobile 32 as the sound source, and the CPU 10 generates sound data of these sound sources. Further, the CPU 10 specifies where the sound sources are located on the still image based on the subject information and the direction of each sound source, and generates position information including the direction of each sound source and the specified position. The CPU 10 records audio data and position information for each sound source in an internal memory (not shown) in association with the subject information.

なお、図３（ａ）に示すように、自動車３２は、静止画像の画角内にないことから、自動車３２の被写体情報の画像領域の大きさや位置、被写体距離Ｄ（ｉ）は、画角外を示す所定の値または無限遠等に設定されているものとする。 As shown in FIG. 3A, since the automobile 32 is not within the angle of view of the still image, the size and position of the image area of the subject information of the automobile 32 and the subject distance D (i) are the angle of view. It is assumed that it is set to a predetermined value indicating outside, infinity, or the like.

ステップＳ１０４：ＣＰＵ１０は、各音源の音声データを再生する際の音量を、後述する静止画像に撮像されたシーンに応じて設定するために、各音源が静止画像の画角内に存在するか否かを、静止画像の画角と各音源の位置情報とに基づいて判定する。ＣＰＵ１０は、音源が画角内の場合、その音源のフラグUseFlag（ｉ）を１に設定する。一方、ＣＰＵ１０は、音源が画角外の場合、フラグUseFlag（ｉ）を０と設定する。ここで、係数ｉは各音源を示し、本実施形態では、人物３０をｉ＝０、建物３１をｉ＝１、自動車３２をｉ＝２とする。 Step S104: The CPU 10 determines whether or not each sound source exists within the angle of view of the still image in order to set the sound volume at the time of reproducing the sound data of each sound source according to the scene captured in the still image described later. Is determined based on the angle of view of the still image and the position information of each sound source. When the sound source is within the angle of view, the CPU 10 sets a flag UseFlag (i) of the sound source to 1. On the other hand, when the sound source is outside the angle of view, the CPU 10 sets the flag UseFlag (i) to 0. Here, the coefficient i indicates each sound source, and in this embodiment, the person 30 is i = 0, the building 31 is i = 1, and the car 32 is i = 2.

ステップＳ１０５：シーン判定部２１は、静止画像に撮像されたシーンを判定し、ＣＰＵ１０は、シーン判定部２１の判定結果に応じて、再生時の各音源の重み付けをする。なお、本実施形態のシーン判定部２１が判定するシーンは、「スナップ」、「ポートレート」または「風景」のいずれかとする。それぞれのシーンにおけるＣＰＵ１０による各音源の重み付けの設定について説明する。
Ａ）「スナップ」の場合
ＣＰＵ１０は、再生時の各音源の重み付けを、静止画像の撮像時における合焦領域、例えば、人物３０の顔領域からの静止画像上の距離Ｌに応じて変化する重み係数ＷＴ（ｉ）＝α／Ｌとして設定する。係数αは所定の値が設定される。なお、音源が静止画像の画角外の場合、すなわち自動車３２の距離Ｌは、所定の値または無限遠等に設定されているものとする。
Ｂ）「ポートレート」の場合
ＣＰＵ１０は、人物からの音源の重み係数ＷＴ（ｉ）が最も大きな値になるように設定する。例えば、ＣＰＵ１０は、人物３０の重み係数ＷＴ（ｉ）を１に設定し、建物３１および自動車３２の重み係数ＷＴ（ｉ）を０．５に設定する。
Ｃ）「風景」の場合
ＣＰＵ１０は、人物以外からの音源の重み係数ＷＴ（ｉ）が大きな値となるように設定する。例えば、ＣＰＵ１０は、人物３０の重み係数ＷＴ（ｉ）を０．２に設定し、建物３１および自動車３２の重み係数ＷＴ（ｉ）を１に設定する。 Step S105: The scene determination unit 21 determines a scene captured in a still image, and the CPU 10 weights each sound source during reproduction according to the determination result of the scene determination unit 21. The scene determined by the scene determination unit 21 according to the present embodiment is any one of “snap”, “portrait”, and “landscape”. The setting of the weighting of each sound source by the CPU 10 in each scene will be described.
A) In the case of “snap” The CPU 10 weights each sound source at the time of reproduction according to the distance L on the still image from the in-focus area at the time of capturing the still image, for example, the face area of the person 30. The coefficient WT (i) is set as α / L. The coefficient α is set to a predetermined value. When the sound source is outside the angle of view of the still image, that is, the distance L of the automobile 32 is set to a predetermined value, infinity, or the like.
B) “Portrait” The CPU 10 sets the sound source weight coefficient WT (i) from the person to the largest value. For example, the CPU 10 sets the weighting factor WT (i) of the person 30 to 1, and sets the weighting factor WT (i) of the building 31 and the automobile 32 to 0.5.
C) In the case of “scenery” The CPU 10 sets the weight coefficient WT (i) of the sound source from a person other than the person to a large value. For example, the CPU 10 sets the weighting factor WT (i) of the person 30 to 0.2, and sets the weighting factor WT (i) of the building 31 and the automobile 32 to 1.

なお、各シーンにおける重み係数ＷＴ（ｉ）の値および設定方法は一例であり、他の値または他の設定方法で設定してもよい。例えば、「ポートレート」の場合、人物３０との距離に反比例するように、建物３１や自動車３２の重み係数ＷＴ（ｉ）が設定されてもよい。 Note that the value and setting method of the weighting factor WT (i) in each scene are examples, and may be set by other values or other setting methods. For example, in the case of “portrait”, the weight coefficient WT (i) of the building 31 or the car 32 may be set so as to be inversely proportional to the distance to the person 30.

ステップＳ１０６：ＣＰＵ１０は、ステップＳ１０５で設定された各音源の重み係数ＷＴ（ｉ）を用いて、再生時の各音源の音量を決める増幅率ＡＭＰ（ｉ）を次式（１）に基づいて算出し、再生制御データとして生成する。
ＡＭＰ（ｉ）＝UseFlag（ｉ）×ＷＴ（ｉ）／（β×Ｄ（ｉ））・・・（１）
ここで、係数βは、各主要被写体の被写体距離Ｄ（ｉ）を規格化する係数である。 Step S106: The CPU 10 calculates the amplification factor AMP (i) for determining the volume of each sound source at the time of reproduction based on the following equation (1) using the weight coefficient WT (i) of each sound source set in step S105. And generated as reproduction control data.
AMP (i) = UseFlag (i) × WT (i) / (β × D (i)) (1)
Here, the coefficient β is a coefficient that normalizes the subject distance D (i) of each main subject.

ステップＳ１０７：ＣＰＵ１０は、音源ごとに音声データと再生制御データの増幅率ＡＭＰ（ｉ）との積を計算して、各音源の再生用音声データを生成する。ＣＰＵ１０は、静止画像を出力装置１４の液晶モニタに再生表示するとともに、出力装置１４のスピーカに各音源の再生用音声データを音声として出力する。ＣＰＵ１０は、一連の処理を終了する。 Step S107: The CPU 10 calculates the product of the sound data and the amplification factor AMP (i) of the reproduction control data for each sound source, and generates sound data for reproduction of each sound source. The CPU 10 reproduces and displays the still image on the liquid crystal monitor of the output device 14 and outputs the sound data for reproduction of each sound source as sound to the speaker of the output device 14. The CPU 10 ends a series of processes.

なお、処理対象の画像が動画の場合、コンピュータ１００は、動画の各フレームに対し図２に示す再生処理を施す。すなわち、コンピュータ１００は、動画の全フレームに対してステップＳ１０２〜ステップＳ１０５の処理を施した後、ステップＳ１０６へ移行して、各フレームにおける音源ごとの増幅率ＡＭＰを算出し、再生制御データを生成する。 When the image to be processed is a moving image, the computer 100 performs the reproduction process shown in FIG. 2 on each frame of the moving image. That is, the computer 100 performs the processing from step S102 to step S105 on all the frames of the moving image, and then proceeds to step S106, calculates the amplification factor AMP for each sound source in each frame, and generates reproduction control data. To do.

このように、本実施形態では、撮像された画像のシーンに応じて各音源の重み係数を設定することにより、各音源の音声データに効果や演出を施して再生することができる。
《他の実施形態》
本発明の他の実施形態に係るコンピュータは、図１に示す一の実施形態に係るコンピュータ１００と同じであり、各構成要素についての詳細な説明は省略する。 As described above, in this embodiment, by setting the weighting coefficient of each sound source according to the scene of the captured image, it is possible to reproduce the sound data of each sound source with effects and effects.
<< Other embodiments >>
A computer according to another embodiment of the present invention is the same as the computer 100 according to one embodiment shown in FIG. 1, and a detailed description of each component will be omitted.

本実施形態のコンピュータ１００と一の実施形態のものとの相違点は、１）処理対象となる画像は動画のみであり、２）コンピュータ１００は、動画の再生において、撮影者視点か被写体視点かに応じて各音源の音声データの重み係数ＷＴを設定する再生モードを有する点にある。ここで、再生モードの撮影者視点モードとは、デジタルカメラ（不図示）の撮影者が聞くような音声で各音源の音声データを再生するモードであり、被写体視点モードとは、主要被写体の位置で聞こえるような音声で各音源の音声データを再生するモードである。また、処理対象となる画像は動画のみとなることに伴い、本実施形態のＣＰＵ１０は、再生プログラムを実行することにより、図４に示すように、被写体検出部２０とともに、動きベクトル算出部２２として動作する。 The difference between the computer 100 of this embodiment and that of the one embodiment is that 1) the image to be processed is only a moving image, and 2) the computer 100 is a photographer viewpoint or a subject viewpoint in reproducing a moving image. Accordingly, a reproduction mode for setting the weighting coefficient WT of the sound data of each sound source is provided. Here, the photographer viewpoint mode in the reproduction mode is a mode in which the sound data of each sound source is reproduced with sound heard by the photographer of the digital camera (not shown), and the subject viewpoint mode is the position of the main subject. In this mode, the sound data of each sound source is reproduced with sound that can be heard. Further, as the image to be processed is only a moving image, the CPU 10 of the present embodiment executes a reproduction program so that the motion vector calculation unit 22 is combined with the subject detection unit 20 as shown in FIG. Operate.

ベクトル算出部２２は、動画を撮像したデジタルカメラ（不図示）のパンニングに伴う動きをフレーム動きとして検出しフレームの動きベクトルを算出するとともに、主要被写体の動きを検出し主要被写体の動きベクトルを算出する。具体的には、動きベクトル算出部２２は、動画の隣接する２つのフレームに公知の相関処理を施す。動きベクトル算出部２２は、その相関結果に基づいて、例えば、被写体検出部２０により検出された主要被写体の画像領域を除いた背景の画像領域における２つのフレーム間のズレ量から、フレーム動きを検出しフレームの動きベクトルを算出する。一方、動きベクトル算出部２２は、背景の画像領域のズレ量と主要被写体の画像領域のズレ量とに基づいて、主要被写体の動きを検出しその主要被写体の動きベクトルを算出する。 The vector calculation unit 22 detects a motion associated with panning of a digital camera (not shown) that has captured a moving image as a frame motion, calculates a motion vector of the frame, detects a motion of the main subject, and calculates a motion vector of the main subject. To do. Specifically, the motion vector calculation unit 22 performs a known correlation process on two adjacent frames of the moving image. Based on the correlation result, for example, the motion vector calculation unit 22 detects the frame motion from the amount of deviation between two frames in the background image region excluding the main subject image region detected by the subject detection unit 20. The motion vector of the frame is calculated. On the other hand, the motion vector calculation unit 22 detects the movement of the main subject based on the amount of deviation of the background image area and the amount of deviation of the image area of the main subject, and calculates the motion vector of the main subject.

次に、図５のフローチャートを参照しつつ、本実施形態のコンピュータ１００の再生処理について説明する。なお、本実施形態での処理対象の動画は、図３に示す被写界４０を撮像したものである。図６（ａ）〜（ｃ）は、その動画のフレームのうち、連続する３フレームを一例として示す。すなわち、図６（ａ）〜（ｃ）は、ｋ番目、ｋ＋１番目およびｋ＋２番目のフレームを示す（ｋは自然数）。ただし、図６（ａ）、（ｂ）は、人物３０を追従するようにデジタルカメラ（不図示）をパンニングして撮像されたフレームとする。また、図６（ｃ）は、左側から現れた自動車３３が画像中心となるようにデジタルカメラ（不図示）をパンニングして撮像されたフレームとする。 Next, the reproduction process of the computer 100 of this embodiment will be described with reference to the flowchart of FIG. Note that the moving image to be processed in the present embodiment is an image of the scene 40 shown in FIG. FIGS. 6A to 6C show three consecutive frames as an example among the frames of the moving image. That is, FIGS. 6A to 6C show the k-th, k + 1-th, and k + 2-th frames (k is a natural number). However, FIGS. 6A and 6B are frames obtained by panning a digital camera (not shown) so as to follow the person 30. Further, FIG. 6C shows a frame obtained by panning a digital camera (not shown) so that the automobile 33 appearing from the left side becomes the center of the image.

ユーザは、入力装置１５を用いて、再生処理プログラムのコマンドを入力、または出力装置１４に表示されたそのプログラムのアイコンをダブルクリック等することにより、再生処理プログラムの起動をＣＰＵ１０に指示する。ＣＰＵ１０は、その指示を入出力Ｉ／Ｆ１２を介して受け付け、記憶部１１に記憶されている再生処理プログラムを読み込み実行する。ＣＰＵ１０は、ステップＳ２０１からの処理を開始する。 The user uses the input device 15 to input a command for the playback processing program or double-click the program icon displayed on the output device 14 to instruct the CPU 10 to start the playback processing program. The CPU 10 receives the instruction via the input / output I / F 12 and reads and executes the reproduction processing program stored in the storage unit 11. CPU10 starts the process from step S201.

ステップＳ２０１：ＣＰＵ１０は、入力装置１４を介して、ユーザにより指定された動画を、記憶部１１または入出力Ｉ／Ｆ１２を介してデジタルカメラ（不図示）から読み込む。なお、ＣＰＵ１０は、再生したい動画の指定とともに、再生モードの指定も受け付けることが好ましい。 Step S201: The CPU 10 reads a moving image designated by the user via the input device 14 from a digital camera (not shown) via the storage unit 11 or the input / output I / F 12. Note that the CPU 10 preferably accepts designation of a reproduction mode along with designation of a moving image to be reproduced.

ステップＳ２０２：被写体検出部２０は、読み込んだ動画の各フレームから人物３０等の画像領域を検出する。ＣＰＵ１０は、検出された各主要被写体の画像領域の大きさや位置等を、動画のヘッダ領域に付加された各主要被写体の被写体距離Ｄと合わせて被写体情報として取得する。 Step S202: The subject detection unit 20 detects an image region such as the person 30 from each frame of the read moving image. The CPU 10 acquires, as subject information, the size and position of the detected image area of each main subject together with the subject distance D of each main subject added to the header area of the moving image.

ステップＳ２０３：ＣＰＵ１０は、特許文献１等の公知の手法を用いて、動画に付加されたステレオ録音の音声データから、その音声データに含まれる音声それぞれの音源の方向を求め、音源ごとの音声信号を抽出し各音源の音声データを生成する。また、ＣＰＵ１０は、それらの音源が各フレーム上のどこに位置するかを、被写体情報および各音源の方向に基づいて特定し、各フレームにおける各音源の方向と特定された位置とからなる位置情報を生成する。ＣＰＵ１０は、音源ごとの音声データおよび位置情報を、上記被写体情報と対応付けて不図示の内部メモリに記録する。 Step S203: The CPU 10 obtains the direction of the sound source of each sound included in the sound data from the sound data of the stereo recording added to the moving image using a known method such as Patent Document 1, and the sound signal for each sound source. To generate audio data of each sound source. Further, the CPU 10 specifies where the sound sources are located on each frame based on the subject information and the direction of each sound source, and position information including the direction of each sound source and the specified position in each frame. Generate. The CPU 10 records audio data and position information for each sound source in an internal memory (not shown) in association with the subject information.

ステップＳ２０４：動きベクトル算出部２２は、隣接する２つのフレームに対して相関処理を施し、背景の画像領域におけるズレ量から、フレーム動きを検出しフレームの動きベクトルを算出する。また、動きベクトル算出部２２は、背景の画像領域のズレ量と各主要被写体の画像領域におけるズレ量とから、各主要被写体の動きベクトルを算出する。ＣＰＵ１０は、算出されたフレームおよび各主要被写体の動きベクトルを、各フレームに対応付けて不図示の内部メモリに記録する。 Step S204: The motion vector calculation unit 22 performs correlation processing on two adjacent frames, detects the frame motion from the amount of deviation in the background image region, and calculates the motion vector of the frame. The motion vector calculation unit 22 calculates a motion vector of each main subject from the amount of deviation of the background image area and the amount of deviation of the main subject in the image area. The CPU 10 records the calculated frame and the motion vector of each main subject in an internal memory (not shown) in association with each frame.

ステップＳ２０５：ＣＰＵ１０は、再生モードとして撮影者視点モードに設定されているか否かを判定する。ＣＰＵ１０は、撮影者視点モードに設定されている場合、ステップＳ２０６（ＹＥＳ側）へ移行し、被写体視点モードに設定されている場合、ステップＳ２０７（ＮＯ側）へ移行する。 Step S205: The CPU 10 determines whether or not the photographer viewpoint mode is set as the reproduction mode. When the photographer viewpoint mode is set, the CPU 10 proceeds to step S206 (YES side), and when the subject viewpoint mode is set, the CPU 10 proceeds to step S207 (NO side).

ステップＳ２０６：ＣＰＵ１０は、撮影者視点モードの場合、デジタルカメラ（不図示）の撮影者が聞くような音声で各音源の音声データを再生するために、例えば、ｍ番目のフレームにおけるフレームの動きベクトルおよび各音源の位置情報に基づいて、ｍ番目のフレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を設定する。具体的には次のように設定する。 Step S206: In the case of the photographer viewpoint mode, the CPU 10 reproduces the sound data of each sound source with the sound heard by the photographer of the digital camera (not shown), for example, the motion vector of the frame in the mth frame Based on the position information of each sound source, the weight coefficient WT (m, i) of each sound source in the mth frame is set. Specifically, it is set as follows.

図７（ａ）〜（ｃ）は、図６（ａ）〜（ｃ）に示すｋ番目、ｋ＋１番目、ｋ＋２番目のフレームにおけるフレームの動きベクトルの向きを、各フレームの中心に矢印で示す。また、図８（ａ）は、各フレームにおいて、フレームの動きベクトルの向きおよび各音源の位置情報に基づいて設定された、各音源の重み係数ＷＴ（ｍ，ｉ）の一覧を示す。なお、撮影者視点モードでは、フレームの動きベクトルの向きに一致し、フレームの中心に近い音源ほど大きな値の重み係数が設定される。 FIGS. 7A to 7C show the directions of the motion vectors of the frames in the kth, k + 1th, and k + 2th frames shown in FIGS. 6A to 6C by arrows at the centers of the respective frames. FIG. 8A shows a list of weighting factors WT (m, i) of each sound source set in each frame based on the direction of the motion vector of the frame and the position information of each sound source. In the photographer viewpoint mode, a larger weight coefficient is set for a sound source that matches the direction of the motion vector of the frame and is closer to the center of the frame.

すなわち、ｋ番目およびｋ＋１番目のフレームは、人物３０がフレームの中心に来るように撮像されたものであることから、図８（ａ）に示すように、人物３０の重み係数が一番大きな値に設定される。また、建物３１および自動車３２は、フレームの動きベクトルの向いた側にあり、且つ人物３０に近づくことから、建物３１および自動車３２の重み係数は、ｋ番目よりもｋ＋１番目のフレーム方が大きな値に設定される。一方、自動車３３は、ｋ番目およびｋ＋１番目のフレームでは画角外で、フレームの動きベクトルの向きとは反対側であることから、重み係数は小さい値のままに設定される。 That is, the kth and (k + 1) th frames are captured so that the person 30 comes to the center of the frame, so that the weight coefficient of the person 30 is the largest value as shown in FIG. Set to Further, since the building 31 and the car 32 are on the side where the motion vector of the frame is directed and approaches the person 30, the weight coefficient of the building 31 and the car 32 is larger in the k + 1th frame than in the kth. Set to On the other hand, since the car 33 is outside the angle of view in the k-th and k + 1-th frames and on the side opposite to the direction of the motion vector of the frame, the weight coefficient is set to a small value.

一方、ｋ＋２番目のフレームは、自動車３３がフレームの中心に来るようにデジタルカメラ（不図示）がパンニングされて撮像されたものであることから、自動車３３の重み係数が一番大きな値に設定される。一方、人物３０、建物３１および自動車３２は、フレームの動きベクトルの向きとは反対側であることから、それぞれの重み係数は、ｋ番目およびｋ＋１番目のフレームに比べて小さな値に設定される。 On the other hand, the k + 2th frame is an image obtained by panning a digital camera (not shown) so that the car 33 is at the center of the frame, so the weight coefficient of the car 33 is set to the largest value. The On the other hand, since the person 30, the building 31, and the car 32 are on the opposite side of the direction of the motion vector of the frame, the respective weighting factors are set to be smaller values than those of the kth and k + 1th frames.

なお、図８（ａ）に示す各フレームにおける音源ごとの重み係数ＷＴ（ｍ，ｉ）の値および値の設定方法は一例であり、主要被写体の数、フレームの動きベクトルの大きさや向き等に応じて適宜設定されることが好ましい。 Note that the value of the weighting factor WT (m, i) for each sound source in each frame shown in FIG. 8A and the method of setting the value are examples, and the number of main subjects, the size and direction of the motion vector of the frame, and the like. It is preferable to set appropriately.

ステップＳ２０７：ＣＰＵ１０は、被写体視点モードの場合、例えば、人物３０が聞くような音声で各音源の音声データを再生するために、人物３０の動きベクトルの向きおよび各音源の位置情報に基づいて、ｍ番目のフレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を設定する。具体的には次のように設定する。 Step S207: In the subject viewpoint mode, for example, in order to reproduce the sound data of each sound source with sound that the person 30 listens, based on the direction of the motion vector of the person 30 and the position information of each sound source, The weight coefficient WT (m, i) of each sound source in the mth frame is set. Specifically, it is set as follows.

図９（ａ）〜（ｃ）は、図７の場合と同様に、図６（ａ）〜（ｃ）に示すｋ番目、ｋ＋１番目、ｋ＋２番目のフレームにおける人物３０の動きベクトルの向きを、各フレームの中心に矢印で示す。また、図８（ｂ）は、各フレームにおいて、人物３０の動きベクトルの向きおよび各音源の位置情報に基づいて設定された、各音源の重み係数ＷＴ（ｍ，ｉ）の一覧を示す。なお、被写体視点モードでは、人物の動きベクトルの向きに一致し、人物３０に近い音源ほど大きな値の重み係数が設定される。 9A to 9C show the direction of the motion vector of the person 30 in the k-th, k + 1-th, and k + 2-th frames shown in FIGS. An arrow is shown at the center of each frame. FIG. 8B shows a list of weighting factors WT (m, i) of each sound source set based on the direction of the motion vector of the person 30 and the position information of each sound source in each frame. In the subject viewpoint mode, a larger weight coefficient is set for a sound source that matches the direction of the motion vector of the person and is closer to the person 30.

すなわち、ｋ番目からｋ＋２番目のフレームにおける人物３０の動きベクトルは同じ向きであり、建物３１および自動車３２は人物３０に近づくことから、図８（ｂ）に示すように、建物３１および自動車３２の重み係数は、ｋ番目からｋ＋２番目のフレームに従い大きな値に設定される。一方、自動車３３は、ｋ番目とｋ＋１番目とのフレームにおいて画角外であることから、０．１と小さな値の重み係数に設定されている。また、自動車３３は、ｋ＋２番目のフレームの画角内で人物３０に接近しているが、人物３０の動きベクトルの向きと反対側であることから、他の音源の重み係数より小さな値に設定される。 That is, the motion vectors of the person 30 in the kth to (k + 2) th frames are in the same direction, and the building 31 and the car 32 approach the person 30. Therefore, as shown in FIG. The weighting factor is set to a large value according to the kth to k + 2th frames. On the other hand, since the car 33 is outside the angle of view in the k-th and k + 1-th frames, the weighting coefficient is set to a small value of 0.1. The car 33 is close to the person 30 within the angle of view of the (k + 2) th frame, but is opposite to the direction of the motion vector of the person 30 and is therefore set to a value smaller than the weight coefficient of other sound sources. Is done.

なお、被写体視点モードでは、人物３０自身が聞くような音声で各音源の音声データを再生することから、図８（ｂ）に示すように、人物３０自身の音声は小さな音量で再生されるように、例えば、０．５等の所定の値の重み係数が予め設定される。 In the subject viewpoint mode, the sound data of each sound source is reproduced with sound that the person 30 hears, so that the sound of the person 30 is reproduced at a low volume as shown in FIG. For example, a weighting factor having a predetermined value such as 0.5 is set in advance.

また、図８（ｂ）に示す各フレームにおける音源ごとの重み係数ＷＴ（ｍ，ｉ）の値および値の設定方法は一例であり、主要被写体の数、主要被写体の動きベクトルの大きさや向き等に応じて適宜設定されることが好ましい。 Further, the value of the weighting coefficient WT (m, i) for each sound source in each frame shown in FIG. 8B and the method of setting the value are examples, and the number of main subjects, the size and direction of the motion vector of the main subject, etc. It is preferable to set appropriately according to the above.

ステップＳ２０８：ＣＰＵ１０は、ステップＳ２０６またはステップＳ２０７において設定したｍ番目のフレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を、次式（２）を用いて、ｍ番目のフレームを含む隣接するＮ枚（＜動画のフレーム総数）のフレームにおける重み係数を時間軸方向に加重平均する。 Step S208: The CPU 10 uses the following equation (2) to calculate the weight coefficient WT (m, i) of each sound source in the mth frame set in Step S206 or Step S207, and the adjacent N including the mth frame. The weighting coefficients in the frames (<total number of frames of moving image) are weighted and averaged in the time axis direction.

ここで、係数ε（ｊ）は、ｊ番目のフレームにおける音源の重み係数がｍ番目のフレームに対して寄与する度合いを示し、ｊ＝ｍの時、最も寄与するように設定される。また、用いられるフレーム数Ｎは１０枚程度以下とし、加重平均の範囲ｊは、ｍからｍ＋Ｎ−１、ｍ−Ｎ＋１からｍ、またはｍ−Ｎ／２からｍ＋Ｎ／２等と適宜選択して行うことが好ましい。この加重平均により、ＣＰＵ１０は、各音源の音量をなめらかに変化させることができる。 Here, the coefficient ε (j) indicates the degree to which the weight coefficient of the sound source in the j-th frame contributes to the m-th frame, and is set to contribute most when j = m. Further, the number N of frames to be used is about 10 or less, and the weighted average range j is appropriately selected from m to m + N-1, m-N + 1 to m, m-N / 2 to m + N / 2, and the like. It is preferable. With this weighted average, the CPU 10 can smoothly change the volume of each sound source.

ステップＳ２０９：ＣＰＵ１０は、ステップＳ２０８で加重平均された重み係数＜ＷＴ（ｍ，ｉ）＞を用いて、再生時の各フレームにおける各音源の音量を決める増幅率ＡＭＰ（ｍ，ｉ）を、次式（３）を用いて算出し再生制御データを生成する。
ＡＭＰ（ｍ，ｉ）＝＜ＷＴ（ｍ，ｉ）＞／（β×Ｄ（ｍ，ｉ））・・・（３）
ここで、係数βは、ｍ番目のフレームにおける各被写体距離Ｄ（ｍ，ｉ）を規格化する係数である。 Step S209: The CPU 10 uses the weighting coefficient <WT (m, i)> weighted and averaged in step S208 to calculate the amplification factor AMP (m, i) that determines the volume of each sound source in each frame at the time of reproduction. Reproduction control data is generated by calculation using equation (3).
AMP (m, i) = <WT (m, i)> / (β × D (m, i)) (3)
Here, the coefficient β is a coefficient that normalizes each subject distance D (m, i) in the m-th frame.

ステップＳ２１０：ＣＰＵ１０は、音源ごとに音声データと再生制御データの増幅率ＡＭＰ（ｍ，ｉ）との積を計算して、各音源の再生用音声データを生成する。 Step S210: The CPU 10 calculates the product of the sound data and the amplification factor AMP (m, i) of the reproduction control data for each sound source, and generates sound data for reproduction of each sound source.

ステップＳ２１１：ＣＰＵ１０は、出力装置１４の液晶モニタに再生表示するとともに、出力装置１４のスピーカに各音源の再生用音声データを音声として出力する。ＣＰＵ１０は、一連の処理を終了する。 Step S211: The CPU 10 reproduces and displays on the liquid crystal monitor of the output device 14, and outputs the sound data for reproduction of each sound source to the speaker of the output device 14 as sound. The CPU 10 ends a series of processes.

このように、本実施形態では、撮像された各フレームにおけるフレームまたは主要被写体の動きベクトルに応じて各音源の重み係数を設定することにより、各音源の音声データに効果や演出を施して再生することができる。
《実施形態の補足事項》
（１）本発明の再生処理装置は、再生処理プログラムをコンピュータ１００に実行させることにより実現させたが、本発明はこれに限定されない。本発明に係る再生処理装置における処理をコンピュータ１００で実現するための再生処理プログラムおよびそれを記録した媒体に対しても適用可能である。 As described above, in this embodiment, by setting the weighting coefficient of each sound source in accordance with the frame in each captured frame or the motion vector of the main subject, the sound data of each sound source is reproduced with effects and effects. be able to.
<< Additional items of embodiment >>
(1) Although the reproduction processing apparatus of the present invention is realized by causing the computer 100 to execute a reproduction processing program, the present invention is not limited to this. The present invention can also be applied to a reproduction processing program for realizing the processing in the reproduction processing apparatus according to the present invention by the computer 100 and a medium on which the program is recorded.

また、本発明の再生処理プログラムを有したデジタルカメラに対しても適用可能である。なお、デジタルカメラが本発明の画像処理装置として動作する場合、ＣＰＵ１０は、被写体検出部２０、シーン判定部２１および動きベクトル算出部２２の各処理をソフトウエア的に実現してもよいし、ＡＳＩＣを用いてこれらの各処理をハードウエア的に実現してもよい。この場合、デジタルカメラにより撮像された静止画像や動画のヘッダ領域には、露出条件とともに、被写体情報、音声データ、位置情報および再生制御データが付加されることが好ましい。なお、ヘッダ領域に付加される音声データは、音源ごとに抽出されたものでもよいし、抽出する前のステレオ録音されたものでもよい。 The present invention can also be applied to a digital camera having the reproduction processing program of the present invention. When the digital camera operates as the image processing apparatus of the present invention, the CPU 10 may realize each process of the subject detection unit 20, the scene determination unit 21, and the motion vector calculation unit 22 by software, or ASIC. Each of these processes may be realized by hardware using In this case, it is preferable that subject information, audio data, position information, and reproduction control data are added to the header region of a still image or moving image captured by a digital camera along with the exposure conditions. Note that the audio data added to the header area may be extracted for each sound source or may be recorded in stereo before extraction.

（２）上記実施形態では、コンピュータ１００が、各音源の音声データの再生とともに、静止画像や動画を再生表示したが、本発明はこれに限定されず、各音源の音声データのみ再生してもよい。 (2) In the above embodiment, the computer 100 reproduces and displays still images and moving images as well as the sound data of each sound source. However, the present invention is not limited to this. Good.

（３）上記実施形態では、シーン判定結果または再生モードに応じて、各音源の重み係数を設定し再生制御データおよび再生音声データを生成したが、本発明はこれに限定されない。例えば、ＣＰＵ１０は、フレームや主要被写体の動きベクトルに基づいて、ドップラー効果等を考慮して再生制御データおよび再生音声データを生成してもよい。 (3) In the above embodiment, the weighting coefficient of each sound source is set according to the scene determination result or the playback mode, and the playback control data and the playback audio data are generated. However, the present invention is not limited to this. For example, the CPU 10 may generate the reproduction control data and the reproduction audio data in consideration of the Doppler effect and the like based on the frame and the motion vector of the main subject.

（４）上記他の実施形態では、動きベクトル算出部２２が、隣接する２つのフレームに対する相関処理に基づいて、フレームおよび各主要被写体の動きベクトルを算出したが、本発明はこれに限定されない。例えば、H.264等の動画形式で動画圧縮された動画の場合には、圧縮効率を高めるために、フレーム間予測における動き補償において動きベクトルが算出される。そこで、動きベクトル算出部２２は、その動きベクトルに基づいて、フレームおよび各主要被写体の動きベクトルを求めてもよい。 (4) In the other embodiment described above, the motion vector calculation unit 22 calculates the motion vector of the frame and each main subject based on the correlation processing for two adjacent frames, but the present invention is not limited to this. For example, in the case of a moving image compressed in a moving image format such as H.264, a motion vector is calculated in motion compensation in inter-frame prediction in order to increase compression efficiency. Therefore, the motion vector calculation unit 22 may obtain the motion vector of the frame and each main subject based on the motion vector.

また、被写体追尾機能により検出した被写体の動きを用いて動きベクトルを算出してもよい。 Alternatively, the motion vector may be calculated using the motion of the subject detected by the subject tracking function.

また、デジタルカメラ（不図示）が加速度センサや電子ジャイロ等のセンサを備える場合、動きベクトル算出部２２は、そのセンサの出力値に基づいて、フレームの動きベクトルを算出してもよい。 When the digital camera (not shown) includes a sensor such as an acceleration sensor or an electronic gyro, the motion vector calculation unit 22 may calculate a frame motion vector based on the output value of the sensor.

（５）上記他の実施形態では、被写体視点モードにおいて、追従する主要被写体を人物３０としたが、本発明はこれに限定されず、建物３１や自動車３２等の主要被写体を追従してもよい。 (5) In the above-described other embodiments, the main subject to be followed is the person 30 in the subject viewpoint mode. However, the present invention is not limited to this, and the main subject such as the building 31 or the car 32 may be followed. .

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲が、その精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図する。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずであり、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物によることも可能である。 From the above detailed description, features and advantages of the embodiments will become apparent. It is intended that the scope of the claims extend to the features and advantages of the embodiments as described above without departing from the spirit and scope of the right. Further, any person having ordinary knowledge in the technical field should be able to easily come up with any improvements and modifications, and there is no intention to limit the scope of the embodiments having the invention to those described above. It is also possible to use appropriate improvements and equivalents within the scope disclosed in.

１０ＣＰＵ、１１記憶部、１２入出力Ｉ／Ｆ、１３バス、１４出力装置、１５入力装置、２０被写体検出部、２１シーン判定部、２２動きベクトル算出部、１００コンピュータ 10 CPU, 11 storage unit, 12 input / output I / F, 13 bus, 14 output device, 15 input device, 20 subject detection unit, 21 scene determination unit, 22 motion vector calculation unit, 100 computer

Claims

An input unit for reading image data and audio data recorded during the image data generation process;
A subject detection unit for detecting a main subject from the image data;
Information about the detected distance to the main subject is acquired from the image data, and based on the information about the distance , the direction of the sound source, and whether the sound source is within the angle of view, A control unit for generating playback control data for controlling the volume;
A reproduction processing apparatus comprising:

The reproduction processing apparatus according to claim 1,
The control unit, the seeking direction of the sound source of at least one of the audio from the audio data to generate audio data of the sound source, the direction of the sound source, the information and the sound source regarding the distance whether the angle A reproduction processing apparatus that generates the reproduction control data by weighting the sound source based on the reproduction control data.

The reproduction processing apparatus according to claim 2, wherein
Wherein the control unit generates an audio data from the seeking direction of each of the plurality of the sound source of each of the sound source audio data, the direction of each sound source, the information and the sound source regarding the distance whether the angle A reproduction processing apparatus that generates the reproduction control data by weighting each sound source based on the reproduction control data.

In the reproduction | regeneration processing apparatus of Claim 2 or Claim 3,
A scene determination unit for determining a scene of the image data;
The reproduction processing apparatus, wherein the control unit weights the sound source based on the direction of the sound source, information on the distance, and whether or not the sound source is within an angle of view and a determination result of the scene determination unit.

In the reproduction | regeneration processing apparatus of Claim 2 or Claim 3,
The image data consists of a plurality of frames imaged continuously in time series,
The controller is
A motion vector calculating unit that detects a motion associated with panning of the imaging device when capturing the image data and calculates the motion vector of the frame;
The control unit weights the sound source in each frame based on the direction of the sound source, information on the distance, and whether or not the sound source is within an angle of view, and a motion vector of each frame.

In the reproduction | regeneration processing apparatus of Claim 2 or Claim 3,
The image data consists of a plurality of frames imaged continuously in time series,
The controller is
A motion vector calculating unit for detecting a motion of the main subject and calculating a motion vector of the main subject;
The reproduction processing apparatus weights the sound source in each frame based on the direction of the sound source, information on the distance, and whether or not the sound source is within an angle of view and a motion vector of the main subject.

In the reproduction | regeneration processing apparatus of Claim 5 or Claim 6,
The control unit is a reproduction processing apparatus that weights and averages the weight of the sound source in the frame with the weight of the sound source in an adjacent frame.

The reproduction processing apparatus according to any one of claims 1 to 7, further comprising a speaker unit that outputs the audio data to the outside as audio.
The reproduction processing apparatus, wherein the control unit controls a volume of the audio data based on the reproduction control data and outputs sound to the speaker unit.

In the reproduction | regeneration processing apparatus of any one of Claim 1 thru | or 8,
A reproduction processing apparatus comprising a display unit for displaying the image data.

An imaging unit that images the object scene and generates image data;
A microphone unit that receives sound and generates sound data;
A reproduction processing apparatus according to claim 1;
An imaging apparatus comprising:

An input procedure for reading image data and audio data recorded during the image data generation process;
A subject detection procedure for detecting a main subject from the image data;
Information about the detected distance to the main subject is acquired from the image data, and based on the information about the distance , the direction of the sound source, and whether the sound source is within the angle of view, A control procedure for generating playback control data for controlling the volume;
A reproduction processing program for causing a computer to execute.