JP6525029B2

JP6525029B2 - Reproduction processing apparatus, imaging apparatus and reproduction processing program

Info

Publication number: JP6525029B2
Application number: JP2017127824A
Authority: JP
Inventors: 麻理杉原
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2019-06-05
Anticipated expiration: 2032-08-07
Also published as: JP2017204869A

Description

本発明は、再生処理装置、撮像装置および再生処理プログラムに関する。 The present invention relates to a reproduction processing device, an imaging device, and a reproduction processing program.

従来、複数のマイクロホンを用いて録音した音声からその音声の音源の方向を取得することができる。 Conventionally, the direction of the sound source of the voice can be acquired from the voice recorded using a plurality of microphones.

例えば、複数のマイクロホンを有するデジタルカメラが、静止画像や動画を撮像するとともに音声を録音して、音声の音源ごとの方向および画像における位置を推定し、各音源の方向別音声データおよび位置データを生成して画像データに対応付けて記録する技術がある（特許文献１等参照）。 For example, a digital camera having a plurality of microphones picks up a still image or a moving image and records a voice to estimate the direction of each voice source and the position in the image, and the directional voice data and position data of each voice source There is a technique of generating and recording in association with image data (see Patent Document 1 etc.).

特開２００９−２３９３４８号公報JP, 2009-239348, A

しかしながら、従来技術では、画像を再生表示すると同時に、単に全ての音源または音源ごとの音声データを再生するだけで、画像のシーンや撮影者または被写体の視点等に応じて、各音源の音声データに効果や演出を施して再生することはできない。 However, in the prior art, the sound data of each sound source may be selected according to the scene of the image or the viewpoint of the photographer or the subject by simply reproducing the sound source for all sound sources or sound sources simultaneously with reproducing and displaying the image. It can not be reproduced with effects or effects.

上記従来技術が有する問題に鑑み、本発明の目的は、音源ごとの音声データに効果や演出を施して再生することができる技術を提供することにある。 SUMMARY OF THE INVENTION In view of the problems of the prior art, an object of the present invention is to provide a technology capable of reproducing sound data by applying effects and effects to sound data for each sound source.

上記課題を解決するために、本発明を例示する再生処理装置の一態様は、被写体の撮像により生成される画像データの処理時に録音された音声データを読み込む入力部と、音源が、表示されている画像の画角内か否かを、画像の画角と音源の位置情報とに基づいて判定し、画角内か否かに基づいて、再生時の音声データの音量を制御する再生制御データを生成する制御部と、を備える。 In order to solve the above problems, according to one aspect of a reproduction processing apparatus exemplifying the present invention, an input unit for reading audio data recorded at the time of processing image data generated by imaging a subject, and a sound source are displayed or within the angle of view whether the images are, determined on the basis of the angle of view and the position information of the sound source of the image, based on whether the angle of view, the reproduction control data for controlling the sound volume of the audio data during reproduction And a control unit that generates the

また、制御部は、音声データから少なくとも１つの音声の音源の方向を求めて音源の音声データを生成し、音源の方向および音源が画角内か否かに基づいて、音源に対する重み付けをして再生制御データを生成してもよい。 In addition, the control unit determines the direction of the sound source of at least one sound from the sound data to generate sound data of the sound source, and weights the sound source based on the direction of the sound source and whether the sound source is within the angle of view. Reproduction control data may be generated.

また、制御部は、音声データから複数の音源それぞれの方向を求めて各音源の音声データを生成し、各音源の方向および音源が画角内か否かに基づいて、各音源に対する重み付けをして再生制御データを生成してもよい。 Also, the control unit determines the direction of each of the plurality of sound sources from the sound data to generate sound data of each sound source, and weights each sound source based on the direction of each sound source and whether the sound source is within the angle of view. Playback control data may be generated.

また、画像データのシーンを判定するシーン判定部を備え、制御部は、音源の方向および音源が画角内か否かとともに、シーン判定部の判定結果に基づいて、音源に対する重み付けをしてもよい。 The image processing apparatus further includes a scene determination unit that determines a scene of the image data, and the control unit weights the sound source based on the determination result of the scene determination unit as well as the direction of the sound source and whether the sound source is within the angle of view. Good.

また、入力部は、連続して撮像された複数の画像データを入力し、制御部は、被写体を撮像する際の撮像装置のパンニングに伴う動きを検出し、複数の画像データに基づいて動きベクトルを算出する動きベクトル算出部をさらに備え、制御部は、音源の方向および音源が画角内か否かと、動きベクトルとに基づいて、各フレームにおける音源に対する重み付けをしてもよい。 Further, the input unit inputs a plurality of image data captured continuously, and the control unit detects a motion associated with the panning of the imaging device at the time of imaging a subject, and a motion vector based on the plurality of image data The control unit may weight the sound source in each frame based on the direction of the sound source, whether the sound source is within the angle of view, and the motion vector.

また、画像データに基づく画像を表示する表示部を備えてもよい。 In addition, a display unit may be provided to display an image based on image data.

本発明を例示する撮像装置の一態様は、被写界を撮像して画像データを生成する撮像部と、音声を受信して音声データを生成するマイクロホン部と、本発明の再生処理装置と、を備える。 One aspect of an imaging device exemplifying the present invention includes an imaging unit that captures an object scene and generates image data, a microphone unit that receives audio and generates audio data, and a reproduction processing device of the present invention; Equipped with

本発明を例示する再生処理プログラムの一態様は、被写体の撮像により生成される画像データの処理時に録音された音声データを読み込む入力手順と、音源が、表示されている画像の画角内か否かを、画像の画角と音源の位置情報とに基づいて判定し、画角内か否かに基づいて、再生時の音声データの音量を制御する再生制御データを生成する制御手順と、をコンピュータに実行させる。 One aspect of the reproduction processing program exemplifying the present invention is an input procedure for reading audio data recorded at the time of processing image data generated by imaging a subject, and whether or not the sound source is within the angle of view of the displayed image. or a determination based on the image view angle and position information of the sound source, based on whether the angle of view, a control procedure for generating reproduction control data for controlling the sound volume of the audio data during reproduction, the Make it run on a computer.

本発明によれば、音源ごとの音声データに効果や演出を施して再生することができる。 According to the present invention, sound data for each sound source can be reproduced with effects and effects.

本発明の一の実施形態に係るコンピュータの構成を示す図The figure which shows the structure of the computer which concerns on one Embodiment of this invention. 一の実施形態に係るコンピュータによる再生処理を示すフローチャートFlow chart showing reproduction processing by the computer according to one embodiment 被写界の一例を示す図Figure showing an example of the subject field 本発明の他の実施形態に係るコンピュータにおけるＣＰＵの構成を示す図A diagram showing a configuration of a CPU in a computer according to another embodiment of the present invention 他の実施形態に係るコンピュータによる再生処理を示すフローチャートFlow chart showing reproduction processing by a computer according to another embodiment 動画の連続するフレームの一例を示す図A diagram showing an example of continuous frames of a moving image フレームの動きベクトルの一例を示す図Diagram showing an example of a frame motion vector 再生モードに応じて設定される重み係数の一覧を示す図Diagram showing a list of weighting factors set according to the playback mode 主要被写体の動きベクトルの一例を示す図Diagram showing an example of the motion vector of the main subject

《一の実施形態》
図１は、本発明の一の実施形態に係る再生処理装置として動作させるコンピュータ１００の構成を示す図である。 << One embodiment >>
FIG. 1 is a diagram showing the configuration of a computer 100 operated as a reproduction processing apparatus according to an embodiment of the present invention.

図１（ａ）に示すコンピュータ１００は、ＣＰＵ１０、記憶部１１、入出力インタフェース（入出力Ｉ／Ｆ）１２およびバス１３から構成される。ＣＰＵ１０、記憶部１１および入出力Ｉ／Ｆ１２は、バス１３を介して情報伝達可能に接続される。また、コンピュータ１００には、入出力Ｉ／Ｆ１２を介して、画像処理の途中経過や処理結果を表示する出力装置１４、ユーザからの入力を受け付ける入力装置１５がそれぞれ接続される。出力装置１４には、一般的な液晶モニタやプリンタ等を用いることができ、入力装置１５には、
キーボードやマウス等をそれぞれ適宜選択して使用できる。なお、本実施形態のコンピュータ１００は、出力装置１４として、スピーカも有するものとする。また、入出力ＩＦ１２は、出力装置１４および入力装置１５とともに、不図示のデジタルカメラをUniversal Serial Bus（ＵＳＢ）ケーブルなどで接続する接続部や、デジタルカメラ（不図示）に装着されたメモリカード（不図示）を差し込む差し込み口を有するものとする。 The computer 100 illustrated in FIG. 1A includes a CPU 10, a storage unit 11, an input / output interface (input / output I / F) 12, and a bus 13. The CPU 10, the storage unit 11, and the input / output I / F 12 are connected via the bus 13 so as to be able to transmit information. Further, the computer 100 is connected with an output device 14 for displaying a progress of image processing and a processing result, and an input device 15 for receiving an input from a user via the input / output I / F 12. A general liquid crystal monitor or printer can be used as the output device 14, and the input device 15 can be
A keyboard, a mouse, etc. can be appropriately selected and used. The computer 100 according to this embodiment also includes a speaker as the output device 14. Further, the input / output IF 12, together with the output device 14 and the input device 15, is a connection unit for connecting a digital camera (not shown) with a Universal Serial Bus (USB) cable or a memory card attached to the digital camera (not shown) (Not shown) shall be inserted.

本実施形態のコンピュータ１００が処理対象とする画像は、不図示のデジタルカメラ等で撮像され、その撮像時に被写界周辺の音声をステレオ録音された音声データがヘッダ領域に付加された静止画像や動画である。なお、その静止画像や動画のヘッダ領域には、撮像時の露出条件（焦点距離、絞り値、シャッタ速度、ＩＳＯ値等）とともに、少なくとも静止画像や動画に撮像された全ての主要被写体の被写体距離Ｄの情報を有するものとする。 An image to be processed by the computer 100 according to the present embodiment is captured by a digital camera (not shown) or the like, and a still image in which audio data in which audio around the object scene is stereo-recorded at the time of imaging is added It is a video. In the header area of the still image or moving image, at least the subject distance of all the main subjects imaged in the still image or moving image, together with the exposure conditions (focal length, aperture value, shutter speed, ISO value, etc.) It shall have the information of D.

ＣＰＵ１０は、コンピュータ１００の各部を統括的に制御するプロセッサである。例えば、ＣＰＵ１０は、入力装置１５で受け付けたユーザからの指示に基づいて、記憶部１１に記憶されている再生処理プログラムを読み込む。ＣＰＵ１０は、その再生処理プログラムを実行することにより、被写体検出部２０およびシーン判定部２１として動作し（図１（ｂ））、処理対象の画像に対して再生処理を行う。ＣＰＵ１０は、静止画像や動画を出力装置１４の液晶モニタに再生表示するとともに、付加された音声データの音声をスピーカに出力する。 The CPU 10 is a processor that controls each part of the computer 100 in an integrated manner. For example, the CPU 10 reads the reproduction processing program stored in the storage unit 11 based on the instruction from the user accepted by the input device 15. The CPU 10 operates as the subject detection unit 20 and the scene determination unit 21 by executing the reproduction processing program (FIG. 1B), and performs reproduction processing on the image to be processed. The CPU 10 reproduces and displays a still image or a moving image on the liquid crystal monitor of the output device 14, and outputs the sound of the added sound data to the speaker.

被写体検出部２０は、処理対象の静止画像や動画の各フレームに対して被写体検出処理を施し、人物の顔領域や、建物や自動車等の画像領域を検出する。この被写体検出処理は公知のアルゴリズムによって行われる。例えば、被写体検出部２０は、記憶部１１に記憶された、人物や建物等の様々なパターンのテンプレートを用いて、静止画像やフレームに対しパターンマッチング処理等を施して主要被写体の画像領域を検出する。ＣＰＵ１０は、検出された主要被写体の画像領域の大きさや位置等を被写体情報として取得する。 The subject detection unit 20 performs subject detection processing on each frame of a still image or a moving image to be processed, and detects a face area of a person or an image area of a building, a car, or the like. This subject detection process is performed by a known algorithm. For example, the subject detection unit 20 performs pattern matching processing or the like on a still image or frame using the templates of various patterns such as a person or a building stored in the storage unit 11 to detect the image area of the main subject Do. The CPU 10 acquires the size, the position, and the like of the detected image area of the main subject as subject information.

シーン判定部２１は、公知の手法を用いて静止画像やフレームに撮像された被写界のシーンを判定する。なお、シーン判定部２１は、読み込んだ静止画像や動画のヘッダ領域に撮像時のシーンモード情報を有する場合、そのシーンモード情報に基づいてシーンを判定する。 The scene determination unit 21 determines a scene of a scene captured in a still image or a frame using a known method. In addition, when the scene determination unit 21 has scene mode information at the time of imaging in the header area of the read still image or moving image, the scene determination unit 21 determines a scene based on the scene mode information.

記憶部１１は、制御プログラムや再生処理プログラム等、およびデジタルカメラ（不図示）から読み込んだ画像を記録する。記憶部１１に記憶されるプログラムや画像等は、バス１３を介して、ＣＰＵ１０から適宜参照することができる。記憶部１１には、一般的なハードディスク装置、光磁気ディスク装置等の記憶装置を選択して用いることができる。なお、記憶部１１は、コンピュータ１００に組み込まれるとしたが、外付けの記憶装置でもよい。この場合、記憶部１１は、入出力Ｉ／Ｆ１２を介してコンピュータ１００に接続される。 The storage unit 11 records a control program, a reproduction processing program, and the like, and an image read from a digital camera (not shown). The programs, images, and the like stored in the storage unit 11 can be appropriately referenced from the CPU 10 via the bus 13. A storage device such as a general hard disk drive or a magneto-optical disk drive can be selected and used as the storage unit 11. Although the storage unit 11 is incorporated in the computer 100, it may be an external storage device. In this case, the storage unit 11 is connected to the computer 100 via the input / output I / F 12.

次に、図２のフローチャートを参照しつつ、本実施形態のコンピュータ１００による再生処理について説明する。なお、本実施形態では、処理対象とする画像を、図３（ａ）に示すような人物３０を含む被写界４０が撮像された静止画像とする。ただし、図３（ｂ）は、上記静止画像の撮像後の被写界４０を示し、自動車３２は、撮像時点では被写界４０の外側であるが、人物３０に接近しているものとする。 Next, reproduction processing by the computer 100 of the present embodiment will be described with reference to the flowchart of FIG. In the present embodiment, the image to be processed is a still image in which an object scene 40 including a person 30 as shown in FIG. 3A is captured. However, FIG. 3B shows the field 40 after capturing of the still image, and it is assumed that the automobile 32 is outside the field 40 at the time of capturing, but is approaching the person 30. .

ユーザは、入力装置１５を用いて、再生処理プログラムのコマンドを入力、または出力装置１４に表示されたそのプログラムのアイコンをダブルクリック等することにより、再生処理プログラムの起動をＣＰＵ１０に指示する。ＣＰＵ１０は、その指示を入出力Ｉ／
Ｆ１２を介して受け付け、記憶部１１に記憶されている再生処理プログラムを読み込み実行する。ＣＰＵ１０は、ステップＳ１０１からの処理を開始する。 The user uses the input device 15 to input a command of the reproduction processing program or double-click the icon of the program displayed on the output device 14 to instruct the CPU 10 to start the reproduction processing program. The CPU 10 inputs / outputs the instruction.
The program is accepted via F12, and the reproduction processing program stored in the storage unit 11 is read and executed. The CPU 10 starts the process from step S101.

ステップＳ１０１：ＣＰＵ１０は、入力装置１５を介して、ユーザにより指定された静止画像を、記憶部１１または入出力Ｉ／Ｆ１２を介してデジタルカメラ（不図示）から読み込む。 Step S101: The CPU 10 reads a still image specified by the user via the input device 15 from the digital camera (not shown) via the storage unit 11 or the input / output I / F 12.

ステップＳ１０２：被写体検出部２０は、読み込んだ静止画像から人物３０の顔領域や建物３１の画像領域を検出する。ＣＰＵ１０は、検出された各主要被写体の画像領域の大きさや位置等を、静止画像のヘッダ領域に付加された各主要被写体の被写体距離Ｄと合わせて被写体情報として取得する。 Step S102: The subject detection unit 20 detects the face area of the person 30 and the image area of the building 31 from the read still image. The CPU 10 acquires the size, the position, and the like of the detected image area of each main subject as subject information D together with the subject distance D of each main subject added to the header area of the still image.

ステップＳ１０３：ＣＰＵ１０は、特許文献１等の公知の手法を用いて、静止画像に付加されたステレオ録音の音声データから、その音声データに含まれる音声それぞれの音源の方向を求め、音源ごとの音声信号を抽出し各音源の音声データを生成する。本実施形態では、音源として、人物３０の音声、建物３１から流れる音楽、および接近する自動車３２のエンジン音の音源があり、ＣＰＵ１０は、それらの音源の音声データを生成するものとする。また、ＣＰＵ１０は、それらの音源が静止画像上のどこに位置するかを、被写体情報および各音源の方向に基づいて特定し、各音源の方向と特定された位置とからなる位置情報を生成する。ＣＰＵ１０は、音源ごとの音声データおよび位置情報を、上記被写体情報と対応付けて不図示の内部メモリに記録する。 Step S103: The CPU 10 obtains the direction of the sound source of each sound included in the audio data from the audio data of stereo recording added to the still image using a known method such as Patent Document 1 etc. The signal is extracted to generate voice data of each sound source. In this embodiment, there are sound sources of the voice of the person 30, music flowing from the building 31, and engine sound of the approaching car 32 as sound sources, and the CPU 10 generates voice data of those sound sources. Further, the CPU 10 specifies where the sound sources are located on the still image based on the subject information and the direction of each sound source, and generates position information including the direction of each sound source and the specified position. The CPU 10 records voice data and position information for each sound source in an internal memory (not shown) in association with the subject information.

なお、図３（ａ）に示すように、自動車３２は、静止画像の画角内にないことから、自動車３２の被写体情報の画像領域の大きさや位置、被写体距離Ｄ（ｉ）は、画角外を示す所定の値または無限遠等に設定されているものとする。 As shown in FIG. 3A, since the car 32 is not within the angle of view of the still image, the size and position of the image area of the subject information of the car 32 and the subject distance D (i) It is assumed that it is set to a predetermined value indicating the outside or infinity.

ステップＳ１０４：ＣＰＵ１０は、各音源の音声データを再生する際の音量を、後述する静止画像に撮像されたシーンに応じて設定するために、各音源が静止画像の画角内に存在するか否かを、静止画像の画角と各音源の位置情報とに基づいて判定する。ＣＰＵ１０は、音源が画角内の場合、その音源のフラグUseFlag（ｉ）を１に設定する。一方、ＣＰ
Ｕ１０は、音源が画角外の場合、フラグUseFlag（ｉ）を０と設定する。ここで、係数ｉ
は各音源を示し、本実施形態では、人物３０をｉ＝０、建物３１をｉ＝１、自動車３２をｉ＝２とする。 Step S104: The CPU 10 determines whether or not each sound source is present within the angle of view of the still image in order to set the volume at the time of reproducing the sound data of each sound source according to the scene captured in the still image described later. Is determined based on the angle of view of the still image and the position information of each sound source. When the sound source is within the angle of view, the CPU 10 sets the flag UseFlag (i) of the sound source to 1. Meanwhile, CP
U10 sets the flag UseFlag (i) to 0 when the sound source is out of the angle of view. Where the coefficient i
Represents each sound source, and in this embodiment, the person 30 is i = 0, the building 31 is i = 1, and the automobile 32 is i = 2.

ステップＳ１０５：シーン判定部２１は、静止画像に撮像されたシーンを判定し、ＣＰＵ１０は、シーン判定部２１の判定結果に応じて、再生時の各音源の重み付けをする。なお、本実施形態のシーン判定部２１が判定するシーンは、「スナップ」、「ポートレート」または「風景」のいずれかとする。それぞれのシーンにおけるＣＰＵ１０による各音源の重み付けの設定について説明する。
Ａ）「スナップ」の場合
ＣＰＵ１０は、再生時の各音源の重み付けを、静止画像の撮像時における合焦領域、例えば、人物３０の顔領域からの静止画像上の距離Ｌに応じて変化する重み係数ＷＴ（ｉ）＝α／Ｌとして設定する。係数αは所定の値が設定される。なお、音源が静止画像の画角外の場合、すなわち自動車３２の距離Ｌは、所定の値または無限遠等に設定されているものとする。
Ｂ）「ポートレート」の場合
ＣＰＵ１０は、人物からの音源の重み係数ＷＴ（ｉ）が最も大きな値になるように設定する。例えば、ＣＰＵ１０は、人物３０の重み係数ＷＴ（ｉ）を１に設定し、建物３１および自動車３２の重み係数ＷＴ（ｉ）を０．５に設定する。
Ｃ）「風景」の場合
ＣＰＵ１０は、人物以外からの音源の重み係数ＷＴ（ｉ）が大きな値となるように設定する。例えば、ＣＰＵ１０は、人物３０の重み係数ＷＴ（ｉ）を０．２に設定し、建物３１および自動車３２の重み係数ＷＴ（ｉ）を１に設定する。 Step S105: The scene determination unit 21 determines a scene captured in a still image, and the CPU 10 weights each sound source at the time of reproduction according to the determination result of the scene determination unit 21. Note that the scene determined by the scene determination unit 21 according to the present embodiment is either “snap”, “portrait” or “landscape”. The setting of weighting of each sound source by the CPU 10 in each scene will be described.
A) In the case of "snap" The CPU 10 changes the weighting of each sound source at the time of reproduction according to the in-focus area at the time of capturing a still image, for example, the distance L on the still image from the face region of the person 30 Set as coefficient WT (i) = α / L. The coefficient α is set to a predetermined value. When the sound source is outside the angle of view of the still image, that is, the distance L of the car 32 is set to a predetermined value or infinity.
B) "Portrait" The CPU 10 sets the weighting factor WT (i) of the sound source from the person to be the largest value. For example, the CPU 10 sets the weight coefficient WT (i) of the person 30 to 1 and sets the weight coefficient WT (i) of the building 31 and the automobile 32 to 0.5.
C) In the case of "landscape"
The CPU 10 sets the weighting factor WT (i) of the sound source from a person other than the person to a large value. For example, the CPU 10 sets the weight coefficient WT (i) of the person 30 to 0.2, and sets the weight coefficient WT (i) of the building 31 and the automobile 32 to 1.

なお、各シーンにおける重み係数ＷＴ（ｉ）の値および設定方法は一例であり、他の値または他の設定方法で設定してもよい。例えば、「ポートレート」の場合、人物３０との距離に反比例するように、建物３１や自動車３２の重み係数ＷＴ（ｉ）が設定されてもよい。 Note that the value and the setting method of the weighting factor WT (i) in each scene are an example, and may be set by other values or other setting methods. For example, in the case of “portrait”, the weighting factor WT (i) of the building 31 or the automobile 32 may be set in inverse proportion to the distance to the person 30.

ステップＳ１０６：ＣＰＵ１０は、ステップＳ１０５で設定された各音源の重み係数ＷＴ（ｉ）を用いて、再生時の各音源の音量を決める増幅率ＡＭＰ（ｉ）を次式（１）に基づいて算出し、再生制御データとして生成する。
ＡＭＰ（ｉ）＝UseFlag（ｉ）×ＷＴ（ｉ）／（β×Ｄ（ｉ））・・・（１）
ここで、係数βは、各主要被写体の被写体距離Ｄ（ｉ）を規格化する係数である。 Step S106: The CPU 10 calculates the amplification factor AMP (i) that determines the volume of each sound source at the time of reproduction based on the following equation (1) using the weighting factor WT (i) of each sound source set in step S105. And generates as reproduction control data.
AMP (i) = UseFlag (i) × WT (i) / (β × D (i)) (1)
Here, the coefficient β is a coefficient that normalizes the subject distance D (i) of each main subject.

ステップＳ１０７：ＣＰＵ１０は、音源ごとに音声データと再生制御データの増幅率ＡＭＰ（ｉ）との積を計算して、各音源の再生用音声データを生成する。ＣＰＵ１０は、静止画像を出力装置１４の液晶モニタに再生表示するとともに、出力装置１４のスピーカに各音源の再生用音声データを音声として出力する。ＣＰＵ１０は、一連の処理を終了する。 Step S107: The CPU 10 calculates the product of the audio data and the amplification factor AMP (i) of the reproduction control data for each sound source, and generates reproduction audio data of each sound source. The CPU 10 reproduces and displays the still image on the liquid crystal monitor of the output device 14 and outputs the reproduction audio data of each sound source as sound to the speaker of the output device 14. The CPU 10 ends the series of processing.

なお、処理対象の画像が動画の場合、コンピュータ１００は、動画の各フレームに対し図２に示す再生処理を施す。すなわち、コンピュータ１００は、動画の全フレームに対してステップＳ１０２〜ステップＳ１０５の処理を施した後、ステップＳ１０６へ移行して、各フレームにおける音源ごとの増幅率ＡＭＰを算出し、再生制御データを生成する。 When the image to be processed is a moving image, the computer 100 performs the reproduction process shown in FIG. 2 on each frame of the moving image. That is, after performing the processing of steps S102 to S105 on all the frames of the moving image, the computer 100 proceeds to step S106, calculates amplification factor AMP for each sound source in each frame, and generates reproduction control data. Do.

このように、本実施形態では、撮像された画像のシーンに応じて各音源の重み係数を設定することにより、各音源の音声データに効果や演出を施して再生することができる。
《他の実施形態》
本発明の他の実施形態に係るコンピュータは、図１に示す一の実施形態に係るコンピュータ１００と同じであり、各構成要素についての詳細な説明は省略する。 As described above, in the present embodiment, by setting the weight coefficient of each sound source according to the scene of the captured image, the sound data of each sound source can be reproduced with an effect or an effect.
Other Embodiments
A computer according to another embodiment of the present invention is the same as the computer 100 according to one embodiment shown in FIG. 1, and detailed description of each component is omitted.

本実施形態のコンピュータ１００と一の実施形態のものとの相違点は、１）処理対象となる画像は動画のみであり、２）コンピュータ１００は、動画の再生において、撮影者視点か被写体視点かに応じて各音源の音声データの重み係数ＷＴを設定する再生モードを有する点にある。ここで、再生モードの撮影者視点モードとは、デジタルカメラ（不図示）の撮影者が聞くような音声で各音源の音声データを再生するモードであり、被写体視点モードとは、主要被写体の位置で聞こえるような音声で各音源の音声データを再生するモードである。また、処理対象となる画像は動画のみとなることに伴い、本実施形態のＣＰＵ１０は、再生プログラムを実行することにより、図４に示すように、被写体検出部２０とともに、動きベクトル算出部２２として動作する。 The difference between the computer 100 of the present embodiment and that of the one embodiment is that 1) the image to be processed is only a moving image, and 2) the computer 100 is a photographer's viewpoint or a subject viewpoint in playback of a moving image And the reproduction mode for setting the weighting factor WT of the sound data of each sound source according to. Here, the photographer viewpoint mode in the playback mode is a mode in which the voice data of each sound source is reproduced with a voice heard by the photographer of the digital camera (not shown), and the subject viewpoint mode is the position of the main subject In this mode, the sound data of each sound source is reproduced with a sound that can be heard in the. Further, as the image to be processed is only a moving image, the CPU 10 according to this embodiment executes the reproduction program, and as shown in FIG. Operate.

ベクトル算出部２２は、動画を撮像したデジタルカメラ（不図示）のパンニングに伴う動きをフレーム動きとして検出しフレームの動きベクトルを算出するとともに、主要被写体の動きを検出し主要被写体の動きベクトルを算出する。具体的には、動きベクトル算出部２２は、動画の隣接する２つのフレームに公知の相関処理を施す。動きベクトル算出部２２は、その相関結果に基づいて、例えば、被写体検出部２０により検出された主要被写体の画像領域を除いた背景の画像領域における２つのフレーム間のズレ量から、フレーム動きを検出しフレームの動きベクトルを算出する。一方、動きベクトル算出部２２は、背景の画像領域のズレ量と主要被写体の画像領域のズレ量とに基づいて、主要被写体の動き
を検出しその主要被写体の動きベクトルを算出する。 The vector calculation unit 22 detects a motion associated with panning of a digital camera (not shown) that has captured a moving image as a frame motion, calculates a motion vector of the frame, detects a motion of the main subject, and calculates a motion vector of the main subject Do. Specifically, the motion vector calculation unit 22 performs known correlation processing on two adjacent frames of the moving image. The motion vector calculation unit 22 detects frame motion from the amount of displacement between two frames in the background image area excluding the image area of the main subject detected by the subject detection unit 20 based on the correlation result, for example Calculate the motion vector of the image frame. On the other hand, the motion vector calculation unit 22 moves the main subject based on the amount of shift of the background image area and the amount of shift of the main subject image area.
To calculate the motion vector of the main subject.

次に、図５のフローチャートを参照しつつ、本実施形態のコンピュータ１００の再生処理について説明する。なお、本実施形態での処理対象の動画は、図３に示す被写界４０を撮像したものである。図６（ａ）〜（ｃ）は、その動画のフレームのうち、連続する３フレームを一例として示す。すなわち、図６（ａ）〜（ｃ）は、ｋ番目、ｋ＋１番目およびｋ＋２番目のフレームを示す（ｋは自然数）。ただし、図６（ａ）、（ｂ）は、人物３０を追従するようにデジタルカメラ（不図示）をパンニングして撮像されたフレームとする。また、図６（ｃ）は、左側から現れた自動車３３が画像中心となるようにデジタルカメラ（不図示）をパンニングして撮像されたフレームとする。 Next, the reproduction process of the computer 100 according to the present embodiment will be described with reference to the flowchart of FIG. The moving image to be processed in the present embodiment is obtained by imaging the object scene 40 shown in FIG. FIGS. 6A to 6C show, as an example, three consecutive frames among the frames of the moving image. That is, FIGS. 6A to 6C show the k-th, k + 1-th and k + 2-th frames (k is a natural number). 6 (a) and 6 (b) are taken as frames captured by panning a digital camera (not shown) so as to follow the person 30. FIG. Further, FIG. 6C shows a frame captured by panning a digital camera (not shown) so that the car 33 appearing from the left side becomes the image center.

ユーザは、入力装置１５を用いて、再生処理プログラムのコマンドを入力、または出力装置１４に表示されたそのプログラムのアイコンをダブルクリック等することにより、再生処理プログラムの起動をＣＰＵ１０に指示する。ＣＰＵ１０は、その指示を入出力Ｉ／Ｆ１２を介して受け付け、記憶部１１に記憶されている再生処理プログラムを読み込み実行する。ＣＰＵ１０は、ステップＳ２０１からの処理を開始する。 The user uses the input device 15 to input a command of the reproduction processing program or double-click the icon of the program displayed on the output device 14 to instruct the CPU 10 to start the reproduction processing program. The CPU 10 receives the instruction via the input / output I / F 12 and reads and executes the reproduction processing program stored in the storage unit 11. The CPU 10 starts the process from step S201.

ステップＳ２０１：ＣＰＵ１０は、入力装置１５を介して、ユーザにより指定された動画を、記憶部１１または入出力Ｉ／Ｆ１２を介してデジタルカメラ（不図示）から読み込む。なお、ＣＰＵ１０は、再生したい動画の指定とともに、再生モードの指定も受け付けることが好ましい。 Step S201: The CPU 10 reads a moving image specified by the user via the input device 15 from a digital camera (not shown) via the storage unit 11 or the input / output I / F 12. In addition, it is preferable that the CPU 10 accepts the specification of the reproduction mode as well as the specification of the moving image to be reproduced.

ステップＳ２０２：被写体検出部２０は、読み込んだ動画の各フレームから人物３０等の画像領域を検出する。ＣＰＵ１０は、検出された各主要被写体の画像領域の大きさや位置等を、動画のヘッダ領域に付加された各主要被写体の被写体距離Ｄと合わせて被写体情報として取得する。 Step S202: The subject detection unit 20 detects an image area of the person 30 or the like from each frame of the read moving image. The CPU 10 acquires the size, the position, and the like of the detected image area of each main subject as subject information D together with the subject distance D of each main subject added to the header area of the moving image.

ステップＳ２０３：ＣＰＵ１０は、特許文献１等の公知の手法を用いて、動画に付加されたステレオ録音の音声データから、その音声データに含まれる音声それぞれの音源の方向を求め、音源ごとの音声信号を抽出し各音源の音声データを生成する。また、ＣＰＵ１０は、それらの音源が各フレーム上のどこに位置するかを、被写体情報および各音源の方向に基づいて特定し、各フレームにおける各音源の方向と特定された位置とからなる位置情報を生成する。ＣＰＵ１０は、音源ごとの音声データおよび位置情報を、上記被写体情報と対応付けて不図示の内部メモリに記録する。 Step S203: The CPU 10 obtains the direction of the sound source of each sound included in the audio data from the audio data of stereo recording added to the moving image using a known method such as Patent Document 1 etc. To generate voice data of each sound source. Further, the CPU 10 specifies where on the respective frames the sound sources are located based on the subject information and the direction of each sound source, and position information including the direction of each sound source in each frame and the specified position is Generate The CPU 10 records voice data and position information for each sound source in an internal memory (not shown) in association with the subject information.

ステップＳ２０４：動きベクトル算出部２２は、隣接する２つのフレームに対して相関処理を施し、背景の画像領域におけるズレ量から、フレーム動きを検出しフレームの動きベクトルを算出する。また、動きベクトル算出部２２は、背景の画像領域のズレ量と各主要被写体の画像領域におけるズレ量とから、各主要被写体の動きベクトルを算出する。ＣＰＵ１０は、算出されたフレームおよび各主要被写体の動きベクトルを、各フレームに対応付けて不図示の内部メモリに記録する。 Step S204: The motion vector calculation unit 22 performs correlation processing on two adjacent frames, detects frame motion from the amount of displacement in the background image area, and calculates a motion vector of the frame. Further, the motion vector calculation unit 22 calculates the motion vector of each main subject from the amount of shift of the image area of the background and the shift amount of the image area of each main subject. The CPU 10 records the calculated frame and the motion vector of each main subject in an internal memory (not shown) in association with each frame.

ステップＳ２０５：ＣＰＵ１０は、再生モードとして撮影者視点モードに設定されているか否かを判定する。ＣＰＵ１０は、撮影者視点モードに設定されている場合、ステップＳ２０６（ＹＥＳ側）へ移行し、被写体視点モードに設定されている場合、ステップＳ２０７（ＮＯ側）へ移行する。 Step S205: The CPU 10 determines whether or not the photographer viewpoint mode is set as the reproduction mode. The CPU 10 proceeds to step S206 (YES side) when the camera viewpoint mode is set, and proceeds to step S207 (NO side) when the camera viewpoint mode is set.

ステップＳ２０６：ＣＰＵ１０は、撮影者視点モードの場合、デジタルカメラ（不図示）の撮影者が聞くような音声で各音源の音声データを再生するために、例えば、ｍ番目のフレームにおけるフレームの動きベクトルおよび各音源の位置情報に基づいて、ｍ番目の
フレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を設定する。具体的には次のように設定する。 Step S206: In the case of the photographer viewpoint mode, the CPU 10 reproduces, for example, the voice data of each sound source with a voice heard by the photographer of the digital camera (not shown). And the position information of each sound source
The weighting factor WT (m, i) of each sound source in the frame is set. Specifically, it sets as follows.

図７（ａ）〜（ｃ）は、図６（ａ）〜（ｃ）に示すｋ番目、ｋ＋１番目、ｋ＋２番目のフレームにおけるフレームの動きベクトルの向きを、各フレームの中心に矢印で示す。また、図８（ａ）は、各フレームにおいて、フレームの動きベクトルの向きおよび各音源の位置情報に基づいて設定された、各音源の重み係数ＷＴ（ｍ，ｉ）の一覧を示す。なお、撮影者視点モードでは、フレームの動きベクトルの向きに一致し、フレームの中心に近い音源ほど大きな値の重み係数が設定される。 7A to 7C show the directions of the motion vectors of the frames in the k-th, k + 1-th and k + 2-th frames shown in FIGS. 6A to 6C by arrows at the centers of the respective frames. Further, FIG. 8A shows a list of weighting factors WT (m, i) of each sound source, which are set based on the direction of the motion vector of the frame and the position information of each sound source in each frame. In the photographer's viewpoint mode, the direction of the motion vector of the frame matches the direction of the frame, and a weighting factor with a larger value is set as the sound source is closer to the center of the frame.

すなわち、ｋ番目およびｋ＋１番目のフレームは、人物３０がフレームの中心に来るように撮像されたものであることから、図８（ａ）に示すように、人物３０の重み係数が一番大きな値に設定される。また、建物３１および自動車３２は、フレームの動きベクトルの向いた側にあり、且つ人物３０に近づくことから、建物３１および自動車３２の重み係数は、ｋ番目よりもｋ＋１番目のフレームの方が大きな値に設定される。一方、自動車３３は、ｋ番目およびｋ＋１番目のフレームでは画角外で、フレームの動きベクトルの向きとは反対側であることから、重み係数は小さい値のままに設定される。 That is, since the k-th and k + 1-th frames are captured so that the person 30 is at the center of the frame, as shown in FIG. 8A, the weight coefficient of the person 30 is the largest value. Set to In addition, since the building 31 and the car 32 are on the opposite side of the motion vector of the frame and approach the person 30, the weight coefficients of the building 31 and the car 32 are larger in the k + 1st frame than the kth Set to a value. On the other hand, since the automobile 33 is outside the angle of view in the k-th and k + 1-th frames and on the side opposite to the direction of the motion vector of the frame, the weighting factor is set to a small value.

一方、ｋ＋２番目のフレームは、自動車３３がフレームの中心に来るようにデジタルカメラ（不図示）がパンニングされて撮像されたものであることから、自動車３３の重み係数が一番大きな値に設定される。一方、人物３０、建物３１および自動車３２は、フレームの動きベクトルの向きとは反対側であることから、それぞれの重み係数は、ｋ番目およびｋ＋１番目のフレームに比べて小さな値に設定される。 On the other hand, the weight coefficient of the car 33 is set to the largest value because the k + 2 th frame is obtained by panning and imaging the digital camera (not shown) so that the car 33 is at the center of the frame. Ru. On the other hand, since the person 30, the building 31, and the car 32 are on the side opposite to the direction of the motion vector of the frame, the respective weighting coefficients are set to small values compared to the k-th and k + 1-th frames.

なお、図８（ａ）に示す各フレームにおける音源ごとの重み係数ＷＴ（ｍ，ｉ）の値および値の設定方法は一例であり、主要被写体の数、フレームの動きベクトルの大きさや向き等に応じて適宜設定されることが好ましい。 The setting method of the value and value of the weighting coefficient WT (m, i) for each sound source in each frame shown in FIG. 8A is an example, and the number and the size of the motion vector of the frame, the direction, etc. It is preferable to appropriately set accordingly.

ステップＳ２０７：ＣＰＵ１０は、被写体視点モードの場合、例えば、人物３０が聞くような音声で各音源の音声データを再生するために、人物３０の動きベクトルの向きおよび各音源の位置情報に基づいて、ｍ番目のフレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を設定する。具体的には次のように設定する。 Step S207: In the case of the object viewpoint mode, the CPU 10 reproduces, for example, the voice data of each sound source with a voice heard by the person 30, based on the direction of the motion vector of the person 30 and the position information of each sound source. The weighting factor WT (m, i) of each sound source in the m-th frame is set. Specifically, it sets as follows.

図９（ａ）〜（ｃ）は、図７の場合と同様に、図６（ａ）〜（ｃ）に示すｋ番目、ｋ＋１番目、ｋ＋２番目のフレームにおける人物３０の動きベクトルの向きを、各フレームの中心に矢印で示す。また、図８（ｂ）は、各フレームにおいて、人物３０の動きベクトルの向きおよび各音源の位置情報に基づいて設定された、各音源の重み係数ＷＴ（ｍ，ｉ）の一覧を示す。なお、被写体視点モードでは、人物の動きベクトルの向きに一致し、人物３０に近い音源ほど大きな値の重み係数が設定される。 9 (a) to 9 (c) show the direction of the motion vector of the person 30 in the k-th, k + 1-th and k + 2-th frames shown in FIGS. 6 (a) to 6 (c) as in FIG. Indicated by an arrow at the center of each frame. Further, FIG. 8B shows a list of weighting factors WT (m, i) of each sound source set based on the direction of the motion vector of the person 30 and the position information of each sound source in each frame. Note that, in the subject viewpoint mode, the direction of the motion vector of the person is matched, and a weighting factor having a larger value is set as the sound source is closer to the person 30.

すなわち、ｋ番目からｋ＋２番目のフレームにおける人物３０の動きベクトルは同じ向きであり、建物３１および自動車３２は人物３０に近づくことから、図８（ｂ）に示すように、建物３１および自動車３２の重み係数は、ｋ番目からｋ＋２番目のフレームに従い大きな値に設定される。一方、自動車３３は、ｋ番目とｋ＋１番目とのフレームにおいて画角外であることから、０．１と小さな値の重み係数に設定されている。また、自動車３３は、ｋ＋２番目のフレームの画角内で人物３０に接近しているが、人物３０の動きベクトルの向きと反対側であることから、他の音源の重み係数より小さな値に設定される。 That is, since the motion vectors of the person 30 in the kth to k + 2th frames are in the same direction, and the building 31 and the car 32 approach the person 30, as shown in FIG. The weighting factor is set to a large value in accordance with the kth to k + 2th frames. On the other hand, since the automobile 33 is out of the angle of view in the k-th and k + 1-th frames, it is set to a weighting factor as small as 0.1. In addition, the automobile 33 approaches the person 30 within the angle of view of the (k + 2) th frame, but since it is opposite to the direction of the motion vector of the person 30, it is set to a smaller value than the weighting factor of other sound sources. Be done.

なお、被写体視点モードでは、人物３０自身が聞くような音声で各音源の音声データを再生することから、図８（ｂ）に示すように、人物３０自身の音声は小さな音量で再生さ
れるように、例えば、０．５等の所定の値の重み係数が予め設定される。 In the subject viewpoint mode, since the voice data of each sound source is reproduced with a voice that the person 30 himself / herself hears, the voice of the person 30 itself is reproduced at a small volume, as shown in FIG.
For example, a weighting factor of a predetermined value, such as 0.5, is preset.

また、図８（ｂ）に示す各フレームにおける音源ごとの重み係数ＷＴ（ｍ，ｉ）の値および値の設定方法は一例であり、主要被写体の数、主要被写体の動きベクトルの大きさや向き等に応じて適宜設定されることが好ましい。 Further, the setting method of the value and value of the weighting coefficient WT (m, i) for each sound source in each frame shown in FIG. 8B is an example, and the number of main subjects, the size and direction of motion vectors of main subjects, etc. It is preferable to set appropriately according to

ステップＳ２０８：ＣＰＵ１０は、ステップＳ２０６またはステップＳ２０７において設定したｍ番目のフレームにおける各音源の重み係数ＷＴ（ｍ，ｉ）を、次式（２）を用いて、ｍ番目のフレームを含む隣接するＮ枚（＜動画のフレーム総数）のフレームにおける重み係数を時間軸方向に加重平均する。 Step S208: The CPU 10 sets the weighting factor WT (m, i) of each sound source in the mth frame set in step S206 or step S207 to the adjacent N including the mth frame using the following equation (2). Weighted averaging is performed in the time axis direction with respect to the weighting factors in the frames (<total number of frames of moving image).

ここで、係数ε（ｊ）は、ｊ番目のフレームにおける音源の重み係数がｍ番目のフレームに対して寄与する度合いを示し、ｊ＝ｍの時、最も寄与するように設定される。また、用いられるフレーム数Ｎは１０枚程度以下とし、加重平均の範囲ｊは、ｍからｍ＋Ｎ−１、ｍ−Ｎ＋１からｍ、またはｍ−Ｎ／２からｍ＋Ｎ／２等と適宜選択して行うことが好ましい。この加重平均により、ＣＰＵ１０は、各音源の音量をなめらかに変化させることができる。 Here, the coefficient ε (j) indicates the degree to which the weighting coefficient of the sound source in the j-th frame contributes to the m-th frame, and is set so as to most contribute when j = m. In addition, the number of frames N to be used is about 10 or less, and the range j of the weighted average is appropriately selected and performed from m to m + N−1, m−N + 1 to m, or m−N / 2 to m + N / 2, etc. Is preferred. By this weighted averaging, the CPU 10 can smoothly change the volume of each sound source.

ステップＳ２０９：ＣＰＵ１０は、ステップＳ２０８で加重平均された重み係数＜ＷＴ（ｍ，ｉ）＞を用いて、再生時の各フレームにおける各音源の音量を決める増幅率ＡＭＰ（ｍ，ｉ）を、次式（３）を用いて算出し再生制御データを生成する。
ＡＭＰ（ｍ，ｉ）＝＜ＷＴ（ｍ，ｉ）＞／（β×Ｄ（ｍ，ｉ））・・・（３）
ここで、係数βは、ｍ番目のフレームにおける各被写体距離Ｄ（ｍ，ｉ）を規格化する係数である。 Step S209: The CPU 10 uses the weight coefficient <WT (m, i)> weighted at step S208 to set the amplification factor AMP (m, i) for determining the volume of each sound source in each frame at the time of reproduction next. It calculates using Formula (3) and produces | generates reproduction | regeneration control data.
AMP (m, i) = <WT (m, i)> / (β × D (m, i)) (3)
Here, the coefficient β is a coefficient that normalizes each subject distance D (m, i) in the m-th frame.

ステップＳ２１０：ＣＰＵ１０は、音源ごとに音声データと再生制御データの増幅率ＡＭＰ（ｍ，ｉ）との積を計算して、各音源の再生用音声データを生成する。 Step S210: The CPU 10 calculates the product of the audio data and the amplification factor AMP (m, i) of the reproduction control data for each sound source, and generates reproduction audio data of each sound source.

ステップＳ２１１：ＣＰＵ１０は、出力装置１４の液晶モニタに再生表示するとともに、出力装置１４のスピーカに各音源の再生用音声データを音声として出力する。ＣＰＵ１０は、一連の処理を終了する。 Step S211: The CPU 10 reproduces and displays on the liquid crystal monitor of the output device 14, and outputs the reproduction audio data of each sound source as a voice to the speaker of the output device 14. The CPU 10 ends the series of processing.

このように、本実施形態では、撮像された各フレームにおけるフレームまたは主要被写体の動きベクトルに応じて各音源の重み係数を設定することにより、各音源の音声データに効果や演出を施して再生することができる。
《実施形態の補足事項》
（１）本発明の再生処理装置は、再生処理プログラムをコンピュータ１００に実行させることにより実現させたが、本発明はこれに限定されない。本発明に係る再生処理装置における処理をコンピュータ１００で実現するための再生処理プログラムおよびそれを記録した媒体に対しても適用可能である。 As described above, in the present embodiment, the sound data of each sound source is reproduced by applying effects and effects by setting the weight coefficient of each sound source according to the motion vector of the frame or main subject in each frame captured. be able to.
<< Supplementary items of the embodiment >>
(1) The reproduction processing apparatus of the present invention is realized by causing the computer 100 to execute the reproduction processing program, but the present invention is not limited to this. The present invention is also applicable to a reproduction processing program for realizing the processing in the reproduction processing apparatus according to the present invention by the computer 100 and a medium having the recording program recorded thereon.

また、本発明の再生処理プログラムを有したデジタルカメラに対しても適用可能である
。なお、デジタルカメラが本発明の画像処理装置として動作する場合、ＣＰＵ１０は、被写体検出部２０、シーン判定部２１および動きベクトル算出部２２の各処理をソフトウエア的に実現してもよいし、ＡＳＩＣを用いてこれらの各処理をハードウエア的に実現してもよい。この場合、デジタルカメラにより撮像された静止画像や動画のヘッダ領域には、露出条件とともに、被写体情報、音声データ、位置情報および再生制御データが付加されることが好ましい。なお、ヘッダ領域に付加される音声データは、音源ごとに抽出されたものでもよいし、抽出する前のステレオ録音されたものでもよい。 The present invention is also applicable to a digital camera having the reproduction processing program of the present invention.
. When the digital camera operates as the image processing apparatus of the present invention, the CPU 10 may realize each processing of the subject detection unit 20, the scene determination unit 21 and the motion vector calculation unit 22 as software, or an ASIC Each of these processes may be realized in hardware using In this case, it is preferable that subject information, audio data, position information, and reproduction control data be added to the header region of a still image or a moving image captured by a digital camera, together with the exposure condition. The audio data added to the header area may be extracted for each sound source or may be stereo-recorded before extraction.

（２）上記実施形態では、コンピュータ１００が、各音源の音声データの再生とともに、静止画像や動画を再生表示したが、本発明はこれに限定されず、各音源の音声データのみ再生してもよい。 (2) In the above embodiment, the computer 100 reproduces and displays the still image and the moving image together with the reproduction of the sound data of each sound source. However, the present invention is not limited to this. Good.

（３）上記実施形態では、シーン判定結果または再生モードに応じて、各音源の重み係数を設定し再生制御データおよび再生音声データを生成したが、本発明はこれに限定されない。例えば、ＣＰＵ１０は、フレームや主要被写体の動きベクトルに基づいて、ドップラー効果等を考慮して再生制御データおよび再生音声データを生成してもよい。 (3) In the above embodiment, according to the scene determination result or the reproduction mode, the weighting factor of each sound source is set to generate reproduction control data and reproduction sound data, but the present invention is not limited to this. For example, the CPU 10 may generate reproduction control data and reproduction sound data in consideration of the Doppler effect or the like based on a frame or a motion vector of a main subject.

（４）上記他の実施形態では、動きベクトル算出部２２が、隣接する２つのフレームに対する相関処理に基づいて、フレームおよび各主要被写体の動きベクトルを算出したが、本発明はこれに限定されない。例えば、H.264等の動画形式で動画圧縮された動画の場合
には、圧縮効率を高めるために、フレーム間予測における動き補償において動きベクトルが算出される。そこで、動きベクトル算出部２２は、その動きベクトルに基づいて、フレームおよび各主要被写体の動きベクトルを求めてもよい。 (4) In the other embodiment described above, the motion vector calculation unit 22 calculates the motion vector of the frame and each main subject based on the correlation processing for two adjacent frames, but the present invention is not limited to this. For example, in the case of a moving image compressed in the moving image format such as H.264, a motion vector is calculated in motion compensation in inter-frame prediction in order to improve compression efficiency. Therefore, the motion vector calculation unit 22 may obtain the motion vector of the frame and each main subject based on the motion vector.

また、被写体追尾機能により検出した被写体の動きを用いて動きベクトルを算出してもよい。 Also, the motion vector may be calculated using the motion of the subject detected by the subject tracking function.

また、デジタルカメラ（不図示）が加速度センサや電子ジャイロ等のセンサを備える場合、動きベクトル算出部２２は、そのセンサの出力値に基づいて、フレームの動きベクトルを算出してもよい。 When a digital camera (not shown) includes a sensor such as an acceleration sensor or an electronic gyro, the motion vector calculation unit 22 may calculate a motion vector of a frame based on the output value of the sensor.

（５）上記他の実施形態では、被写体視点モードにおいて、追従する主要被写体を人物３０としたが、本発明はこれに限定されず、建物３１や自動車３２等の主要被写体を追従してもよい。 (5) In the other embodiment described above, the main subject to be followed is the person 30 in the subject viewpoint mode, but the present invention is not limited to this and the main subject such as the building 31 or the car 32 may be followed .

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲が、その精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図する。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずであり、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物によることも可能である。 The features and advantages of the embodiments will be apparent from the foregoing detailed description. This is intended that the scope of the claims extend to the features and advantages of the embodiments as described above without departing from the spirit and scope of the invention. In addition, those skilled in the art should be able to easily conceive of any improvements and modifications, and there is no intention to limit the scope of the embodiments having the invention to those described above, and the embodiments With appropriate modifications and equivalents which fall within the scope disclosed in.

１０ＣＰＵ、１１記憶部、１２入出力Ｉ／Ｆ、１３バス、１４出力装置、１５
入力装置、２０被写体検出部、２１シーン判定部、２２動きベクトル算出部、１００コンピュータ 10 CPU, 11 storage units, 12 input / output I / F, 13 buses, 14 output devices, 15
Input device 20 subject detection unit 21 scene determination unit 22 motion vector calculation unit 100 computer

Claims

An input unit for reading audio data recorded at the time of processing of image data generated by imaging an object;
Whether or not the sound source is within the angle of view of the displayed image is determined based on the angle of view of the image and the position information of the sound source, and based on whether or not it is within the angle of view, A control unit that generates reproduction control data that controls the volume of audio data;
A reproduction processing apparatus comprising:

In the reproduction processing apparatus according to claim 1,
The control unit determines a direction of a sound source of at least one sound from the sound data to generate sound data of the sound source, and generates the sound data of the sound source based on the direction of the sound source and whether the sound source is within an angle of view. A reproduction processing apparatus that generates the reproduction control data by weighting.

In the reproduction processing apparatus according to claim 2,
The control unit obtains the direction of each of the plurality of sound sources from the sound data to generate sound data of each of the sound sources, and based on whether or not the direction of each sound source and the sound source is within an angle of view A reproduction processing apparatus that generates the reproduction control data by weighting a sound source.

In the reproduction processing apparatus according to claim 2 or 3,
A scene determination unit that determines a scene of the image data;
The control processing unit is configured to weight the sound source based on the determination result of the scene determination unit together with the direction of the sound source and whether or not the sound source is within an angle of view.

In the reproduction processing apparatus according to claim 2 or 3,
The input unit inputs a plurality of continuously captured image data;
The control unit
The image processing apparatus further comprises a motion vector calculation unit that detects a motion associated with panning of the imaging device when capturing an object, and calculates a motion vector based on the plurality of image data.
The control unit performs weighting on the sound source in each frame based on the direction of the sound source, whether or not the sound source is within an angle of view, and the motion vector.

The reproduction processing apparatus according to any one of claims 1 to 5.
A reproduction processing apparatus comprising a display unit for displaying an image based on the image data.

An imaging unit that captures an object scene and generates image data;
A microphone unit that receives voice and generates voice data;
A reproduction processing apparatus according to any one of claims 1 to 6.
An imaging device comprising:

An input procedure for reading audio data recorded at the time of processing of image data generated by imaging an object;
Whether or not the sound source is within the angle of view of the displayed image is determined based on the angle of view of the image and the position information of the sound source, and based on whether or not it is within the angle of view, A control procedure for generating reproduction control data for controlling the volume of audio data;
A playback processing program that causes a computer to execute.