JP2023035585A

JP2023035585A - Imaging apparatus, method for controlling imaging apparatus, and program

Info

Publication number: JP2023035585A
Application number: JP2021142563A
Authority: JP
Inventors: 達雄西野; Tatsuo Nishino
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2023-03-13

Abstract

To achieve reproduction of realistic sound.SOLUTION: An imaging apparatus has: imaging means; and connection means that is connected with a first microphone and a second microphone. The imaging apparatus adjusts the gain for a voice signal input from the first microphone based on the amount of defocus of an image picked up by the imaging means.SELECTED DRAWING: Figure 1

Description

本発明は、複数のマイクロホンからの音声を取得する撮像装置に関する。 The present invention relates to an imaging device that acquires sounds from a plurality of microphones.

近年、撮影と同時に複数のマイクロホン（以下、マイクとする）で収音することにより、被写体の音声や環境音などを画像と共に記録することができる撮像装置が普及している。また、特許文献１には、撮影時にフォーカスした方向からの音声を強調する手法が開示されている。 2. Description of the Related Art In recent years, imaging apparatuses have become widespread that can record the voice of a subject, environmental sounds, and the like together with an image by collecting sound with a plurality of microphones (hereinafter referred to as microphones) at the same time as the image is captured. Further, Japanese Patent Application Laid-Open No. 2002-200001 discloses a technique for emphasizing sound from a focused direction when photographing.

特開２０１２－１５６５１号公報JP 2012-15651 A

複数のマイクで収音された音を、撮影画像に合わせるようにそれぞれ調整することができれば、撮影画像に対し臨場感のある音の再生が可能になる。ここで、特許文献１に開示された技術を用いれば、被写体に対して左右或いは上下側を強調した音の再生は可能になるが、撮影装置の撮影方向つまり奥行き方向について臨場感のある音の再生は困難である。 If the sounds picked up by a plurality of microphones can be individually adjusted so as to match the captured image, it will be possible to reproduce the sound with a sense of realism with respect to the captured image. Here, if the technique disclosed in Patent Document 1 is used, it is possible to reproduce sound that emphasizes the left and right sides or the upper and lower sides of the subject. Reproduction is difficult.

そこで、本発明は、臨場感のある音の再生を実現可能にすることを目的とする。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to make it possible to reproduce sound with a sense of reality.

本発明の撮像装置は、撮像手段と、第１のマイクロホンと、第２のマイクロホンを接続する接続手段と、前記撮像手段によって撮像された画像のデフォーカス量に基づいて、前記第１のマイクロホンから入力された音声信号に対するゲインを調整する調整手段と、を有することを特徴とする。 The imaging device of the present invention includes an imaging means, a first microphone, a connection means for connecting a second microphone, and a defocus amount of an image captured by the imaging means from the first microphone. and adjusting means for adjusting the gain for the input audio signal.

本発明によれば、臨場感のある音の再生が実現可能となる。 According to the present invention, it is possible to reproduce realistic sounds.

撮像装置の構成例を示す図である。It is a figure which shows the structural example of an imaging device. 撮像装置の制御フローチャートである。4 is a control flowchart of the imaging device; 撮像装置を使用した撮影の一例を示す図である。It is a figure which shows an example of imaging|photography using an imaging device. 撮像装置で撮影した画像の一例を示す図である。It is a figure which shows an example of the image image|photographed with the imaging device. 撮影時に被写体が移動した場合の一例を示す図である。FIG. 10 is a diagram showing an example of a case in which a subject moves during shooting; 第４の実施形態に係る撮像装置の制御フローチャートである。FIG. 11 is a control flowchart of an imaging device according to a fourth embodiment; FIG. 第４の実施形態に係る撮像装置を使用した撮影の一例を示す図である。FIG. 11 is a diagram showing an example of photographing using an imaging device according to a fourth embodiment; FIG.

以下、本発明に係る実施形態を、図面を参照しながら説明する。以下の実施形態は本発明を限定するものではなく、また本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。実施形態の構成は、本発明が適用される装置の仕様や各種条件（使用条件、使用環境等）によって適宜修正又は変更され得る。また、後述する各実施形態の一部を適宜組み合わせて構成してもよい。以下の各実施形態において、同一の構成には同じ参照符号を付して説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments according to the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and not all combinations of features described in the embodiments are essential for the solution of the present invention. The configuration of the embodiment can be appropriately modified or changed according to the specifications of the device to which the present invention is applied and various conditions (use conditions, use environment, etc.). Also, a part of each embodiment described later may be appropriately combined. In each of the following embodiments, the same configurations are given the same reference numerals.

本実施形態では、撮影と同時に、複数のマイク（マイクロホン）で収音することにより、撮影対象の主被写体の音声や周囲の環境音等を記録可能な撮像装置を例に挙げて説明する。本実施形態では、複数のマイクとして、撮像装置の内蔵マイクと被写体近傍のマイク（被写体の人物が持つマイク）とを挙げるが、これは一例であり、撮像装置近傍のマイクや被写体とは異なる位置のマイクなど、さらに多くのマイクが含まれていてもよい。 In the present embodiment, an imaging apparatus capable of recording the voice of the main subject to be shot, the surrounding environmental sounds, etc. by collecting sounds with a plurality of microphones at the same time as shooting will be described as an example. In the present embodiment, the multiple microphones include the built-in microphone of the imaging device and the microphone near the subject (the microphone held by the person of the subject), but this is just an example. more microphones may be included, such as

ここで、複数のマイクで収音されたそれぞれの音を、撮影画像に適した音に調整することができれば、撮影画像に対し臨場感のある音の再生が可能となる。例えば被写体までの距離に応じて、被写体近傍のマイクで収音された音と撮影装置の内蔵マイクで収音された音の、それぞれの音量を適切に調整できれば、臨場感のある音の再生が可能になると考えられる。しかしながら、一般的には、撮影画像に合わせるように音を調整すること、例えば撮影対象の主被写体の音声と周囲の環境音とにそれぞれ適した音量調整を行うようなことは、複雑で時間がかかる作業であり、ユーザにとって負担の大きな作業である。また、動画の撮影中に被写体が動いた場合、その動いた被写体の画像に対して最適な音量にならないことが多い。この場合、動画および音声の記録を行った後に、ユーザは、再生した動画を見ながら各マイクによる音の調整を行うというような音量調整作業を行わなければならなくなる。さらに、配置されるマイクの数が多いなるほど、再生される音の空間領域は広がるが、各マイクに対する音量の調整が複雑になるため、撮影後に音量を調整する作業負荷がさらに重くなる。一方、特許文献１の技術のように、フォーカスした方向からの音声を強調することも考えられるが、特許文献１の技術では、撮影装置の撮影方向つまり奥行き方向について音を調整することはできず、撮影画像に対して臨場感のある音の再生は困難である。 Here, if each sound picked up by a plurality of microphones can be adjusted to a sound suitable for a captured image, it is possible to reproduce sound with a sense of realism with respect to the captured image. For example, if the volume of the sound picked up by the microphone near the subject and the sound picked up by the camera's built-in microphone can be appropriately adjusted according to the distance to the subject, sound reproduction with a sense of realism can be achieved. It is considered possible. However, in general, it is complicated and time-consuming to adjust the sound so that it matches the captured image, for example, to adjust the volume appropriately for the sound of the main subject being photographed and the surrounding environmental sound. This is such work, and it is a heavy burden on the user. In addition, when a subject moves while shooting a moving image, the sound volume is often not optimal for the image of the moving subject. In this case, after the video and audio are recorded, the user has to adjust the volume by adjusting the sound of each microphone while watching the reproduced video. Furthermore, the more microphones are placed, the wider the spatial area of the reproduced sound becomes, but the volume adjustment for each microphone becomes more complicated, and the workload of adjusting the volume after shooting becomes even heavier. On the other hand, as in the technique of Patent Document 1, it is conceivable to emphasize the sound from the focused direction. However, it is difficult to reproduce sound with a sense of realism with respect to the captured image.

＜第１の実施形態＞
そこで、第１の実施形態に係る撮像装置は、臨場感のある音の再生を実現可能にするために、以下に説明する構成を有し、以下に説明する処理を行うことで、撮影画像に対して遠近感のはっきりした臨場感のある音の再生を実現可能とする。第１の実施形態の撮像装置は、撮影画像から撮影対象となっている主被写体とそれ以外の背景とを分離し、撮影画像の画素座標毎のデフォーカス量（フォーカス外れ量）を表すデフォーカスマップを基に、背景の暈け具合を判定する。すなわち撮像装置は、主被写体にピントが合っている場合において、背景のデフォーカス量が大きくなるほど当該背景の暈け具合は大きくなっていると判定する。ここで、撮像装置の絞り値が小さくなるほど背景の暈けは大きくなり、Ｆ値が小さくなるほど背景の暈けは大きくなり、焦点距離が長いほど背景の暈けは大きくなる。このため本実施形態の撮像装置は、背景の暈け具合を表すデフォーカス量に、撮影時の絞り値、Ｆ値、および焦点距離を加味して、音声データに対するゲインを決定し、そのゲインを基に音量を調整した後の音声データを、撮影画像のデータと共に記録する。本実施形態の撮像装置は、背景のデフォーカス量が大きくなるほど、撮像装置の内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に大きくし、そのゲインを基に音量を調整した後の音声データを記録する。言い換えると、撮像装置は、背景のデフォーカス量が小さくなるほど、内蔵マイクで収音された音声のゲインを、被写体近傍の外部マイクで収音された音声のゲインに対して相対的に小さくし、そのゲインを基に音量を調整した後の音声データを記録する。なお本実施形態の撮像装置は、背景のデフォーカス量が予め定めた閾値より大きい場合に、内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に大きくしてもよい。言い換えると、撮像装置は、背景のデフォーカス量が予め定めた閾値以下である場合には、内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に小さくしてもよい。 <First Embodiment>
Therefore, the imaging apparatus according to the first embodiment has the configuration described below in order to enable realization of sound reproduction with a sense of presence, and performs the processing described below to reproduce the captured image. On the other hand, it is possible to realize sound reproduction with a clear sense of perspective and presence. The imaging apparatus of the first embodiment separates the main subject, which is the object of photography, from the background other than the main subject from the photographed image, and defocus amount (out-of-focus amount) representing the defocus amount (out-of-focus amount) for each pixel coordinate of the photographed image. Based on the map, the degree of blurring of the background is determined. That is, when the main subject is in focus, the imaging device determines that the greater the defocus amount of the background, the greater the degree of blurring of the background. Here, the smaller the aperture value of the imaging device, the larger the background blur, the smaller the F number, the larger the background blur, and the longer the focal length, the larger the background blur. For this reason, the imaging apparatus of the present embodiment determines the gain for the audio data by adding the aperture value, the F number, and the focal length at the time of shooting to the defocus amount representing the degree of blurring of the background. The audio data after volume adjustment is recorded together with the captured image data. The imaging apparatus of this embodiment increases the gain for sound picked up by the built-in microphone of the imaging apparatus relative to the gain for sound picked up by an external microphone near the main subject as the defocus amount of the background increases. , and record the audio data after adjusting the volume based on the gain. In other words, the smaller the defocus amount of the background, the smaller the gain of the sound picked up by the built-in microphone relative to the gain of the sound picked up by the external microphone near the subject. Audio data after adjusting the volume based on the gain is recorded. Note that when the defocus amount of the background is larger than a predetermined threshold value, the imaging apparatus of the present embodiment sets the gain for the sound picked up by the internal microphone relative to the gain for the sound picked up by the external microphone near the main subject. can be significantly larger. In other words, when the defocus amount of the background is equal to or less than a predetermined threshold value, the imaging apparatus sets the gain for the sound picked up by the built-in microphone relative to the gain for the sound picked up by the external microphone near the main subject. can be made smaller.

図１は前述したことを実現する第１の実施形態の撮像装置の構成例を示したブロック図である。
撮影光学系ユニット１０２は、フォーカスレンズユニット、ズームレンズユニット、おおび絞り・シャッタユニット等を含む。アクチュエータ制御部１０３は、モータドライバＩＣを含み、撮影光学系ユニット１０２のフォーカスレンズユニット、ズームレンズユニット、および絞り・シャッタユニット等の各種アクチュエータを駆動する。各種アクチュエータは、後述する中央制御部１０１が生成するアクチュエータ駆動指示データに基づいて駆動される。 FIG. 1 is a block diagram showing a configuration example of an imaging apparatus according to a first embodiment that implements the above.
The photographing optical system unit 102 includes a focus lens unit, a zoom lens unit, an aperture/shutter unit, and the like. The actuator control unit 103 includes a motor driver IC, and drives various actuators of the imaging optical system unit 102, such as a focus lens unit, a zoom lens unit, and an aperture/shutter unit. Various actuators are driven based on actuator drive instruction data generated by the central control unit 101, which will be described later.

撮像部１０４は、ＣＭＯＳセンサやＣＣＤセンサなどの撮像素子を含み、撮影光学系ユニット１０２によって撮像素子面に結像された光学像を光電変換して撮像信号（電気信号）とし、その撮像信号を撮像処理部１０５に出力する。撮像処理部１０５は、撮像部１０４から入力された撮像信号をデジタル画像信号（画像データ）に変換して画像処理部１０６に出力する。 The imaging unit 104 includes an imaging device such as a CMOS sensor or a CCD sensor, photoelectrically converts an optical image formed on the imaging device surface by the imaging optical system unit 102 into an imaging signal (electrical signal), and converts the imaging signal into an electrical signal. Output to the imaging processing unit 105 . The imaging processing unit 105 converts the imaging signal input from the imaging unit 104 into a digital image signal (image data) and outputs the digital image signal (image data) to the image processing unit 106 .

画像処理部１０６は、撮像処理部１０５から入力された画像データに対し、用途に応じた画像処理を行う。画像処理部１０６にて行われる画像処理としては、例えば、画像切り出し処理と回転処理による電子防振処理や、画像から被写体を検出する被写体検出処理、画像中の被写体（人物等）の占める面積率を計算する処理等が含まれる。これらの画像処理は既存の処理であるためそれらの詳細な説明は省略する。画像処理部１０６は、画像処理後の画像データを中央制御部１０１に出力する。 The image processing unit 106 performs image processing on the image data input from the imaging processing unit 105 according to the application. The image processing performed by the image processing unit 106 includes, for example, electronic anti-vibration processing by image clipping processing and rotation processing, subject detection processing for detecting a subject from an image, area ratio of a subject (person, etc.) in an image. , and the like are included. Since these image processes are existing processes, detailed description thereof will be omitted. The image processing unit 106 outputs image data after image processing to the central control unit 101 .

収音部１０８は、当該撮像装置の内蔵マイク（カメラ内蔵マイクロホン）を含む音声取得部であり、内蔵マイクにて取り込んだ音声を電気信号に変換し、更にデジタル音声信号（音声データ）に変換して音声処理部１０７に出力する。第１の実施形態の場合、撮像装置の内蔵マイクは第１のマイクロホンに相当する。 The sound pickup unit 108 is a sound acquisition unit that includes a built-in microphone (camera built-in microphone) of the imaging apparatus, converts sound captured by the built-in microphone into an electric signal, and further converts it into a digital sound signal (sound data). and output to the audio processing unit 107 . In the case of the first embodiment, the built-in microphone of the imaging device corresponds to the first microphone.

外部収音装置２００は、当該撮像装置に無線接続される外部音声取得部（外部マイクロホン）であり、収音した音声データを含む無線信号を送信する。第１の実施形態の場合、外部収音装置２００の外部マイクは第２のマイクロホンに相当する。外部収音装置２００は例えばワイヤレスマイクであり、当該ワイヤレスマイクから送信された無線信号は撮像装置１００内の無線部１０９によって受信される。なお、外部収音装置２００は、例えばフォーンプラグを有する有線ケーブルマイクであってもよい。本実施形態では、外部収音装置２００がワイヤレスマイクである場合を例に挙げて説明する。 The external sound pickup device 200 is an external sound acquisition unit (external microphone) wirelessly connected to the imaging device, and transmits a radio signal including collected sound data. In the case of the first embodiment, the external microphone of the external sound pickup device 200 corresponds to the second microphone. The external sound pickup device 200 is, for example, a wireless microphone, and a wireless signal transmitted from the wireless microphone is received by the wireless unit 109 inside the imaging device 100 . Note that the external sound pickup device 200 may be, for example, a wired cable microphone having a phone plug. In this embodiment, a case where the external sound pickup device 200 is a wireless microphone will be described as an example.

無線部１０９は、Ｗｉｆｉ（登録商標）やＢＬＥ（Bluetooth LE, Bluetoothは登録商標）などの無線規格に準拠して画像データ等の無線通信を行う。本実施形態の撮像装置の場合、無線部１０９は外部収音装置２００から音声データを含む無線信号を受信する。無線部１０９は、無線信号を受信データに変換して中央制御部１０１に送る。このときの中央制御部１０１は、無線部１０９より入力された受信データから、外部収音装置２００にて収音された音声データを分離し、さらに音声処理部１０７で処理可能な音声データに変換して音声処理部１０７に出力する。 The wireless unit 109 performs wireless communication of image data and the like in compliance with wireless standards such as Wifi (registered trademark) and BLE (Bluetooth LE, Bluetooth is a registered trademark). In the imaging apparatus of this embodiment, the wireless unit 109 receives wireless signals including audio data from the external sound pickup device 200 . Radio section 109 converts the radio signal into reception data and sends it to central control section 101 . At this time, the central control unit 101 separates the audio data collected by the external sound pickup device 200 from the received data input from the radio unit 109, and further converts it into audio data that can be processed by the audio processing unit 107. and output to the audio processing unit 107 .

なお外部収音装置２００が有線ケーブルマイクである場合、当該有線ケーブルマイクにて収音されたアナログ音声信号は撮像装置１００のマイクジャック経由で当該撮像装置に入力される。撮像装置１００に入力されたアナログ音声信号は、不図示のアナログデジタル変換器によってデジタル音声信号（音声データ）に変換された後、音声処理部１０７に入力される。 Note that when the external sound pickup device 200 is a wired cable microphone, an analog audio signal picked up by the wired cable microphone is input to the imaging device 100 via the microphone jack of the imaging device 100 . An analog audio signal input to the imaging apparatus 100 is converted into a digital audio signal (audio data) by an analog-to-digital converter (not shown), and then input to the audio processing unit 107 .

音声処理部１０７は、収音部１０８と外部収音装置２００にてそれぞれ収音された音声データに対して音声処理を行う。本実施形態の場合、音声処理部１０７では、後述するようにして中央制御部１０１で決定されるゲインを基に音量を調整するような音声処理が行われる。 The sound processing unit 107 performs sound processing on the sound data collected by the sound collection unit 108 and the external sound collection device 200 respectively. In the case of this embodiment, the audio processing unit 107 performs audio processing such as adjusting the volume based on the gain determined by the central control unit 101 as will be described later.

中央制御部１０１は、ＣＰＵとＲＯＭおよびＲＡＭを含む制御ユニットであり、撮像装置１００全体を制御し、また各種の演算処理を行う。なお、ＣＰＵ、ＲＯＭおよびＲＡＭ等の図示は省略している。ＲＯＭには本実施形態に係る制御プログラムや初期設定値などが記憶されている。中央制御部１０１における演算処理は、本実施形態に係る制御プログラムを実行することにより実現される。中央制御部１０１のＣＰＵは、本実施形態に係る制御プログラムをＲＯＭから読み出してＲＡＭに展開して実行する。本実施形態の場合、ＣＰＵは、撮影画像のデフォーカスマップの解析処理、デフォーカス量を基に背景の暈け具合を判定し、さらに絞り値、Ｆ値、および焦点距離を加味して音量を調整するためのゲイン算出処理などを実行する。中央制御部１０１のＣＰＵが制御プログラムを実行することによって実現される本実施形態に係る処理の詳細は後述する。 A central control unit 101 is a control unit including a CPU, a ROM, and a RAM, controls the entire imaging apparatus 100, and performs various kinds of arithmetic processing. Illustrations of the CPU, ROM, RAM, etc. are omitted. The ROM stores control programs, initial set values, and the like according to the present embodiment. Arithmetic processing in the central control unit 101 is realized by executing the control program according to the present embodiment. The CPU of the central control unit 101 reads the control program according to this embodiment from the ROM, develops it in the RAM, and executes it. In the case of this embodiment, the CPU analyzes the defocus map of the captured image, determines the degree of blurring of the background based on the defocus amount, and adjusts the sound volume in consideration of the aperture value, F-number, and focal length. Gain calculation processing and the like for adjustment are executed. The details of the processing according to this embodiment realized by the CPU of the central control unit 101 executing the control program will be described later.

操作部１１１は、撮像装置１００をユーザが操作するために設けられている各種ボタンやスイッチ、タッチパネルなどの操作デバイスと、それら操作デバイスに対するユーザ操作を操作信号として取得する操作信号取得部とを有する。なお、図示は省略しているが、操作部１１１の操作デバイスには、電源ボタン、シャッターボタンや動画撮影ボタンなどが含まれる。 The operation unit 111 includes operation devices such as various buttons, switches, and a touch panel provided for the user to operate the imaging apparatus 100, and an operation signal acquisition unit that acquires user operations on these operation devices as operation signals. . Although illustration is omitted, the operation devices of the operation unit 111 include a power button, a shutter button, a video shooting button, and the like.

記憶部１１２は、撮影により得られた画像データや、その撮影と同時に収音部１０８および外部収音装置２００が収音した音声データなどの種々のデータを、それぞれ関連付けて記憶する。記憶部１１２は記憶媒体を備えており、その記憶媒体は撮像装置の内部記憶媒体（内部メモリ）の他、撮像装置に着脱可能なＳＤカード等の記憶媒体も含まれる。 The storage unit 112 stores various data such as image data obtained by shooting and audio data picked up by the sound pickup unit 108 and the external sound pickup device 200 at the same time as the shooting in association with each other. The storage unit 112 includes a storage medium, and the storage medium includes an internal storage medium (internal memory) of the imaging apparatus and a storage medium such as an SD card that is removable from the imaging apparatus.

表示部１１３は、ＬＣＤ（液晶ディスプレイ）や有機ＥＬディスプレイなどのディスプレイ装置を備え、画像処理部１０６から出力された画像データに基づいて、必要に応じて画像表示を行う。また表示部１１３には、中央制御部１０１によって生成されたＵＩ（ユーザインタフェース）画面用の画像も表示される。操作部１１１がタッチパネルを含む場合、当該タッチパネルは表示部１１３の画面に併設されている。 A display unit 113 includes a display device such as an LCD (liquid crystal display) or an organic EL display, and displays images as necessary based on image data output from the image processing unit 106 . The display unit 113 also displays an image for a UI (user interface) screen generated by the central control unit 101 . When the operation unit 111 includes a touch panel, the touch panel is provided side by side with the screen of the display unit 113 .

入出力端子部１１４は、外部装置との間で通信信号および画像信号を入出力する。
音声再生部１１０はスピーカを備え、音声データを電気信号（アナログ音声信号）に変換し、そのアナログ音声信号によってスピーカを駆動することで音声を再生させる。
電源部１１５は、撮像装置の各部（各構成要素）に対し、それぞれの用途に応じた電源を供給する。
電源制御部１１６は、電源部１１５の異なる電源種を個別に立ち上げる制御や、立ち下げの制御を行う。 The input/output terminal unit 114 inputs and outputs communication signals and image signals with an external device.
The audio reproduction unit 110 includes a speaker, converts audio data into an electrical signal (analog audio signal), and drives the speaker with the analog audio signal to reproduce audio.
The power supply unit 115 supplies power to each unit (each component) of the image capturing apparatus according to its application.
The power supply control unit 116 performs control to individually start up or shut down the different types of power supplies of the power supply unit 115 .

以下、外部収音装置２００がワイヤレスマイクである場合を例に挙げ、当該外部収音装置２００が取得した音声データと、内蔵マイクである収音部１０８が取得した音声データとの音声処理について、図２～図４を参照しながら説明する。本実施形態では、前述したように外部収音装置２００は被写体近傍に配置されていて当該被写体の音声を収音し、収音部１０８は周囲の音を収音する。本実施形態では、特に外部収音装置２００は、被写体としての人物に装着されていて（若しくは人物が持っていて）、主に人物の話し声を収音し、収音部１０８は、主に撮像装置１００の周辺全体の音である環境音を収音する場合を想定して説明する。 Hereinafter, taking the case where the external sound collecting device 200 is a wireless microphone as an example, the audio processing of the audio data acquired by the external sound collecting device 200 and the audio data acquired by the sound collecting unit 108, which is a built-in microphone, Description will be made with reference to FIGS. 2 to 4. FIG. In this embodiment, as described above, the external sound pickup device 200 is arranged near the subject and picks up the sound of the subject, and the sound pickup unit 108 picks up the surrounding sounds. In this embodiment, the external sound pickup device 200 is worn by (or held by) a person as a subject, and mainly picks up the person's speaking voice, and the sound pickup unit 108 mainly picks up an image. A description will be given assuming a case of picking up environmental sound, which is the sound of the entire surroundings of the device 100 .

図２は、本実施形態に係る撮像装置１００の制御フローチャートである。図２のフローチャートに示した各ステップの処理は、中央制御部１０１のＣＰＵが実行する処理であるが、以下説明では記載の簡略化のために中央制御部１０１が実行するとして説明する。 FIG. 2 is a control flowchart of the imaging device 100 according to this embodiment. The processing of each step shown in the flowchart of FIG. 2 is processing executed by the CPU of the central control unit 101, but in the following description, it is assumed that the central control unit 101 executes the processing for simplification of description.

まず、撮像装置１００の操作部１１１の一部である動画撮影ボタンがユーザによって押下されると、撮像装置１００は動画撮影を開始する。動画撮影が開始されると、撮像部１０４にて撮像され、撮像処理部１０５による処理を経た画像信号が、画像処理部１０６に入力される。 First, when the user presses a movie shooting button that is part of the operation unit 111 of the imaging device 100, the imaging device 100 starts shooting a movie. When moving image shooting is started, an image signal is captured by the imaging unit 104 and processed by the imaging processing unit 105 and is input to the image processing unit 106 .

また動画撮影が開始されると、中央制御部１０１は、ステップＳ２０１の処理として、各種の初期設定値をＲＯＭから読み込む。初期設定値には、収音部１０８である内蔵マイクが収音した音声に対して初期設定されているゲインや、外部収音装置２００であるワイヤレスマイクが収音した音声に対する初期設定されているゲインの情報が含まれる。そして、収音部１０８で取得された音声データは音声処理部１０７に入力される。また、外部収音装置２００から無線通信にて送信されて無線部１０９が受信した受信データは、前述したように中央制御部１０１が音声処理部１０７で処理可能な音声データに変換してから当該音声処理部１０７に入力される。 Also, when moving image shooting is started, the central control unit 101 reads various initial setting values from the ROM as the process of step S201. The initial settings include the gain initially set for the sound picked up by the built-in microphone, which is the sound pickup unit 108, and the initial setting for the sound picked up by the wireless microphone, which is the external sound pickup device 200. Contains gain information. The audio data acquired by the sound pickup unit 108 is input to the audio processing unit 107 . In addition, the reception data transmitted from the external sound collecting device 200 by wireless communication and received by the radio unit 109 is converted into audio data that can be processed by the audio processing unit 107 by the central control unit 101 as described above, and then processed by the audio processing unit 107. It is input to the audio processing unit 107 .

次にステップＳ２０２において、中央制御部１０１は、画像処理部１０６を制御して、撮影画像から主被写体を検出する被写体検出処理を行わせる。本実施形態の場合、被写体検出処理では、動画の撮影画像から人物である主被写体の領域を検出する。すなわち被写体検出処理は、画像から主被写体と背景の領域を分離する処理を含む。
さらにステップＳ２０３において、中央制御部１０１は、画像処理部１０６を制御して画像解析処理を行わせることで、画像の中で主被写体以外の背景の領域のデフォーカス量を算出させる。すなわち中央制御部１０１は、背景のデフォーカス量を背景の暈け具合を表す情報として取得する。 Next, in step S202, the central control unit 101 controls the image processing unit 106 to perform subject detection processing for detecting the main subject from the captured image. In the case of this embodiment, in the subject detection process, the area of the main subject, which is the person, is detected from the captured image of the moving image. That is, the subject detection processing includes processing for separating the main subject and background regions from the image.
Furthermore, in step S203, the central control unit 101 controls the image processing unit 106 to perform image analysis processing, thereby calculating the defocus amount of the background area other than the main subject in the image. That is, the central control unit 101 acquires the defocus amount of the background as information representing the degree of blurring of the background.

次にステップＳ２０４において、中央制御部１０１は、撮影光学系ユニット１０２における絞り値、Ｆ値、および焦点距離などを取得する。絞り値、Ｆ値、および焦点距離は、撮像装置において一般に取得可能な情報であり、それらの詳細は省略する。本実施形態ではそれら絞り値、Ｆ値、および焦点距離をまとめて、撮像装置の補正情報と呼ぶ。また、中央制御部１０１はアクチュエータ制御部１０３を制御して撮影光学系ユニット１０２の有する複数の光学レンズを光軸方向に移動させることで、主被写体に対して合焦する制御を行う。さらに、中央制御部１０１は、主被写体に対する合焦時の光軸方向における光学レンズの位置から、主被写体から撮像装置１００までの距離（主被写体距離）を算出してもよい。 Next, in step S<b>204 , the central control unit 101 acquires the aperture value, F-number, focal length, and the like of the photographing optical system unit 102 . The aperture value, F-number, and focal length are information generally obtainable by the imaging device, and details thereof are omitted. In this embodiment, the aperture value, F-number, and focal length are collectively referred to as correction information of the imaging device. Further, the central control unit 101 controls the actuator control unit 103 to move the plurality of optical lenses of the imaging optical system unit 102 in the optical axis direction, thereby performing control for focusing on the main subject. Furthermore, the central control unit 101 may calculate the distance (main subject distance) from the main subject to the imaging device 100 from the position of the optical lens in the optical axis direction when focusing on the main subject.

次にステップＳ２０５において、中央制御部１０１は、ステップＳ２０３で取得した背景のデフォーカス量と、ステップＳ２０４で取得した撮像装置の補正情報とを基に、収音部１０８と外部収音装置２００の各マイクの音声に対するゲインを決定する。本実施形態の場合、外部収音装置２００からの音声に対するゲインについては、初期設定されているゲインをそのまま使用する（音量を変更しない）ものとする。一方、収音部１０８からの音声に対するゲインは、背景のデフォーカス量に撮像装置の補正情報を加味して決定される。本実施形態の場合、中央制御部１０１は、背景のデフォーカス量が大きくなるほど、収音部１０８にて取得された音声に対するゲインを、外部収音装置２００にて取得された音声に対するゲインより相対的に大きくする。 Next, in step S205, the central control unit 101 controls the sound pickup unit 108 and the external sound pickup device 200 based on the background defocus amount acquired in step S203 and the correction information of the imaging device acquired in step S204. Determines the gain for audio for each microphone. In the case of the present embodiment, as for the gain for the sound from the external sound collection device 200, the initially set gain is used as it is (the volume is not changed). On the other hand, the gain for the sound from the sound pickup unit 108 is determined by adding the correction information of the imaging device to the defocus amount of the background. In the case of this embodiment, the central control unit 101 increases the gain for the sound acquired by the sound pickup unit 108 relative to the gain for the sound acquired by the external sound pickup device 200 as the defocus amount of the background increases. increase in size.

そしてステップＳ２０６において、中央制御部１０１は、ステップＳ２０５の処理で決定したゲインに応じて音声処理部１０７で音量を調整した後の音声データと、撮影された画像データとを紐づけて記憶部１１２に記憶させる処理を実行する。 Then, in step S206, the central control unit 101 associates the captured image data with the audio data whose volume has been adjusted by the audio processing unit 107 according to the gain determined in step S205, and stores them in the storage unit 112. Execute the process to be stored in the

次にステップＳ２０７において、中央制御部１０１は、動画撮影ボタンが再度押されたことで、ユーザから動画撮影の停止指示が入力されたか否かを判定する処理を実行する。ステップＳ２０７において、中央制御部１０１は、動画撮影の停止指示が入力されたと判定した場合には図２のフローチャートの処理、つまり動画記録処理を終了する。一方、中央制御部１０１は、動画撮影ボタンが再度押されていない場合、つまりユーザから動画撮影の停止指示が入力されていない場合には、ステップＳ２０８以降の処理を実行する。 Next, in step S<b>207 , the central control unit 101 executes processing for determining whether or not the user has input an instruction to stop movie shooting by pressing the movie shooting button again. In step S207, when the central control unit 101 determines that an instruction to stop moving image shooting has been input, the process of the flowchart in FIG. 2, that is, the moving image recording process ends. On the other hand, if the moving image shooting button has not been pressed again, that is, if the user has not input an instruction to stop shooting the moving image, the central control unit 101 executes the processing from step S208 onward.

ステップＳ２０８において、中央制御部１０１は、ステップＳ２０３で取得した前回のデフォーカス量に変化があったか否かを判定する。中央制御部１０１は、ステップＳ２０８でデフォーカス量に変化が無いと判定した場合には、ステップＳ２０２に処理を戻し、ステップＳ２０２以降の処理を実行する。一方、中央制御部１０１は、ステップＳ２０８でデフォーカス量に変化があったと判定した場合には、ステップＳ２０９の処理を実行する。 In step S208, the central control unit 101 determines whether or not the previous defocus amount acquired in step S203 has changed. When the central control unit 101 determines in step S208 that there is no change in the defocus amount, the process returns to step S202, and the processes after step S202 are executed. On the other hand, when the central control unit 101 determines that the defocus amount has changed in step S208, the process of step S209 is executed.

ステップＳ２０９において、中央制御部１０１は、ステップＳ２０４で取得した前回の撮像装置の補正情報に変化があったか否かを判定する。中央制御部１０１は、ステップＳ２０９で撮像装置の補正情報に変化が無いと判定した場合には、ステップＳ２０２に処理を戻し、ステップＳ２０２以降の処理を実行する。一方、中央制御部１０１は、ステップＳ２０９で撮像装置の補正情報に変化があったと判定した場合には、ステップＳ２０７に処理を戻し、ステップＳ２０７以降の処理を実行する。 In step S209, the central control unit 101 determines whether or not the previous correction information of the imaging apparatus acquired in step S204 has changed. When the central control unit 101 determines in step S209 that there is no change in the correction information of the imaging apparatus, the process returns to step S202, and the processes after step S202 are executed. On the other hand, when the central control unit 101 determines in step S209 that there is a change in the correction information of the imaging device, the process returns to step S207, and the processes after step S207 are executed.

図３は本実施形態に係る撮像装置１００を使用した撮影状況の一例を示す図である。図３には、主被写体３０３と背景被写体３０４とが画角内に収まる構図で、撮像装置１００による撮影が行われている状態を示している。なお図３の例では、背景被写体が家である例を挙げているが、本実施形態における背景には家だけでなくさらに遠方の建造物や山など様々な被写体が含まれる。また図３の例では、撮像装置１００の内蔵マイクを便宜的にマイク３０１として表している。一方、主被写体３０３の近傍には外部収音装置２００としての外部マイク３０２が配されている。 FIG. 3 is a diagram showing an example of an imaging situation using the imaging apparatus 100 according to this embodiment. FIG. 3 shows a state in which the imaging apparatus 100 is shooting with a composition in which the main subject 303 and the background subject 304 are within the angle of view. In the example of FIG. 3, the background subject is a house, but the background in this embodiment includes not only the house but also various subjects such as distant buildings and mountains. Also, in the example of FIG. 3, the built-in microphone of the imaging apparatus 100 is represented as a microphone 301 for convenience. On the other hand, an external microphone 302 as an external sound pickup device 200 is arranged near the main subject 303 .

また、図４は、本実施形態の撮像装置が図３に示した構図で撮影した際の撮影画像内に写っている主被写体４０３と背景被写体４０４の写り具合の一例を示した図である。図４（Ａ）は主被写体４０３に対して背景被写体４０４の暈け具合が小さい例を表し、図４（Ｂ）は主被写体４０３に対して背景被写体４０４の暈け具合が大きい例を示している。すなわち図３の構図例のような撮影を行った場合において、図４（Ａ）のように主被写体４０３と背景被写体４０４が暈けることなくくっきりと写る場合と、図４（Ｂ）のように主被写体４０３に対して背景被写体４０４が暈けて写る場合とがある。例えば絞り値が大きい場合やＦ値が大きい場合、あるいは焦点距離が短い場合に、図４（Ａ）のように主被写体４０３と背景被写体４０４が暈けることなく写ることが多い。一方、例えば絞り値が小さい場合、Ｆ値が小さい場合、焦点距離が長い場合には、図４（Ｂ）のように主被写体４０３に対して背景被写体４０４が暈けて写ることが多い。したがって中央制御部１０１は、撮影画像のデフォーカスマップなどの解析結果を基に、背景の暈け具合をデフォーカス量で判定し、さらに絞り値、Ｆ値、および焦点距離を加味して音声に対するゲインを決定する。 FIG. 4 is a diagram showing an example of how the main subject 403 and the background subject 404 appear in the photographed image when the photographing apparatus according to the present embodiment photographs with the composition shown in FIG. FIG. 4A shows an example in which the background subject 404 blurs less than the main subject 403, and FIG. 4B shows an example in which the background subject 404 blurs more than the main subject 403. there is That is, when photographing is performed as in the composition example shown in FIG. 3, the main subject 403 and the background subject 404 are clearly photographed without blurring as shown in FIG. In some cases, the background subject 404 is blurred with respect to the main subject 403 . For example, when the aperture value is large, the F-number is large, or the focal length is short, the main subject 403 and the background subject 404 are often photographed without being blurred, as shown in FIG. 4A. On the other hand, for example, when the aperture value is small, the F number is small, or the focal length is long, the background subject 404 is often blurred with respect to the main subject 403 as shown in FIG. 4B. Therefore, the central control unit 101 determines the degree of blurring of the background by the defocus amount based on the analysis results such as the defocus map of the captured image. Determine gain.

本実施形態の場合、中央制御部１０１は、背景のデフォーカス量が大きくなるほど、外部マイクにて取得される主被写体の音声のゲインに対して、内蔵マイクにて取得される環境音のゲインが相対的に大きくなるようにする。これにより、背景の暈け具合が大きくなるほど、内蔵マイクにて取得される環境音の音量が、外部マイクにて取得される主被写体の音量に対して相対的に大きくなる。すなわち、主被写体３０１が撮像装置１００の近くにいる場合では、撮像装置１００は、収音部１０８によって収音される音を大きく記録されるようにして、相対的に主被写体３０１の周辺の音が大きく記録されるようにする。言い換えると、背景のデフォーカス量が小さくなるほど、中央制御部１０１は、外部マイクにて取得される主被写体の音声のゲインに対して、内蔵マイクにて取得される環境音のゲインが相対的に小さくなるようにする。これにより、背景の暈け具合が小さくなるほど、内蔵マイクにて取得される環境音の音量が、外部マイクにて取得される主被写体の音量に対して相対的に小さくなる。すなわち、主被写体３０１が撮像装置１００から離れている場合では、撮像装置１００は、収音部１０８によって収音される音を小さく記録されるようにして、相対的に主被写体３０１の周辺の音が小さく記録されるようにする。
これにより、撮像装置１００は被写体との距離に応じた自然な環境音を記録することができる。 In the case of this embodiment, the central control unit 101 increases the gain of the environmental sound acquired by the built-in microphone with respect to the gain of the sound of the main subject acquired by the external microphone as the defocus amount of the background increases. Make it relatively large. As a result, the volume of the environmental sound acquired by the built-in microphone increases relative to the volume of the main subject acquired by the external microphone as the degree of blurring of the background increases. That is, when the main subject 301 is near the imaging apparatus 100, the imaging apparatus 100 increases the sound picked up by the sound pickup unit 108 and records the sounds around the main subject 301 relatively. be recorded in large numbers. In other words, the smaller the defocus amount of the background, the more the central control unit 101 adjusts the gain of the environmental sound acquired by the built-in microphone relative to the gain of the sound of the main subject acquired by the external microphone. make it smaller. As a result, the smaller the degree of blurring of the background, the smaller the volume of the environmental sound acquired by the built-in microphone relative to the volume of the main subject acquired by the external microphone. That is, when the main subject 301 is far away from the imaging apparatus 100, the imaging apparatus 100 records the sound picked up by the sound pickup unit 108 at a low volume, so that the surrounding sounds of the main subject 301 are recorded relatively. so that is recorded small.
As a result, the imaging device 100 can record natural environmental sounds according to the distance to the subject.

なお、中央制御部１０１は、背景のデフォーカス量が予め定めた閾値より大きい場合に、内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に大きくしてもよい。言い換えると、中央制御部１０１は、背景のデフォーカス量が予め定めた閾値以下である場合には、内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に小さくしてもよい。 Note that when the defocus amount of the background is greater than a predetermined threshold, the central control unit 101 sets the gain for the sound picked up by the built-in microphone relative to the gain for the sound picked up by the external microphone near the main subject. can be as large as In other words, when the defocus amount of the background is equal to or less than a predetermined threshold, the central control unit 101 sets the gain for the sound picked up by the built-in microphone to the gain for the sound picked up by the external microphone near the main subject. It may be relatively smaller.

＜第２の実施形態＞
次に、第２の実施形態の撮像装置１００について説明する。第２の実施形態の撮像装置１００の構成は前述した図１と同様であり、また第２の実施形態の撮像装置１００における制御フローチャートは前述した図２と同様であるため、それらの図示と説明は省略する。 <Second embodiment>
Next, the imaging device 100 of the second embodiment will be described. The configuration of the image pickup apparatus 100 of the second embodiment is the same as that of FIG. 1 described above, and the control flowchart in the image pickup apparatus 100 of the second embodiment is the same as that of FIG. are omitted.

図５は第２の実施形態の撮像装置１００による撮影状況の一例を示した図であり、主被写体５０３と背景被写体５０４とが画角内に収まる構図で、撮像装置１００による撮影が行われている状態を表している。第１の実施形態の場合、撮像装置１００は収音部１０８の内蔵マイクのみを備え、外部収音装置２００である外部マイクも一つの例を挙げたが、第２の実施形態では、外部収音装置２００としての複数の外部マイクが撮像装置１００に接続されている。図５の例では、外部収音装置２００に含まれる複数の外部マイクとして、外部マイク５０１、外部マイク５０２、および外部マイク５０５があり、それらが撮像装置１００に対して有線若しくは無線により接続されているとする。外部マイク５０１は撮像装置１００の近傍に配されており、外部マイク５０２は主被写体５０３の近傍に、外部マイク５０５は背景被写体５０４の近傍にそれぞれ配されているとする。第２の実施形態の場合、それら各マイクのうち、撮像装置１００の内蔵マイク、撮像装置１００近傍の外部マイク５０１、および背景被写体５０４近傍の外部マイク５０１は、それぞれが周囲の環境音を収音する。すなわち第２の実施形態の場合、撮像装置１００の内蔵マイクは第１のマイクロホンに相当し、撮像装置１００近傍の外部マイク５０１、および背景被写体５０４近傍の外部マイク５０１は第３のマイクロホンに相当する。一方、主被写体５０３近傍の外部マイク５０２は、主被写体５０３の音声を収音する。すなわち第２の実施形態の場合、主被写体５０３近傍の外部マイク５０２は第２のマイクロホンに相当する。そして、第２の実施形態の撮像装置１００は、それら撮像装置１００の内蔵マイク、さらに外部マイク５０１、外部マイク５０２、および外部マイク５０５がそれぞれ収音した音声データを、撮影した画像とともに記録する。 FIG. 5 is a diagram showing an example of an imaging situation by the imaging apparatus 100 of the second embodiment, in which imaging is performed by the imaging apparatus 100 with a composition in which a main subject 503 and a background subject 504 are within the angle of view. It represents the state of being In the case of the first embodiment, the imaging device 100 includes only the built-in microphone of the sound pickup unit 108, and the external microphone that is the external sound pickup device 200 is also an example. A plurality of external microphones as the sound device 200 are connected to the imaging device 100 . In the example of FIG. 5, there are an external microphone 501, an external microphone 502, and an external microphone 505 as the plurality of external microphones included in the external sound pickup device 200, and these are connected to the imaging device 100 by wire or wirelessly. Suppose there is Assume that the external microphone 501 is arranged near the imaging apparatus 100, the external microphone 502 is arranged near the main subject 503, and the external microphone 505 is arranged near the background subject 504, respectively. In the case of the second embodiment, among these microphones, the built-in microphone of the imaging device 100, the external microphone 501 near the imaging device 100, and the external microphone 501 near the background subject 504 each pick up ambient environmental sounds. do. That is, in the case of the second embodiment, the built-in microphone of the imaging device 100 corresponds to the first microphone, and the external microphone 501 near the imaging device 100 and the external microphone 501 near the background object 504 correspond to the third microphone. . On the other hand, an external microphone 502 near the main subject 503 picks up the voice of the main subject 503 . That is, in the case of the second embodiment, the external microphone 502 near the main subject 503 corresponds to the second microphone. Then, the imaging device 100 of the second embodiment records audio data picked up by the built-in microphones of the imaging device 100 and the external microphones 501, 502, and 505 together with the captured image.

ここで、第２の実施形態においても前述の実施形態同様に、撮像装置１００は主被写体５０３に対する合焦制御を行うため、主被写体５０３が移動したとしても、当該主被写体５０３は暈けることなく撮影されるとする。一方、背景被写体５０４は合焦されないため、背景被写体５０４は暈けて写ることになり、当該背景被写体５０４の暈け具合は、撮像装置１００から主被写体５０３までの距離、さらには撮像装置１００における絞り値、Ｆ値、および焦点距離によって異なる。 Here, in the second embodiment, the imaging apparatus 100 performs focus control on the main subject 503 as in the above-described embodiments. Therefore, even if the main subject 503 moves, the main subject 503 does not blur. Suppose it is taken. On the other hand, since the background subject 504 is out of focus, the background subject 504 is blurred. Varies depending on aperture value, f-number, and focal length.

図５（Ａ）および図５（Ｂ）は、主被写体５０３と背景被写体５０４とが画角内に収まる構図において、その構図内で主被写体５０３が移動した場合の例を示している。図５（Ａ）は主被写体５０３が背景被写体５０４の近くにいて撮像装置１００から遠い位置に存在している例を示し、図５（Ｂ）は主被写体５０３が撮像装置１００に近い位置にいて背景被写体５０４から離れている例を示している。すなわち図５（Ａ）と図５（Ｂ）は、主被写体５０３が図５（Ａ）に示した位置から図５（Ｂ）に示した位置へ移動する場合、若しくは、図５（Ｂ）の位置から図５（Ａ）の位置へ移動する場合の例を示しているとする。ここで、撮像装置１００が主被写体５０３に対する合焦制御を行っている場合において、背景被写体５０４の暈け具合を表すデフォーカス量は、図５（Ａ）の例よりも図５（Ｂ）の方が大きくなる。そして、図５（Ａ）と図５（Ｂ）の例のように、撮像装置１００に対して遠近方向に主被写体５０３が移動した場合、その主被写体５０３の移動に応じて、背景被写体５０４のデフォーカス量は増減することになる。 FIGS. 5A and 5B show an example in which the main subject 503 moves within the composition in which the main subject 503 and the background subject 504 are within the angle of view. FIG. 5A shows an example in which the main subject 503 is near the background subject 504 and far from the imaging apparatus 100, and FIG. An example away from the background object 504 is shown. 5A and 5B show the case where the main subject 503 moves from the position shown in FIG. 5A to the position shown in FIG. It is assumed that an example of moving from the position to the position shown in FIG. 5(A) is shown. Here, when the imaging apparatus 100 is performing focus control on the main subject 503, the defocus amount representing the degree of blurring of the background subject 504 is as shown in FIG. becomes larger. 5A and 5B, when the main subject 503 moves in the perspective direction with respect to the image capturing apparatus 100, the background subject 504 changes according to the movement of the main subject 503. The defocus amount increases or decreases.

第２の実施形態においても前述の実施形態と同様に、撮像装置１００の中央制御部１０１は、撮影画像のデフォーカスマップのデフォーカス量を基に、背景被写体５０４の暈け具合を判定する。そして中央制御部１０１は、背景被写体５０４のデフォーカス量に撮像装置の補正情報を加味して、各マイクの音声に対するゲインを決定する。ただし第２の実施形態でも第１の実施形態の例と同様に、主被写体５０３近傍の外部マイク５０２の音声のゲインは調整しないとする。このため、中央制御部１０１は、背景被写体５０４のデフォーカス量を基に撮像装置１００の内蔵マイク、当該撮像装置１００近傍の外部マイク５０１、および背景被写体５０４近傍の外部マイク５０５の音声に対するゲインを決定する。すなわち中央制御部１０１は、背景被写体５０４におけるデフォーカス量が大きくなるほど、内蔵マイク、外部マイク５０１、および外部マイク５０５の音量を上げるようなゲインとする。一方、中央制御部１０１は、背景被写体５０４のデフォーカス量が小さくなるほど、内蔵マイク、外部マイク５０１、および外部マイク５０５の音量を下げるようなゲインとする。 In the second embodiment, similarly to the above embodiments, the central control unit 101 of the imaging apparatus 100 determines the degree of blurring of the background subject 504 based on the defocus amount of the defocus map of the captured image. Then, the central control unit 101 takes into account the defocus amount of the background object 504 and the correction information of the imaging device, and determines the gain for the sound of each microphone. However, in the second embodiment, as in the first embodiment, the gain of the sound of the external microphone 502 near the main subject 503 is not adjusted. Therefore, based on the defocus amount of the background subject 504, the central control unit 101 adjusts the gain for the sound of the built-in microphone of the image capturing apparatus 100, the external microphone 501 near the image capturing apparatus 100, and the external microphone 505 near the background subject 504. decide. That is, the central control unit 101 sets gains such that the volume of the built-in microphone, the external microphone 501, and the external microphone 505 increases as the defocus amount of the background object 504 increases. On the other hand, the central control unit 101 sets gains such that the volumes of the built-in microphone, the external microphone 501, and the external microphone 505 decrease as the defocus amount of the background object 504 decreases.

さらに第２の実施形態の場合、中央制御部１０１は、撮像装置１００に対して遠近方向に主被写体が移動した場合、背景被写体５０４のデフォーカス量を基に求めたゲインに対し、主被写体の移動によるデフォーカス量の増減分に相当するゲイン分を加減する。中央制御部１０１は、その差し引き後のゲインを基に、内蔵マイク、撮像装置１００近傍の外部マイク５０１、および背景被写体５０４近傍の外部マイク５０５の音量を調整するように音声処理部１０７を制御する。 Furthermore, in the case of the second embodiment, when the main subject moves in the perspective direction with respect to the imaging apparatus 100, the central control unit 101 adjusts the gain obtained based on the defocus amount of the background subject 504 to the main subject. The gain corresponding to the increase/decrease of the defocus amount due to movement is adjusted. The central control unit 101 controls the audio processing unit 107 so as to adjust the volume of the built-in microphone, the external microphone 501 near the imaging device 100, and the external microphone 505 near the background object 504 based on the gain after the subtraction. .

すなわち第２の実施形態の撮像装置１００では、撮像装置に対して遠近方向に主被写体が移動した場合、背景のデフォーカス量に撮像装置の補正情報を加味して算出したゲインに対し、主被写体の移動によるデフォーカス量の増減分に相当するゲイン分を加減する。これにより、第２の実施形態の撮像装置は、遠近方向に主被写体が移動した場合でも、撮影装置の撮影方向つまり奥行き方向について音を調整可能となり、撮影画像に対して遠近感がはっきりした臨場感のある音の再生が可能となる。 That is, in the imaging apparatus 100 of the second embodiment, when the main subject moves in the far and near direction with respect to the imaging apparatus, the main subject A gain corresponding to the increase or decrease of the defocus amount due to the movement of the is added or subtracted. As a result, even when the main subject moves in the perspective direction, the imaging apparatus of the second embodiment can adjust the sound in the imaging direction of the imaging apparatus, that is, in the depth direction. It is possible to reproduce sound with feeling.

なお、撮像装置に対して遠近方向に主被写体が移動した場合に、背景のデフォーカス量を基に求めたゲインに対して主被写体の移動によるデフォーカス量の増減分に相当するゲイン分を加減する処理は、前述した第１の実施形態の撮像装置にも適用可能である。 When the main subject moves in the perspective direction with respect to the imaging device, the gain corresponding to the increase or decrease in the defocus amount due to the movement of the main subject is added or subtracted from the gain obtained based on the defocus amount of the background. The processing to do so can also be applied to the imaging apparatus of the first embodiment described above.

＜第３の実施形態＞
次に第３の実施形態の撮像装置１００について説明する。第３の実施形態の撮像装置１００の構成は前述した図１と同様であり、また第３の実施形態の撮像装置１００における制御フローチャートは概ね前述した図２と同様であるため、それらの図示と説明は省略する。 <Third Embodiment>
Next, an imaging device 100 according to a third embodiment will be described. The configuration of the imaging apparatus 100 of the third embodiment is the same as that of FIG. 1 described above, and the control flowchart in the imaging apparatus 100 of the third embodiment is generally the same as that of FIG. 2 described above. Description is omitted.

前述した第１の実施形態では、背景の暈け具合が大きくなるほど、撮像装置の内蔵マイク等で収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に大きくしたが、逆に環境音に対するゲインを小さくしてもよい。言い換えると、背景のデフォーカス量が小さくなるほど、内蔵マイクで収音された音声のゲインを、被写体近傍の外部マイクで収音された音声のゲインに対して相対的に大きくしてもよい。また、背景のデフォーカス量が予め定めた閾値より大きい場合に、内蔵マイクで収音した音に対するゲインを、主被写体近傍の外部マイクで収音した音に対するゲインより相対的に小さくしてもよい。 In the first embodiment described above, the greater the degree of blurring of the background, the greater the gain for the sound picked up by the built-in microphone of the imaging device, etc. relative to the gain for the sound picked up by the external microphone near the main subject. Although the gain is increased, conversely, the gain for the environmental sound may be decreased. In other words, the smaller the defocus amount of the background, the larger the gain of the sound picked up by the built-in microphone relative to the gain of the sound picked up by the external microphone near the subject. Further, when the defocus amount of the background is larger than a predetermined threshold, the gain for the sound picked up by the built-in microphone may be relatively smaller than the gain for the sound picked up by the external microphone near the main subject. .

第３の実施形態の撮像装置１００は、背景のデフォーカス量を基に決定される音声のゲインを、ユーザが操作部１１１を介して任意に切り替え設定可能とする例である。第３の実施形態の撮像装置１００は、背景のデフォーカス量に応じて、内蔵マイクで収音された音声のゲインを外部収音装置２００で収音された音声のゲインより相対的に大きくする設定と、小さくする設定とを、ユーザ操作に応じて切り替え可能な機能を有する。 The imaging apparatus 100 of the third embodiment is an example in which the user can arbitrarily switch and set the audio gain determined based on the defocus amount of the background via the operation unit 111 . The imaging device 100 of the third embodiment makes the gain of the sound picked up by the built-in microphone relatively larger than the gain of the sound picked up by the external sound pickup device 200 according to the defocus amount of the background. It has a function that allows switching between setting and decreasing setting according to a user operation.

＜第４の実施形態＞
前述した第１～第３の実施形態では、撮像装置において撮影と同時に複数のマイクで収音した各音声の音量を背景の暈け具合に応じて調整し、その調整後の音声データを撮影画像のデータと共に記録する例を挙げた。第４の実施形態の撮像装置１００は、撮影と同時に複数のマイクで収音した各音声データは記録音量を調整せずにそのまま記録する。第４の実施形態では、後に動画編集が行われる際などに、背景のデフォーカス量と撮像装置の補正情報を基に音声データに対するゲインを決定して自動的に音量調整を行うような自動編集機能を備える装置について説明する。第４の実施形態に係る音量調整を伴う自動編集機能は、撮像装置１００が備えていてもよいし、パーソナルコンピュータあるいはスマートフォンやタブレット端末等の情報処理装置がアプリケーションプログラムの実行によって実現してもよい。 <Fourth Embodiment>
In the above-described first to third embodiments, the sound volume of each sound picked up by a plurality of microphones is adjusted according to the degree of blurring of the background at the same time as the image is captured by the imaging device, and the sound data after the adjustment is converted into the captured image. I gave an example of recording with the data of The imaging apparatus 100 of the fourth embodiment records each audio data picked up by a plurality of microphones at the same time as shooting without adjusting the recording volume. In the fourth embodiment, automatic editing such as automatically adjusting the volume by determining the gain for the audio data based on the defocus amount of the background and the correction information of the imaging device when the moving image is edited later. A device with functions will be described. The automatic editing function involving volume adjustment according to the fourth embodiment may be provided by the imaging device 100, or may be realized by an information processing device such as a personal computer, a smartphone, or a tablet terminal by executing an application program. .

本実施形態では、撮像装置１００が音量調整を伴う自動編集機能を備えている例を挙げて説明する。なお、第４の実施形態の撮像装置１００の構成は前述した図１と同様であるためその図示は省略する。 In the present embodiment, an example in which the imaging apparatus 100 has an automatic editing function with volume adjustment will be described. Note that the configuration of the imaging apparatus 100 of the fourth embodiment is the same as that of FIG. 1 described above, and therefore the illustration thereof is omitted.

図６は、第４の実施形態の撮像装置１００における制御フローチャートである。図６のフローチャートに示した各ステップの処理は、中央制御部１０１のＣＰＵが実行する処理であり、以下の説明では簡略化のために中央制御部１０１が実行するとして説明する。図７（Ａ）は、撮像装置１００において動画撮影および音声の収音とそれらの記録がなされるまでの制御フローチャートである。図７（Ｂ）は動画編集時に記録動画と音声を再生し、背景のデフォーカス量と撮像装置の補正情報を基に音声に対するゲインを決定して音量調整を行う自動編集処理と、その編集処理後の音声データを画像と紐づけして記録するまでの制御フローチャートである。なお、図７（Ａ）のステップＳ６０１～Ｓ６０４およびステップＳ６０７～Ｓ６０９の処理は、図２において対応した処理ステップであるステップＳ２０１～Ｓ２０４およびステップＳ２０７～Ｓ２０９の処理と同様であるため、それら説明は省略する。 FIG. 6 is a control flowchart in the imaging device 100 of the fourth embodiment. The processing of each step shown in the flowchart of FIG. 6 is processing executed by the CPU of the central control unit 101, and for the sake of simplification, it will be assumed that the processing is executed by the central control unit 101 in the following description. FIG. 7A is a control flow chart for moving image shooting, audio pickup, and recording thereof in the imaging apparatus 100 . FIG. 7B shows automatic editing processing in which a recorded moving image and audio are reproduced during moving image editing, and the gain for the audio is determined based on the defocus amount of the background and the correction information of the imaging device to adjust the volume, and the editing processing. FIG. 10 is a control flow chart until the later audio data is associated with an image and recorded. FIG. Note that the processing of steps S601 to S604 and steps S607 to S609 in FIG. 7A is the same as the processing of steps S201 to S204 and steps S207 to S209, which are the corresponding processing steps in FIG. omitted.

図７（Ａ）のステップＳ６０４の処理後、ステップＳ６０５において、中央制御部１０１は、収音部１０８と外部収音装置２００の各マイクに対する記録音量を決定する。第４の実施形態の場合、各マイクの記録音量は、それぞれのマイクに対して初期設定されているゲインに応じた音量である。 After the process of step S604 in FIG. 7A, in step S605, the central control unit 101 determines the recording volume for each microphone of the sound pickup unit 108 and the external sound pickup device 200. FIG. In the case of the fourth embodiment, the recording volume of each microphone is the volume corresponding to the gain initially set for each microphone.

次にステップＳ６０６において、中央制御部１０１は、各マイクで収音された音声データと、ステップＳ６０３で取得したデフォーカス量と、ステップＳ６０４で取得した撮像装置の補正情報とを、撮影画像のタイムコードと対応付けて記憶部１１２に記憶させる。 Next, in step S606, the central control unit 101 converts the sound data picked up by each microphone, the defocus amount acquired in step S603, and the correction information of the imaging device acquired in step S604 into the captured image time. It is stored in the storage unit 112 in association with the code.

次に図７（Ｂ）の動画編集処理の制御フローチャートについて説明する。なお本実施形態では、撮像装置１００が、背景のデフォーカス量と撮像装置の補正情報を基に音声のゲインを決定して音量を調整する自動編集機能を有する場合を例に挙げている。なお、本実施形態において、動画に対する編集処理については既存の処理であるためその詳細な説明は省略し、動画編集に伴って行われる音量調整処理を主に説明する。 Next, the control flowchart of the moving image editing process in FIG. 7B will be described. In this embodiment, an example is given in which the imaging apparatus 100 has an automatic editing function that determines the audio gain based on the background defocus amount and the correction information of the imaging apparatus and adjusts the volume. Note that in the present embodiment, editing processing for moving images is an existing processing, so a detailed description thereof will be omitted, and volume adjustment processing that is performed along with moving image editing will be mainly described.

ユーザから自動編集処理の開始が開始されると、ステップＳ６１１において、中央制御部１０１は、撮影画像のタイムコードを記憶部１１２から読み出す。それと同時に、中央制御部１０１は、そのタイムコードに対応して記録されている各マイクの音声データとデフォーカス量と撮像装置の補正情報とを、記憶部１１２から読み出す。各マイクの音声データは、撮像装置１００の内蔵マイクにて取得されて記録された音声データと、外部収音装置２００である外部マイクにて取得されて記録された音声データとである。 When the user starts the automatic editing process, the central control unit 101 reads the time code of the captured image from the storage unit 112 in step S611. At the same time, the central control unit 101 reads from the storage unit 112 the audio data of each microphone, the defocus amount, and the correction information of the imaging device recorded corresponding to the time code. The audio data of each microphone is the audio data acquired and recorded by the built-in microphone of the imaging device 100 and the audio data acquired and recorded by the external microphone which is the external sound pickup device 200 .

次にステップＳ６１２において、中央制御部１０１は、背景のデフォーカス量と撮像装置の補正情報とに基づいて各マイクの音声データの音量調整を行う。第４の実施形態において、中央制御部１０１は、内蔵マイクで収音されて記録された音声データに対しては、背景のデフォーカス量と撮像装置の補正情報とに基づいて決定したゲインで音量調整を行う。一方、中央制御部１０１は、外部マイクの音声データについては記録された音量の音声データとする。そして、中央制御部１０１は、内蔵マイクの音声つまり前述した調整後の音声と、外部マイクの音声との合成比をタイミングコードに応じたタイミング毎に決定して合成する。 Next, in step S612, the central control unit 101 adjusts the volume of the audio data of each microphone based on the background defocus amount and the correction information of the imaging device. In the fourth embodiment, the central control unit 101 adjusts the volume of audio data picked up by the built-in microphone with a gain determined based on the defocus amount of the background and the correction information of the imaging device. make adjustments. On the other hand, the central control unit 101 regards the voice data of the external microphone as voice data of the recorded volume. Then, the central control unit 101 determines and synthesizes the sound of the built-in microphone, that is, the sound after adjustment described above and the sound of the external microphone at each timing according to the timing code.

その後、ステップＳ６１３において、中央制御部１０１は、ステップＳ６１２の処理で合成した音声データと、画像データとを、編集処理後の画像及び音声として記憶部１１２に記憶させる。その後、中央制御部１０１は、自動編集処理を終了する。 After that, in step S613, the central control unit 101 stores the voice data synthesized in the process of step S612 and the image data in the storage unit 112 as the edited image and voice. After that, the central control unit 101 terminates the automatic editing process.

第４の実施形態では、音量調整を撮像装置が行う例を挙げて説明したが、音量調整を
パーソナルコンピュータやスマートフォン等の情報処理装置で実現する場合には、前述したような処理が情報処理装置において実行される。この場合の情報処理装置は、第４の実施形態に係る自動編集処理を実現するアプリケーションプログラムを実行する。ハードウェア構成の図示は省略するが、自動編集処理を実現する情報処理装置は、ＣＰＵ、ＲＯＭ、ＲＡＭ、補助記憶装置、表示部、操作部、通信Ｉ／Ｆ、及びバス等を有して構成される。ＣＰＵは、ＲＯＭやＲＡＭに格納されているコンピュータプログラムやデータを用いて、当該装置の全体を制御するとともに、前述した音量調整を含む自動編集処理を実行する。また情報処理装置は、ＣＰＵとは異なる１又は複数の専用のハードウェアを有していて、ＣＰＵによる処理の少なくとも一部を専用のハードウェアが実行する構成であってもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭは、変更を必要としないプログラムなどを格納する。ＲＡＭは、補助記憶装置から供給されるプログラムやデータ、及び通信Ｉ／Ｆを介して外部から供給されるデータなどを一時記憶する。補助記憶装置は、ＨＤＤやＳＳＤ等で構成され、画像データと音声データ、さらにタイムコード、デフォーカス量および撮像装置の補助情報、その他の制御パラメータなどの種々のデータを記憶する。表示部は、例えば液晶ディスプレイやＬＥＤディスプレイ等で構成され、ユーザが情報処理装置を操作するためのＧＵＩ（グラフィカルユーザインタフェース）などを表示する。操作部は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵに入力する。またＣＰＵは、表示部を制御する表示制御部、及び操作部を制御する操作制御部としても動作する。通信Ｉ／Ｆは、情報処理装置の外部の装置との通信に用いられる。例えば、情報処理装置がさらに外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆに接続される。情報処理装置が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆはアンテナを備える。バスは、情報処理装置の各部をつないで情報を伝達する。なお第４の実施形態の場合、情報処理装置と接続される外部の装置は、前述した撮像装置や他の情報処理装置等である。また表示部と操作部が情報処理装置の内部に存在するものとしたが、表示部と操作部との少なくとも一方が情報処理装置の外部に別の装置として存在していてもよい。また、情報処理装置は、表示部や操作部を必ずしも備えていなくてもよい。 In the fourth embodiment, an example in which the imaging device adjusts the volume has been described. is executed in The information processing apparatus in this case executes an application program that implements the automatic editing process according to the fourth embodiment. Although illustration of the hardware configuration is omitted, an information processing device that realizes automatic editing processing includes a CPU, a ROM, a RAM, an auxiliary storage device, a display unit, an operation unit, a communication I/F, a bus, and the like. be done. The CPU uses the computer programs and data stored in the ROM and RAM to control the entire apparatus and to perform the automatic editing process including the volume adjustment described above. Further, the information processing apparatus may have one or a plurality of pieces of dedicated hardware different from the CPU, and may be configured such that at least part of the processing by the CPU is executed by the dedicated hardware. Examples of dedicated hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors). The ROM stores programs and the like that do not require modification. The RAM temporarily stores programs and data supplied from the auxiliary storage device, data supplied from the outside via the communication I/F, and the like. The auxiliary storage device is composed of an HDD, SSD, or the like, and stores various data such as image data, audio data, time code, defocus amount, auxiliary information of the imaging device, and other control parameters. The display unit is composed of, for example, a liquid crystal display, an LED display, or the like, and displays a GUI (graphical user interface) or the like for the user to operate the information processing apparatus. The operation unit includes, for example, a keyboard, a mouse, a joystick, a touch panel, etc., and inputs various instructions to the CPU in response to user's operations. The CPU also operates as a display control section that controls the display section and as an operation control section that controls the operation section. The communication I/F is used for communication with an external device of the information processing device. For example, when the information processing device is further connected to an external device by wire, a communication cable is connected to the communication I/F. If the information processing device has a function of wirelessly communicating with an external device, the communication I/F has an antenna. The bus connects each part of the information processing device to transmit information. Note that in the case of the fourth embodiment, the external device connected to the information processing device is the imaging device described above, another information processing device, or the like. Further, although the display section and the operation section are assumed to exist inside the information processing apparatus, at least one of the display section and the operation section may exist as a separate device outside the information processing apparatus. Further, the information processing device does not necessarily have to include a display unit and an operation unit.

図７は、第４の実施形態を想定した被写体と撮像装置の一例の説明に用いる図である。なお、図７では、撮像装置１００がフォーカスを合わせる主被写体７０１の近傍に外部マイク７０５が配されているとする。また図７の例では、主被写体７０１以外に他の被写体７０２や７０３が存在している。ただし、撮像装置１００がフォーカスを合わせているのは主被写体７０１であるため、他の被写体７０２や７０３はある程度暈けているとする。そして、撮像装置１００で動画撮影が行われた際には、主被写体７０１近傍の外部マイク７０５で収音された音声だけでなく、他の被写体７０２や７０３の音声が環境音に含まれる音声として記録される。 FIG. 7 is a diagram used for explaining an example of a subject and an imaging device assuming the fourth embodiment. In FIG. 7, it is assumed that the external microphone 705 is arranged near the main subject 701 on which the imaging apparatus 100 is focused. Also, in the example of FIG. 7, subjects 702 and 703 other than the main subject 701 are present. However, since the imaging apparatus 100 focuses on the main subject 701, the other subjects 702 and 703 are blurred to some extent. When the imaging apparatus 100 shoots a moving image, not only the sound picked up by the external microphone 705 near the main subject 701 but also the sounds of the other subjects 702 and 703 are included in the environmental sound. Recorded.

その後、撮像装置１００で自動編集処理が実行されると、主被写体７０１近傍の外部マイク７０５で収音された音声に対し、撮像装置１００でフォーカスが合っていなかった被写体７０２や７０３の音声は、それらの暈け具合に応じて相対的に音量が調整される。 After that, when automatic editing processing is executed by the imaging device 100, the voices of the subjects 702 and 703 that are out of focus by the imaging device 100 are The volume is relatively adjusted according to the degree of blurring.

以上説明したように、第１～第４の実施形態の撮像装置によれば、撮影方向つまり奥行き方向の音を適切に調整することができ、撮影画像に対して遠近感がはっきりした臨場感のある音を再生することが可能になる。 As described above, according to the imaging apparatuses of the first to fourth embodiments, it is possible to appropriately adjust the sound in the shooting direction, that is, in the depth direction, and to provide a sense of realism with a clear sense of perspective for the shot image. It becomes possible to play a certain sound.

本発明は、上述の実施形態の一以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける一つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、一以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
上述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (eg, ASIC) that implements one or more functions.
All of the above-described embodiments merely show specific examples for carrying out the present invention, and the technical scope of the present invention should not be construed to be limited by these. That is, the present invention can be embodied in various forms without departing from its technical concept or main features.

１００：撮像装置、１０１：中央制御部、１０２：光学系ユニット、１０４：撮像部、１０６：画像処理部、１０７：音声処理部、１０８：収音部、１０９：無線部、１１０：音声再生部、１１２：記憶部、１１３：表示部、２００：外部収音装置 100: imaging device, 101: central control unit, 102: optical system unit, 104: imaging unit, 106: image processing unit, 107: audio processing unit, 108: sound pickup unit, 109: radio unit, 110: audio reproduction unit , 112: storage unit, 113: display unit, 200: external sound pickup device

Claims

imaging means;
a first microphone;
connecting means for connecting a second microphone;
adjusting means for adjusting a gain for the audio signal input from the first microphone based on the defocus amount of the image captured by the imaging means;
An imaging device characterized by comprising:

The first microphone is a built-in microphone of the imaging device,
2. The imaging apparatus according to claim 1, wherein said second microphone is a microphone wirelessly connected via said connection means.

The adjustment means obtains a defocus amount of a background area excluding an area of a subject to be photographed in the captured image, and adjusts the gain based on the defocus amount of the background area. 3. The imaging device according to claim 1 or 2, characterized by:

The adjusting means adjusts the gain of the audio signal input from the first microphone relative to the gain of the audio signal input from the second microphone as the defocus amount of the background area increases. 4. The image pickup apparatus according to claim 3, wherein the image pickup apparatus according to claim 3, wherein the image pickup apparatus adjusts to a large extent.

When the defocus amount of the background region is greater than a threshold, the adjustment means adjusts the gain for the audio signal input from the first microphone to the gain for the audio signal input from the second microphone. 4. The imaging apparatus according to claim 3, wherein the adjustment is made relatively large with respect to.

The adjusting means adjusts the gain of the audio signal input from the first microphone relative to the gain of the audio signal input from the second microphone as the defocus amount of the background area increases. 4. The image pickup apparatus according to claim 3, wherein the image pickup apparatus is adjusted to be relatively small.

When the defocus amount of the background region is greater than a threshold, the adjustment means adjusts the gain for the audio signal input from the first microphone to the gain for the audio signal input from the second microphone. 4. The image pickup apparatus according to claim 3, wherein the adjustment is made relatively small.

The adjustment means adjusts the gain for the audio signal input from the first microphone relative to the gain for the audio signal input from the second microphone, according to the defocus amount of the background area. 4. The setting according to claim 3, characterized in that the setting for adjusting the gain to the audio signal input from the second microphone is switched according to the user's operation between a setting for adjusting the gain relatively large and a setting for adjusting the gain of the audio signal input from the second microphone relatively small. The imaging device described.

When the object to be photographed moves, the adjustment means corresponds to the amount of increase or decrease in the defocus amount of the background area due to the movement of the object, with respect to the gain adjusted based on the defocus amount of the background area. 9. The imaging apparatus according to any one of claims 3 to 8, wherein the gain is adjusted.

A third microphone is also connected to the connection means,
10. The adjustment device according to any one of claims 1 to 9, wherein the adjusting means also adjusts the gain of the audio signal input from the third microphone based on the defocus amount of the captured image. 10. The image pickup device according to claim 1.

The third microphone is arranged in the vicinity of the imaging device and is connected to the connection means by wire or wirelessly, and is arranged at a position different from the position of the subject to be photographed and is connected by wire or wirelessly to the connection means. 11. Imaging device according to claim 10, comprising at least one of a microphone connected to means.

3. The adjustment means adjusts the gain by taking into account at least one of an aperture value, an F value, and a focal length in an optical system that forms an optical image on the imaging means. 12. The imaging device according to any one of 11.

13. The apparatus according to any one of claims 1 to 12, further comprising recording means for recording the audio signal after gain adjustment, the audio signal input from the second microphone, and the captured image. The imaging device according to any one of items 1 and 2.

A control method for an imaging device having imaging means, a first microphone, and a connection means for connecting a second microphone, comprising:
A control method for an imaging device, comprising a control step of adjusting a gain for an audio signal input from said first microphone based on a defocus amount of an image captured by said imaging means.

A program for causing a computer of an imaging device to function as adjustment means for the imaging device according to any one of claims 1 to 13.