JP2020072311A

JP2020072311A - Information acquisition device, information acquisition method, information acquisition program, and information acquisition system

Info

Publication number: JP2020072311A
Application number: JP2018203265A
Authority: JP
Inventors: 貴大中代; Takahiro Nakadai; 野中　修; Osamu Nonaka; 修野中
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2018-10-29
Filing date: 2018-10-29
Publication date: 2020-05-07
Anticipated expiration: 2038-10-29
Also published as: JP7219049B2; JP7428763B2; JP2022184863A

Abstract

To obtain a preferable sound corresponding to a captured image by selecting and adjusting a sound from an external sound collecting device.SOLUTION: An information acquisition device includes a feature extraction unit 11g that acquires a first sound collected by a built-in sound collecting device 13 that is built in an imaging device 10 and that collects an ambient sound of the imaging device, and extracts the characteristic of the acquired first sound, a sound acquisition unit 21b that acquires a third sound by performing at least one of selection and adjustment of a plurality of second sounds collected by a plurality of microphones having different sensitivity distribution directions on the basis of the characteristics of the first sound extracted by the feature extraction unit, and a synchronization processing unit 11h that synchronizes the third sound acquired by the sound acquisition unit with an image of a subject imaged by the imaging device.SELECTED DRAWING: Figure 1

Description

本発明は、撮像装置及び収音装置から映像及び音声を取得する情報取得装置、情報取得方法、情報取得プログラム及び情報取得システムに関する。 The present invention relates to an information acquisition device, an information acquisition method, an information acquisition program, and an information acquisition system that acquire video and audio from an imaging device and a sound collection device.

近年、デジタルカメラなどの撮影機能付き携帯機器（撮影機器）は、静止画のみならず、動画撮影機能を有するものが多い。撮影機器は、周囲の音声を収音する内蔵マイクロホンを備えており、動画撮影に際して、映像及び音声を含むＡＶデータを記録可能なものもある。 In recent years, many mobile devices (photographing devices) with a photographing function such as digital cameras have a moving image photographing function as well as a still image. The image capturing device is equipped with a built-in microphone that picks up surrounding sounds, and there are some that can record AV data including images and sounds when shooting a moving image.

更に、撮影機器においては、外部マイクロホンを取り付け可能な端子を有するものもあり、動画撮影によって得た映像と外部マイクロホンによって取得された音声とを記録可能な装置も商品化されている。例えば、撮影機器から離間した位置に外部マイクロホンを配置可能な場合には、外部マイクロホンをその収音対象である被写体の近傍に配置しておくことで、Ｓ／Ｎの良好な音声を外部マイクロホンから取得することも可能である。 Further, some photographing devices have a terminal to which an external microphone can be attached, and a device capable of recording an image obtained by shooting a moving image and a sound obtained by the external microphone has been commercialized. For example, when an external microphone can be arranged at a position distant from the photographing device, by arranging the external microphone in the vicinity of the subject that is the sound collection target, sound with good S / N can be output from the external microphone. It is also possible to obtain.

特開２００５−１５１４７１号公報JP, 2005-151471, A

しかしながら、例えば、野鳥の撮影をする場合等においては、撮影機器を携帯するユーザは被写体から比較的離れた位置であって被写体を良好に撮影できる位置に移動することができる一方、被写体近傍に配置した外部マイクロホンの移動は制限されることがあり、撮影機器によって取得した映像にふさわしい音声を外部マイクロホンによって取得することができるとは限らない。 However, for example, when photographing a wild bird, the user carrying the photographing device can move to a position relatively far from the subject and can photograph the subject satisfactorily, while arranging it near the subject. The movement of the external microphone may be restricted, and it is not always possible to acquire the sound suitable for the image acquired by the photographing device with the external microphone.

なお、特許文献１においては、複数のマイクやカメラを用いたＴＶ会議システムにおいて、発言者の選択を正確に行うものが知られている。しかしながら、このシステムは、声紋登録された会議参加者を認証する声紋認証部や話者を最適に撮影するようにテレビカメラ装置を制御する撮像調整部等を有しており、装置規模が大きい。 In addition, in patent document 1, there is known a TV conference system using a plurality of microphones and cameras, which accurately selects a speaker. However, this system has a voiceprint authentication unit that authenticates conference participants who have registered voiceprints, an image pickup adjustment unit that controls the television camera device so as to optimally capture a speaker, and the like, and thus the device scale is large.

本発明は、内蔵収音装置が取得した音声の特徴に基づいて、外部収音装置からの音声の選択又は調整を行うことで、撮影機器によって撮影された画像とその画像に対応した好ましい音声とを取得することができる情報取得装置、情報取得方法、情報取得プログラム及び情報取得システムを提供することを目的とする。 The present invention, by selecting or adjusting the sound from the external sound collecting device based on the characteristics of the sound acquired by the built-in sound collecting device, provides an image captured by the image capturing device and a preferable sound corresponding to the image. It is an object of the present invention to provide an information acquisition device, an information acquisition method, an information acquisition program, and an information acquisition system capable of acquiring information.

本発明の一態様による情報取得装置は、撮像装置に内蔵されて前記撮像装置の周囲音声を収音する内蔵収音装置により収音された第１音声を取得して、取得した前記第１音声の特徴を抽出する特徴抽出部と、前記特徴抽出部が抽出した前記第１音声の特徴に基づいて、感度分布の方向が異なる複数のマイクロホンが収音した複数の第２音声に対する選択及び調整の少なくとも一方を行って第３音声を取得する音声取得部と、前記音声取得部が取得した前記第３音声を前記撮像装置が撮像して得た被写体の映像に同期させる同期処理部とを具備する。 An information acquisition device according to an aspect of the present invention acquires a first voice that is picked up by a built-in sound pickup device that is built in an image pickup device and picks up ambient sound of the image pickup device, and the acquired first voice. Of a plurality of second voices picked up by a plurality of microphones having different sensitivity distribution directions based on the feature extracting unit for extracting the feature of the first voice and the feature of the first voice extracted by the feature extracting unit. An audio acquisition unit that performs at least one of the operations to acquire a third audio, and a synchronization processing unit that synchronizes the third audio acquired by the audio acquisition unit with the image of the subject obtained by the imaging device. ..

本発明の一態様による情報取得方法は、特徴抽出部、音声取得部及び同期処理部を備えた情報取得装置における情報取得方法であって、前記特徴抽出部が、撮像装置に内蔵されて前記撮像装置の周囲音声を収音する内蔵収音装置により収音された第１音声を取得して、取得した前記第１音声の特徴を抽出し、前記音声取得部が、抽出された前記第１音声の特徴に基づいて、感度分布の方向が異なる複数のマイクロホンが収音した複数の第２音声に対する選択及び調整の少なくとも一方を行って第３音声を取得し、前記同期処理部が、取得された前記第３音声を前記撮像装置が撮像して得た被写体の映像に同期させる。 An information acquisition method according to an aspect of the present invention is an information acquisition method in an information acquisition device including a feature extraction unit, a voice acquisition unit, and a synchronization processing unit, wherein the feature extraction unit is built in an imaging device to perform the imaging. The first sound picked up by the built-in sound collecting device that picks up the ambient sound of the device is extracted, the characteristic of the acquired first sound is extracted, and the sound acquisition unit extracts the first sound. On the basis of the characteristics of, the plurality of microphones having different sensitivity distribution directions perform at least one of selection and adjustment with respect to the plurality of second voices to acquire the third voice, and the synchronization processing unit acquires the third voice. The third sound is synchronized with the image of the subject obtained by the image pickup device.

本発明の一態様による情報取得プログラムは、コンピュータに、撮像装置に内蔵されて前記撮像装置の周囲音声を収音する内蔵収音装置により収音された第１音声を取得して、取得した前記第１音声の特徴を抽出し、抽出された前記第１音声の特徴に基づいて、感度分布の方向が異なる複数のマイクロホンが収音した複数の第２音声に対する選択及び調整の少なくとも一方を行って第３音声を取得し、取得された前記第３音声を前記撮像装置が撮像して得た被写体の映像に同期させる手順を実行させる。 The information acquisition program according to one aspect of the present invention acquires the first sound picked up by a built-in sound collecting device, which is built in the image pickup device and picks up ambient sound of the image pickup device, in the computer, and is acquired. A feature of the first voice is extracted, and based on the extracted feature of the first voice, at least one of selection and adjustment of the plurality of second voices picked up by the plurality of microphones having different sensitivity distribution directions is performed. A third sound is acquired, and a procedure for synchronizing the acquired third sound with the image of the subject obtained by the imaging device capturing the third sound is executed.

本発明の一態様による情報取得システムは、周囲音声を収音する内蔵収音装置と、前記内蔵収音装置が収音した第１音声の特徴を抽出する特徴抽出部とを具備する撮像装置と、感度分布の方向が異なる複数のマイクロホンを備え、前記複数のマイクロホンが収音した複数の第２音声について、前記撮像装置の特徴抽出部が抽出した前記第１音声の特徴に基づいて、選択及び調整の少なくとも一方を行って第３音声を取得する外部収音装置と、前記外部収音装置が取得した前記第３音声を前記撮像装置が撮像して得た被写体の映像に同期させる同期処理を行う再生装置とを具備する。 An information acquisition system according to one aspect of the present invention includes an imaging device that includes a built-in sound pickup device that picks up ambient sound, and a feature extraction unit that extracts a feature of the first sound picked up by the built-in sound pickup device. A plurality of microphones having different sensitivity distribution directions, and a plurality of second sounds picked up by the plurality of microphones are selected based on the characteristics of the first sound extracted by the characteristic extraction unit of the imaging device, and An external sound collecting device that performs at least one of the adjustments to acquire the third sound, and a synchronization process that synchronizes the third sound acquired by the external sound collecting device with the image of the subject obtained by the imaging device. And a reproducing device for performing.

本発明によれば、内蔵収音装置が取得した音声の特徴に基づいて、外部収音装置からの音声の選択又は調整を行うことで、撮影機器によって撮影された画像とその画像に対応した音声とを取得することができるという効果を有する。 According to the present invention, by selecting or adjusting the sound from the external sound collecting device based on the characteristics of the sound acquired by the built-in sound collecting device, the image taken by the photographing device and the sound corresponding to the image are picked up. It has the effect that and can be obtained.

本発明の第１の実施の形態に係る情報取得装置を示すブロック図。The block diagram which shows the information acquisition apparatus which concerns on the 1st Embodiment of this invention. カメラ１０及び外部収音装置２０の外観の一例を示す説明図。Explanatory drawing which shows an example of the external appearance of the camera 10 and the external sound collection device 20. 撮影の様子を説明するための説明図。Explanatory drawing for demonstrating the state of photography. 撮影の様子を説明するための説明図。Explanatory drawing for demonstrating the state of photography. 撮影時におけるカメラ１０と外部収音装置２０との被写体との位置関係を示す説明図。Explanatory drawing which shows the positional relationship of the to-be-photographed object of the camera 10 and the external sound collection device 20 at the time of imaging. カメラ１０の動作を説明するためのフローチャート。6 is a flowchart for explaining the operation of the camera 10. 外部収音装置２０の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of the external sound collection device 20. 本発明の第２の実施の形態に係る情報取得装置を示すブロック図。The block diagram which shows the information acquisition apparatus which concerns on the 2nd Embodiment of this invention. カメラ５０の動作を説明するためのフローチャート。6 is a flowchart for explaining the operation of the camera 50. 本発明の第３の実施の形態に係る情報取得装置を示すブロック図。The block diagram which shows the information acquisition apparatus which concerns on the 3rd Embodiment of this invention. カメラ６０及びレコーダ７０の外観を説明するための説明図。Explanatory drawing for demonstrating the external appearance of the camera 60 and the recorder 70. ＳＴ収音部１３から被写体までの角度とＳＴ収音部７２から被写体までの角度との相違を説明するための説明図。Explanatory drawing for demonstrating the difference between the angle from ST sound collection part 13 to a subject, and the angle from ST sound collection part 72 to a subject. カメラ６０の動作を説明するためのフローチャート。6 is a flowchart for explaining the operation of the camera 60. レコーダ７０の動作を説明するためのフローチャート。6 is a flowchart for explaining the operation of the recorder 70. 再生装置８０の動作を説明するためのフローチャート。6 is a flowchart for explaining the operation of the playback device 80.

以下、図面を参照して本発明の実施の形態について詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施の形態）
図１は本発明の第１の実施の形態に係る情報取得装置を示すブロック図である。また、図２はカメラ１０及び外部収音装置２０の外観の一例を示す説明図である。本実施の形態は移動自在で内蔵収音装置を備えて被写体の動画撮影が可能な撮像装置を採用すると共に、複数の方向に対してそれぞれ所定の指向特性にて収音可能な外部収音装置を採用し、内蔵収音装置及び外部収音装置によって収音した音声の特徴の比較に基づいて外部収音装置からの音声の選択又は調整を行うことにより、撮像装置において取得する映像とその映像にふさわしい音声とを取得することを可能にするものである。なお、本実施の形態において、映像にふさわしい音声とは、主被写体が発生する音声が例えばなるべく高いＳ／Ｎで含まれる音声のことである。 (First embodiment)
FIG. 1 is a block diagram showing an information acquisition device according to the first embodiment of the present invention. FIG. 2 is an explanatory diagram showing an example of the external appearance of the camera 10 and the external sound collecting device 20. The present embodiment employs an image pickup device that is movable and has a built-in sound pickup device and is capable of shooting a moving image of a subject. And the image acquired by the image pickup device by selecting or adjusting the sound from the external sound collecting device based on the comparison of the characteristics of the sounds collected by the built-in sound collecting device and the external sound collecting device. It makes it possible to obtain the appropriate voice. In addition, in the present embodiment, the sound suitable for the video is, for example, the sound generated by the main subject with a high S / N ratio.

本実施の形態における情報取得装置は、撮像装置内に構成してもよく、外部収音装置内に構成してもよく、撮像装置及び外部収音装置内に分散して構成してもよく、更に、これらの装置とは独立した装置として構成してもよい。なお、図１では、情報取得装置を撮像装置及び外部収音装置内に分散して構成する例を示している。 The information acquisition device according to the present embodiment may be configured in the image pickup device, may be configured in the external sound pickup device, or may be configured in a distributed manner in the image pickup device and the external sound pickup device, Further, it may be configured as a device independent of these devices. It should be noted that FIG. 1 shows an example in which the information acquisition devices are distributed and configured in the imaging device and the external sound collecting device.

先ず、図２を参照して撮像装置であるカメラ１０及び外部収音装置２０の外観について説明する。 First, the external appearance of the camera 10 and the external sound pickup device 20, which are image pickup devices, will be described with reference to FIG.

図２に示すカメラ１０は図１の各回路が収納された筐体１０ａと後述する光学系１２ａが収納される鏡筒１２ｂとを有する。筐体１０ａの上面には、後述する操作部１５を構成するシャッタボタン１５ａが設けられている。 The camera 10 shown in FIG. 2 has a housing 10a in which each circuit of FIG. 1 is housed and a lens barrel 12b in which an optical system 12a described later is housed. A shutter button 15a that constitutes an operation unit 15 described later is provided on the upper surface of the housing 10a.

また、図２に示す外部収音装置２０は、図１の各回路が収納された筐体２０ａを有している。筐体２０ａは、円筒形状に構成されて、周面から後述するマルチ収音部２２を構成する複数のマイクロホン２２ａの収音部が周囲に向けて突出して設けられている。例えば、１２個のマイクロホン２２ａが、筐体２０ａ側面の法線方向に３０度毎に配置される。ここでのマルチ収音部は、収音時の収音できる音の分布の方向が異なる複数のマイクを表示しているが、そのマイクの数は複数あればよく、また、一体ではなく、複数の装置を組み合わせて構成してもよい。 Further, the external sound pickup device 20 shown in FIG. 2 has a housing 20a in which each circuit of FIG. 1 is housed. The housing 20a is formed in a cylindrical shape, and the sound pickup portions of a plurality of microphones 22a that form a multi-sound pickup portion 22, which will be described later, are provided so as to project from the peripheral surface toward the surroundings. For example, twelve microphones 22a are arranged every 30 degrees in the normal direction of the side surface of the housing 20a. The multi-sound pickup unit here displays a plurality of microphones with different distribution directions of the sound that can be picked up at the time of sound pickup, but it is sufficient if the number of the microphones is plural. The above devices may be combined and configured.

図１において、撮像装置を構成するカメラ１０には制御部１１が設けられている。制御部１１は、ＣＰＵやＦＰＧＡ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 In FIG. 1, a control unit 11 is provided in a camera 10 that constitutes the image pickup apparatus. The control unit 11 may be configured by a processor using a CPU, an FPGA, or the like, operate according to a program stored in a memory (not shown) to control each unit, or may be a function of an electronic circuit of hardware. Some or all may be realized.

カメラ１０は、撮像部１２及びＳＴ収音部１３を備えている。撮像部１２は、光学系１２ａ及び図示しない撮像素子を有している。光学系１２ａは、ズームやフォーカシングのための図示しないレンズや絞り等を備えている。光学系１２ａは、これらのレンズを駆動する図示しないズーム（変倍）機構、ピント及び絞り機構を備えている。撮像素子は、ＣＣＤやＣＭＯＳセンサ等によって構成されており、光学系１２ａによって被写体光学像が撮像素子の撮像面に導かれるようになっている。撮像素子は、被写体光学像を光電変換して被写体の撮像画像（撮像信号）を取得する。 The camera 10 includes an image capturing unit 12 and an ST sound collecting unit 13. The image pickup unit 12 has an optical system 12a and an image pickup device (not shown). The optical system 12a includes a lens, a diaphragm, and the like (not shown) for zooming and focusing. The optical system 12a includes a zoom (variable magnification) mechanism, a focus and a diaphragm mechanism (not shown) that drives these lenses. The image pickup device is composed of a CCD, a CMOS sensor or the like, and the optical system 12a guides the subject optical image to the image pickup surface of the image pickup device. The image pickup device photoelectrically converts the subject optical image to obtain a picked-up image (image pickup signal) of the subject.

制御部１１に構成された撮影制御部１１ａは、光学系１２ａのズーム機構、ピント機構及び絞り機構を駆動制御して、ズーム、絞り及びピントを調節することができるようになっている。ピント、画角情報部１１ｃは、光学系１２ａからズーム、絞り及びピントに関する情報を取得して撮影制御部１１ａに出力するようになっている。このフィードバックによって、撮影制御部１１ａはズーム、絞り及びピントを所望の設定値に設定することができるようになっている。撮像部１２は、撮影制御部１１ａに制御されて撮像を行い、撮像画像（動画像及び静止画像）の撮像信号を制御部１１に出力する。 The photographing control unit 11a included in the control unit 11 is capable of controlling the zoom mechanism, the focus mechanism, and the aperture mechanism of the optical system 12a to adjust the zoom, aperture, and focus. The focus / angle-of-view information unit 11c is configured to acquire information regarding zoom, aperture, and focus from the optical system 12a and output the information to the photographing control unit 11a. By this feedback, the imaging control unit 11a can set the zoom, aperture, and focus to desired setting values. The image capturing unit 12 is controlled by the image capturing control unit 11a to perform image capturing, and outputs an image capturing signal of a captured image (moving image and still image) to the control unit 11.

制御部１１には収音制御及び処理部１１ｅが構成されており、収音制御及び処理部１１ｅは、ＳＴ収音部１３を制御する。内蔵収音装置としてのＳＴ収音部１３は、ステレオマイクロホン等により構成されており、収音制御及び処理部１１ｅに制御されて、カメラ１０の周囲の音声を収音して音声信号を取得し、取得した音声（以下、内部音声又は第１音声ともいう）を制御部１１に出力することができるようになっている。なお、ＳＴ収音部１３は、カメラ１０の撮影方向、即ち、光学系１２ａの光軸方向に感度のピークを有するものとする。 A sound collection control and processing unit 11e is configured in the control unit 11, and the sound collection control and processing unit 11e controls the ST sound collection unit 13. The ST sound collecting unit 13 as a built-in sound collecting device is configured by a stereo microphone or the like, and is controlled by the sound collecting control and processing unit 11e to collect the sound around the camera 10 and acquire a sound signal. The acquired voice (hereinafter, also referred to as internal voice or first voice) can be output to the control unit 11. The ST sound collecting unit 13 has a peak sensitivity in the shooting direction of the camera 10, that is, in the optical axis direction of the optical system 12a.

カメラ１０には操作部１５が設けられている。操作部１５は、レリーズボタン、ファンクションボタン、撮影モード設定、パラメータ操作等の各種スイッチ、ダイヤル、リング部材等（図示省略）を含み、ユーザ操作に基づく操作信号を制御部１１に出力する。制御部１１は、操作部１５からの操作信号に基づいて、各部を制御するようになっている。 The camera 10 is provided with an operation unit 15. The operation unit 15 includes a release button, a function button, various switches for shooting mode setting, parameter operation, etc., a dial, a ring member, etc. (not shown), and outputs an operation signal based on a user operation to the control unit 11. The control unit 11 controls each unit based on an operation signal from the operation unit 15.

制御部１１は、撮像部１２からの撮像画像（動画像及び静止画像）を取込む。制御部１１の画像処理部１１ｂは、取込んだ撮像画像に対して、所定の信号処理、例えば、色調整処理、マトリックス変換処理、ノイズ除去処理、その他各種の信号処理を行う。 The control unit 11 captures the captured image (moving image and still image) from the image capturing unit 12. The image processing unit 11b of the control unit 11 performs predetermined signal processing, for example, color adjustment processing, matrix conversion processing, noise removal processing, and various other signal processing on the captured image.

カメラ１０には表示部１６が設けられており、表示部１６は、例えば、ＬＣＤ（液晶表示装置）等の表示画面を有している。この表示画面は例えばカメラ１０の筐体背面等に設けられる。制御部１１は、画像処理部１１ｂによって信号処理された撮像画像を表示部１６に表示させるようになっている。また、制御部１１は、カメラ１０の各種メニュー表示や警告表示等を表示部１６に表示させることもできるようになっている。 The camera 10 is provided with a display unit 16, and the display unit 16 has a display screen such as an LCD (liquid crystal display device). This display screen is provided, for example, on the back surface of the housing of the camera 10. The control unit 11 is configured to display the captured image signal-processed by the image processing unit 11b on the display unit 16. The control unit 11 can also display various menu displays, warning displays, etc. of the camera 10 on the display unit 16.

カメラ１０には通信部１８ａ，１８ｂが設けられている。通信部１８ａ，１８ｂは、制御部１１に制御されて、外部収音装置２０との間で情報を送受することができるようになっている。通信部１８ａは、例えば、ブルートゥース（登録商標）等の近距離無線による通信が可能であり、通信部１８ｂは、例えば、Ｗｉ−Ｆｉ（登録商標）等の無線ＬＡＮによる通信が可能である。なお、通信部１８ａ，１８ｂは、ブルートゥースやＷｉ−Ｆｉに限らず、各種通信方式での通信を採用することが可能である。制御部１１は、通信部１８ａ又は１８ｂを介して、外部収音装置２０から音声信号を受信することができる。 The camera 10 is provided with communication units 18a and 18b. The communication units 18a and 18b are controlled by the control unit 11 and can transmit and receive information to and from the external sound collecting device 20. The communication unit 18a can perform short-range wireless communication such as Bluetooth (registered trademark), and the communication unit 18b can perform wireless LAN communication such as Wi-Fi (registered trademark). Note that the communication units 18a and 18b are not limited to Bluetooth and Wi-Fi, but can employ communication in various communication systems. The control unit 11 can receive an audio signal from the external sound collection device 20 via the communication unit 18a or 18b.

カメラ１０には、記録部１７が設けられている。記録部１７は、所定の記録媒体によって構成されて、制御部１１から与えられた情報を記録すると共に、記録されている情報を制御部１１に出力することができる。記録部１７としては、例えばカードインターフェースを採用することができ、記録部１７はメモリカード等の記録媒体に画像データを記録可能である。 The camera 10 is provided with a recording unit 17. The recording unit 17 is composed of a predetermined recording medium and can record the information given from the control unit 11 and output the recorded information to the control unit 11. A card interface, for example, can be used as the recording unit 17, and the recording unit 17 can record image data on a recording medium such as a memory card.

本実施の形態においては、記録部１７は、収音画像音声記録部１７ａ、連携情報部１７ｂ及び外部音声記録部１７ｃを有している。制御部１１は、信号処理後の撮像画像を圧縮処理し、圧縮後の画像を記録部１７に与えて記録させることができる。この画像は収音画像音声記録部１７ａに記録される。カメラ１０には時計部１９が設けられており、制御部１１は、時計部１９からの時間情報を用いて、撮像部１２によって取得した動画像とＳＴ収音部１３によって取得した音声とを関連付けて収音画像音声記録部１７ａに記録することができる。 In the present embodiment, the recording unit 17 has a sound pickup image / sound recording unit 17a, a cooperation information unit 17b, and an external sound recording unit 17c. The control unit 11 can compress the captured image after the signal processing and give the compressed image to the recording unit 17 for recording. This image is recorded in the picked-up image audio recording unit 17a. The camera 10 is provided with a clock unit 19, and the control unit 11 uses the time information from the clock unit 19 to associate the moving image acquired by the imaging unit 12 with the sound acquired by the ST sound collecting unit 13. It can be recorded in the collected sound image sound recording unit 17a.

連携情報部１７ｂには、外部収音装置２０との間の通信に関する情報が記録されており、制御部１１は、連携情報部１７ｂから読み出した情報に基づいて通信部１８ａ，１８ｂを制御することで、外部収音装置２０との間で通信により情報の授受が可能である。制御部１１は、外部収音装置２０との通信の結果、外部収音装置２０から音声信号を受信することができる。制御部１１は、受信した音声を外部音声（第３音声ともいう）として外部音声記録部１７ｃに与えて記録することができるようになっている。 Information related to communication with the external sound collecting device 20 is recorded in the cooperation information unit 17b, and the control unit 11 controls the communication units 18a and 18b based on the information read from the cooperation information unit 17b. Thus, information can be exchanged with the external sound collecting device 20 by communication. As a result of communication with the external sound collecting device 20, the control unit 11 can receive an audio signal from the external sound collecting device 20. The control unit 11 is capable of giving the received voice as an external voice (also referred to as a third voice) to the external voice recording unit 17c to record the voice.

本実施の形態においては、制御部１１には音声取得部１１ｆが構成されている。音声取得部１１ｆは、外部音声として取得すべき音声を指定するするために、特徴抽出部１１ｇを有している。特徴抽出部１１ｇはＳＴ収音部１３によって取得された内部音声の音声特徴を抽出する。音声取得部１１ｆは、外部音声として取得すべき音声を指定するための情報として、特徴抽出部１１ｇが取得した音声特徴を、通信部１８ａ又は１８ｂを介して外部収音装置２０に送信するようになっている。なお、音声特徴としては、例えば、周波数範囲や周波数の変化の状態、強弱（音声振幅）の変化の状態等を分析等して判定すればよく、特徴抽出部１１ｇは、これらの状態を公知の各種判定方法を採用して抽出してもよい。 In the present embodiment, the control unit 11 includes a voice acquisition unit 11f. The voice acquisition unit 11f has a feature extraction unit 11g for designating a voice to be acquired as an external voice. The feature extraction unit 11g extracts the voice feature of the internal voice acquired by the ST sound collection unit 13. The voice acquiring unit 11f transmits the voice feature acquired by the feature extracting unit 11g to the external sound collecting device 20 via the communication unit 18a or 18b as information for designating the voice to be acquired as the external voice. Is becoming It should be noted that the voice feature may be determined by, for example, analyzing the frequency range, the state of change in frequency, the state of change in strength (voice amplitude), and the like, and the feature extraction unit 11g determines these states as known. Various determination methods may be adopted and extracted.

また、これらを総合的に判定するために、機械学習で得られた推論エンジンを使って入力された音声から特定の音成分を抽出する方法も考えられる。これには、特定の時間幅の音の情報と、そこから抽出した音声を正解データとして特定の量の教師データを作成し、これらが弁別できるような機械学習を行えばよい。さらに、同期して取得した画像の変化と合わせて解析するような方法もある。 In addition, a method of extracting a specific sound component from a speech input using an inference engine obtained by machine learning is also conceivable in order to make a comprehensive determination of these. For this purpose, it is sufficient to create a specific amount of teacher data by using sound information of a specific time width and voice extracted therefrom as correct answer data, and perform machine learning so that these can be discriminated. Furthermore, there is also a method of performing analysis in combination with changes in images acquired in synchronization.

外部収音装置２０には、制御部２１が設けられている。制御部２１は、ＣＰＵ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。外部収音装置２０は、通信部２６ａ，２６ｂを有している。通信部２６ａ，２６ｂは、制御部２１に制御されて、カメラ１０との間で情報を送受することができるようになっている。通信部２６ａは、例えば、ブルートゥース（登録商標）等の近距離無線による通信が可能であり、通信部２６ｂは、例えば、Ｗｉ−Ｆｉ（登録商標）等の無線ＬＡＮによる通信が可能である。なお、通信部２６ａ，２６ｂは、ブルートゥースやＷｉ−Ｆｉに限らず、各種通信方式での通信を採用することが可能である。制御部２１は、通信部２６ａ又は２６ｂを介して、カメラ１０から音声特徴の情報を受信すると共に、カメラ１０に対して音声信号を送信することができる。 The external sound collecting device 20 is provided with a control unit 21. The control unit 21 may be configured by a processor using a CPU or the like and operate according to a program stored in a memory (not shown) to control each unit, or a part of a function of an electronic circuit of hardware. Alternatively, all of them may be realized. The external sound collecting device 20 has communication units 26a and 26b. The communication units 26a and 26b are controlled by the control unit 21 so that information can be transmitted and received to and from the camera 10. The communication unit 26a can perform short-range wireless communication such as Bluetooth (registered trademark), and the communication unit 26b can perform wireless LAN communication such as Wi-Fi (registered trademark). Note that the communication units 26a and 26b are not limited to Bluetooth and Wi-Fi, but can employ communication in various communication systems. The control unit 21 can receive audio feature information from the camera 10 and transmit an audio signal to the camera 10 via the communication unit 26a or 26b.

外部収音装置２０にはマルチ収音部２２が設けられており、マルチ収音部２２は、例えば図示しない複数のマイクロホンにより構成されており、各マイクロホンは相互に異なる方向に所定の指向特性（感度分布）を有するように配置されている。即ち、マルチ収音部２２は、相互に異なる方向に感度のピークを有する複数のマイクロホンによって収音した複数の音声（以下、外部収音音声又は第２音声ともいう）を取得するようになっている。制御部２１には、収音制御部２１ａが構成されており、収音制御部２１ａは、マルチ収音部２２の収音を制御することができるようになっている。 The external sound pickup device 20 is provided with a multi-sound pickup unit 22, and the multi-sound pickup unit 22 is composed of, for example, a plurality of microphones (not shown), and each microphone has a predetermined directional characteristic in different directions. Sensitivity distribution). That is, the multi-sound collecting unit 22 is adapted to acquire a plurality of sounds (hereinafter, also referred to as an external sound collection sound or a second sound) picked up by a plurality of microphones having sensitivity peaks in mutually different directions. There is. The control unit 21 is configured with a sound pickup control unit 21a, and the sound pickup control unit 21a can control the sound pickup of the multi sound pickup unit 22.

外部収音装置２０には操作部２３が設けられている。操作部２３は、録音モード設定、パラメータ操作等のための図示しない各種スイッチ、ダイヤル、リング部材等を含み、ユーザ操作に基づく操作信号を制御部２１に出力する。制御部２１は、操作部２３からの操作信号に基づいて、各部を制御するようになっている。また、制御部２１は、通信部２６ａ，２６ｂを介してカメラ１０の制御部１１から制御情報が与えられた場合には、この制御情報に基づいて各部を制御するようになっていてもよい。この場合には、カメラ１０の制御部１１によって、外部収音装置２０における録音制御が可能である。 The external sound pickup device 20 is provided with an operation unit 23. The operation unit 23 includes various switches, dials, ring members and the like (not shown) for recording mode setting, parameter operation, etc., and outputs an operation signal based on a user operation to the control unit 21. The control unit 21 controls each unit based on an operation signal from the operation unit 23. Moreover, when the control information is given from the control unit 11 of the camera 10 via the communication units 26a and 26b, the control unit 21 may control each unit based on this control information. In this case, the control unit 11 of the camera 10 can control recording in the external sound pickup device 20.

外部収音装置２０には、記録部２５が設けられている。記録部２５は、所定の記録媒体によって構成されて、制御部２１から与えられた情報を記録すると共に、記録されている情報を制御部２１に出力することができる。記録部２５としては、例えばカードインターフェースを採用することができ、記録部２５はメモリカード等の記録媒体に画像データを記録可能である。 The external sound collecting device 20 is provided with a recording unit 25. The recording unit 25 is configured by a predetermined recording medium, can record the information given from the control unit 21, and can output the recorded information to the control unit 21. A card interface, for example, can be used as the recording unit 25, and the recording unit 25 can record image data on a recording medium such as a memory card.

記録部２５は、音声記録部２５ａ及び連携情報部２５ｂを有している。制御部２１は、信号処理後の外部収音音声を音声記録部２５ａに与えて記録させることができる。外部収音装置２０には時計部２４が設けられており、制御部２１は、時計部２４からの時間情報を用いて、マルチ収音部２２によって取得した外部収音音声に時間情報を付加して音声記録部２５ａに記録することができる。 The recording unit 25 has a voice recording unit 25a and a cooperation information unit 25b. The control unit 21 can give the externally collected sound after the signal processing to the sound recording unit 25a to record the sound. The external sound pickup device 20 is provided with a clock unit 24, and the control unit 21 uses the time information from the clock unit 24 to add time information to the external sound pickup sound acquired by the multi sound pickup unit 22. Can be recorded in the voice recording unit 25a.

本実施の形態においては、カメラ１０に対して送信すべき第３音声を判定するために、音声取得部２１ｂが設けられている。音声取得部２１ｂは、特徴抽出部２１ｃを有している。特徴抽出部２１ｃは、特徴抽出部１１ｇと同様の構成であり、マルチ収音部２２によって取得された複数の外部収音音声（第２音声）の音声特徴をそれぞれ抽出する。音声取得部２１ｂは、カメラ１０から通信部２６ａ又は２６ｂを介して、外部音声（第３音声）として送信すべき音声を指定するための情報、即ち、カメラ１０において取得された内部音声（第１音声）の音声特徴の情報が与えられる。音声取得部２１ｂは、カメラ１０から受信した音声特徴と特徴抽出部２１ｃによって抽出した各外部収音音声の音声特徴とをそれぞれ比較することで、外部音声として出力すべき外部収音音声を選択する。 In the present embodiment, the sound acquisition unit 21b is provided to determine the third sound to be transmitted to the camera 10. The voice acquisition unit 21b has a feature extraction unit 21c. The feature extraction unit 21c has the same configuration as the feature extraction unit 11g, and extracts each of the voice features of the plurality of external sound pickup sounds (second sound) acquired by the multi sound pickup unit 22. The voice acquisition unit 21b receives information for designating a voice to be transmitted as external voice (third voice) from the camera 10 via the communication unit 26a or 26b, that is, the internal voice (first voice acquired by the camera 10). (Voice) information of voice characteristics is given. The voice acquisition unit 21b selects the external collected voice to be output as the external voice by comparing the voice feature received from the camera 10 and the voice feature of each external collected voice extracted by the feature extraction unit 21c. ..

例えば、音声取得部２１ｂは、内部音声の音声特徴との類似度が最も高い音声特徴を有する外部収音音声を選択し、選択した外部収音音声を時間情報と共に外部音声（第３音声）として通信部２６ａ又は２６ｂを介してカメラ１０に送信するようになっている。また、音声取得部２１ｂは、類似度が所定の閾値よりも高い音声特徴を有する外部収音音声が複数存在する場合には、これらの外部収音音声のうち最もＳ／Ｎが高い外部収音音声を外部音声として選択するようになっていてもよい。 For example, the voice acquisition unit 21b selects an external picked-up voice having a voice feature having the highest similarity to the voice feature of the internal voice, and the selected external picked-up voice is used as the external voice (third voice) together with the time information. The data is transmitted to the camera 10 via the communication unit 26a or 26b. In addition, when there are a plurality of external sound pickup sounds having a sound feature whose similarity is higher than a predetermined threshold value, the sound acquisition unit 21b has the highest S / N ratio among these external sound pickup sounds. The voice may be selected as the external voice.

なお、音声取得部２１ｂはマルチ収音部２２から収音した複数の音声信号を個別にゲイン調整したり、所定の割合で合成したりする音声処理を行うことができるようになっていてもよく、音声取得部２１ｂは、１つ以上の外部収音音声を選択し、選択した外部収音音声に対する調整を行って、外部音声（第３音声）を取得するようになっていてもよい。なお、外部音声としてステレオ音声を取得するようになっていてもよい。 The voice acquisition unit 21b may be capable of performing voice processing such as individually adjusting the gain of a plurality of voice signals picked up by the multi-voice pickup unit 22 or synthesizing the voice signals at a predetermined ratio. The sound acquisition unit 21b may be configured to select one or more externally collected sounds, adjust the selected externally collected sounds, and acquire the external sound (third sound). Note that stereo sound may be acquired as the external sound.

カメラ１０の制御部１１には同期処理部１１ｈが構成されている。同期処理部１１ｈは、音声取得部１１ｆが取得した外部音声と撮像部１２が取得した被写体の映像とを同期させて外部音声記録部１７ｃに記録するようになっている。例えば、同期処理部１１ｈは、撮像部１２によって取得された映像信号と同時に収音された音声信号の波形と外部音声の波形とを比較することで、被写体の映像と外部音声との同期をとるようになっていてもよい。 A synchronization processing unit 11h is configured in the control unit 11 of the camera 10. The synchronization processing unit 11h synchronizes the external sound acquired by the sound acquisition unit 11f with the image of the subject acquired by the imaging unit 12 and records the external sound in the external sound recording unit 17c. For example, the synchronization processing unit 11h synchronizes the image of the subject with the external sound by comparing the waveform of the audio signal picked up at the same time as the video signal acquired by the imaging unit 12 with the waveform of the external sound. It may be like this.

このように基準となる信号（カメラ内蔵の内部音声記録部による）があることによって、外部からの情報の同期など関連付けはシステム構成として単純化が可能となる。これは、撮像と音声取得が同じ装置内の時計信号で管理されて記録されているからで、カメラ内ですでに同期が取れているものを正しい状況として参照したり、この基準をもとに対象物が離れている分の音速分の遅れを解消したりといった操作が簡単にできるということである。 By thus providing the reference signal (by the internal audio recording unit built in the camera), the association such as synchronization of information from the outside can be simplified as the system configuration. This is because the image capture and audio acquisition are managed and recorded by the clock signal in the same device, so you can refer to what is already synchronized in the camera as the correct situation, or based on this standard. This means that operations such as eliminating the sound velocity delay due to the distance between the objects can be performed easily.

次に、このように構成された実施の形態の動作について図３から図７を参照して説明する。図３及び図４は撮影の様子を説明するための説明図であり、図５は撮影時におけるカメラ１０と外部収音装置２０との被写体との位置関係を示す説明図である。また、図６はカメラ１０の動作を説明するためのフローチャートであり、図７は外部収音装置２０の動作を説明するためのフローチャートである。 Next, the operation of the embodiment thus configured will be described with reference to FIGS. 3 to 7. FIG. 3 and FIG. 4 are explanatory diagrams for explaining the shooting state, and FIG. 5 is an explanatory diagram showing the positional relationship between the camera 10 and the external sound collecting device 20 and the subject at the time of shooting. 6 is a flow chart for explaining the operation of the camera 10, and FIG. 7 is a flow chart for explaining the operation of the external sound collecting device 20.

図３及び図４の例ではカメラ１０は、筐体１０ａの背面に表示部１６の表示画面１６ａが設けられている。ユーザは、例えば、筐体１０ａを手で把持して、表示画面１６ａを見ながら被写体である鳥４１を視野範囲に捉えた状態で、シャッタボタン１５ａを押下操作することで撮影を行う。なお、図３及び図４ではカメラ１０の撮影範囲を枠で囲って示している。 In the example of FIGS. 3 and 4, the camera 10 is provided with the display screen 16a of the display unit 16 on the back surface of the housing 10a. The user, for example, grips the housing 10a with his / her hand and, while watching the display screen 16a, captures the bird 41, which is a subject, in the visual field range, and presses the shutter button 15a to perform shooting. 3 and 4, the shooting range of the camera 10 is shown surrounded by a frame.

図３の例では、鳥４１は樹木３１ａの枝に留まっており、図４の例では、鳥４１は樹木３１ａの隣の樹木３１ｂの枝に留まっている。樹木３１ａ，３１ｂとカメラ１０との間の地面には草３２が生えている。樹木３１ａ，３１ｂに比較的近い位置に外部収音装置２０が配設されている。例えば、野鳥を撮影する場合には、野鳥が留まりやすいであろう樹木の近くに、事前に外部収音装置２０を設置しておくことが考えられる。一方、カメラ１０を携帯するユーザは、野鳥が逃げ出さないように、また、障害物を避けるために、比較的野鳥から離れた位置で撮影を行う。 In the example of FIG. 3, the bird 41 stays on the branch of the tree 31a, and in the example of FIG. 4, the bird 41 stays on the branch of the tree 31b next to the tree 31a. Grass 32 grows on the ground between the trees 31a and 31b and the camera 10. The external sound pickup device 20 is arranged at a position relatively close to the trees 31a and 31b. For example, when shooting a wild bird, it is conceivable to install the external sound pickup device 20 in advance near a tree where the wild bird is likely to stay. On the other hand, the user carrying the camera 10 shoots at a position relatively distant from the wild bird in order to prevent the wild bird from escaping and to avoid obstacles.

従って、鳥４１の鳴き声を収音するものとすると、カメラ１０に内蔵されているＳＴ収音部１３による収音音声（第１音声）よりも、マルチ収音部２２の各マイクロホン２２ａによる外部収音音声（第２音声）の方が良好なＳ／Ｎが得られるものと考えられる。なお、映像については、望遠レンズ等を採用することで被写体から比較的離れた位置からも十分に高画質の画像を取得することができるが、音声については、被写体からの距離が大きくなるとノイズが増加し、対象物の音声の収音品質が劣化してしまう。このため、対象物により近い位置のマイクロホンにより収音を行った方が、ノイズの少ない音声を取得することが可能である。 Therefore, if the sound of the bird 41 is picked up, the external sound picked up by each microphone 22a of the multi sound pick-up unit 22 is more than the sound picked up by the ST sound pick-up unit 13 (first sound) built in the camera 10. It is considered that better S / N can be obtained with the sound voice (second voice). It should be noted that by using a telephoto lens etc. for video, it is possible to obtain a sufficiently high quality image even from a position relatively distant from the subject, but with regard to sound, noise will occur as the distance from the subject increases. However, the sound quality of the target sound is deteriorated. Therefore, it is possible to acquire a voice with less noise by collecting sound with a microphone located closer to the object.

カメラ１０の制御部１１は、電源が投入されると、図６のステップＳ１において、撮像モードが指示されたか否かを判定する。撮影モードが指示されていない場合には、制御部１１は、指定されたモード、例えば、外部収音装置２０との連携のための設定や送受信を行う連携モードや記録画像の再生を行う再生モードに移行する。 When the power is turned on, the control unit 11 of the camera 10 determines whether or not the imaging mode is instructed in step S1 of FIG. When the shooting mode is not instructed, the control unit 11 sets the specified mode, for example, the cooperation mode for setting and transmitting / receiving for cooperation with the external sound collecting device 20, or the reproduction mode for reproducing recorded images. Move to.

撮像モードが指示されると、制御部１１は、次のステップＳ２において、外部収音装置との連携が指定されているか否かを判定する。制御部１１は、連携が指定されていない場合には、処理をステップＳ６に移行して、記録開始操作が行われたか否か又は記録中であるか否かを判定する。連携が指定されている場合には、制御部１１は、ステップＳ３において、ユーザ操作による実際の撮像及び録音に先立って、外部音声を指定するための撮像、録音を行う。 When the imaging mode is instructed, the control unit 11 determines in next step S2 whether or not the cooperation with the external sound pickup device is designated. When the cooperation is not designated, the control unit 11 moves the process to step S6 and determines whether the recording start operation is performed or whether the recording is being performed. When the cooperation is designated, the control unit 11 performs the image pickup and the sound recording for designating the external sound before the actual image pickup and the sound recording by the user operation in step S3.

制御部１１の特徴抽出部１１ｇは、次のステップＳ４において、ＳＴ収音部１３からの内部音声（第１音声）の音声特徴を抽出し、抽出した音声特徴を通信部１８ａ，１８ｂを介して外部収音装置２０に送信して（ステップＳ５）、処理をステップＳ６に進める。なお、ＳＴ収音部１３は撮影方向にピーク感度を有しており、ＳＴ収音部１３により取得された内部音声は、被写体である鳥４１の鳴き声の音声特徴を有するものと考えられる。 In the next step S4, the feature extraction unit 11g of the control unit 11 extracts the voice feature of the internal voice (first voice) from the ST sound pickup unit 13 and transmits the extracted voice feature via the communication units 18a and 18b. The sound is transmitted to the external sound collecting device 20 (step S5), and the process proceeds to step S6. It should be noted that the ST sound collecting unit 13 has a peak sensitivity in the shooting direction, and the internal sound acquired by the ST sound collecting unit 13 is considered to have the sound characteristics of the bark of the bird 41 that is the subject.

一方、外部収音装置２０の制御部２１は、電源が投入されると、図７のステップＳ２１において、カメラ１０との連携が設定されている否かを判定する。連携が設定されていない場合には、制御部２１は、処理をステップＳ２５に移行して収音が指示されているか否かを判定する。なお、制御部２１は、カメラ１０から記録開始を示す情報が送信されることで、収音を指示されたものと判定するようになっていてもよい。収音が指示されていない場合には、制御部２１は処理をステップＳ３１に移行して、その他の処理、例えば、連携のための設定や送受信を行う連携モード、記録されている音声を再生する再生モード等を実行する。なお、連携モードにおいて、カメラ１０の特徴抽出部１１ｇが取得した音声特徴が通信部２６ａ，２６ｂによって受信されて、記録部２５に記録されるようになっている。 On the other hand, when the power is turned on, the control unit 21 of the external sound collection device 20 determines in step S21 of FIG. 7 whether the cooperation with the camera 10 is set. When the cooperation is not set, the control unit 21 shifts the processing to step S25 and determines whether or not sound collection is instructed. The control unit 21 may determine that the sound collection is instructed by transmitting the information indicating the start of recording from the camera 10. When the sound collection is not instructed, the control unit 21 shifts the processing to step S31, and performs other processing, for example, a cooperation mode for setting or transmitting / receiving for cooperation, and reproducing recorded sound. Playback mode etc. are executed. In the cooperation mode, the audio features acquired by the feature extraction unit 11g of the camera 10 are received by the communication units 26a and 26b and recorded in the recording unit 25.

連携が設定されている場合には、制御部２１は、カメラ１０に送信する外部音声（第３音声）を決定するために、処理をステップＳ２１からステップＳ２２に移行して、マルチ収音部２２の全てのマイクロホンでの収音を行う。制御部２１は、ステップＳ２３において、マルチ収音部２２の全てのマイクロホンからの外部収音音声（第２音声）を取得すると、特徴抽出部２１ｃによって音声特徴を抽出する。音声取得部２１ｂは、各外部収音音声（第２音声）の音声特徴と、記録部２５から読み出した内部音声（第１音声）の音声特徴との比較を行う。 When the cooperation is set, the control unit 21 shifts the processing from step S21 to step S22 in order to determine the external sound (third sound) to be transmitted to the camera 10, and the multi sound collecting unit 22. Sound is picked up by all microphones of. When the control unit 21 acquires the external sound pickup voice (second voice) from all the microphones of the multi sound pickup unit 22 in step S23, the feature extraction unit 21c extracts a voice feature. The voice acquisition unit 21b compares the voice feature of each external collected voice (second voice) with the voice feature of the internal voice (first voice) read from the recording unit 25.

音声取得部２１ｂは、音声特徴の比較により、外部収音音声を取得するマルチ収音部２２の全てのマイクロホンのうち外部音声として出力する音声を収音するマイクロホンを決定する（ステップＳ２４）。例えば、音声取得部２１ｂは、第１音声と各第２音声との音声特徴の類似度を算出し、類似度が所定の閾値よりも大きい音声特徴を有する第２音声であって、Ｓ／Ｎが最も大きい第２音声を収音したマイクロホンを選択し、選択したマイクロホンを外部音声の収音用のマイクロホンに決定する。 The voice acquisition unit 21b determines the microphone that picks up the voice output as the external voice among all the microphones of the multi-voice pickup unit 22 that acquires the external voice pickup by comparing the voice characteristics (step S24). For example, the voice acquisition unit 21b calculates the similarity between the voice features of the first voice and each of the second voices, and the S / N is the second voice having the voice feature of which the similarity is higher than a predetermined threshold. Is selected, and the selected microphone is determined as the microphone for collecting external sound.

このＳ／Ｎ比判定は、どれが信号（Ｓ）で、どれがノイズ（Ｎ）であるかの判定が必要な場合があるが、これは、前述のような特徴判定の技術を使ってもよく、撮影対象物の画像の情報とその対象物が発する音声の関係から推測できるようにしてもよい。例えば、鳥のさえずりや人の声などは口やのどの動きと相関があるので、画像から色や陰影や形状によって特徴部を検出してその変化のパターンと音声の変化のパターンが一致するものを信号（Ｓ）として選んでもよい。この場合、その他の成分をノイズ（Ｎ）とする。また、画像から撮影者が興味を持って狙っている対象物が何であるかがわかるので（音は広がりやすいので狙いにくいが画像は光で直進してくるので狙いが明瞭）、対象物の画像特徴（形状や色の分布や動きの特徴）を、画像辞書などを使って判定し、人なら人の声の特徴に合致した音声成分を抽出し信号（S）とし、それ以外のものはノイズ（N）とするような技術的解決方法もある。鳥を検出すれば鳥の声、猫を検出したら猫の声、楽器を検出したらその楽器特有の音声が、その時得られた音声のうち、どの成分であるかは、画像と関連する音声の特徴を一覧にした辞書やデータベース、あるいは機械学習によって得られた推論モデルを用意する事によって簡単に判別が可能である。これらは記録部や演算部で構成できる。また、単に風の音や空調や雑踏のように特定の周波数や特定の変化パターンを選んでノイズ（Ｎ）と判定し、それ以外の特徴を持つ音成分を信号（Ｓ）として選んでもよい。また、水の流れと鳥の声とどちらも重要な場合もあるが、このような場合は、このどちらもきれいに採る工夫をしてもよい。複数の音声が重要である場合の判定は、機械学習によって得られた推論モデルを利用してもよい。この場合、機械学習時の教師データに複数の音成分が選ばれるようなアノテーションを行えばよい。また、マルチ収音部のマイク決定は一つのマイクのみを選ぶのではなく、複数のマイクを選んでもよい。また、複数のマイクの収音結果を使って、一つのマイクの音声を加工してもよい。ステレオ効果を重視する場合は、左右の音声ごとに同様のマイク選択をしてもよく、複数のマイクで得られた音声を自然なステレオ感になるように、内蔵マイクや画面内の対象物位置などの情報によって音声処理してもよい。 This S / N ratio determination may need to determine which is the signal (S) and which is the noise (N), but this can be done even by using the feature determination technique described above. Of course, it may be possible to infer from the relationship between the information of the image of the object to be photographed and the sound uttered by the object. For example, the chirping of birds and the voice of a person correlate with the movement of the mouth and throat, so the feature pattern is detected from the image by color, shading, and shape, and the pattern of its change matches the pattern of voice change. May be selected as the signal (S). In this case, the other component is noise (N). In addition, the image of the target can be understood from the image because the photographer can see what the target is interested in (the sound is easy to spread, so it is difficult to aim, but the image goes straight with light, so the target is clear) Characters (shape, color distribution, and motion characteristics) are determined using an image dictionary, and for humans, a voice component that matches the human voice's characteristics is extracted as a signal (S), and the others are noise. There is also a technical solution such as (N). If the bird is detected, the voice of the bird is detected, if the cat is detected, the voice of the cat is detected, and if an instrument is detected, the component of the sound obtained at that time is the characteristic of the sound related to the image. It is possible to easily discriminate by preparing a dictionary or database that lists, or an inference model obtained by machine learning. These can be composed of a recording unit and a calculation unit. It is also possible to simply select a specific frequency or a specific change pattern such as wind sound, air conditioning, or crowdedness to determine noise (N), and select a sound component having other characteristics as the signal (S). In some cases, both the flow of water and the voice of the bird are important, but in such a case, it is possible to devise a method to take both of them cleanly. The determination when a plurality of voices are important may use an inference model obtained by machine learning. In this case, an annotation may be performed so that a plurality of sound components are selected as teacher data during machine learning. Further, the microphones of the multi-sound pickup unit may be selected from a plurality of microphones instead of selecting only one microphone. Also, the sound of one microphone may be processed using the sound collection results of a plurality of microphones. If you want to emphasize the stereo effect, you can select the same microphone for each of the left and right voices, and use the built-in microphones and the object position on the screen so that the voices obtained with multiple microphones give a natural stereo effect. Voice processing may be performed using information such as.

図５は図３及び図４の例におけるカメラ１０とマルチ収音部２２と被写体（鳥４１）との位置関係を示している。図５ではカメラ１０からマルチ収音部２２までの距離は約Ｄｍであり、カメラ１０から被写体である鳥４１までの距離は約Ｄ０ｍである。図５は、外部収音装置２０のマイクロホン２２ａとして相互に３０度間隔で順次配置された１２個のマイクロホンＭ１〜Ｍ１２により構成された例を示しており、破線にてマイクロホンＭ２，Ｍ７，Ｍ１１の感度ピーク方向を示し、実線によってマイクロホンＭ１の感度ピーク方向ＤＭ１及びマイクロホンＭ１２の感度ピーク方向ＤＭ２を示している。 FIG. 5 shows the positional relationship among the camera 10, the multi sound collecting unit 22, and the subject (bird 41) in the examples of FIGS. In FIG. 5, the distance from the camera 10 to the multi-sound pickup unit 22 is about Dm, and the distance from the camera 10 to the bird 41, which is a subject, is about D0m. FIG. 5 shows an example in which the microphones 22a of the external sound pickup device 20 are configured by 12 microphones M1 to M12 which are sequentially arranged at intervals of 30 degrees, and the microphones M2, M7, M11 are indicated by broken lines. The sensitivity peak direction is shown, and the solid line shows the sensitivity peak direction DM1 of the microphone M1 and the sensitivity peak direction DM2 of the microphone M12.

マイクロホンＭ１の感度ピーク方向ＤＭ１は、図３における鳥４１に向かう方向に一致しており、マイクロホンＭ１２の感度ピーク方向ＤＭ２は、図４における鳥４１に向かう方向に一致している。従って、図３の例では、マイクロホンＭ１による外部収音音声の音声特徴が内部音声の音声特徴に最も類似すると考えられ、また、図４の例では、マイクロホンＭ１２による外部収音音声の音声特徴が内部音声の音声特徴に最も類似すると考えられる。 The sensitivity peak direction DM1 of the microphone M1 matches the direction toward the bird 41 in FIG. 3, and the sensitivity peak direction DM2 of the microphone M12 matches the direction toward the bird 41 in FIG. Therefore, in the example of FIG. 3, it is considered that the voice feature of the externally collected voice by the microphone M1 is most similar to the voice feature of the internal voice, and in the example of FIG. 4, the voice feature of the externally collected voice by the microphone M12 is It is considered to be most similar to the voice feature of internal voice.

しかし、鳥４１以外に音を発生する音源が存在しない場合等においては、複数のマイクロホンの外部収音音声の音声特徴と内部音声の音声特徴との類似度が所定の閾値よりも大きい略同様の値になることが考えられる。この場合でも、各マイクロホンの感度分布方向が異なることからピーク感度方向は異なり、マイクロホンＭ１〜Ｍ１２によって鳥４１の鳴き声を収音する場合において、最も高いＳ／Ｎが得られるマイクロホンは、図３の場合にはマイクロホンＭ１であるものと考えられ、図４の場合にはマイクロホンＭ１２であるものと考えられる。これにより、図３の例ではマイクロホンＭ１が外部音声収音用のマイクロホンとして決定され、図４の例ではマイクロホンＭ１２が外部音声収音用のマイクロホンとして決定される。 However, when there is no sound source that generates a sound other than the bird 41, the similarity between the voice characteristics of the externally collected voice and the voice characteristics of the internal voice of the plurality of microphones is larger than a predetermined threshold value. It can be a value. Also in this case, the peak sensitivity direction is different because the sensitivity distribution directions of the microphones are different, and in the case where the sound of the bird 41 is picked up by the microphones M1 to M12, the microphone having the highest S / N is shown in FIG. In this case, it is considered to be the microphone M1, and in the case of FIG. 4, it is considered to be the microphone M12. As a result, in the example of FIG. 3, the microphone M1 is determined as the microphone for collecting external voice, and in the example of FIG. 4, the microphone M12 is determined as the microphone for collecting external voice.

マルチ収音部２２は、収音が指示されている場合には、ステップＳ２５からステップＳ２６に処理を移行して、収音を行う。制御部２１は、カメラ１０との連携が指定されているか否かを判定し、連携が指定されている場合には、ステップＳ２４において選択されたマイクロホンからの音声を第３音声としてカメラ１０に送信する（ステップＳ２８）。制御部２１は、ステップＳ２９において収音の終了を判定しており、終了操作が行われるまでステップＳ２６からＳ２９の処理を繰り返す。収音終了が判定されると、記録されている音声をファイル化して処理をステップＳ２１に戻す。 When the sound pickup is instructed, the multi sound pickup unit 22 shifts the processing from step S25 to step S26 and picks up the sound. The control unit 21 determines whether or not the cooperation with the camera 10 is specified, and when the cooperation is specified, the sound from the microphone selected in step S24 is transmitted to the camera 10 as the third sound. Yes (step S28). The control unit 21 determines the end of the sound collection in step S29, and repeats the processing of steps S26 to S29 until the end operation is performed. When the end of sound collection is determined, the recorded voice is converted into a file and the process returns to step S21.

一方、ユーザがカメラ１０の操作部１５を操作して、被写体の撮像を開始するものとする。制御部１１は、ステップＳ６において記録開始が指示されたことを判定すると、撮像部１２を制御して被写体を撮像させると共に、ＳＴ収音部１３を制御して周囲音声を収音する（ステップＳ７）。制御部１１は、記録部１７の収音画像音声記録部１７ａへの撮像画像及び収音音声の記録を開始する。 On the other hand, it is assumed that the user operates the operation unit 15 of the camera 10 to start capturing an image of a subject. When the control unit 11 determines that the recording start is instructed in step S6, the control unit 11 controls the image capturing unit 12 to capture an image of the subject and controls the ST sound collecting unit 13 to collect ambient sound (step S7). ). The control unit 11 starts recording the picked-up image and the collected sound in the collected-sound image sound recording unit 17a of the recording unit 17.

制御部１１は、ステップＳ８において、外部収音装置２０との連携が行われているか否かを判定する。連携が行われていない場合には、制御部１１は、ステップＳ１２において撮影終了操作が行われたか否かを判定する。一方、連携が行われている場合には、制御部１１は、ステップＳ９において、外部収音装置２０から送信されている外部音声を受信し、ステップＳ１０において、内部音声を取得する。制御部１１の同期処理部１１ｈは、受信した外部音声を内部音声と比較することによって、外部音声を撮像部１２から得られた映像に同期させて、外部音声記録部１７ｃに記録して（ステップＳ１１）、ステップＳ１２に処理を移行する。 In step S8, the control unit 11 determines whether or not cooperation with the external sound collection device 20 is being performed. When the cooperation is not performed, the control unit 11 determines whether or not the shooting end operation is performed in step S12. On the other hand, when the cooperation is performed, the control unit 11 receives the external voice transmitted from the external sound collecting device 20 in step S9, and acquires the internal voice in step S10. The synchronization processing unit 11h of the control unit 11 synchronizes the external sound with the video obtained from the imaging unit 12 by comparing the received external sound with the internal sound, and records the external sound in the external sound recording unit 17c (step S11), the process proceeds to step S12.

制御部１１は、ステップＳ１２において撮影終了操作が行われたか否かを判定し、撮影終了操作が行われるまでステップＳ１，Ｓ２，Ｓ６〜Ｓ１１を繰り返す。撮影終了操作が行われると、制御部１１は、記録されている画像及び音声をファイル化して（ステップＳ１３）、処理をステップＳ１に戻す。 The control unit 11 determines whether or not the shooting end operation is performed in step S12, and repeats steps S1, S2, and S6 to S11 until the shooting end operation is performed. When the shooting end operation is performed, the control unit 11 converts the recorded image and sound into a file (step S13), and returns the process to step S1.

なお、図６及び図７では、カメラ１０における撮像及び内部音声の収音と同時に、外部収音装置２０から外部音声を受信して、映像と外部音声とを同期させながら記録を行う例を説明したが、外部収音装置２０において外部音声を記録しファイル化された外部音声を撮像後にカメラ１０に送信することで、映像と外部音声とを同期させた状態で記録するようになっていてもよい。 6 and 7, an example in which the external sound is received from the external sound collecting device 20 at the same time as the image pickup by the camera 10 and the sound collection of the internal sound, and the recording is performed while synchronizing the image and the external sound. However, even if the external sound pickup device 20 records the external sound and transmits the filed external sound to the camera 10 after capturing the image, the video and the external sound are recorded in a synchronized state. Good.

このように本実施の形態においては、複数の方向に対してそれぞれ所定の指向特性にて収音可能な外部収音装置を採用し、内蔵収音装置及び外部収音装置によって収音した音声の特徴の比較に基づいて外部収音装置からの音声の選択又は調整を行うことにより、撮像装置において取得する映像とその映像にふさわしい音声とを取得しており、撮像装置において取得した映像にふさわしい音声を映像と同期させて記録することが可能である。これにより、撮像後の編集作業によって映像とその映像に適した音声とを合成するという極めて煩雑な作業を行うことなく、自動的に最適な音声が合成された映像を得ることができる。 As described above, in the present embodiment, an external sound pickup device capable of picking up sound with a predetermined directional characteristic in each of a plurality of directions is adopted, and the sound collected by the built-in sound pickup device and the external sound pickup device By selecting or adjusting the sound from the external sound pickup device based on the comparison of the features, the image acquired by the image pickup device and the sound suitable for the image are acquired, and the sound suitable for the image acquired by the image pickup device is acquired. Can be recorded in synchronization with the video. As a result, it is possible to automatically obtain an image in which the optimum sound is synthesized without performing the extremely complicated work of synthesizing the image and the sound suitable for the image by the editing work after the imaging.

なお、上述したように、情報取得装置は、カメラ１０と外部収音装置２０のいずれに構成してもよく、例えば、カメラ１０のみに構成してもよい。この場合には、外部収音装置２０としては、ピーク感度方向が異なる複数のマイクロホンと各マイクロホンが収音した音声をカメラ１０に転送する通信部のみを備えた一般的なマイクロホン装置を採用することができる。 Note that, as described above, the information acquisition device may be configured in either the camera 10 or the external sound collection device 20, and may be configured, for example, only in the camera 10. In this case, as the external sound pickup device 20, a general microphone device that includes only a plurality of microphones having different peak sensitivity directions and a communication unit that transfers the sound picked up by each microphone to the camera 10 is adopted. You can

（第２の実施の形態）
図８は本発明の第２の実施の形態に係る情報取得装置を示すブロック図である。図８において図１と同一の構成要素には同一符号を付して説明を省略する。第１の実施の形態においては、内蔵収音装置であるＳＴ収音部１３が取得した音声から被写体が発する音声の音声特徴を抽出できることを前提にして、外部収音装置を被写体近傍に配置することで被写体が発するＳ／Ｎの良い外部音声を取得することを可能にした。しかし、ＳＴ収音部１３と被写体とは比較的距離が離れていることから、ＳＴ収音部１３の収音音声によって被写体が発する音声特徴を確実に抽出できないことが考えられる。そこで、本実施の形態においては、画像特徴及び音声特徴を記録したデータベースを利用することで、被写体が発生する音声特徴を確実に抽出することを可能にするものである。本実施の形態においては、カメラ５０は、画像特徴抽出部１４及び画像音声（データベース）ＤＢ部１７ｄを付加した点が図１のカメラ１０と異なる。 (Second embodiment)
FIG. 8 is a block diagram showing an information acquisition device according to the second embodiment of the present invention. In FIG. 8, the same components as those in FIG. 1 are designated by the same reference numerals and the description thereof will be omitted. In the first embodiment, the external sound pickup device is arranged in the vicinity of the subject, on the assumption that the voice feature of the voice emitted by the subject can be extracted from the voice acquired by the ST sound pickup unit 13 which is the built-in sound pickup device. This makes it possible to acquire external sound with good S / N emitted by the subject. However, since the ST sound collecting unit 13 and the subject are relatively distant from each other, it is conceivable that the sound feature of the subject cannot be reliably extracted by the sound collected by the ST sound collecting unit 13. Therefore, in the present embodiment, it is possible to reliably extract the voice feature generated by the subject by using the database in which the image feature and the voice feature are recorded. In the present embodiment, the camera 50 differs from the camera 10 of FIG. 1 in that an image feature extraction unit 14 and an image / sound (database) DB unit 17d are added.

図８において、画像特徴抽出部１４は、プロセッサ等により構成されて、撮像部１２が撮像して取得した画像の画像特徴を抽出して制御部１１に出力するようになっている。記録部１７には画像音声データベース（ＤＢ）部１７ｄが設けられている。画像音声ＤＢ部１７ｄには、各種対象物の画像特徴及び各種対象物の音声特徴、即ち環境音の音声特徴の情報が記憶されている。 In FIG. 8, the image feature extraction unit 14 is configured by a processor or the like, and is configured to extract the image feature of the image captured by the imaging unit 12 and output the image feature to the control unit 11. The recording unit 17 is provided with an image / sound database (DB) unit 17d. The image and sound DB unit 17d stores image characteristics of various objects and sound characteristics of various objects, that is, information of sound characteristics of environmental sounds.

音声取得部１１ｆは、画像特徴抽出部１４によって抽出された画像特徴と画像音声ＤＢ部１７ｄに記憶されている画像特徴との比較によって、撮像部１２によって撮像されている主被写体の種類を判定する。音声取得部１１ｆは、判定した主被写体が発生する音声特徴を画像音声ＤＢ部１７ｄから読み出し、特徴抽出部１１ｇが取得した内部音声の音声特徴との比較によって、外部収音装置２０に指示すべき音声特徴の情報を生成するようになっている。 The audio acquisition unit 11f determines the type of the main subject imaged by the imaging unit 12 by comparing the image features extracted by the image feature extraction unit 14 with the image features stored in the image audio DB unit 17d. .. The sound acquisition unit 11f reads out the determined sound feature generated by the main subject from the image sound DB unit 17d, and compares the sound feature of the internal sound acquired by the feature extraction unit 11g with the external sound collection device 20. It is designed to generate information on voice characteristics.

例えば、主被写体が鳥である場合には、音声取得部１１ｆは、画像音声ＤＢ部１７ｄの画像特徴から鳥の種類を判定し、判定した種類の鳥の鳴き声の音声特徴を画像音声ＤＢ部１７ｄから読み出す。音声取得部１１ｆは、読み出した音声特徴を利用して、収音された内部音声から雑音成分を除去して、外部収音装置２０に指定する音声特徴の情報を生成する。 For example, when the main subject is a bird, the sound acquisition unit 11f determines the type of bird from the image features of the image and sound DB unit 17d, and determines the sound features of the squeal of the determined type of bird in the image and sound DB unit 17d. Read from. The voice acquisition unit 11f removes noise components from the collected internal voice by using the read voice feature, and generates voice feature information to be designated to the external sound pickup device 20.

また、例えば、主被写体が楽器である場合には、音声取得部１１ｆは、画像音声ＤＢ部１７ｄの画像特徴から楽器の種類を判定し、判定した種類の楽器の音の音声特徴を画像音声ＤＢ部１７ｄから読み出す。音声取得部１１ｆは、読み出した音声特徴を利用して、収音された内部音声から雑音成分を除去して、外部収音装置２０に指定する音声特徴の情報を生成する。同様にして、音声取得部１１ｆは、例えば大勢の人の中から、希望する人の声の音声特徴を抽出することも可能である。 In addition, for example, when the main subject is a musical instrument, the audio acquisition unit 11f determines the type of musical instrument from the image characteristics of the image / audio DB unit 17d, and determines the audio characteristic of the sound of the determined type of musical instrument in the image / audio DB. It is read from the section 17d. The voice acquisition unit 11f removes noise components from the collected internal voice by using the read voice feature, and generates voice feature information to be designated to the external sound pickup device 20. Similarly, the voice acquisition unit 11f can also extract voice features of a desired person's voice from a large number of people, for example.

なお、図８では画像特徴抽出部１４によって抽出した画像特徴に基づいて対象物を判定する例を説明したが、ユーザの入力操作によって、抽出すべき音の種類を指定するようになっていてもよい。 Although an example of determining the object based on the image features extracted by the image feature extraction unit 14 has been described with reference to FIG. 8, the type of sound to be extracted may be specified by the user's input operation. Good.

次に、このように構成された実施の形態の動作について、図９のフローチャートを参照して説明する。図９はカメラ５０の動作を説明するためのフローチャートである。図９において図６と同一の手順には同一符号を付して説明を省略する。図９のフローは、ステップＳ４に代えてステップＳ４１を採用した点が図６と異なる。 Next, the operation of the embodiment configured as described above will be described with reference to the flowchart of FIG. FIG. 9 is a flow chart for explaining the operation of the camera 50. In FIG. 9, the same steps as those in FIG. The flow of FIG. 9 differs from that of FIG. 6 in that step S41 is adopted instead of step S4.

ステップＳ４１においては、音声取得部１１ｆは、画像音声ＤＢ部１７ｄを利用して音声特徴を抽出する。例えば、図３及び図４の例では、草３２が風に吹かれることで、「ざわざわ」という音が生じ、この音が、被写体である鳥４１の鳴き声よりも大きな音でＳＴ収音部１３に収音されることがある。そこで、音声取得部１１ｆは、画像特徴抽出部１４によって抽出された画像特徴に基づいて、画像音声ＤＢ部１７ｄに格納されている情報から主被写体を特定し、特定した主被写体についての音声特徴を画像音声ＤＢ部１７ｄから読み出す。音声取得部１１ｆは、読み出した音声特徴と、収音された内部音声の音声特徴とに基づいて、ノイズ成分を判定し、主被写体から発する音声の音声特徴を抽出する。音声取得部１１ｆは、抽出した音声特徴の情報を通信部１８ａ，１８ｂを介して外部収音装置２０に送信する。 In step S41, the voice acquisition unit 11f uses the image / voice DB unit 17d to extract the voice feature. For example, in the example of FIG. 3 and FIG. 4, the grass 32 is blown by the wind, which causes a "buzzing" sound, and this sound is louder than the sound of the bird 41, which is the subject, and the ST sound collecting unit 13 May be picked up. Therefore, the sound acquisition unit 11f specifies the main subject from the information stored in the image sound DB unit 17d based on the image features extracted by the image feature extraction unit 14, and determines the sound features of the identified main subject. It is read from the image / audio DB unit 17d. The voice acquisition unit 11f determines the noise component based on the read voice feature and the voice feature of the collected internal voice, and extracts the voice feature of the voice emitted from the main subject. The voice acquisition unit 11f transmits the extracted voice feature information to the external sound collection device 20 via the communication units 18a and 18b.

他の作用は、第１の実施の形態と同様である。 Other actions are similar to those of the first embodiment.

このように本実施の形態においても、第１の実施の形態と同様の効果を得ることができる。更に、本実施の形態においては、外部音声の選択のための情報として、内部音声の音声特徴だけでなく画像特徴の情報を利用しており、内部音声に対象となる主被写体が発する音以外の音が含まれる場合でも、確実に主被写体が発する音に基づく音声特徴を抽出して、外部音声の選択に用いることができる。 As described above, also in this embodiment, the same effect as that of the first embodiment can be obtained. Furthermore, in the present embodiment, as the information for selecting the external sound, not only the sound characteristic of the internal sound but also the image characteristic information is used, and the information other than the sound emitted by the main subject as the target of the internal sound is used. Even if the sound is included, the sound feature based on the sound emitted by the main subject can be surely extracted and used for the selection of the external sound.

なお、上記各実施の形態においては、１つのマイクロホンによって収音される音声を外部音声としてカメラ１０において記録する例を説明したが、外部音声としては複数の音声、例えばステレオ音声を選択して記録するようになっていてもよい。 In each of the above embodiments, the example in which the sound picked up by one microphone is recorded in the camera 10 as the external sound is described. However, as the external sound, a plurality of sounds, for example, a stereo sound is selected and recorded. You may be allowed to do so.

（第３の実施の形態）
図１０は本発明の第３の実施の形態に係る情報取得装置を示すブロック図である。図１０において図１と同一の構成要素には同一符号を付して説明を省略する。また、図１１はカメラ６０及びレコーダ７０の外観の一例を示す説明図である。 (Third Embodiment)
FIG. 10 is a block diagram showing an information acquisition device according to the third embodiment of the present invention. 10, the same components as those in FIG. 1 are designated by the same reference numerals and the description thereof will be omitted. FIG. 11 is an explanatory diagram showing an example of the external appearance of the camera 60 and the recorder 70.

上述したように、外部収音装置は、例えば被写体近傍に固定的に配置され、カメラはユーザに携帯されて被写体から比較的離れた位置に移動自在に配置される。このため、カメラと外部収音装置とは被写体に対する距離や方向が異なる場合があり、カメラによって取得する映像にふさわしい音声が外部収音装置によって取得できるとは限らない場合がある。例えば、鳥が画面内で右から左に移動した場合でも、外部音声によって鳥の移動を感じられない場合もある。そこで、本実施の形態においては、映像にふさわしい音声として、映像中の主被写体の画像中の位置と主被写体が発する音声の音像定位とがなるべく一致する音声であって、主被写体が発生する音声が例えばなるべく高いＳ／Ｎで含まれるステレオ音声を取得するようになっている。 As described above, the external sound pickup device is fixedly disposed, for example, near the subject, and the camera is carried by the user and is movably disposed at a position relatively distant from the subject. Therefore, the camera and the external sound pickup device may have different distances and directions with respect to the subject, and it may not always be possible for the external sound pickup device to obtain sound suitable for the image obtained by the camera. For example, even if the bird moves from right to left on the screen, the movement of the bird may not be felt by the external sound. Therefore, in the present embodiment, as the sound suitable for the video, the sound generated by the main subject is a sound in which the position in the image of the main subject in the video and the sound image localization of the sound emitted by the main subject match as much as possible. , For example, is adapted to acquire a stereo sound included with a S / N as high as possible.

本実施の形態においても情報取得装置は、撮像装置内に構成してもよく、外部収音装置内に構成してもよく、撮像装置及び外部収音装置内に分散して構成してもよく、更に、これらの装置とは独立した装置内に構成してもよい。なお、図１０では、情報取得装置を、カメラ６０、外部収音装置を構成するレコーダ７０及び再生装置８０に分散して構成する例を示している。 Also in the present embodiment, the information acquisition device may be configured in the image pickup device, in the external sound pickup device, or in the image pickup device and the external sound pickup device in a distributed manner. Further, they may be configured in a device independent of these devices. It should be noted that FIG. 10 shows an example in which the information acquisition device is distributed and configured in the camera 60, the recorder 70 and the reproduction device 80 that configure the external sound pickup device.

先ず、図１１を参照して撮像装置であるカメラ６０及び外部収音装置であるレコーダ７０の外観について説明する。 First, the external appearance of the camera 60 as an image pickup device and the recorder 70 as an external sound pickup device will be described with reference to FIG.

図１１に示すカメラ６０は図１０の各回路が収納された筐体１０ａと光学系１２ａが収納される鏡筒１２ｂとを有する。筐体１０ａの上面には、操作部１５を構成するシャッタボタン１５ａが設けられている。 A camera 60 shown in FIG. 11 has a housing 10a in which each circuit in FIG. 10 is stored and a lens barrel 12b in which an optical system 12a is stored. A shutter button 15a forming the operation unit 15 is provided on the upper surface of the housing 10a.

また、図１１に示すレコーダ７０は、図１の各回路が収納された筐体７０ａを有している。筐体７０ａは、立方体形状に構成されて、端部に後述するＳＴ収音部７２を構成する２つのマイクロホン７２Ｒ，７２Ｌの収音部が突出して設けられている。 Further, the recorder 70 shown in FIG. 11 has a housing 70a in which each circuit of FIG. 1 is housed. The housing 70a has a cubic shape, and two end microphones of two microphones 72R and 72L that form an ST sound pickup section 72 are provided so as to project therefrom.

図１０において、撮像装置を構成するカメラ６０は、記録部１７に代えて記録部６１を採用すると共に画像特徴抽出部１４を付加した点が図１のカメラ１０と異なる。画像特徴抽出部１４は、プロセッサ等により構成されて、撮像部１２が撮像して取得した画像の画像特徴を抽出して制御部１１に出力するようになっている。例えば、画像特徴抽出部１４は、主被写体の画面上の位置やサイズの情報等を画像特徴として抽出することができる。 In FIG. 10, a camera 60 constituting the image pickup apparatus is different from the camera 10 of FIG. 1 in that a recording unit 61 is adopted instead of the recording unit 17 and an image feature extracting unit 14 is added. The image feature extraction unit 14 is configured by a processor and the like, and is configured to extract the image feature of the image captured and acquired by the imaging unit 12 and output the image feature to the control unit 11. For example, the image feature extraction unit 14 can extract information such as the position and size of the main subject on the screen as image features.

記録部６１は、制御部１１から与えられた撮像画像及び収音されたステレオの内部音声を記録する。記録部６１には連携情報部６１ａが設けられている。連携情報部６１ａには、レコーダ７０及び再生装置８０との間の通信に関する情報が記録されており、制御部１１は、連携情報部１７ｂから読み出した情報に基づいて通信部１８ａ，１８ｂを制御することで、レコーダ７０及び再生装置８０との間で通信により情報の授受が可能である。制御部１１は、レコーダ７０に対して内部音声（第１音声）の音声特徴の情報を送信すると共に、再生装置８０に対して撮像して得た映像及び内部音声を送信することができるようになっている。なお、制御部１１は、内部音声の音声特徴の情報と同時に画像特徴抽出部１４が抽出した画像特徴の情報をレコーダ７０に送信することもできるようになっている。 The recording unit 61 records the captured image provided from the control unit 11 and the stereo internal sound collected. The recording section 61 is provided with a cooperation information section 61a. Information related to communication between the recorder 70 and the playback device 80 is recorded in the cooperation information unit 61a, and the control unit 11 controls the communication units 18a and 18b based on the information read from the cooperation information unit 17b. As a result, information can be exchanged between the recorder 70 and the reproduction device 80 by communication. The control unit 11 transmits the audio feature information of the internal sound (first sound) to the recorder 70, and at the same time, can transmit the image and the internal sound obtained by imaging to the playback device 80. Is becoming The control unit 11 can also transmit the information of the image feature extracted by the image feature extraction unit 14 to the recorder 70 at the same time as the information of the audio feature of the internal voice.

レコーダ７０は、マルチ収音部２２、制御部２１及び記録部２５に夫々代えて、ＳＴ収音部７２、制御部７１及び記録部７３を採用した点が図１の外部収音装置２０と異なる。ＳＴ収音部７２は、２つのマイクロホン７２Ｒ，７２Ｌを有している。マイクロホン７２Ｒ，７２Ｌは、例えば、相互に同一の特性を有しており、感度ピーク方向が相互に所定の角度異なる方向に向くように配設されている。 The recorder 70 is different from the external sound collecting device 20 of FIG. 1 in that an ST sound collecting unit 72, a control unit 71, and a recording unit 73 are adopted in place of the multi sound collecting unit 22, the control unit 21, and the recording unit 25, respectively. .. The ST sound collecting unit 72 has two microphones 72R and 72L. The microphones 72R and 72L have, for example, the same characteristics as each other, and are arranged such that the sensitivity peak directions are different from each other by a predetermined angle.

制御部７１には収音制御部７１ａ及び音声取得部７１ｂが構成されており、収音制御部７１ａは、ＳＴ収音部７２の収音を制御するようになっている。音声取得部７１ｂは、ＳＴ収音部７２のマイクロホン７２Ｒ，７２Ｌが収音した外部収音音声をステレオ音声として取得することができるようになっている。制御部７１は、ＳＴ収音部７２によって収音されたステレオ音声である外部収音音声を記録部７３に与えて記録するようになっている。 The control unit 71 includes a sound collection control unit 71a and a sound acquisition unit 71b, and the sound collection control unit 71a controls the sound collection of the ST sound collection unit 72. The sound acquisition unit 71b can acquire the external sound collection sound collected by the microphones 72R and 72L of the ST sound collection unit 72 as stereo sound. The control unit 71 is configured to give an external sound collection sound, which is a stereo sound collected by the ST sound collection unit 72, to the recording unit 73 and record the sound.

記録部７３には連携情報部７３ａが設けられている。連携情報部７３ａは、カメラ６０及び再生装置８０との間の通信に関する情報が記録されており、制御部７１は、連携情報部７３ａから読み出した情報に基づいて通信部２６ａ，２６ｂを制御することで、カメラ６０及び再生装置８０との間で通信により情報の授受が可能である。制御部７１は、カメラ６０から内部音声（第１音声）の音声特徴の情報を受信すると共に、再生装置８０に対して外部音声を送信することができるようになっている。 The recording unit 73 is provided with a cooperation information unit 73a. Information related to communication between the camera 60 and the playback device 80 is recorded in the cooperation information unit 73a, and the control unit 71 controls the communication units 26a and 26b based on the information read from the cooperation information unit 73a. Thus, information can be exchanged between the camera 60 and the playback device 80 by communication. The control unit 71 can receive the information of the audio feature of the internal audio (first audio) from the camera 60 and can also transmit the external audio to the playback device 80.

本実施の形態においては、音声取得部７１ｂは、受信した内部音声の音声特徴に基づいて、収音した２つの外部収音音声を調整して外部音声を取得することができるようになっている。例えば、音声取得部７１ｂは、内部音声の音声特徴に基づいて、収音した２つの外部収音音声の位相及びレベルを個別に調整したステレオ音声を外部音声として取得するようになっていてもよい。 In the present embodiment, the voice acquisition unit 71b can adjust the two externally collected voices that have been collected based on the voice characteristics of the received internal voice to acquire the external voice. .. For example, the sound acquisition unit 71b may be configured to acquire, as the external sound, stereo sound in which the phases and levels of the two collected external sound are individually adjusted based on the sound feature of the internal sound. ..

例えば、カメラ６０において、画面中央に被写体が位置する状態で内部音声の収音を行うと、ＳＴ収音部１３によって取得されたステレオ音声である２つの内部音声の位相及びレベルは略同一であると考えられる。これに対し、被写体とマイクロホン７２Ｒ，７２Ｌとの位置及び向きの関係によっては、マイクロホン７２Ｒ，７２Ｌによって収音される被写体の音声の位相及びレベルは相互に異なることが考えられる。そこで、マイクロホン７２Ｒ，７２Ｌによって収音された外部収音音声の位相及びレベルを調整することで、位相及びレベルを一致させたステレオ音声を外部音声として取得することができる。 For example, in the camera 60, when the internal sound is picked up in a state where the subject is located in the center of the screen, the phases and levels of the two internal sounds, which are stereo sounds acquired by the ST sound pickup unit 13, are substantially the same. it is conceivable that. On the other hand, it is conceivable that the phase and level of the sound of the subject picked up by the microphones 72R, 72L may be different from each other depending on the relationship between the positions of the subject and the microphones 72R, 72L. Therefore, by adjusting the phase and level of the external sound pickup sound picked up by the microphones 72R and 72L, it is possible to acquire the stereo sound having the matched phase and level as the external sound.

また、音声取得部７１ｂは、ＳＴ収音部１３から被写体までの距離とＳＴ収音部７２から被写体までの距離とに基づいて、調整量を変化させてもよい。図１２はＳＴ収音部１３から被写体までの角度とＳＴ収音部７２から被写体までの角度との相違を説明するための説明図である。図１２の例はカメラ６０の光軸上にレコーダ７０が配置された例を示している。図１２に示すように、被写体がカメラ６０の光軸からＸ１だけずれた位置に位置する場合には、被写体までの距離が比較的長いカメラ６０（ＳＴ収音部１３）については光軸からの角度がθ１であるのに対し、被写体までの距離が比較的短いレコーダ７０（ＳＴ収音部７２）については光軸からの角度がθ１よりも大きいθ２となる。そこで、カメラ６０とレコーダ７０の被写体までの距離に応じて、外部収音音声の調整量を変更することで、より映像にふさわしい外部音声を取得することができる。 Further, the sound acquisition unit 71b may change the adjustment amount based on the distance from the ST sound collecting unit 13 to the subject and the distance from the ST sound collecting unit 72 to the subject. FIG. 12 is an explanatory diagram for explaining the difference between the angle from the ST sound collecting unit 13 to the subject and the angle from the ST sound collecting unit 72 to the subject. The example of FIG. 12 shows an example in which the recorder 70 is arranged on the optical axis of the camera 60. As shown in FIG. 12, when the subject is located at a position displaced by X1 from the optical axis of the camera 60, the distance from the optical axis of the camera 60 (ST sound collecting unit 13) to the subject is relatively long. While the angle is θ1, the angle from the optical axis is θ2, which is larger than θ1, for the recorder 70 (ST sound pickup unit 72) whose distance to the subject is relatively short. Therefore, by changing the adjustment amount of the external sound pickup sound according to the distance between the camera 60 and the subject of the recorder 70, it is possible to acquire the external sound more suitable for the image.

また、音声取得部７１ｂは、画像特徴の情報を用いて、２つの外部収音音声の位相及びレベルを個別に調整するようになっていてもよい。例えば、画像特徴の情報によって被写体が画面端部に位置することが示された場合には、画像特徴に基づいて外部収音音声の位相及びレベルを相互に異ならせることで、画面上における被写体の位置に対応して、撮影者にとっての被写体が発する音の定位と外部音声によって得られる音像定位とを略一致させることが可能である。なお、音声取得部７１ｂは、位相及びレベルに限らず、２つの外部収音音声に対して周波数領域における調整等を行ってもよい。 Further, the sound acquisition unit 71b may be configured to individually adjust the phases and levels of the two externally collected sounds by using the image feature information. For example, when the image feature information indicates that the subject is located at the edge of the screen, the phase and level of the external sound collection sound are made to differ from each other based on the image feature, thereby Corresponding to the position, it is possible to make the localization of the sound emitted by the subject for the photographer and the sound image localization obtained by the external sound substantially coincide. Note that the voice acquisition unit 71b may perform adjustment in the frequency domain or the like on the two externally collected voices, not limited to the phase and the level.

音声取得部７１ｂは、取得した外部音声を再生装置８０に送信する。なお、音声取得部７１ｂは、位相及びレベルが調整されたステレオ音声を外部音声として再生装置８０に送信してもよく、位相及びレベルの調整前のステレオ音声と調整値とを外部音声の情報として再生装置８０に送信してもよい。 The audio acquisition unit 71b transmits the acquired external audio to the playback device 80. The audio acquisition unit 71b may transmit the stereo sound of which the phase and level are adjusted to the reproduction device 80 as the external sound, and the stereo sound before the adjustment of the phase and the level and the adjustment value are used as the external sound information. It may be transmitted to the playback device 80.

再生装置８０は、コンピュータや、スマートフォンやタブレット端末等によって構成されていてもよい。再生装置８０には、制御部８１が構成されている。制御部８１は、ＣＰＵやＦＰＧＡ等を用いたプロセッサによって構成されて、図示しないメモリに記憶されたプログラムに従って動作して各部を制御するものであってもよいし、ハードウェアの電子回路で機能の一部又は全部を実現するものであってもよい。 The playback device 80 may be configured by a computer, a smartphone, a tablet terminal, or the like. The playback device 80 includes a control unit 81. The control unit 81 may be configured by a processor using a CPU, an FPGA, or the like, operate according to a program stored in a memory (not shown) to control each unit, or may be a function of an electronic circuit of hardware. Some or all may be realized.

再生装置８０には、操作部８３が設けられている。操作部８３は、再生モード設定、パラメータ操作等のための図示しない各種スイッチ、ダイヤル、リング部材等を含み、ユーザ操作に基づく操作信号を制御部８１に出力する。制御部８１は、操作部８３からの操作信号に基づいて、各部を制御するようになっている。通信部８２は、制御部８１に制御されて、カメラ６０及びレコーダ７０との間で通信を行って情報を授受することができるようになっている。制御部８１は、通信部８２を介して、カメラ６０からの映像及び内部音声を受信すると共に、レコーダ７０からの外部音声を受信する。 The playback device 80 is provided with an operation unit 83. The operation unit 83 includes various switches, dials, ring members and the like (not shown) for reproducing mode setting, parameter operation, etc., and outputs an operation signal based on a user operation to the control unit 81. The control unit 81 controls each unit based on an operation signal from the operation unit 83. The communication unit 82 is controlled by the control unit 81, and can communicate with the camera 60 and the recorder 70 to exchange information. The control unit 81 receives video and internal sound from the camera 60 and external sound from the recorder 70 via the communication unit 82.

制御部８１には、同期再生部８１ａが設けられており、同期再生部８１ａは、受信した内部音声、外部音声及び映像を同期させることができる。例えば、同期再生部８１ａは、撮像部１２によって取得された映像信号と同時に収音された音声信号の波形と外部音声の波形とを比較することで、被写体の映像と外部音声との同期をとるようになっていてもよい。 The control unit 81 is provided with a synchronous reproduction unit 81a, and the synchronous reproduction unit 81a can synchronize the received internal audio, external audio, and video. For example, the synchronous playback unit 81a synchronizes the video of the subject with the external audio by comparing the waveform of the audio signal picked up at the same time as the video signal acquired by the imaging unit 12 with the waveform of the external audio. It may be like this.

再生装置８０には再生部８４及び記録部８５が設けられている。記録部８５は、制御部８１から受信された内部音声、外部音声及び映像が与えられて、これらを同期させて記録するようになっている。再生部８４は、図示しない表示部及びスピーカを備えており、制御部８１に制御されて、同期再生部８１ａによって同期がとられた外部音声及び映像を再生出力することができる。 The reproducing device 80 is provided with a reproducing unit 84 and a recording unit 85. The recording unit 85 is provided with the internal sound, the external sound, and the image received from the control unit 81, and records them in synchronization with each other. The reproducing unit 84 includes a display unit and a speaker (not shown), and is controlled by the control unit 81 to reproduce and output external audio and video synchronized by the synchronous reproducing unit 81a.

次に、このように構成された実施の形態の動作について図１３から図１５を参照して説明する。図１３はカメラ６０の動作を説明するためのフローチャートであり、図１４はレコーダ７０の動作を説明するためのフローチャートであり、図１５は再生装置８０の動作を説明するためのフローチャートである。 Next, the operation of the embodiment configured as described above will be described with reference to FIGS. 13 to 15. 13 is a flow chart for explaining the operation of the camera 60, FIG. 14 is a flow chart for explaining the operation of the recorder 70, and FIG. 15 is a flow chart for explaining the operation of the playback device 80.

いま、上述した図３及び図４の例においてカメラ１０及び外部収音装置２０をそれぞれカメラ６０及びレコーダ７０に置き換えた例を想定する。即ち、樹木３１ａ，３１ｂに比較的近い位置にレコーダ７０を配設し、カメラ６０を携帯するユーザは、鳥４１が逃げ出さないように、また、障害物を避けるために、比較的鳥４１から離れた位置で撮影を行う。 Now, it is assumed that the camera 10 and the external sound collecting device 20 are replaced with the camera 60 and the recorder 70 in the above-described examples of FIGS. 3 and 4, respectively. That is, a user who has the recorder 70 disposed at a position relatively close to the trees 31a and 31b and carries the camera 60 is separated from the bird 41 relatively in order to prevent the bird 41 from escaping and to avoid obstacles. Shoot at the desired position.

従って、この場合においても、鳥４１の鳴き声を収音するものとすると、カメラ６０に内蔵されているＳＴ収音部１３による収音音声（第１音声）よりも、レコーダ７０のＳＴ収音部７２の各マイクロホン７２Ｒ，７２Ｌによる外部収音音声（第２音声）の方が良好なＳ／Ｎが得られるものと考えられる。 Therefore, even in this case, if the sound of the bird 41 is picked up, the ST sound collecting unit of the recorder 70 is more than the sound collecting sound (first sound) by the ST sound collecting unit 13 built in the camera 60. It is considered that a better S / N can be obtained by the external sound pickup voice (second voice) by the microphones 72R, 72L of 72.

カメラ６０の制御部１１は、電源が投入されると、図１３のステップＳ４１において、撮像モードが指示されたか否かを判定する。撮影モードが指示されていない場合には、制御部１１は、指定されたモード、例えば、レコーダ７０及び再生装置８０との連携のための設定や送受信を行う連携モードや記録画像の再生を行う再生モードに移行する。 When the power is turned on, the control unit 11 of the camera 60 determines in step S41 of FIG. 13 whether or not the imaging mode is instructed. When the shooting mode is not instructed, the control unit 11 sets the specified mode, for example, the cooperation mode for performing settings and transmission / reception for cooperation with the recorder 70 and the reproduction device 80, or the reproduction for reproducing the recorded image. Switch to mode.

撮像モードが指示されると、制御部１１は、次のステップＳ４２において、動画の撮影、ＳＴ収音部１３による収音を開始し、動画及びステレオ音声である内部音声の記録部６１への記録を開始する。なお、終了操作が行われると、制御部１１は、録画、録音を終了して、記録部６１の映像及び内部音声をファイル化する。 When the imaging mode is instructed, in the next step S42, the control unit 11 starts shooting a moving image and starts collecting sound by the ST sound collecting unit 13, and recording the moving image and the internal sound, which is stereo sound, in the recording unit 61. To start. When the ending operation is performed, the control unit 11 ends the recording and the sound recording, and converts the video and the internal sound of the recording unit 61 into a file.

制御部１１は、ステップＳ４３において、レコーダ７０との連携が指定されているか否かを判定する。制御部１１は、連携が指定されていない場合には、処理をステップＳ４１に戻し、連携が指定されている場合には、処理をステップＳ４４に移行する。 The control unit 11 determines in step S43 whether or not the cooperation with the recorder 70 is designated. When the cooperation is not designated, the control unit 11 returns the process to step S41, and when the cooperation is designated, moves the process to step S44.

制御部１１の特徴抽出部１１ｇは、次のステップＳ４４において、ＳＴ収音部１３からのステレオ音声である内部音声（第１音声）の音声特徴を抽出し、抽出した音声特徴からノイズを除去した後、通信部１８ａ，１８ｂを介してレコーダ７０に送信する（ステップＳ４５）。 In the next step S44, the feature extraction unit 11g of the control unit 11 extracts the voice feature of the internal voice (first voice) that is the stereo voice from the ST sound pickup unit 13 and removes noise from the extracted voice feature. Then, it transmits to the recorder 70 via the communication parts 18a and 18b (step S45).

また、画像特徴抽出部１４は、撮像画像から音声に対応する画像即ち主被写体を判定し（ステップＳ４６）、画像特徴を抽出してレコーダ７０に送信して（ステップＳ４７）、処理をステップＳ４１に戻す。なお、ステップＳ４６，Ｓ４７の処理は、レコーダ７０において外部収音音声の調整に画像特徴の情報を用いない場合には、省略することができる。 In addition, the image feature extraction unit 14 determines an image corresponding to sound, that is, a main subject from the captured image (step S46), extracts image features and transmits them to the recorder 70 (step S47), and the process proceeds to step S41. return. Note that the processes of steps S46 and S47 can be omitted when the recorder 70 does not use the image feature information for adjusting the externally collected sound.

一方、レコーダ７０の制御部７１は、電源が投入されると、図１４のステップＳ５１において、録音モードが指定されているか否かを判定する。制御部７１は、録音モードが指定されていない場合には、再生モード等の指定されている他のモードを実行する。録音モードが指定されると、制御部７１は、次のステップＳ５２において、ＳＴ収音部７２による収音を開始し、ステレオ音声である外部収音音声の記録部７３への記録を開始する。 On the other hand, when the power is turned on, the control unit 71 of the recorder 70 determines in step S51 of FIG. 14 whether the recording mode is designated. When the recording mode is not designated, the control unit 71 executes another designated mode such as a reproduction mode. When the recording mode is designated, in the next step S52, the control unit 71 starts the sound collection by the ST sound collection unit 72 and starts recording the external sound collection sound, which is a stereo sound, in the recording unit 73.

制御部７１は、ステップＳ５３において、カメラ６０との連携が設定されている否かを判定する。連携が設定されていない場合には、制御部７１は、処理をステップＳ５６に移行して通常の録音を行う。即ち、ＳＴ収音部７２によって取得されたステレオ音声である外部収音音声がそのまま記録部７３に記録される。 The control unit 71 determines in step S53 whether or not the cooperation with the camera 60 is set. When the cooperation is not set, the control unit 71 shifts the processing to step S56 and performs normal recording. That is, the external sound collection sound, which is the stereo sound acquired by the ST sound collection unit 72, is directly recorded in the recording unit 73.

カメラ６０との連携が設定されている場合には、制御部７１は、ステップＳ５３からＳ５４に移行して、カメラ６０からの内部音声の音声特徴や画像特徴を取得する。制御部７１は、内部音声の音声特徴を時間情報を用いて、外部収音音声と共に記録部７３に記録する（ステップＳ５５）。なお、上述したように、音声取得部７１ｂによって、内部音声の音声特徴や画像特徴を用いて、ＳＴ収音部７２からの外部収音音声の位相やレベルを調整し、調整後に得た外部音声を記録部７３に記録するようになっていてもよい。 When the cooperation with the camera 60 is set, the control unit 71 proceeds from step S53 to S54 and acquires the audio feature and the image feature of the internal voice from the camera 60. The control unit 71 records the voice feature of the internal voice in the recording unit 73 together with the external voice pickup using the time information (step S55). As described above, the sound acquisition unit 71b adjusts the phase and level of the external sound collection sound from the ST sound collection unit 72 using the sound feature and the image feature of the internal sound, and obtains the external sound obtained after the adjustment. May be recorded in the recording unit 73.

本実施の形態においては、カメラ６０によって取得された映像とレコーダ７０によって取得される外部音声とは、再生装置８０によって同期再生されるようになっている。ユーザが操作部８３を操作して動画再生を指示するものとする。制御部８１は、図１５のステップＳ６１において動画再生が指定されたか否かを判定しており、動画再生が指定されると、処理をステップＳ６１に移行する。なお、動画再生が指定されない場合には、制御部８１は、指定されている他のモードを実行する。 In the present embodiment, the video captured by the camera 60 and the external audio captured by the recorder 70 are played back synchronously by the playback device 80. It is assumed that the user operates the operation unit 83 to give an instruction to reproduce a moving image. The control unit 81 determines whether or not the moving image reproduction is designated in step S61 of FIG. 15, and when the moving image reproduction is designated, the process proceeds to step S61. In addition, when the moving image reproduction is not designated, the control unit 81 executes another designated mode.

ステップＳ６２において、制御部８１は、通信部８２を介して、カメラ６０から映像及び内部音声を取得し、レコーダ７０から外部音声を取得する。なお、制御部８１は、外部音声として外部収音音声及び調整値を受信する場合もある。同期再生部８１ａは、ステップＳ６３において、受信した内部音声、外部音声及び映像を同期させる。即ち、同期再生部８１ａは、撮像部１２によって取得された映像信号と同時に収音された音声信号の波形と外部音声の波形とを比較することで、被写体の映像と外部音声との同期をとり、同期した映像及び外部音声を再生部８４に出力する。 In step S <b> 62, the control unit 81 acquires video and internal sound from the camera 60 and external sound from the recorder 70 via the communication unit 82. In addition, the control unit 81 may receive an externally collected sound and an adjustment value as the external sound. In step S63, the synchronous playback unit 81a synchronizes the received internal audio, external audio, and video. That is, the synchronous reproduction unit 81a synchronizes the image of the subject with the external audio by comparing the waveform of the audio signal collected at the same time as the video signal acquired by the imaging unit 12 with the external audio waveform. , And outputs the synchronized video and external audio to the reproduction unit 84.

再生部８４に供給される外部音声は、外部収音音声が内部音声の音声特徴や画像特徴によって調整されたものであり、映像中の主被写体の位置に応じた音像定位を有する。こうして、再生部８４の表示画面に表示される映像にふさわしい音声が再生部８４のスピーカから出力される。 The external sound supplied to the reproducing unit 84 is obtained by adjusting the external sound collection sound according to the sound feature and the image feature of the internal sound, and has the sound image localization according to the position of the main subject in the video. In this way, the sound suitable for the video displayed on the display screen of the reproducing unit 84 is output from the speaker of the reproducing unit 84.

このように本実施の形態においては、撮像装置と外部収音装置と被写体との位置関係に拘わらず、映像にふさわしい音声を同期させて出力することが可能である。 As described above, in the present embodiment, it is possible to output the audio suitable for the video in synchronization regardless of the positional relationship among the imaging device, the external sound collecting device, and the subject.

上記実施の形態においては、撮像のための機器として、デジタルカメラを用いて説明したが、カメラとしては、デジタル一眼レフカメラでもコンパクトデジタルカメラでもよく、ビデオカメラ、ムービーカメラでもよく、さらに、携帯電話やスマートフォンなど携帯情報端末（ＰＤＡ：Personal Digital Assist）等に内蔵されるカメラでも勿論構わない。 In the above embodiment, a digital camera is used as an image capturing device, but the camera may be a digital single lens reflex camera, a compact digital camera, a video camera, a movie camera, or a mobile phone. Of course, a camera built in a personal digital assistant (PDA) such as a smartphone or a smart phone may be used.

本発明は、上記各実施形態にそのまま限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素の幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above embodiments as they are, and can be embodied by modifying the constituent elements within a range not departing from the gist of the invention in an implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in each of the above embodiments. For example, some of all the constituent elements shown in the embodiment may be deleted. Furthermore, the constituent elements of different embodiments may be combined appropriately.

なお、特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。また、これらの動作フローを構成する各ステップは、発明の本質に影響しない部分については、適宜省略も可能であることは言うまでもない。 It should be noted that even if the description of the claims, the description, and the operation flow in the drawings is made by using “first,” “next,” and the like for convenience, it is essential that they are performed in this order. It does not mean. Further, it goes without saying that the steps constituting these operation flows can be appropriately omitted as long as they do not affect the essence of the invention.

なお、ここで説明した技術のうち、主にフローチャートで説明した制御に関しては、プログラムで設定可能であることが多く、記録媒体や記録部に収められる場合もある。この記録媒体、記録部への記録の仕方は、製品出荷時に記録してもよく、配布された記録媒体を利用してもよく、インターネットを介してダウンロードしたものでもよい。 Of the techniques described here, the control mainly described in the flowchart is often settable by a program and may be stored in a recording medium or a recording unit. The recording medium and the recording unit may be recorded at the time of product shipment, may be distributed recording medium, or may be downloaded via the Internet.

なお、実施例中で、「部」（セクションやユニット）として記載した部分は、専用の回路や、複数の汎用の回路を組み合わせて構成してもよく、必要に応じて、予めプログラムされたソフトウェアに従って動作を行うマイコン、ＣＰＵなどのプロセッサ、あるいはＦＰＧＡなどシーケンサを組み合わせて構成されてもよい。また、その制御の一部または全部を外部の装置が引き受けるような設計も可能で、この場合、有線や無線の通信回路が介在する。通信は、ブルートゥースやＷｉＦｉ、電話回線などで行えばよく、ＵＳＢなどで行っても良い。専用の回路、汎用の回路や制御部を一体としてＡＳＩＣとして構成してもよい。 It should be noted that, in the embodiments, a portion described as a "section" (section or unit) may be configured by a dedicated circuit or a combination of a plurality of general-purpose circuits, and if necessary, preprogrammed software may be used. It may be configured by combining a microcomputer that operates according to the above, a processor such as a CPU, or a sequencer such as an FPGA. It is also possible to design such that a part or all of the control is taken over by an external device, and in this case, a wired or wireless communication circuit intervenes. The communication may be performed using Bluetooth, WiFi, a telephone line, or the like, and may be performed using USB or the like. A dedicated circuit, a general-purpose circuit, and a control unit may be integrated into an ASIC.

１０…カメラ、１１…制御部、１１ａ…撮影制御部、１１ｂ…画像処理部、１１ｃ…ピント、画角情報部、１１ｅ…収音制御及び処理部、１１ｆ…音声取得部、１１ｇ…特徴抽出部、１１ｈ…同期処理部、１２…撮像部、１２ａ…光学系、１３…ＳＴ収音部、１４…画像特徴抽出部、１５…操作部、１６…表示部、１７…記録部、１７ａ…収音画像音声記録部、１７ｂ…連携情報部、１７ｃ…外部音声記録部、１７ｄ…画像音声ＤＢ部、１８ａ，１８ｂ，２６ａ，２６ｂ…通信部、２０…外部収音装置、２１…制御部、２１ａ…収音制御部、２１ｂ…音声取得部、２１ｃ…特徴抽出部、２２…マルチ収音部、２５…記録部、２５ａ…音声情報部、２５ｂ…連携情報部、２５ｃ…音声情報部。 10 ... Camera, 11 ... Control part, 11a ... Shooting control part, 11b ... Image processing part, 11c ... Focus, angle-of-view information part, 11e ... Sound collection control and processing part, 11f ... Sound acquisition part, 11g ... Feature extraction part , 11h ... Synchronization processing part, 12 ... Imaging part, 12a ... Optical system, 13 ... ST sound collecting part, 14 ... Image feature extracting part, 15 ... Operation part, 16 ... Display part, 17 ... Recording part, 17a ... Sound collection Image-sound recording unit, 17b ... Cooperation information unit, 17c ... External sound recording unit, 17d ... Image-sound DB unit, 18a, 18b, 26a, 26b ... Communication unit, 20 ... External sound collecting device, 21 ... Control unit, 21a ... Sound collection control section, 21b ... Voice acquisition section, 21c ... Feature extraction section, 22 ... Multi sound collection section, 25 ... Recording section, 25a ... Voice information section, 25b ... Linkage information section, 25c ... Voice information section.

Claims

A feature extraction unit that is included in the imaging device and that acquires a first voice that is collected by a built-in sound collection device that collects ambient sound of the imaging device, and that extracts a feature of the acquired first voice;
Based on the feature of the first voice extracted by the feature extractor, at least one of selection and adjustment of a plurality of second voices picked up by a plurality of microphones having different sensitivity distribution directions is performed to obtain a third voice. A voice acquisition unit that
An information acquisition device, comprising: a synchronization processing unit that synchronizes the third sound acquired by the sound acquisition unit with an image of a subject captured by the imaging device.

The feature extraction unit acquires the first voice and the second voice, and extracts the feature of the first voice and the feature of the second voice,
The voice acquisition unit performs at least one of selection and adjustment for the plurality of second voices based on a comparison between the feature of the first voice and the feature of the second voice extracted by the feature extraction unit, and a third voice. The information acquisition apparatus according to claim 1, wherein the information acquisition apparatus acquires.

The said audio | voice acquisition part acquires said 3rd audio | voice from said some 2nd audio | voice by selection based on the similarity of the characteristic of said 1st audio | voice and the characteristic of 2nd audio | voice. Information acquisition device.

The voice acquisition unit acquires, as the third voice, a voice with the highest S / N among a plurality of second voices selected based on the similarity between the feature of the first voice and the feature of the second voice. The information acquisition device according to claim 3, wherein

Further equipped with a database of environmental sound characteristics,
The information acquisition apparatus according to claim 2, wherein the feature extraction unit extracts a feature of the first voice using a feature of the environmental sound.

The image pickup device further comprises an image feature extraction unit for extracting the features of the image of the subject obtained by imaging,
6. The information acquisition according to claim 5, wherein the feature extraction unit extracts the feature of the first voice by referring to the database based on the feature of the image extracted by the image feature extraction unit. apparatus.

The voice acquisition unit adjusts at least one of a phase and a level of the plurality of second voices based on the feature of the first voice extracted by the feature extraction unit to obtain the third voice. The information acquisition device according to claim 1.

The information acquisition apparatus according to claim 1, further comprising a recording unit that records the video and the third audio synchronized by the synchronization processing unit.

The information acquisition device according to claim 1, further comprising a communication unit that transmits the characteristics of the first sound to the sound acquisition unit when the image is captured by the imaging device.

The information acquisition device according to claim 1, wherein the feature extraction unit and the voice acquisition unit are built in at least one of an external sound pickup device having the plurality of microphones and the imaging device.

The synchronization processing unit is built in the playback device,
The information acquisition apparatus according to claim 10, wherein the reproduction apparatus includes a communication unit that acquires the video and the third audio from the imaging apparatus and the external sound collection apparatus.

An information acquisition method in an information acquisition device including a feature extraction unit, a voice acquisition unit, and a synchronization processing unit,
The feature extracting unit obtains the first voice picked up by a built-in sound pickup device which is built in the image pickup device and picks up ambient sound of the image pickup device, and extracts the feature of the obtained first voice. ,
The voice acquisition unit performs at least one of selection and adjustment for a plurality of second voices picked up by a plurality of microphones having different sensitivity distribution directions based on the extracted feature of the first voice, and a third voice To get
The information acquisition method, wherein the synchronization processing unit synchronizes the acquired third sound with an image of a subject captured by the imaging device.

On the computer,
Acquiring the first sound collected by the built-in sound collecting device that is built in the image pickup device and picks up ambient sound of the image pickup device, and extracts the characteristic of the acquired first sound,
Based on the extracted characteristics of the first voice, at least one of selection and adjustment for a plurality of second voices picked up by a plurality of microphones having different sensitivity distribution directions is performed to obtain a third voice,
An information acquisition program for executing a procedure of synchronizing the acquired third sound with the image of the subject obtained by the imaging device.

An image pickup apparatus comprising: a built-in sound pickup device that picks up ambient sound; and a feature extraction unit that extracts a feature of the first sound picked up by the built-in sound pickup device,
A plurality of microphones having different sensitivity distribution directions are provided, and a plurality of second sounds picked up by the plurality of microphones are selected and adjusted based on the characteristics of the first sound extracted by the characteristic extraction unit of the imaging device. An external sound pickup device that performs at least one of the above to obtain the third sound,
An information acquisition system, comprising: a reproduction device that performs a synchronization process for synchronizing the third sound acquired by the external sound collecting device with the image of the subject obtained by the image capturing device.