JP2016140055A

JP2016140055A - Moving image sound recording system, moving image sound recording device, moving image sound recording program, and moving image sound recording method

Info

Publication number: JP2016140055A
Application number: JP2015226788A
Authority: JP
Inventors: 亮人相場; Akihito Aiba
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-01-23
Filing date: 2015-11-19
Publication date: 2016-08-04
Anticipated expiration: 2035-11-19
Also published as: JP6631193B2

Abstract

PROBLEM TO BE SOLVED: To enable the acquisition of a sound signal with high accuracy even in the case that a relation between a position for acquiring a moving image signal and a position for acquiring a sound signal changes when acquiring the moving image signal and the sound signal at the same time.SOLUTION: A moving image sound recording system 1 includes: moving image photographing means 21; voice recording means 31; voice enhancement means 32 for enhancing a voice signal of a voice coming from an optional direction among voice signals acquired by the voice recording means 31; photographing parameter acquisition means 24 for acquiring information representing a photographing direction of the moving image photographing means 21 and information representing a positional relation between the moving image photographing means 21 and the voice recording means 31; an enhancement parameter control means 33 for controlling the direction of a voice enhanced by the voice enhancement means 32 on the basis of the information acquired by the photographing parameter acquisition means 24.SELECTED DRAWING: Figure 1

Description

本発明は、動画像および音を収録する動画像音収録システム、動画像音収録装置、動画像音収録プログラム、および動画像音収録方法に関する。 The present invention relates to a moving image sound recording system, a moving image sound recording device, a moving image sound recording program, and a moving image sound recording method for recording moving images and sounds.

ビデオカメラやスマートフォン（高機能携帯電話機）、タブレット端末、テレビ会議装置など、カメラとマイクを内蔵し、カメラによる動画像信号の取得と並行して、マイクによる音声信号の取得を行う機能を備えた機器（デバイス）がある。 Built-in camera and microphone, such as a video camera, smart phone (high-function mobile phone), tablet terminal, video conference device, etc. There is a device.

しかし、これらの機器では、カメラで撮影した被写体とは無関係で不要な音声がマイクに混入してしまう問題がある。そこで、この問題に対処した装置として、動画像信号を取得するカメラと、音声信号を取得するマイクと、前記カメラで取得された動画像信号内における撮影対象者の位置と、前記カメラが撮影に用いるパラメータ情報（画角情報、焦点距離情報など）とに基づき、自端末に対する前記撮影対象者の相対位置を推定する推定手段と、前記相対位置に向けて、前記マイクの指向性を調整する調整手段と、を備えた携帯端末がある（特許文献１）。 However, these devices have a problem that unnecessary sound is mixed into the microphone regardless of the subject photographed by the camera. Therefore, as a device that has coped with this problem, a camera that acquires a moving image signal, a microphone that acquires an audio signal, the position of a person to be imaged in the moving image signal acquired by the camera, and the camera that captures the image. Based on parameter information to be used (view angle information, focal length information, etc.), estimation means for estimating the relative position of the subject to be photographed with respect to the terminal, and adjustment for adjusting the directivity of the microphone toward the relative position There is a portable terminal provided with a means (Patent Document 1).

しかしながら、この携帯端末には、カメラとマイクがどちらも同一の装置に内蔵され、その位置関係が変化しないという前提がある。したがって、カメラとマイクが別々の装置に備わっており、動画像信号あるいは音声信号を無線通信等でやり取りする場合、さらにそれぞれの装置を持って動かす場合などは、その位置関係が変動するため適応できないという問題がある。 However, this portable terminal has a premise that both the camera and the microphone are built in the same device, and the positional relationship does not change. Therefore, the camera and microphone are provided in different devices, and when moving image signals or audio signals are exchanged by wireless communication, etc., or when each device is moved, the positional relationship fluctuates and cannot be adapted. There is a problem.

本発明は、このような問題を解決するためになされたものであり、その目的は、動画像信号と、音信号とを同時に取得するときに、動画像信号を取得する位置と、音信号を取得する位置との関係が変化する場合でも、音信号を高精度に取得できるようにすることである。 The present invention has been made to solve such a problem. The object of the present invention is to obtain a position for acquiring a moving image signal and a sound signal when acquiring the moving image signal and the sound signal at the same time. Even when the relationship with the acquisition position changes, the sound signal can be acquired with high accuracy.

本発明は、被写体を撮影して動画像信号を取得する動画像取得手段と、音を収録して音信号を取得する音取得手段と、前記動画像取得手段の撮影方向を表す情報、および前記動画像取得手段と前記音取得手段との位置関係を表す情報を取得する撮影パラメータ取得手段と、前記撮影パラメータ取得手段により取得された前記情報に基づいて、前記音取得手段により取得する前記音信号のうち、所定の方向の音信号を強調する音強調手段と、を有する動画像音収録システムである。 The present invention includes a moving image acquisition unit that captures a moving image signal by photographing a subject, a sound acquisition unit that captures sound and acquires a sound signal, information indicating a shooting direction of the moving image acquisition unit, and An imaging parameter acquisition unit that acquires information representing a positional relationship between the moving image acquisition unit and the sound acquisition unit, and the sound signal acquired by the sound acquisition unit based on the information acquired by the imaging parameter acquisition unit Among them, a moving image sound recording system having sound enhancement means for enhancing a sound signal in a predetermined direction.

本発明によれば、動画像信号と、音信号とを同時に取得するときに、動画像信号を取得する位置と、音信号を取得する位置との関係が変化する場合でも、音信号を高精度に取得することができる。 According to the present invention, when acquiring a moving image signal and a sound signal at the same time, even if the relationship between the position where the moving image signal is acquired and the position where the sound signal is acquired changes, the sound signal is highly accurate. Can be obtained.

本発明の実施形態に係る動画像音声収録ステムの構成を示すブロック図である。It is a block diagram which shows the structure of the moving image audio | voice recording system which concerns on embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムにおける撮影パラメータ情報および装置状態情報の関係について説明するための図である。It is a figure for demonstrating the relationship between the imaging | photography parameter information and apparatus state information in the moving image audio | voice recording system which concerns on embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムにおいて、音声収録装置に対する動画像撮影装置の位置および向きを所定の状態に調整する操作の流れの一例について説明するための図である。It is a figure for demonstrating an example of the flow of operation which adjusts the position and direction of the moving image imaging device with respect to an audio recording device to a predetermined state in the moving image audio recording system which concerns on embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムが撮影パラメータ情報の基準状態を取得する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in which the moving image audio | voice recording system which concerns on embodiment of this invention acquires the reference | standard state of imaging parameter information. 本発明の実施形態における動画像撮影装置の撮影方向と音声収録装置の指向性との対応関係について説明するための図である。It is a figure for demonstrating the correspondence of the imaging | photography direction of the moving image imaging device and the directivity of an audio | voice recording device in embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムにおいて、音声収録装置に対する動画像撮影装置の位置が変化する場合の音声収録装置の指向性の制御について説明するための図である。It is a figure for demonstrating the directivity control of an audio | voice recording apparatus when the position of the moving image imaging device with respect to an audio | voice recording apparatus changes in the moving image audio | voice recording system which concerns on embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the moving image audio | voice recording system which concerns on embodiment of this invention. 本発明の実施形態に係る動画像音声収録システムを実現するコンピュータシステムのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer system which implement | achieves the moving image audio | voice recording system which concerns on embodiment of this invention.

以下、本発明の実施形態について図面を参照しつつ詳細に説明する。
〈動画像音声収録システム〉
図１は、本発明の実施形態に係る動画像音声収録システム１の構成を示すブロック図である。図示のように、本発明の実施形態に係る動画像音声収録システム（以下、本システム）１は、動画像撮影装置２と、音声収録装置３からなる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<Video recording system>
FIG. 1 is a block diagram showing a configuration of a moving image audio recording system 1 according to an embodiment of the present invention. As shown in the figure, a moving image audio recording system (hereinafter, this system) 1 according to an embodiment of the present invention includes a moving image photographing device 2 and an audio recording device 3.

動画像撮影装置２は被写体を撮影して動画像信号を取得する。音声収録装置３は音声を収録して音声信号（音声に対応する電気信号）を取得する。また、動画像撮影装置２は、自身が取得した動画像信号と、音声収録装置３が取得した音声信号とを結合して動画像音声信号を生成し、記憶する。 The moving image capturing device 2 captures a subject and acquires a moving image signal. The voice recording device 3 records voice and acquires a voice signal (electrical signal corresponding to the voice). In addition, the moving image photographing device 2 generates a moving image sound signal by combining the moving image signal acquired by itself and the sound signal acquired by the sound recording device 3 and stores the moving image sound signal.

なお、音声収録装置３は、音声（人間の発声による音）を収録して音声信号を生成するだけでなく、周囲に他の音（物音、機械の動作音、騒音など）が存在するときは、それらの音に対応する電気信号も生成する。すなわち、音声収録装置３は、音声を含む全ての音を収録して、それらに対応する電気信号である音信号を生成する。したがって、厳密には「音声収録装置３」は「音収録装置３」、「動画像音声収録システム１」は「動画像音収録システム１」とすべきであるが、便宜上、本実施形態では「音声収録装置３」、「動画像音声収録システム１」とした。 Note that the voice recording device 3 not only records voice (sound produced by human speech) and generates a voice signal, but also when other sounds (such as physical sounds, machine operation sounds, noises, etc.) are present in the surroundings. The electric signals corresponding to those sounds are also generated. That is, the sound recording device 3 records all sounds including sound and generates sound signals that are electrical signals corresponding to them. Therefore, strictly speaking, “sound recording device 3” should be “sound recording device 3” and “moving image sound recording system 1” should be “moving image sound recording system 1”. The audio recording device 3 ”and the“ moving image audio recording system 1 ”were used.

動画像撮影装置２は、動画像撮影手段２１、基準点認識手段２２、装置状態取得手段２３、撮影パラメータ取得手段２４、送信手段２５、受信手段２６、動画像音声結合手段２７、および動画像音声記憶手段２８を備えている。ここで、基準点認識手段２２、撮影パラメータ取得手段２４、および動画像音声結合手段２７は、ＣＰＵ、ＲＯＭ、およびＲＡＭを有する制御手段２０により構成される。すなわち、ＲＯＭに記憶されている動画像音声収録プログラムなどのコンピュータプログラムをＣＰＵがＲＡＭを作業エリアとして処理することにより実現される機能ブロックである。 The moving image shooting device 2 includes a moving image shooting unit 21, a reference point recognition unit 22, an apparatus state acquisition unit 23, a shooting parameter acquisition unit 24, a transmission unit 25, a reception unit 26, a moving image audio combining unit 27, and a moving image audio. Storage means 28 is provided. Here, the reference point recognizing means 22, the photographing parameter acquiring means 24, and the moving image / sound combining means 27 are constituted by a control means 20 having a CPU, a ROM, and a RAM. That is, it is a functional block realized when the CPU processes a computer program such as a moving image audio recording program stored in the ROM using the RAM as a work area.

また、音声収録装置３は、音声収録手段３１、音声強調手段３２、強調パラメータ制御手段３３、受信手段３４、および送信手段３５を備えている。ここで、音声強調手段３２および強調パラメータ制御手段３３は、ＣＰＵ、ＲＯＭ、およびＲＡＭを有する制御手段３０により構成される。すなわち、ＲＯＭに記憶されている動画像音声収録プログラムなどのコンピュータプログラムをＣＰＵがＲＡＭを作業エリアとして処理することにより実現される機能ブロックである。 The voice recording device 3 includes a voice recording unit 31, a voice enhancement unit 32, an enhancement parameter control unit 33, a reception unit 34, and a transmission unit 35. Here, the voice emphasis unit 32 and the emphasis parameter control unit 33 are configured by a control unit 30 having a CPU, a ROM, and a RAM. That is, it is a functional block realized when the CPU processes a computer program such as a moving image audio recording program stored in the ROM using the RAM as a work area.

《動画像撮影装置》
動画像撮影装置２における動画像撮影手段２１は、例えばカメラであり、被写体を撮影して動画像信号を取得する。動画像撮影手段２１は本発明に係る動画像取得手段として機能する。《Moving image shooting device》
The moving image photographing means 21 in the moving image photographing device 2 is a camera, for example, and photographs a subject to obtain a moving image signal. The moving image photographing means 21 functions as moving image acquisition means according to the present invention.

基準点認識手段２２は、動画像音声収録システム１が動画像信号および音声信号の取得を開始するに先立ち、動画像撮影装置２と音声収録装置３とを所定の位置関係（以下、初期位置関係）に設定するための手段である。 Prior to the moving image / audio recording system 1 starting to acquire moving image signals and audio signals, the reference point recognizing means 22 establishes a predetermined positional relationship between the moving image capturing device 2 and the audio recording device 3 (hereinafter referred to as an initial positional relationship). ).

より詳しくは、動画像信号により構成される画像上の音声収録装置３の位置（以下、動画像信号により構成される画像上の位置を画像座標という）が所定の位置であるか否かを認識し、その結果を基準点合致情報として出力する。基準点合致情報は、例えば音声収録装置３の画像座標が所定の位置である場合は“true”、所定の位置でない場合は“false”となる。認識には、例えばパターンマッチングなどの手法を用いる（詳細については図３を参照して後述する）。なお、基準点は複数あってもよく、その場合、全ての点がそれぞれの所定の座標にあるか否かを判定する。 More specifically, it is recognized whether or not the position of the audio recording device 3 on the image composed of moving image signals (hereinafter, the position on the image composed of moving image signals is referred to as image coordinates) is a predetermined position. The result is output as reference point match information. The reference point match information is, for example, “true” when the image coordinates of the audio recording device 3 are a predetermined position, and “false” when the image coordinates are not the predetermined position. For the recognition, for example, a technique such as pattern matching is used (details will be described later with reference to FIG. 3). There may be a plurality of reference points. In this case, it is determined whether or not all the points are at their predetermined coordinates.

装置状態取得手段２３は装置状態情報を取得する。装置状態情報とは、動画像撮影装置２の加速度や角加速度などである。装置状態情報は例えば、三次元空間内に設定したＸＹＺ直交座標におけるＸ軸方向の加速度ａ_X、Ｙ軸方向の加速度ａ_Y、Ｚ軸に対する角加速度ω_Z、からなる３つの値の組（ａ_X，ａ_Y，ω_Z）で表すことができる。ここで、Ｘ軸およびＹ軸を含む面が水平面であり、Ｚ軸は鉛直線である。これらの加速度や角加速度は、例えば加速度センサや角加速度センサを用いて取得することができる。 The device state acquisition unit 23 acquires device state information. The apparatus state information is the acceleration or angular acceleration of the moving image capturing apparatus 2. The apparatus state information includes, for example, a set of three values (a in the X-axis direction acceleration a _X , Y-axis direction acceleration a _Y in the XYZ orthogonal coordinates set in the three-dimensional space, and the angular acceleration ω _{Z in} the Z-axis (a _X , a _Y , ω _Z ). Here, the plane including the X axis and the Y axis is a horizontal plane, and the Z axis is a vertical line. These accelerations and angular accelerations can be acquired using, for example, an acceleration sensor or an angular acceleration sensor.

撮影パラメータ取得手段２４は動画像撮影装置２の撮影パラメータ情報を取得する。撮影パラメータ情報とは、例えば、動画像撮影装置２が前述した初期位置関係に設定された状態を原点（位置関係を表す情報の基準値）、および０度（撮影方向を表す情報の基準値）としたときの、現在の動画像撮影装置２の座標（位置関係を表す情報）、および角度（撮影方向を表す情報）である。 The shooting parameter acquisition unit 24 acquires shooting parameter information of the moving image shooting apparatus 2. The shooting parameter information is, for example, the state in which the moving image shooting apparatus 2 is set to the above-described initial positional relationship, the origin (reference value of information representing the positional relationship), and 0 degree (reference value of information representing the shooting direction). Are the coordinates (information indicating the positional relationship) and the angle (information indicating the shooting direction) of the current moving image shooting device 2.

例えばこの情報は（Ｘ，Ｙ，θ）の３組の値で表すことが出来る。これらは、装置状態情報と、基準点認識手段２２の基準点合致情報から推定する。すなわち、装置状態情報を取得する時間間隔をΔｔ、前回推定した撮影パラメータ情報を（Ｘ₀，Ｙ₀，θ₀）とすると、現在の撮影パラメータ情報は下記の式〔１〕〜式〔３〕により算出できる。 For example, this information can be expressed by three sets of values (X, Y, θ). These are estimated from the apparatus state information and the reference point match information of the reference point recognition means 22. That is, assuming that the time interval for acquiring the apparatus state information is Δt and the previously estimated shooting parameter information is (X ₀ , Y ₀ , θ ₀ ), the current shooting parameter information is represented by the following equations [1] to [3]. Can be calculated.

Ｘ＝Ｘ₀＋ａ_XΔｔ²…式〔１〕
Ｙ＝Ｙ₀＋ａ_YΔｔ²…式〔２〕
θ＝θ₀＋ω_ZΔｔ²…式〔３〕
これらの式において、ａ_XΔｔ²、ａ_YΔｔ²、ω_ZΔｔ²は、それぞれ加速度ａ_X、加速度ａ_Y、角加速度ω_Zの時間軸上の二重積分を表す。 X = X ₀ + a _X Δt ² Formula [1]
Y = Y ₀ + a _Y Δt ² Formula [2]
θ = θ ₀ + ω _Z Δt ² Formula [3]
In these equations, a _X Δt ² , a _Y Δt ² , and ω _Z Δt ² represent double integrals on the time axis of acceleration a _X , acceleration a _Y , and angular acceleration ω _Z , respectively.

ここで、初めの推定時の（Ｘ₀，Ｙ₀，θ₀）については、基準点合致情報が“true”、かつ装置状態情報が静止、すなわち（ａ_X，ａ_Y，ω_Z）＝（０，０，０）の状態を基準状態、すなわち（Ｘ₀，Ｙ₀，θ₀）＝（０，０，０）とし、そこから推定を始めるやり方がある。この場合、基準状態から、装置状態情報（ａ_X、ａ_Y、ω_Z）の時間軸上の二重積分を算出することで、動画像撮影装置２の現在の位置および撮影方向を表す撮影パラメータ情報を取得する。 Here, for (X ₀ , Y ₀ , θ ₀ ) at the time of the initial estimation, the reference point match information is “true” and the device state information is stationary, that is, (a _X , a _Y , ω _Z ) = ( There is a method in which the state of ( ₀ , ₀ , ₀ ) is set as a reference state, that is, (X ₀ , Y ₀ , θ ₀ ) = ( ₀ , ₀ , 0), and estimation is started therefrom. In this case, a shooting parameter representing the current position and shooting direction of the moving image shooting device 2 is calculated by calculating a double integral on the time axis of the device status information (a _X , a _Y , ω _Z ) from the reference state. Get information.

図２は、撮影パラメータ情報と装置状態情報との関係について説明するための図である。ここで、図２Ａは撮影パラメータ情報を示し、図２Ｂは装置状態情報を示す。 FIG. 2 is a diagram for explaining the relationship between the shooting parameter information and the apparatus state information. Here, FIG. 2A shows photographing parameter information, and FIG. 2B shows apparatus state information.

図２Ａにおいて、（Ｘ，Ｙ，θ）＝（０，０，０）の点１０１は初期位置関係における動画像撮影装置２の撮影パラメータ情報を表し、（Ｘ，Ｙ，θ）＝（Ｘ₁，Ｙ₁，θ₁）の点１０２は動画像撮影装置２の現在の撮影パラメータ情報を表す。また、矢印１０３は動画像撮影装置２の現在の撮影方向（θ₁）を表す。また、（Ｘ，Ｙ，θ）＝（０，Ｙ_ref，０）の点１００は初期位置関係における音声収録装置３に対する撮影パラメータ情報を表す。すなわち、初期位置関係では、動画像撮影装置２に対して、音声収録装置３がＹ軸方向（＝矢印１０３に示す撮影方向）にＹ_ref離れた位置に存在することを表す。 In FIG. 2A, a point 101 of (X, Y, θ) = (0, 0, 0) represents shooting parameter information of the moving image shooting apparatus 2 in the initial positional relationship, and (X, Y, θ) = (X ₁ , Y ₁ , θ ₁ ) 102 represents the current shooting parameter information of the moving image shooting apparatus 2. An arrow 103 represents the current shooting direction (θ ₁ ) of the moving image shooting apparatus 2. A point 100 of (X, Y, θ) = (0, Y _ref , 0) represents shooting parameter information for the audio recording device 3 in the initial positional relationship. That is, the initial positional relationship indicates that the audio recording device 3 is present at a position Y _ref away from the moving image photographing device 2 in the Y-axis direction (= the photographing direction indicated by the arrow 103).

図２Ｂにおいて、ａ_X，ａ_Y，ω_Zは、任意の点１０５（Ｘ，Ｙ，θ）におけるＸ軸方向の加速度、Ｙ軸方向の加速度、Ｚ軸の周りの角加速度の向きを表す。 In FIG. 2B, a _X , a _Y , and ω _Z represent the direction of acceleration in the X-axis direction, acceleration in the Y-axis direction, and angular acceleration around the Z-axis at an arbitrary point 105 (X, Y, θ).

図１の説明に戻る。送信手段２５は撮影パラメータ情報を音声収録装置３へ送信する。通信方法は有線通信でも無線通信でもよい。受信手段２６は音声収録装置３で収録され、強調処理された音声信号（以下、強調音声信号）を受信する。この通信方法も有線通信でも無線通信でもよい。 Returning to the description of FIG. The transmission means 25 transmits the shooting parameter information to the audio recording device 3. The communication method may be wired communication or wireless communication. The receiving means 26 receives an audio signal recorded by the audio recording device 3 and subjected to an enhancement process (hereinafter, an emphasized audio signal). This communication method may be wired communication or wireless communication.

動画像音声結合手段２７は、動画像撮影手段２１により取得された動画像信号と、受信手段２６により受信された強調音声信号とを結合し、関連付けられた動画像音声信号とする。動画像音声記憶手段２８は、例えばハードディスク、ソリッドステートディスク、ＳＤメモリなどからなり、動画像音声信号を記憶する。 The moving image audio combining unit 27 combines the moving image signal acquired by the moving image capturing unit 21 and the enhanced audio signal received by the receiving unit 26 to obtain an associated moving image audio signal. The moving image audio storage means 28 is composed of, for example, a hard disk, a solid state disk, an SD memory, etc., and stores a moving image audio signal.

《音声収録装置》
本発明に係る音取得手段としての音声収録手段３１は、例えばマイクアレイからなり、音声を収録して音声信号を生成する。音声強調手段３２は、音声信号から、任意の方向から来た音声を強調した強調音声信号を生成する。強調方法には、例えばマイクアレイによるビームフォーミングや、指向性の向きの異なるマイクの切り替えなどを用いる。マイクアレイによるビームフォーミングについては後に詳述する。 <Audio recording device>
The sound recording means 31 as the sound acquisition means according to the present invention comprises, for example, a microphone array, and records sound and generates a sound signal. The voice emphasizing unit 32 generates an emphasized voice signal in which voice coming from an arbitrary direction is emphasized from the voice signal. As an emphasis method, for example, beam forming by a microphone array or switching of microphones having different directivity directions is used. The beam forming by the microphone array will be described in detail later.

受信手段３４は動画像撮影装置２から撮影パラメータ情報を受信する。強調パラメータ制御手段３３は、受信手段３４により受信された撮影パラメータ情報に基づいて、音声強調手段３２の強調パラメータを制御する。この強調パラメータについては後に詳述する。送信手段３５は、音声強調手段３２により生成された強調音声信号を動画像撮影装置２へ送信する。 The receiving unit 34 receives shooting parameter information from the moving image shooting apparatus 2. The enhancement parameter control unit 33 controls the enhancement parameter of the voice enhancement unit 32 based on the shooting parameter information received by the reception unit 34. This enhancement parameter will be described in detail later. The transmission unit 35 transmits the enhanced audio signal generated by the audio enhancement unit 32 to the moving image capturing apparatus 2.

なお、この実施形態では、動画像撮影装置２側に動画像音声記憶手段２８があり、音声収録装置３からの音声信号を受信しているが、逆に、音声収録装置３側に動画像音声記憶手段を設けて、動画像撮影装置２からの動画像信号を受信するように構成することもできる。また、動画像音声記憶手段をさらに別の装置に設けてもよい。 In this embodiment, the moving image capturing device 28 has the moving image sound storage means 28 and receives the sound signal from the sound recording device 3, but conversely, the moving image sound is transmitted to the sound recording device 3 side. A storage unit may be provided to receive a moving image signal from the moving image capturing device 2. Further, the moving image / audio storage means may be provided in another device.

また、この実施形態では、動画像音声信号を最終的に記憶しているが、例えばディスプレイやスピーカなどの出力手段を設けて、そこから出力してもよいし、テレビ会議のような用途でネットワークを介して他装置へ送信してもよい。 In this embodiment, the moving image audio signal is finally stored. However, for example, an output unit such as a display or a speaker may be provided and output from the output unit. You may transmit to other apparatuses via.

また、この実施形態では、動画像撮影装置２と音声収録装置３とが初期位置関係であることを認識した後は、音声収録装置３は動かず、動画像撮影装置２が動くことを前提としているため、装置状態取得手段２３を動画像撮影装置２３内に設けたが、音声収録装置３内にも装置状態取得手段を設けることで、音声収録装置３の装置状態情報を取得するように構成してもよい。このように構成すれば、音声収録装置３が動いても、動画像撮影装置２は撮影パラメータ情報を取得することができる。 Moreover, in this embodiment, after recognizing that the moving image photographing device 2 and the sound recording device 3 are in the initial positional relationship, the sound recording device 3 does not move and the moving image photographing device 2 moves. Therefore, the apparatus state acquisition means 23 is provided in the moving image photographing apparatus 23. However, the apparatus state acquisition means is also provided in the audio recording apparatus 3, so that the apparatus state information of the audio recording apparatus 3 is acquired. May be. If comprised in this way, even if the audio | voice recording apparatus 3 moves, the moving image imaging device 2 can acquire imaging parameter information.

〈撮影パラメータ情報の基準状態の取得〉
音声収録装置３に対する動画像撮影装置２の位置、および向きを所定の状態に調整する操作の流れの一例について図３を参照して説明する。ここでは、図３Ａに示すように、動画像撮影装置２をカメラ付きタブレット端末とし、音声収録装置３を動画像撮影装置２と無線通信可能なワイヤレスマイクとした。 <Acquisition of reference state of shooting parameter information>
An example of an operation flow for adjusting the position and orientation of the moving image capturing apparatus 2 with respect to the audio recording apparatus 3 to a predetermined state will be described with reference to FIG. Here, as shown in FIG. 3A, the moving image shooting device 2 is a tablet terminal with a camera, and the audio recording device 3 is a wireless microphone capable of wireless communication with the moving image shooting device 2.

この操作は第１〜第３の前提のもとで実行する。
第１の前提：音声収録装置３上の３点を基準点とする。ここでは、図３Ａに示すように、音声収録装置３に付けた３つの十字型のマーカーＰa、Ｐb、Ｐcの中心を基準点とした。
第２の前提：この３点を、所定の位置、所定の向きで動画像撮影装置２により撮影したときの画像座標をそれぞれ（ｘa，ｙa）、（ｘb，ｙb）、（ｘc，ｙc）とする。
第３の前提：そのとき、音声収録装置３は静止していなければならない。 This operation is executed under the first to third assumptions.
First premise: Three points on the audio recording device 3 are set as reference points. Here, as shown in FIG. 3A, the center of the three cross-shaped markers Pa, Pb, and Pc attached to the audio recording device 3 is used as a reference point.
Second premise: The image coordinates when these three points are photographed by the moving image photographing device 2 at a predetermined position and in a predetermined direction are (xa, ya), (xb, yb), (xc, yc), respectively. To do.
Third premise: At that time, the audio recording device 3 must be stationary.

ユーザは撮影中の音声収録装置３の画像を見ながら、３つの基準点であるマーカーＰa、Ｐb、Ｐcの中心がそれぞれ（ｘ₁，ｙ₁）、（ｘ₂，ｙ₂）、（ｘ₃，ｙ₃）に一致するように動画像撮影装置２を動かす。このとき、確認用に撮影中の画像の（ｘ₁，ｙ₁）、（ｘ₂，ｙ₂）、（ｘ₃，ｙ₃）の点にガイドを表示することが好適である。ここでは、図３Ｂに示すように、（ｘ₁，ｙ₁）、（ｘ₂，ｙ₂）、（ｘ₃，ｙ₃）を指示する矢印Ｐ₁、Ｐ₂、Ｐ₃をディスプレイ２００に表示した。 While viewing the image of the sound recording device 3 being photographed, the center of the markers Pa, Pb, and Pc, which are the three reference points, is (x ₁ , y ₁ ), (x ₂ , y ₂ ), (x ₃ ), respectively. , Y ₃ ), the moving image photographing device 2 is moved. At this time, it is preferable to display a guide at points (x ₁ , y ₁ ), (x ₂ , y ₂ ), (x ₃ , y ₃ ) of the image being photographed for confirmation. Here, as shown in FIG. 3B, arrows P ₁ , P ₂ , and P ₃ indicating (x ₁ , y ₁ ), (x ₂ , y ₂ ), and (x ₃ , y ₃ ) are displayed on the display 200. did.

ユーザが動画像撮影装置２を動かした結果、図３Ｃに示すように、撮影中の画像上の３つの基準点の座標がそれぞれ（ｘ₁，ｙ₁）、（ｘ₂，ｙ₂）、（ｘ₃，ｙ₃）に一致すると、基準点認識手段２２は基準点合致情報として“true”を出力する。 As a result of the user moving the moving image photographing device 2, as shown in FIG. 3C, the coordinates of the three reference points on the image being photographed are (x ₁ , y ₁ ), (x ₂ , y ₂ ), ( If they match x ₃ , y ₃ ), the reference point recognition means 22 outputs “true” as reference point match information.

さらに、装置状態取得手段２３は、動画像撮影装置２の加速度および角加速度をセンサから取得し、装置状態情報を出力する。動画像撮影装置２が静止していれば装置状態情報は（０，０，０）を示す。 Furthermore, the apparatus state acquisition means 23 acquires the acceleration and angular acceleration of the moving image capturing apparatus 2 from the sensor, and outputs apparatus state information. If the moving image photographing device 2 is stationary, the device state information indicates (0, 0, 0).

基準点合致情報として“true”を、装置状態情報として（０，０，０）を受け取った撮影パラメータ取得手段２４は、その時点の状態を、座標、角度計算のための基準状態とする。すなわち、撮影パラメータ情報：（Ｘ，Ｙ，θ）＝（０，０，０）とし、操作を終了する。 The imaging parameter acquisition unit 24 that has received “true” as the reference point match information and (0, 0, 0) as the apparatus state information sets the current state as a reference state for calculating coordinates and angles. That is, the shooting parameter information: (X, Y, θ) = (0, 0, 0) is set, and the operation is terminated.

図４は、動画像音声収録システム１が撮影パラメータ情報の基準状態を取得する処理の流れを示すフローチャートである。 FIG. 4 is a flowchart showing a flow of processing in which the moving image audio recording system 1 acquires the reference state of the shooting parameter information.

まず、動画像撮影手段２１が動画像信号を取得する(ステップＳ１)。次に、取得された動画像信号により構成される動画像上の基準点の画像座標（例えば図３ＢにおけるマーカーＰa、Ｐb、Ｐcの中心の画像座標（ｘa，ｙa）、（ｘb，ｙb）、（ｘc，ｙc））が所定の座標（例えば（ｘ₁，ｙ₁）、（ｘ₂，ｙ₂）、（ｘ₃，ｙ₃））と一致しているか否かを基準点認識手段２２が判定する（ステップＳ２）。判定の結果、一致していれば（ステップＳ２：YES）、ステップＳ３へ進み、一致していなければ（ステップＳ２：NO）、ステップＳ１に戻る。 First, the moving image photographing means 21 acquires a moving image signal (step S1). Next, the image coordinates of the reference point on the moving image constituted by the acquired moving image signals (for example, the image coordinates (xa, ya), (xb, yb) of the centers of the markers Pa, Pb, Pc in FIG. 3B), The reference point recognizing means 22 determines whether (xc, yc)) matches a predetermined coordinate (for example, (x ₁ , y ₁ ), (x ₂ , y ₂ ), (x ₃ , y ₃ )). Determine (step S2). As a result of the determination, if they match (step S2: YES), the process proceeds to step S3. If they do not match (step S2: NO), the process returns to step S1.

ステップＳ３では、装置状態取得手段２３が装置状態情報（ａ_X，ａ_Y，ω_Z）を取得する。次に、この装置状態情報が、動画像撮影装置２が静止していることを示しているか否か、すなわち（ａ_X，ａ_Y，ω_Z）＝（０，０，０）であるか否かを撮影パラメータ取得手段２４が判定する（ステップＳ４）。判定の結果、静止していることを示していれば（ステップＳ４：YES）、ステップＳ５へ進み、静止していることを示していなければ（ステップＳ４：NO）、ステップＳ１に戻る。 In step S3, the device state acquisition unit 23 acquires device state information (a _X , a _Y , ω _Z ). Next, whether or not the apparatus state information indicates that the moving image capturing apparatus 2 is stationary, that is, whether or not (a _X , a _Y , ω _Z ) = (0, 0, 0). This is determined by the imaging parameter acquisition means 24 (step S4). As a result of the determination, if it indicates that it is stationary (step S4: YES), it proceeds to step S5. If it does not indicate that it is stationary (step S4: NO), it returns to step S1.

ステップＳ５では、撮影パラメータ取得手段２４が、現在の撮影パラメータ情報を基準状態に設定する。すなわち（ａ_X，ａ_Y，ω_Z）＝（０，０，０）のとき、動画像撮影装置２の撮影パラメータ情報（Ｘ，Ｙ，θ）として、基準状態、すなわち（Ｘ₀，Ｙ₀，θ₀）＝（０，０，０）を設定する。この結果、例えば図３のマーカーＰa、Ｐb、Ｐcの中心が矢印Ｐ₁、Ｐ₂、Ｐ₃の先端に一致し、かつ動画像撮影装置２が静止している時に、図２Ａの点１０１における撮影パラメータ情報が設定される。 In step S5, the shooting parameter acquisition unit 24 sets the current shooting parameter information to the reference state. That is, when (a _X , a _Y , ω _Z ) = (0, 0, 0), as the shooting parameter information (X, Y, θ) of the moving image shooting apparatus 2, the reference state, that is, (X ₀ , Y _0). , Θ ₀ ) = (0, ₀ , ₀ ). As a result, for example, when the centers of the markers Pa, Pb, and Pc in FIG. 3 coincide with the tips of the arrows P ₁ , P ₂ , and P ₃ and the moving image photographing apparatus 2 is stationary, the point 101 in FIG. Shooting parameter information is set.

〈動画像撮影装置の撮影方向と音声収録装置の指向性との関係〉
図５は、動画像音声収録システム１における動画像撮影装置２の撮影方向と音声収録装置３の指向性との対応関係について説明するための図である。 <Relationship between shooting direction of moving image shooting device and directivity of audio recording device>
FIG. 5 is a diagram for explaining the correspondence between the shooting direction of the moving image shooting apparatus 2 and the directivity of the sound recording apparatus 3 in the moving image / voice recording system 1.

図５Ａは、動画像撮影手段２１の撮影方向が音声収録装置３に向かう方向１１１である状態を示している。この図において、ｐ，ｑ，ｒ，ｓはマイク（音声収録手段３１）を表している。すなわち、４つのマイクが正方形の頂点に配置されているといえる。 FIG. 5A shows a state where the shooting direction of the moving image shooting means 21 is a direction 111 toward the sound recording device 3. In this figure, p, q, r, and s represent microphones (audio recording means 31). That is, it can be said that four microphones are arranged at the apex of the square.

この状態では、音声収録装置３の指向性を、動画像撮影手段２１の中心に向いている方向にピーク値を有する指向性１２１、およびその反対の方向にピーク値を有する指向性１２２からなる双極性にする。 In this state, the directivity of the sound recording device 3 is a bipolar structure including a directivity 121 having a peak value in a direction toward the center of the moving image photographing means 21 and a directivity 122 having a peak value in the opposite direction. Make it sex.

図５Ｂは、動画像撮影手段２１の撮影方向が音声収録装置３へ向かう方向から反時計回りにθ回転した方向１１２に向いている状態を示している。この状態では、音声収録装置３の指向性を、動画像撮影手段２１への方向から時計回りにφ回転した方向１１３にピーク値を有する指向性１２３からなる単極性にする。 FIG. 5B shows a state in which the shooting direction of the moving image shooting means 21 is directed to the direction 112 rotated θ from the direction toward the audio recording device 3 counterclockwise. In this state, the directivity of the audio recording device 3 is set to a unipolarity composed of the directivity 123 having a peak value in the direction 113 rotated φ from the direction toward the moving image photographing means 21 clockwise.

図５Ｃは、動画像撮影手段２１の撮影方向が音声収録装置３へ向かう方向から時計回りにθ回転した方向１１４に向いている状態を示している。この状態では、音声収録装置３の指向性を、動画像撮影手段２１への方向から反時計回りにφ回転した方向１１５にピーク値を有する指向性１２４からなる単極性にする。 FIG. 5C shows a state in which the shooting direction of the moving image shooting means 21 is directed in the direction 114 rotated θ from the direction toward the audio recording device 3 in the clockwise direction. In this state, the directivity of the audio recording device 3 is set to a unipolarity having a directivity 124 having a peak value in the direction 115 rotated φ counterclockwise from the direction toward the moving image photographing means 21.

図５Ａ、図５Ｂ、図５Ｃに示すように、通常、注目している音源は動画像撮影手段２１の撮影方向１１１、１１２、１１４に位置すると考えられるため、音声収録装置３の指向性を１２２、１２３、１２４のように設定することにより、音源からの音を強調することができる。これらの指向性を実現するには、音声収録装置３の強調パラメータ制御手段３３が制御信号として、撮影方向情報θに応じて形成する指向性の種類（単極性か双極性か）と、その向きを示すφの値を音声強調手段３２に出力し、音声強調手段３２がその制御信号を基に指向性を形成することになる。 As shown in FIGS. 5A, 5B, and 5C, the sound source of interest is normally considered to be located in the shooting directions 111, 112, and 114 of the moving image shooting means 21, and therefore the directivity of the sound recording device 3 is set to 122. , 123, and 124, the sound from the sound source can be emphasized. In order to realize these directivities, the type of directivity (monopolar or bipolar) formed by the emphasis parameter control means 33 of the audio recording device 3 according to the shooting direction information θ as a control signal, and its direction Is output to the voice enhancement unit 32, and the voice enhancement unit 32 forms directivity based on the control signal.

θとφとの関係を示すテーブルの一例を下記の表１に示す。ここで、θ＝０は図５Ａに示されている方向１１１、すなわち、動画像撮影手段２１の中心から音声収録装置３のマイクに向かう方向である。また、時計回りがθのプラス方向、反時計回りがθのマイナス方向である。つまり、図５Ｂではθはほぼ−π／４であり、図５Ｃではθはほぼπ／４である。 An example of a table showing the relationship between θ and φ is shown in Table 1 below. Here, θ = 0 is the direction 111 shown in FIG. 5A, that is, the direction from the center of the moving image photographing means 21 toward the microphone of the audio recording device 3. Further, clockwise is the positive direction of θ, and counterclockwise is the negative direction of θ. That is, in FIG. 5B, θ is approximately −π / 4, and in FIG. 5C, θ is approximately π / 4.

また、表１において、φ=０は音声収録装置３から動画像撮影手段２１の中心に向かう方向、すなわちθ＝πの方向である。また、時計回りがφのプラス方向、反時計回りがφのマイナス方向である。 In Table 1, φ = 0 is the direction from the sound recording device 3 toward the center of the moving image photographing means 21, that is, the direction of θ = π. Further, clockwise is the positive direction of φ, and counterclockwise is the negative direction of φ.

したがって、図５Ａ、図５Ｂ、図５Ｃと表１との関係は下記のようになる。
図５Ａ：「−π／６＜θ≦π／６」・・・「双極性、φ＝０」
図５Ｂ：「７π／６＜θ≦１１π／６」・・・「単極性、φ＝π／４」
図５Ｃ：「π／６＜θ≦５π／６」・・・「単極性、φ＝−π／４」 Therefore, the relationship between FIG. 5A, FIG. 5B, FIG. 5C and Table 1 is as follows.
FIG. 5A: “−π / 6 <θ ≦ π / 6”... “Bipolarity, φ = 0”
FIG. 5B: “7π / 6 <θ ≦ 11π / 6”... “Unipolar, φ = π / 4”
FIG. 5C: “π / 6 <θ ≦ 5π / 6”... “Unipolar, φ = −π / 4”

音声信号を強調する方法としては、マイクアレイによるビームフォーミングなどを用いる。例えば、φ方向に指向性を形成する場合、遅延和ビームフォーミングでは下記の式〔４〕で強調音声信号が得られる。 As a method for enhancing the audio signal, beam forming by a microphone array or the like is used. For example, when directivity is formed in the φ direction, an enhanced speech signal is obtained by the following equation [4] in the delay sum beamforming.

Ｙ(ω)＝Ｗ^H(ω)ｚ(ω) …式〔４〕
ここでωは音声信号のスペクトルの角周波数、Ｙは強調音声信号のスペクトル、ｚは入力音声信号のスペクトル、Ｗは強調のためのフィルタ係数、Ｈは複素共役転置を表す。ｚとＷはベクトル表現になっており、それぞれ、下記の式〔５〕、〔６〕で表される。 Y (ω) = W ^H (ω) z (ω) (4)
Here, ω is the angular frequency of the spectrum of the speech signal, Y is the spectrum of the enhanced speech signal, z is the spectrum of the input speech signal, W is the filter coefficient for enhancement, and H is the complex conjugate transpose. z and W are represented by vectors and are expressed by the following equations [5] and [6], respectively.

ｚ(ω)＝［Ｚ₁(ω),・・・,Ｚ_M(ω)］^T …式〔５〕
Ｗ(ω)＝［Ｗ₁(ω),・・・,Ｗ_M(ω)］^T …式〔６〕
ここで、Ｚの添え字はマイクの番号を表し、Ｍはマイクの個数である。また、Ｔは行列の転置を表す。 z (ω) = [Z ₁ (ω),..., Z _M (ω)] ^T Equation (5)
W (ω) = [W ₁ (ω),..., W _M (ω)] ^T Equation (6)
Here, the Z suffix represents the number of the microphone, and M is the number of microphones. T represents transposition of the matrix.

ここでＷの値は、各マイクの位置と強調したい音源の方向が同一平面状にあるとすると、次の式〔７〕のようになる。
Ｗ_m(ω)＝exp｛ｊ(ω/c)(ｘ_msinφ＋ｙ_mcosφ)｝ …式〔７〕
ここで、ｃは音速、ｘ_m，ｙ_mはマイクの座標、φは強調したい音声信号の音源の方向である。 Here, the value of W is expressed by the following equation [7] when the position of each microphone and the direction of the sound source to be emphasized are in the same plane.
W _m (ω) = exp {j (ω / c) (x _m sinφ + y _m cosφ)} (7)
Here, c is the sound velocity, x _m, the y _m coordinates of the microphone, phi is the direction of the sound source of the audio signal to be emphasized.

〈音声収録装置に対する動画像撮影装置の位置が変化する場合の音声収録装置の指向性の制御〉
図６は、音声収録装置に対する動画像撮影装置の位置が変化する場合の音声収録装置の指向性の制御について説明するための図である。 <Directional control of audio recording device when position of moving image shooting device changes with respect to audio recording device>
FIG. 6 is a diagram for describing directivity control of the audio recording device when the position of the moving image capturing device changes with respect to the audio recording device.

この場合は例えば、音声収録装置３から見て、３６０度（２π）を８分割した領域ａ〜ｈのどこに動画像撮影装置２が位置するかで制御信号を切り替える。例えば図示のように動画像撮影装置２が領域ｃに位置する場合、下記の表２のように撮影方向θと制御信号（指向性の種類、向きφ）を設定する。 In this case, for example, the control signal is switched depending on where the moving image capturing apparatus 2 is located in the areas a to h obtained by dividing 360 degrees (2π) into eight when viewed from the audio recording apparatus 3. For example, when the moving image photographing device 2 is located in the region c as shown in the drawing, the photographing direction θ and the control signal (type of directivity, direction φ) are set as shown in Table 2 below.

この表において、θ＝０は音声収録装置３から領域ｅの中心に向かう方向である。また、時計回りがθのプラス方向、反時計回りがθのマイナス方向である。また、φ＝０は音声収録装置３から領域ａの中心に向かう方向である。また、時計回りがφのプラス方向、反時計回りがφのマイナス方向である。 In this table, θ = 0 is a direction from the audio recording device 3 toward the center of the region e. Further, clockwise is the positive direction of θ, and counterclockwise is the negative direction of θ. Φ = 0 is a direction from the audio recording device 3 toward the center of the area a. Further, clockwise is the positive direction of φ, and counterclockwise is the negative direction of φ.

この表に示すデータを持つテーブルを領域毎に用意することで、音声収録装置３に対する動画像撮影装置２の方向が変化しても、適切な指向性を形成することができる。なお、動画像撮影装置２が領域ｃ以外の領域（以下、注目領域）に位置するときは、表２におけるθの範囲の値を領域ｃと注目領域との角度差を付加した値にすればよい。 By preparing a table having the data shown in this table for each region, even if the direction of the moving image photographing device 2 with respect to the sound recording device 3 changes, appropriate directivity can be formed. When the moving image capturing apparatus 2 is located in a region other than the region c (hereinafter referred to as a region of interest), the value of the range of θ in Table 2 is set to a value obtained by adding an angle difference between the region c and the region of interest. Good.

なお、ここではφの値を４段階に変化させているが、例えばθの値に応じて連続的に変化させるなど、より多くの段階に変化させてもよい。 Here, although the value of φ is changed in four steps, it may be changed in more steps, for example, continuously changing according to the value of θ.

〈動画像音声収録システムの動作〉
図７は、動画像音声収録システム１の動作を示すフローチャートである。 <Operation of video / audio recording system>
FIG. 7 is a flowchart showing the operation of the moving image audio recording system 1.

まず、動画像撮影装置２では、装置状態取得手段２３が装置状態情報を取得する（ステップＳ１１）。次に、撮影パラメータ取得手段２４が、ステップＳ１で取得された装置状態情報から撮影パラメータ情報を推定する（ステップＳ１２）。このときの基準状態については図４に示した処理により設定済みのものを用いる。 First, in the moving image photographing apparatus 2, the apparatus state acquisition unit 23 acquires apparatus state information (step S11). Next, the shooting parameter acquisition unit 24 estimates shooting parameter information from the apparatus state information acquired in step S1 (step S12). For the reference state at this time, a reference state that has been set by the processing shown in FIG. 4 is used.

次いで、送信手段２５が、ステップＳ２で推定された撮影パラメータ情報を音声収録装置３へ送信する（ステップＳ１３）。次に、動画像撮影手段２１が動画像信号を取得する（ステップＳ１４）。 Next, the transmission means 25 transmits the shooting parameter information estimated in step S2 to the audio recording device 3 (step S13). Next, the moving image photographing means 21 acquires a moving image signal (step S14).

音声収録装置３では、受信手段３４が撮影パラメータ情報を受信し（ステップＳ２１）、強調パラメータ制御手段３３が、ステップＳ２１で受信された撮影パラメータ情報に応じて、音声強調パラメータを制御する（ステップＳ２２）。 In the audio recording device 3, the reception unit 34 receives the shooting parameter information (step S21), and the enhancement parameter control unit 33 controls the audio enhancement parameter according to the shooting parameter information received in step S21 (step S22). ).

次に、音声収録手段３１が音声信号を取得し（ステップＳ２３）、音声強調手段３２が、ステップＳ２３で取得された音声信号をステップＳ２２で取得された音声強調パラメータに基づいて強調処理し、強調音声信号を取得する（ステップＳ２４）。次に、送信手段３５が、ステップＳ２４で取得された強調音声信号を動画像撮影装置２へ送信し、音声収録装置３側の処理を終える。 Next, the voice recording unit 31 acquires a voice signal (step S23), and the voice enhancement unit 32 performs enhancement processing on the voice signal acquired in step S23 based on the voice enhancement parameter acquired in step S22. An audio signal is acquired (step S24). Next, the transmission means 35 transmits the emphasized sound signal acquired in step S24 to the moving image photographing apparatus 2, and the process on the sound recording apparatus 3 side is finished.

動画像撮影装置２では、受信手段２６が強調音声信号を受信し（ステップＳ１５）、動画像音声結合手段２７が、ステップ１４で取得された動画像信号と、ステップＳ１５で受信された強調音声信号を結合して、動画像音声信号を取得する（ステップＳ１６）。次に、動画像音声記憶手段２８が、ステップＳ１５で取得された動画像音声信号を記憶し（ステップＳ１７）、動画像撮影装置２側の処理を終える。 In the moving image photographing apparatus 2, the receiving unit 26 receives the enhanced sound signal (step S15), and the moving image / sound combining unit 27 receives the moving image signal acquired in step 14 and the enhanced sound signal received in step S15. Are combined to obtain a moving image audio signal (step S16). Next, the moving image / sound storage unit 28 stores the moving image / sound signal acquired in step S15 (step S17), and the processing on the moving image photographing apparatus 2 side is completed.

〈動画像音声収録システムを実現するコンピュータシステム〉
図８は、動画像撮影装置２および音声収録装置３を実現するコンピュータシステムのハードウェア構成を示すブロック図である。 <Computer system that realizes a moving image audio recording system>
FIG. 8 is a block diagram illustrating a hardware configuration of a computer system that implements the moving image capturing device 2 and the audio recording device 3.

動画像撮影装置２および音声収録装置３は、図８に示すような汎用のコンピュータシステムにより実現することができる。このコンピュータシステムは、ＣＰＵ（Central Processing Unit）１１、ＲＯＭ（Read Only Memory）１２、ＲＡＭ（Random Access Memory）１３、ＨＤＤ（Hard Disk Drive）１４、およびＩ／Ｆ（インタフェース）１５がバス１０を介して接続された構成を有し、Ｉ／Ｆ１５にはＬＣＤ（Liquid Crystal Display）などの表示部１６および操作部１７が接続されている。 The moving image photographing device 2 and the sound recording device 3 can be realized by a general-purpose computer system as shown in FIG. In this computer system, a central processing unit (CPU) 11, a read only memory (ROM) 12, a random access memory (RAM) 13, a hard disk drive (HDD) 14, and an interface (I / F) 15 are connected via a bus 10. A display unit 16 such as an LCD (Liquid Crystal Display) and an operation unit 17 are connected to the I / F 15.

ＣＰＵ１１は演算手段であり、コンピュータシステム全体の動作を制御する。ＲＯＭ１２は、読み出し専用の不揮発性記憶媒体であり、ファームウェア等のプログラムが格納されている。ＲＡＭ１３は、情報の高速な読み書きが可能な揮発性の記憶媒体であり、ＣＰＵ１１が情報を処理する際の作業領域として用いられる。ＨＤＤ１４は、情報の読み書きが可能な不揮発性の記憶媒体であり、ＯＳ（Operating System）や各種の制御プログラム、アプリケーションプログラム等が格納されている。Ｉ／Ｆ１５は、バスと各種のハードウェアやネットワーク等を接続し制御する。表示部１６は、ユーザがコンピュータシステムの状態を確認するための視覚的ユーザインタフェースである。操作部１７は、キーボードやマウス等、ユーザがコンピュータシステムに情報を入力するためのユーザインタフェースである。 The CPU 11 is a calculation means and controls the operation of the entire computer system. The ROM 12 is a read-only nonvolatile storage medium, and stores programs such as firmware. The RAM 13 is a volatile storage medium capable of reading and writing information at high speed, and is used as a work area when the CPU 11 processes information. The HDD 14 is a nonvolatile storage medium that can read and write information, and stores an OS (Operating System), various control programs, application programs, and the like. The I / F 15 connects and controls the bus and various hardware and networks. The display unit 16 is a visual user interface for the user to check the state of the computer system. The operation unit 17 is a user interface such as a keyboard and a mouse for the user to input information to the computer system.

以上詳細に説明したように、本発明の実施形態に係る動画像音声収録システム１には下記（１）〜（５）の特徴がある。
（１）音声収録手段３１に対する動画像撮影手段２１の位置、および動画像撮影手段２１の撮影方向を推定し、その推定結果に応じて、音声収録手段３１へ到来する音声の方向を特定し、音声収録手段３１の指向性を変化させるので、音声収録手段３１と動画像撮影手段２１との位置関係が変化する場合でも、良好な音声信号を取得することができる。
（２）動画像撮影手段２１の撮影方向の推定結果に応じて、撮影範囲の外側から到来する音を相対的に多く抑圧することができる。
（３）動画像撮影装置２の加速度および角加速度から、動画像撮影装置２の位置および撮影方向を推定することができる。
（４）音声収録手段３１に対する動画像撮影手段２１の基準位置、および動画像撮影手段２１の基準の撮影方向を、位置および撮影方向の推定に利用することができる。
（５）ユーザシステム利用時に、音声収録手段３１に対する動画像撮影手段２１の基準位置、および動画像撮影手段２１の基準の撮影方向を得ることができる。 As described above in detail, the moving image and sound recording system 1 according to the embodiment of the present invention has the following features (1) to (5).
(1) Estimating the position of the moving image photographing means 21 with respect to the sound recording means 31 and the photographing direction of the moving image photographing means 21, and specifying the direction of the sound arriving at the sound recording means 31 according to the estimation result; Since the directivity of the sound recording means 31 is changed, a good sound signal can be acquired even when the positional relationship between the sound recording means 31 and the moving image photographing means 21 changes.
(2) According to the estimation result of the shooting direction of the moving image shooting means 21, it is possible to suppress a relatively large amount of sound coming from outside the shooting range.
(3) From the acceleration and angular acceleration of the moving image shooting device 2, the position and shooting direction of the moving image shooting device 2 can be estimated.
(4) The reference position of the moving image photographing means 21 relative to the sound recording means 31 and the reference photographing direction of the moving image photographing means 21 can be used for estimating the position and the photographing direction.
(5) When using the user system, it is possible to obtain the reference position of the moving image photographing means 21 with respect to the sound recording means 31 and the reference photographing direction of the moving image photographing means 21.

なお、以上説明した実施形態では、動画像撮影装置２と、音声収録装置３とが別々の装置であり、かつ動画像撮影装置２が動画像撮影手段（カメラ）２１を内蔵し、音声収録装置３が音声収録手段（マイク）３１を内蔵しているが、動画像撮影装置と音声収録装置とを１つの動画像音声収録装置とし、かつ動画像撮影手段、音声収録手段を装置と別に設けるように構成してもよい。 In the embodiment described above, the moving image shooting device 2 and the sound recording device 3 are separate devices, and the moving image shooting device 2 has a built-in moving image shooting means (camera) 21, and the sound recording device. 3 includes a sound recording means (microphone) 31, but the moving image photographing device and the sound recording device are one moving image sound recording device, and the moving image photographing means and the sound recording means are provided separately from the device. You may comprise.

１…動画像音声収録システム、２…動画像撮影装置、３…音声収録装置、２１…動画像撮影手段、２２…基準点認識手段、２３…装置状態取得手段、２４…撮影パラメータ取得手段、３１…音声収録手段、３２…音声強調手段、３３…強調パラメータ制御手段。 DESCRIPTION OF SYMBOLS 1 ... Moving image audio recording system, 2 ... Moving image imaging device, 3 ... Audio recording device, 21 ... Moving image imaging means, 22 ... Reference point recognition means, 23 ... Apparatus state acquisition means, 24 ... Shooting parameter acquisition means, 31 ... voice recording means, 32 ... voice emphasis means, 33 ... emphasis parameter control means.

特開２０１１−４１０９６号公報JP 2011-41096 A

Claims

Moving image acquisition means for shooting a subject and acquiring a moving image signal;
A sound acquisition means for recording sound and acquiring a sound signal;
Shooting parameter acquisition means for acquiring information indicating the shooting direction of the moving image acquisition means, and information indicating the positional relationship between the moving image acquisition means and the sound acquisition means;
Sound enhancement means for enhancing a sound signal in a predetermined direction among the sound signals acquired by the sound acquisition means based on the information acquired by the imaging parameter acquisition means;
Video sound recording system.

In the moving image sound recording system according to claim 1,
The moving image sound recording system, wherein the sound emphasizing unit suppresses a relatively large amount of sound coming from outside the imaging range estimated from the information acquired by the imaging parameter acquisition unit.

In the moving image sound recording system according to claim 1,
The moving image sound recording system, wherein the information indicating the positional relationship is information indicating a position of the moving image acquisition unit with respect to the sound acquisition unit.

In the moving image sound recording system according to claim 1,
Apparatus status acquisition means for acquiring information representing the state of movement of the moving image acquisition means,
The moving image sound recording system, wherein the shooting parameter acquisition unit acquires information representing a positional relationship between the moving image acquisition unit and the sound acquisition unit using the information acquired by the apparatus state acquisition unit.

In the moving image sound recording system according to claim 4,
The moving image sound recording system, wherein the information representing the state of movement is information representing acceleration and angular acceleration.

In the moving image sound recording system according to claim 4 or 5,
Reference point recognizing means for determining whether or not the coordinates of a predetermined reference point in the space match the predetermined coordinates on the image from the moving image signal acquired by the moving image acquisition means,
The shooting parameter acquisition unit is information indicating the shooting direction when it is determined that the reference point recognition unit matches and the information indicating the state of movement generated by the device state acquisition unit indicates stillness. A moving image sound recording system that acquires a reference value of information and a reference value of information representing positional relationship.

In the moving image sound recording system according to claim 6,
Until the shooting parameter acquisition unit acquires the reference value of the information indicating the shooting direction and the reference value of the information indicating the positional relationship, the shooting parameter acquisition unit includes a unit that performs display for guiding the adjustment of the position and orientation of the moving image acquisition unit. Video sound recording system.

Moving image acquisition means for shooting a subject and acquiring a moving image signal;
A sound acquisition means for recording sound and acquiring a sound signal;
Shooting parameter acquisition means for acquiring information indicating the shooting direction of the moving image acquisition means, and information indicating the positional relationship between the moving image acquisition means and the sound acquisition means;
Sound enhancement means for enhancing a sound signal in a predetermined direction among the sound signals acquired by the sound acquisition means based on the information acquired by the imaging parameter acquisition means;
A moving image sound recording apparatus.

A moving image sound recording program for processing by a computer the moving image signal acquired by the moving image acquisition means and the sound signal acquired by the sound acquisition means,
An imaging parameter acquisition unit for acquiring information indicating a shooting direction of the moving image acquisition unit and information indicating a positional relationship between the moving image acquisition unit and the sound acquisition unit;
A moving image sound recording program for functioning as sound enhancement means for enhancing a sound signal in a predetermined direction among the sound signals acquired by the sound acquisition means based on the information acquired by the shooting parameter acquisition means .

A moving image acquisition step of shooting a subject and acquiring a moving image signal;
A sound acquisition step for recording sound and acquiring a sound signal;
An imaging parameter acquisition step for acquiring information indicating a shooting direction in the moving image acquisition step, and information indicating a relationship between a shooting position in the moving image acquisition step and a recording position in the sound acquisition step;
Based on the information acquired by the imaging parameter acquisition step, among the sound signals acquired by the sound acquisition step, a sound enhancement step of enhancing a sound signal in a predetermined direction;
A moving image sound recording method.