JP2017175598A

JP2017175598A - Sound collecting device and sound collecting method

Info

Publication number: JP2017175598A
Application number: JP2016239974A
Authority: JP
Inventors: 佑樹野田; Yuki Noda; 渡部　康; Yasushi Watabe; 康渡部; 畠山　武士; Takeshi Hatakeyama; 武士畠山
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-03-22
Filing date: 2016-12-12
Publication date: 2017-09-28

Abstract

PROBLEM TO BE SOLVED: To provide a sound collecting device and a sound collecting method capable of effectively collecting sound at a specified position.SOLUTION: The sound collecting device includes: a sound collecting position input interface for inputting a piece of information of sound collecting position of a voice, which represents a sound collecting position; a voice interface for inputting voice signals voices collected by two or more sound collecting parts which are disposed at predetermined positions; a record medium that stores a piece of sound collecting part disposition information, which represents a piece of disposition information of the sound collecting parts; and a controller that selects voice signals which are output by the sound collecting parts based on the information of the sound collecting positions and the sound collecting part disposition information.SELECTED DRAWING: Figure 3

Description

本開示は、音声を収音する収音装置および収音方法に関する。 The present disclosure relates to a sound collection device and a sound collection method for collecting sound.

特許文献１は、撮像した画像に対して特定の被写体を指定すると、複数の音声信号から被写体に対応する合成音声信号を生成する構成を開示する。この構成により、指定された被写体の音声のみを強調して再生することができる。 Patent Document 1 discloses a configuration in which when a specific subject is specified for a captured image, a synthesized audio signal corresponding to the subject is generated from a plurality of audio signals. With this configuration, it is possible to emphasize and reproduce only the sound of the designated subject.

特開２００８−１９３１９６号公報JP 2008-193196 A

本開示は、指定した位置における音声の収音に有効な収音装置および収音方法を提供する。 The present disclosure provides a sound collection device and a sound collection method effective for collecting sound at a specified position.

本開示における収音装置は、音声の収音位置を示す収音位置情報を入力する収音位置入力インタフェースと、あらかじめ所定位置に配置された２つ以上の収音部が収音する音声信号を入力する音声インタフェースと、収音部の配置情報を示す収音部配置情報を記憶する記録媒体と、収音位置情報と収音部配置情報とに基づいて、収音部が出力する音声信号を選択するコントローラと、を備える。 The sound collection device according to the present disclosure includes a sound collection position input interface for inputting sound collection position information indicating a sound collection position of the sound, and a sound signal collected by two or more sound collection units arranged in advance at a predetermined position. Based on the input audio interface, the recording medium storing the sound collection unit arrangement information indicating the arrangement information of the sound collection unit, the sound signal output by the sound collection unit based on the sound collection position information and the sound collection unit arrangement information A controller to be selected.

また、本開示における収音方法は、音声の収音位置を示す収音位置情報を入力する第１ステップと、あらかじめ所定位置に配置された収音部が収音する２つ以上の音声信号を入力する第２ステップと、収音位置情報と収音部の配置情報を示す収音部配置情報とに基づいて、収音部が出力する音声信号を選択する第３ステップと、を備える。 In addition, the sound collection method according to the present disclosure includes a first step of inputting sound collection position information indicating a sound collection position of sound, and two or more sound signals collected by a sound collection unit arranged in advance at a predetermined position. A second step of inputting, and a third step of selecting an audio signal output by the sound collection unit based on the sound collection position information and the sound collection unit arrangement information indicating the arrangement information of the sound collection unit.

本開示における収音装置および収音方法は、指定した位置における音声の収音に有効である。 The sound collection device and the sound collection method according to the present disclosure are effective for collecting sound at a designated position.

実施の形態１における収音装置を備える収音システムの構成を示すブロック図1 is a block diagram illustrating a configuration of a sound collection system including a sound collection device according to Embodiment 1. FIG. 実施の形態１における収音位置とマイクロホンの位置とカメラの撮像範囲との関係の一例を示す俯瞰図FIG. 3 is an overhead view showing an example of a relationship among a sound collection position, a microphone position, and an imaging range of a camera in the first embodiment. 実施の形態１における収音装置の処理を示すフローチャートThe flowchart which shows the process of the sound collection device in Embodiment 1. 実施の形態２における座席とマイクロホンの位置との関係の一例を示す俯瞰図An overhead view showing an example of the relationship between the seat and the position of the microphone in the second embodiment 実施の形態３における収音位置とマイクロホンの位置と障害物の配置との関係の一例を示す俯瞰図An overhead view showing an example of a relationship between a sound collection position, a microphone position, and an obstacle arrangement in the third embodiment 実施の形態３における収音装置の処理を示すフローチャートFlowchart showing the processing of the sound collection device in the third embodiment

以下、適宜図面を参照しながら、実施の形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.

なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために、提供されるのであって、これらにより特許請求の範囲に記載の主題を限定することは意図されていない。 The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter described in the claims.

（実施の形態１）
１．収音システム及び収音装置の構成
図１は、本実施の形態における収音装置を備える収音システムの構成を示すブロック図である。図１において、収音システムは、収音装置１００と、クライアント２００と、カメラ３０１ａと、カメラ３０１ｂと、マイクロホン３０２ａと、マイクロホン３０２ｂとを備える。ここで、カメラ３０１ａと、カメラ３０１ｂとを、まとめてカメラ（撮像部）３０１と呼ぶ。本実施の形態では、カメラ３０１を２つ備える構成について説明するが、カメラ３０１は１つ以上であればよい。また、マイクロホン３０２ａと、マイクロホン３０２ｂとを、まとめてマイクロホン（収音部）３０２と呼ぶ。本実施の形態では、マイクロホン３０２を２つ備える構成について説明するが、マイクロホン３０２は２つ以上であればよい。 (Embodiment 1)
1. Configuration of Sound Collection System and Sound Collection Device FIG. 1 is a block diagram illustrating a configuration of a sound collection system including the sound collection device according to the present embodiment. In FIG. 1, the sound collection system includes a sound collection device 100, a client 200, a camera 301a, a camera 301b, a microphone 302a, and a microphone 302b. Here, the camera 301 a and the camera 301 b are collectively referred to as a camera (imaging unit) 301. In this embodiment, a configuration including two cameras 301 is described; however, one or more cameras 301 may be used. In addition, the microphone 302 a and the microphone 302 b are collectively referred to as a microphone (sound collecting unit) 302. In this embodiment mode, a configuration including two microphones 302 is described; however, it is sufficient that the number of microphones 302 is two or more.

収音装置１００は、ＣＰＵ（コントローラ）１０１と、メモリ（記録媒体）１０２と、ネットワークインタフェース（収音位置入力インタフェース）１０３と、映像インタフェース１０４と、音声インタフェース１０５とを備える。 The sound collection device 100 includes a CPU (controller) 101, a memory (recording medium) 102, a network interface (sound collection position input interface) 103, a video interface 104, and an audio interface 105.

ＣＰＵ１０１は、メモリ１０２に記憶されたコンピュータプログラムを実行し、収音に使用するマイクロホン３０２のうち、最適なマイクロホンの選択を行う。この選択により、マイクロホン３０２が出力する音声データの選択を行う。なお、最適なマイクロホンの選択手法については、後述する。 The CPU 101 executes a computer program stored in the memory 102 and selects an optimum microphone among the microphones 302 used for sound collection. By this selection, the audio data output from the microphone 302 is selected. An optimum microphone selection method will be described later.

メモリ（記録媒体）１０２は、収音システムの配置される空間内に任意に設定される座標系における、カメラ３０１及びマイクロホン３０２の配置される位置座標を記憶している。なお、マイクロホン３０２の位置座標は、マイクロホン３０２の配置情報を示す収音部配置情報の一例である。また、カメラ３０１の位置座標は、カメラ３０１の配置を示す撮像部配置情報の一例である。 The memory (recording medium) 102 stores the position coordinates where the camera 301 and the microphone 302 are arranged in a coordinate system arbitrarily set in the space where the sound collection system is arranged. The position coordinates of the microphone 302 are an example of sound collection unit arrangement information indicating arrangement information of the microphone 302. The position coordinates of the camera 301 are an example of image capturing unit arrangement information indicating the arrangement of the camera 301.

この他に、メモリ１０２には、所定の空間内の１つ以上の特定位置を示す特定位置情報が記憶される。特定位置情報には、収音位置、所定の空間内に配置された座席の位置等がある（詳細は、後述する）。 In addition, the memory 102 stores specific position information indicating one or more specific positions in a predetermined space. Specific position information includes a sound collection position, a position of a seat arranged in a predetermined space, and the like (details will be described later).

ネットワークインタフェース（収音位置入力インタフェース）１０３は、収音装置１００がクライアント２００との通信を行うためのインタフェースである。ネットワークインタフェース１０３には、音声の収音位置を示す収音位置情報が入力される。具体的には、ネットワークインタフェース１０３は、クライアント２００から、音声の収音位置を示す映像データ上の画素位置（収音位置情報の一例）を受信し、ＣＰＵ１０１へ送る。また、ネットワークインタフェース１０３は、ＣＰＵ１０１から送られる映像データ及び音声データをクライアント２００に対して送信する。 The network interface (sound collection position input interface) 103 is an interface for the sound collection device 100 to communicate with the client 200. The network interface 103 receives sound collection position information indicating a sound collection position. Specifically, the network interface 103 receives a pixel position (an example of sound collection position information) on the video data indicating the sound collection position of the audio from the client 200 and sends it to the CPU 101. The network interface 103 transmits video data and audio data sent from the CPU 101 to the client 200.

映像インタフェース１０４は、カメラ３０１とＣＰＵ１０１とを接続するインタフェースである。具体的には、映像インタフェース１０４は、あらかじめ所定位置に配置された所定範囲の映像を撮影する１つ以上のカメラ（撮像部）３０１と接続されている。 The video interface 104 is an interface for connecting the camera 301 and the CPU 101. Specifically, the video interface 104 is connected to one or more cameras (imaging units) 301 that capture a predetermined range of video that is arranged in advance at a predetermined position.

音声インタフェース１０５は、マイクロホン３０２とＣＰＵ１０１とを接続するインタフェースである。音声インタフェース１０５には、あらかじめ所定位置に配置されたマイクロホン（収音部）３０２が収音する２つの音声信号が入力される。 The audio interface 105 is an interface for connecting the microphone 302 and the CPU 101. Two audio signals picked up by a microphone (sound pickup unit) 302 arranged in advance at a predetermined position are input to the audio interface 105.

クライアント２００は、ＣＰＵ２０１と、メモリ２０２と、ネットワークインタフェース２０３と、入出力インタフェース２０６とを備える。クライアント２００は、ネットワークインタフェース２０３を介して収音装置１００と接続される。 The client 200 includes a CPU 201, a memory 202, a network interface 203, and an input / output interface 206. The client 200 is connected to the sound collection device 100 via the network interface 203.

入出力インタフェース２０６は、例えば、ディスプレイ２０６ａとタッチパネル２０６ｂとスピーカ２０６ｃとから構成される。ディスプレイ２０６ａは、カメラ３０１で撮像された映像データを、収音装置１００を介して受信して表示することができる。ユーザは、ディスプレイ２０６ａ上に貼付されているタッチパネル２０６ｂを使用して、映像中の特定の位置を指定することができる。スピーカ２０６ｃは、マイクロホン３０２で収音された音声データを、収音装置１００を介して受信して再生することができる。 The input / output interface 206 includes, for example, a display 206a, a touch panel 206b, and a speaker 206c. The display 206 a can receive and display the video data captured by the camera 301 via the sound collection device 100. The user can designate a specific position in the video using the touch panel 206b attached on the display 206a. The speaker 206c can receive and reproduce the audio data collected by the microphone 302 via the sound collection device 100.

カメラ３０１は、例えば航空機の客室内の天井部に設置される。カメラ３０１は、撮像した映像データを収音装置１００に伝送する。 The camera 301 is installed, for example, on a ceiling portion in an aircraft cabin. The camera 301 transmits the captured video data to the sound collection device 100.

マイクロホン３０２は、例えば航空機の客室内の天井部に設置される。マイクロホン３０２は、収音した音声データを収音装置１００に伝送する。 The microphone 302 is installed, for example, on a ceiling in an aircraft cabin. The microphone 302 transmits the collected sound data to the sound collection device 100.

なお、ＣＰＵ１０１は、コントローラの一例である。メモリ１０２は、記録媒体の一例である。ネットワークインタフェース１０３は、音声の収音位置を示す収音位置情報を入力する収音位置入力インタフェースの一例である。カメラ３０１は、撮像部の一例である。マイクロホン３０２は、収音部の一例である。 The CPU 101 is an example of a controller. The memory 102 is an example of a recording medium. The network interface 103 is an example of a sound collection position input interface that inputs sound collection position information indicating a sound collection position. The camera 301 is an example of an imaging unit. The microphone 302 is an example of a sound collection unit.

２．収音装置の動作
以上のように構成された収音装置１００について、その動作を以下説明する。 2. Operation of Sound Collection Device The operation of the sound collection device 100 configured as described above will be described below.

図２は、収音位置と、マイクロホン３０２の位置と、カメラ３０１の撮像範囲との関係の一例を示す俯瞰図である。図２において、カメラ３０１は、例えば航空機の客室内の天井部に設置される。そのため、その撮像範囲は図２に示す枠内のようになる。図２の撮像範囲の中央付近には、上から見た乗客の頭と鼻が映っている。映った頭と鼻の位置から、乗客は図中の左方向を向いていることが分かる。すなわち、乗客は、マイクロホン３０２ａを正面に、マイクロホン３０２ｂを背面にして、立っている（あるいは座っている）と考えられる。ここで、撮像範囲内の任意の位置として収音位置Ｐが指定される。図２において、収音位置Ｐは、乗客がいる位置を示す。 FIG. 2 is an overhead view showing an example of the relationship between the sound collection position, the position of the microphone 302, and the imaging range of the camera 301. In FIG. 2, the camera 301 is installed on the ceiling in an aircraft cabin, for example. Therefore, the imaging range is within the frame shown in FIG. The passenger's head and nose as seen from above are shown near the center of the imaging range in FIG. From the position of the head and nose, it can be seen that the passenger is facing left in the figure. That is, the passenger is considered to be standing (or sitting) with the microphone 302a in front and the microphone 302b in the back. Here, the sound collection position P is designated as an arbitrary position within the imaging range. In FIG. 2, the sound collection position P indicates a position where a passenger is present.

マイクロホン３０２ａは、収音位置Ｐ（乗客がいる位置）から距離Ｄａだけ離れた図中の位置に配置されており、その位置座標はマイク位置Ｐａとして表される。すなわち、マイクロホン３０２ａは、乗客の前方であり乗客から距離Ｄａだけ離れた場所Ｐａに位置する。 The microphone 302a is arranged at a position in the figure separated from the sound collection position P (position where the passenger is present) by a distance Da, and the position coordinate is represented as a microphone position Pa. That is, the microphone 302a is located at a location Pa that is in front of the passenger and is separated from the passenger by a distance Da.

マイクロホン３０２ｂは、収音位置Ｐから距離Ｄｂだけ離れた図中の位置に配置されており、その位置座標はマイク位置Ｐｂとして表される。すなわち、マイクロホン３０２ｂは、乗客の後方であり乗客から距離Ｄｂだけ離れた場所Ｐｂに位置する。なお、マイクロホン３０２は、カメラ３０１の撮像範囲外に配置されていてもよい。 The microphone 302b is disposed at a position in the figure separated from the sound collection position P by a distance Db, and the position coordinate is represented as a microphone position Pb. That is, the microphone 302b is located at a place Pb behind the passenger and separated from the passenger by a distance Db. Note that the microphone 302 may be disposed outside the imaging range of the camera 301.

以下では、上述のような配置関係にある場合を例にとって、本実施の形態における収音装置の動作を説明する。 In the following, the operation of the sound collection device in the present embodiment will be described by taking as an example the case where there is an arrangement relationship as described above.

図３は、本実施の形態における収音装置１００の処理を示すフローチャートである。 FIG. 3 is a flowchart showing processing of the sound collection device 100 according to the present embodiment.

図３において、最初に、収音装置１００は、クライアント２００から収音位置（航空機内のある乗客のいる場所）の指定を受け付けると、収音位置Ｐを算出する（Ｓ１０１）。 In FIG. 3, first, the sound collection device 100 calculates a sound collection position P when receiving a designation of a sound collection position (a place where a certain passenger is present in the aircraft) from the client 200 (S <b> 101).

収音位置Ｐは、例えば以下のように指定される。クライアント２００内の入出力インタフェース２０６は、ディスプレイ２０６ａ及びタッチパネル２０６ｂを有する（図１参照）。ディスプレイ２０６ａは、収音装置１００を経由して、カメラ３０１が撮像する映像データを受信して表示する。そして、ユーザが、タッチパネル２０６ｂを用いて、ディスプレイ２０６ａに表示された映像内の任意の位置の指定を行う。すると、クライアント２００内のＣＰＵ２０１は、映像内の指定された位置に対応する画素位置（収音位置情報の一例）を算出する。そして、算出した画素位置を、ネットワークインタフェース２０３を介して、収音装置１００に伝送する。これにより、収音装置１００内のＣＰＵ１０１が、カメラ３０１で撮像された映像データとカメラ位置とに基づいて、収音システムの配置される空間内に任意に設定された座標系に対する座標として、映像内の画素位置から収音位置Ｐを算出する。 The sound collection position P is specified as follows, for example. The input / output interface 206 in the client 200 includes a display 206a and a touch panel 206b (see FIG. 1). The display 206 a receives and displays video data captured by the camera 301 via the sound collection device 100. Then, the user designates an arbitrary position in the video displayed on the display 206a using the touch panel 206b. Then, the CPU 201 in the client 200 calculates a pixel position (an example of sound collection position information) corresponding to the designated position in the video. Then, the calculated pixel position is transmitted to the sound collection device 100 via the network interface 203. As a result, the CPU 101 in the sound collection device 100 uses the video data captured by the camera 301 and the camera position as the coordinates for the coordinate system arbitrarily set in the space where the sound collection system is arranged. The sound collection position P is calculated from the pixel position inside.

次に、収音装置１００内のＣＰＵ（コントローラ）１０１は、映像内の指定された位置から一定の距離範囲内に人物（乗客）が存在するか否かを判定する。なお、一定の距離とは、通常数センチメートルから数メートル程度の距離である。 Next, the CPU (controller) 101 in the sound collection device 100 determines whether or not a person (passenger) exists within a certain distance range from a specified position in the video. The fixed distance is usually a distance of several centimeters to several meters.

人物が存在するか否かの判定は、以下のように行われる。ＣＰＵ１０１は、撮像範囲に映った画像から、人物（乗客）の頭及び鼻の位置を認識する。そして、ＣＰＵ１０１は、人物が存在すると判定した場合、その人物の頭及び鼻の位置を認識することにより、人物が向いている方向を示す顔面方向Ｘを被収音方向として算出する（Ｓ１０２）。ここで被収音方向とは、収音したい音声を発音する対象が音声を発音する方向、言い換えると収音される音声が発音される方向である。すなわち、人物（乗客）の場合、人物が向いている方向である顔面方向Ｘに向かって発声するため、顔面方向Ｘが被収音方向として特定される。 Determination of whether or not a person exists is performed as follows. The CPU 101 recognizes the position of the person's (passenger's) head and nose from the image shown in the imaging range. If the CPU 101 determines that there is a person, the CPU 101 recognizes the positions of the person's head and nose, and calculates the face direction X indicating the direction in which the person is facing as the sound collection direction (S102). Here, the sound collection direction is a direction in which a target that produces sound to be collected produces sound, in other words, a direction in which collected sound is produced. That is, in the case of a person (passenger), since the utterance is made in the face direction X, which is the direction in which the person is facing, the face direction X is specified as the sound collection direction.

さらに、ＣＰＵ１０１は、収音位置Ｐと顔面方向Ｘとを基準とする相対的な位置に基づいて、マイクロホン３０２毎の重み係数を算出する（Ｓ１０３）。ここで、図２を用いて、本実施の形態における重み係数の決定方法を説明する。 Further, the CPU 101 calculates a weighting factor for each microphone 302 based on the relative position based on the sound collection position P and the face direction X (S103). Here, the determination method of the weighting coefficient in this Embodiment is demonstrated using FIG.

ＣＰＵ１０１は、収音位置Ｐを中心とし、顔面方向Ｘが示す方向を基準として、−θ〜＋θの角度範囲内に存在するマイクロホン３０２に対しては、重み係数ｗ１を与える。またＣＰＵ１０１は、−θ〜＋θの角度範囲外に存在するマイクロホン３０２に対しては、ｗ１よりも大きな値を持つ重み係数ｗ２を与える。すなわち、図２に示す配置関係の場合、−θ〜＋θの角度範囲内に存在するマイクロホン３０２ａに対してはｗ１、−θ〜＋θの角度範囲外に存在するマイクロホン３０２ｂに対してはｗ２が与えられる。 The CPU 101 gives a weighting coefficient w1 to the microphone 302 existing within the angle range of −θ to + θ with the sound collection position P as the center and the direction indicated by the face direction X as a reference. Further, the CPU 101 gives a weight coefficient w2 having a value larger than w1 to the microphone 302 existing outside the angle range of −θ to + θ. That is, in the case of the arrangement relationship shown in FIG. 2, w1 is given to the microphone 302a existing within the angle range of −θ to + θ, and w2 is given to the microphone 302b existing outside the angle range of −θ to + θ. It is done.

角度θ及び重み係数ｗ１及びｗ２の値は、例えば、人物より発話された音声の音量レベルの減衰量と、人物の顔面方向に対する角度との依存関係に関する知見に基づいて事前情報として決定される。そして、角度θ及び重み係数ｗ１及びｗ２の値は、ＣＰＵ１０１の処理パラメータとして収音装置１００内のメモリ１０２に記憶される。 The values of the angle θ and the weighting factors w1 and w2 are determined as prior information based on, for example, knowledge about the dependence relationship between the volume level attenuation of the speech uttered by the person and the angle with respect to the person's face direction. The values of the angle θ and the weighting factors w1 and w2 are stored in the memory 102 in the sound collection device 100 as processing parameters of the CPU 101.

次に、メモリ（記録媒体）１０２に記憶されているマイク位置Ｐａ及びＰｂの座標と、算出された収音位置Ｐの座標とに基づいて、ＣＰＵ１０１は、収音位置Ｐからそれぞれのマイクロホン３０２ａ及び３０２ｂまでの直線距離Ｄａ及びＤｂを算出する（Ｓ１０４）。 Next, based on the coordinates of the microphone positions Pa and Pb stored in the memory (recording medium) 102 and the calculated coordinates of the sound pickup position P, the CPU 101 starts the microphones 302a and Linear distances Da and Db up to 302b are calculated (S104).

さらに、ＣＰＵ１０１は、マイクロホン３０２毎に決定された重み係数ｗ１及びｗ２と、算出された直線距離Ｄａ及びＤｂとに基づいて、収音位置Ｐからマイクロホン３０２ａ及び３０２ｂのそれぞれまでの重み付き距離Ｄｗａ及びＤｗｂを算出する（Ｓ１０５）。 Further, the CPU 101, based on the weighting factors w1 and w2 determined for each microphone 302 and the calculated linear distances Da and Db, weighted distances Dwa from the sound collection position P to the microphones 302a and 302b, and Dwb is calculated (S105).

ここで、重み付き距離Ｄｗａ及びＤｗｂは、例えば、以下の関係式により算出される。
Ｄｗａ＝Ｄａ×ｗ１
Ｄｗｂ＝Ｄｂ×ｗ２
最後に、ＣＰＵ１０１は、算出された収音位置Ｐからそれぞれのマイクロホン３０２までの重み付き距離が最小となるマイクロホン３０２に対応する音声信号を選択する（Ｓ１０６）。すなわち、ＣＰＵ（コントローラ）１０１は、収音位置情報が示す位置と各収音部配置情報が示す位置との相対位置関係で決定される重み係数と、収音位置Ｐと各収音部との直線距離とに基づいて算出される重み付き距離が最も小さくなる収音部に対応する音声信号を選択する。 Here, the weighted distances Dwa and Dwb are calculated by the following relational expression, for example.
Dwa = Da × w1
Dwb = Db × w2
Finally, the CPU 101 selects an audio signal corresponding to the microphone 302 that minimizes the weighted distance from the calculated sound collection position P to each microphone 302 (S106). That is, the CPU (controller) 101 determines the weight coefficient determined by the relative positional relationship between the position indicated by the sound collection position information and the position indicated by each sound collection unit arrangement information, and the sound collection position P and each sound collection unit. The audio signal corresponding to the sound collection unit having the smallest weighted distance calculated based on the straight line distance is selected.

選択された音声信号は、例えばクライアント２００に伝送される。そして、入出力インタフェース２０６が有するスピーカ２０６ｃにより再生されることにより、ユーザが聴取できる。これにより、収音装置１００の処理が終了する。 The selected audio signal is transmitted to the client 200, for example. Then, it is reproduced by the speaker 206c included in the input / output interface 206 so that the user can listen. Thereby, the process of the sound collection apparatus 100 is complete | finished.

以上説明したように、本実施の形態の収音装置１００は、顔面方向Ｘを基準として、−θ〜＋θの角度範囲内に存在するマイクロホン３０２に対しては、重み係数ｗ１を与える。また、−θ〜＋θの角度範囲外に存在するマイクロホン３０２に対しては、ｗ１よりも大きな値を持つ重み係数ｗ２を与える。このため、−θ〜＋θの角度範囲内に存在するマイクロホン３０２の重み付き距離は相対的に小さな値となりやすい。 As described above, the sound collection device 100 according to the present embodiment gives the weight coefficient w1 to the microphone 302 existing in the angle range of −θ to + θ with the face direction X as a reference. A weight coefficient w2 having a value larger than w1 is given to the microphone 302 existing outside the angle range of −θ to + θ. For this reason, the weighted distance of the microphone 302 existing within the angle range of −θ to + θ tends to be a relatively small value.

すなわち、ＣＰＵ１０１は、収音位置情報が示す位置と各収音部配置情報が示す位置との相対位置関係で決定される重み係数（ｗ１、ｗ２）と、収音位置Ｐと各マイクロホン（収音部）３０２との直線距離（Ｄａ、Ｄｂ）とに基づいて算出される重み付き距離（Ｄｗａ、Ｄｗｂ）が最も小さくなる収音部に対応する音声信号を選択する。 That is, the CPU 101 determines the weight coefficient (w1, w2) determined by the relative positional relationship between the position indicated by the sound collection position information and the position indicated by each sound collection unit arrangement information, the sound collection position P, and each microphone (sound collection). Part) 302 is selected based on the straight line distance (Da, Db) and the sound signal corresponding to the sound collecting part having the smallest weighted distance (Dwa, Dwb).

このような選択手法により、−θ〜＋θの角度範囲内に存在するマイクロホン３０２、すなわち人間の発話する音声の音量レベル減衰量の小さな顔面方向Ｘに存在するマイクロホン３０２が選択されやすくなる。従って、本実施の形態の収音装置１００は、指定した位置における音声の収音を効果的に行うことができる。 By such a selection method, the microphone 302 that exists in the angle range of −θ to + θ, that is, the microphone 302 that exists in the face direction X where the volume level attenuation amount of the speech uttered by a human is small can be easily selected. Therefore, the sound collection device 100 according to the present embodiment can effectively collect the sound at the designated position.

なお、重み係数ｗ１及びｗ２は、例えば−θ〜＋θの角度範囲外に存在するマイクロホン３０２に対して与えられる重み係数ｗ２の値を、ｗ１に対して非常に大きな値として設定してもよい。この場合、−θ〜＋θの角度範囲外に存在するマイクロホン３０２に対応する重み付き距離は、−θ〜＋θの角度範囲内に存在するマイクロホン３０２に対応する重み付き距離に対して十分に大きくなる。このため、重み付き距離が最小となるマイクロホン３０２を選択するときに、−θ〜＋θの角度範囲外に存在するマイクロホン３０２は選択されないようにすることができる。すなわちＣＰＵ１０１は、収音位置Ｐを基準として被収音方向（顔面方向Ｘ）を中心とした所定の角度範囲（−θ〜＋θ）内に存在するマイクロホン３０２のうちで、収音位置情報と収音部配置情報とに基づいて算出される収音位置Ｐと各マイクロホン３０２との直線距離が最も小さなマイクロホン３０２に対応する音声信号を選択する。 For the weighting factors w1 and w2, for example, the value of the weighting factor w2 given to the microphone 302 existing outside the angle range of −θ to + θ may be set as a very large value with respect to w1. In this case, the weighted distance corresponding to the microphone 302 existing outside the angle range of −θ to + θ is sufficiently larger than the weighted distance corresponding to the microphone 302 existing within the angle range of −θ to + θ. . For this reason, when selecting the microphone 302 having the smallest weighted distance, the microphone 302 existing outside the angle range of −θ to + θ can be prevented from being selected. That is, the CPU 101 collects the sound collection position information and the sound collection information among the microphones 302 existing within a predetermined angle range (−θ to + θ) centered on the sound collection direction (face direction X) with respect to the sound collection position P. The sound signal corresponding to the microphone 302 having the smallest linear distance between the sound collection position P calculated based on the sound part arrangement information and each microphone 302 is selected.

また、重み係数ｗ１及びｗ２を使用せずに、ＣＰＵ１０１が、収音位置Ｐを基準として被収音方向（顔面方向Ｘ）を中心とした所定の角度範囲（−θ〜＋θ）にあるマイクロホン３０２のうちで、収音位置Ｐと各マイクロホン３０２との直線距離が最も小さなマイクロホン３０２に対応する音声信号を選択するようにしてもよい。 Further, without using the weighting factors w1 and w2, the CPU 101 causes the microphone 302 to be within a predetermined angle range (−θ to + θ) with the sound pickup direction (face direction X) as the center with respect to the sound pickup position P. Of these, an audio signal corresponding to the microphone 302 having the smallest linear distance between the sound collection position P and each microphone 302 may be selected.

上記した手法により、最適なマイクロホンが選択され、指定した位置における収音が効果的に行われる。 By the above-described method, an optimum microphone is selected, and sound collection at a designated position is effectively performed.

なお、収音システムの配置される空間内に任意に設定される座標系は、高さ情報を無視した２次元座標系でもよいし、高さ情報を考慮した３次元座標系でもよい。３次元座標系の場合、収音位置の高さは２つ以上のカメラ３０１が撮像する映像データと、メモリ１０２が記憶しているカメラ３０１の位置座標とに基づいて算出される。 The coordinate system arbitrarily set in the space in which the sound collection system is arranged may be a two-dimensional coordinate system ignoring height information or a three-dimensional coordinate system considering height information. In the case of a three-dimensional coordinate system, the height of the sound collection position is calculated based on video data captured by two or more cameras 301 and the position coordinates of the cameras 301 stored in the memory 102.

３．効果等
以上のように、本実施の形態において、収音装置１００は、音声の収音位置を示す収音位置情報を入力するネットワークインタフェース（収音位置入力インタフェース）１０３と、あらかじめ所定位置に配置されたマイクロホン３０２が収音する２つ以上の音声信号を入力する音声インタフェース１０５と、マイクロホン３０２の配置情報を示す収音部配置情報を記憶するメモリ（記録媒体）１０２と、収音位置情報と収音部配置情報とに基づいて、マイクロホン（収音部）３０２が出力する音声信号を選択するＣＰＵ（コントローラ）１０１と、を備える。 3. Effects As described above, in the present embodiment, the sound collection device 100 is arranged at a predetermined position in advance with the network interface (sound collection position input interface) 103 for inputting the sound collection position information indicating the sound collection position of the sound. An audio interface 105 for inputting two or more audio signals collected by the microphone 302, a memory (recording medium) 102 for storing sound collection unit arrangement information indicating arrangement information of the microphone 302, sound collection position information, And a CPU (controller) 101 that selects an audio signal output from the microphone (sound collecting unit) 302 based on the sound collecting unit arrangement information.

これによりＣＰＵ１０１は、指定した位置における音声の収音をしやすいマイクロホン３０２を選択しやすくなる。 As a result, the CPU 101 can easily select the microphone 302 that easily collects sound at the designated position.

（実施の形態２）
図３及び図４を用いて実施の形態２を説明する。 (Embodiment 2)
The second embodiment will be described with reference to FIGS. 3 and 4.

図４は、座席４０１ａ、４０１ｂ及び４０１ｃとマイクロホン（収音部）３０２の位置との関係の一例を示す俯瞰図（配置図の一部）である。実施の形態２の収音システムが実施の形態１と異なる部分は、図３に示す収音装置におけるマイク選択の処理における、収音位置Ｐの算出方法（ステップＳ１０１）及び被収音方向の算出方法（ステップＳ１０２）である。他の部分の構成及び動作については、実施の形態１と同様（図１参照）であるので、詳細な説明は省略する。 FIG. 4 is an overhead view (part of the layout diagram) showing an example of the relationship between the seats 401 a, 401 b and 401 c and the position of the microphone (sound collecting unit) 302. The sound collection system of the second embodiment is different from the first embodiment in the sound collection position P calculation method (step S101) and the sound collection direction calculation in the microphone selection process in the sound collection apparatus shown in FIG. It is a method (step S102). Since the configuration and operation of the other parts are the same as in the first embodiment (see FIG. 1), detailed description is omitted.

まず、本実施の形態における収音装置１００の、収音位置Ｐの算出方法（ステップＳ１０１）について説明する。本実施の形態におけるクライアント２００内の入出力インタフェース２０６は、ディスプレイ２０６ａとタッチパネル２０６ｂとを有する（図１参照）。クライアント２００内のメモリ２０２は、空間内の所定の位置に配置された座席の配置図のデータを記憶している。 First, the calculation method (step S101) of the sound collection position P of the sound collection device 100 in the present embodiment will be described. The input / output interface 206 in the client 200 in this embodiment includes a display 206a and a touch panel 206b (see FIG. 1). A memory 202 in the client 200 stores data of a layout of seats arranged at predetermined positions in the space.

ディスプレイ２０６ａは、この配置図を表示する。ユーザは、ディスプレイ２０６ａに表示される配置図を見ながら、タッチパネルを用いて任意の座席を指定することができる。ユーザが任意の座席を指定すると、クライアント２００は、指定された座席番号（収音位置情報）を、クライアント２００内のネットワークインタフェース２０３を介して収音装置１００に伝送する。 The display 206a displays this layout drawing. The user can designate an arbitrary seat using the touch panel while looking at the layout diagram displayed on the display 206a. When the user designates an arbitrary seat, the client 200 transmits the designated seat number (sound collection position information) to the sound collection device 100 via the network interface 203 in the client 200.

収音装置１００内のメモリ１０２（記録媒体）は、収音システムの配置される空間内に任意に設定される座標系における、座席の位置座標と座席番号とを対応付ける配置データベースを記憶している。すなわち、所定の空間内に配置された座席の位置が、特定位置を示す特定位置情報であり、この特定位置情報がメモリ１０２に記憶されている。 The memory 102 (recording medium) in the sound collection device 100 stores an arrangement database that associates seat position coordinates and seat numbers in a coordinate system arbitrarily set in a space in which the sound collection system is arranged. . That is, the position of a seat arranged in a predetermined space is specific position information indicating a specific position, and this specific position information is stored in the memory 102.

なお、座席に乗客が座るので、所定の空間内に配置された座席の位置が、収音位置になりうる。よって、座席番号の指定を受け付けると、収音装置１００は、メモリ１０２に記憶している座席の配置データベースを参照し、指定された座席番号に対応する座席の位置座標を収音位置Ｐとして決定する。 In addition, since a passenger sits on a seat, the position of the seat arrange | positioned in the predetermined space can become a sound collection position. Therefore, when the designation of the seat number is accepted, the sound collection device 100 refers to the seat arrangement database stored in the memory 102 and determines the position coordinate of the seat corresponding to the designated seat number as the sound collection position P. To do.

次に、本実施の形態における被収音方向の算出方法（ステップＳ１０２）について説明する。本実施の形態では、指定された座席番号に対応する座席の配置方向Ｙを被収音方向として決定する。座席の配置方向Ｙとは、座席の背もたれの前面（着座した人物の背中が接触する面）が向いている方向を指す。すなわち、座席の配置方向Ｙは、乗客（人物）が座席に着座した場合に乗客の顔が通常向いている方向を指す。 Next, a method for calculating the sound collection direction (step S102) in the present embodiment will be described. In the present embodiment, the seat arrangement direction Y corresponding to the designated seat number is determined as the sound collection direction. The seat arrangement direction Y refers to the direction in which the front surface of the seat back (the surface on which the seated person's back contacts) is facing. That is, the seat arrangement direction Y indicates the direction in which the passenger's face is normally facing when the passenger (person) is seated on the seat.

ここで、空間内での座席の配置方向がすべての座席について一定の方向である場合には、座席の配置方向Ｙは、収音装置１００内のＣＰＵ（コントローラ）１０１の処理パラメータとして設定されていてもよい。また、空間内での座席の配置方向が座席によって異なる場合には、座席の配置データベースにおいて、座席の配置方向Ｙは座席番号に対してそれぞれ紐付けられた情報として格納されていてもよい。さらに、ＣＰＵ１０１は、指定された座席番号に対応する座席の配置方向Ｙを座席の配置データベースから読み出して、これを被収音方向としてもよい。以降の処理は、実施の形態１と同様であるため、詳細な説明は省略する。 Here, when the arrangement direction of the seats in the space is a constant direction for all the seats, the arrangement direction Y of the seats is set as a processing parameter of the CPU (controller) 101 in the sound collection device 100. May be. Further, when the arrangement direction of the seats in the space differs depending on the seats, the arrangement direction Y of the seats may be stored as information associated with the seat number in the seat arrangement database. Further, the CPU 101 may read the seat arrangement direction Y corresponding to the designated seat number from the seat arrangement database, and use this as the sound collection direction. Since the subsequent processing is the same as that of the first embodiment, detailed description thereof is omitted.

実施の形態１と実施の形態２における被収音方向の差異を明確にするために、図４を参照して説明する。図４において、座席４０１は、空間内に配置された座席の一部として、座席４０１ａ、４０１ｂ及び４０１ｃを抜き出して図示したものである。このうち座席４０１ｂが収音位置Ｐとして指定されているとする。 In order to clarify the difference in the sound collection direction between the first embodiment and the second embodiment, a description will be given with reference to FIG. In FIG. 4, a seat 401 is illustrated by extracting seats 401a, 401b, and 401c as a part of the seats arranged in the space. Of these, it is assumed that the seat 401b is designated as the sound collection position P.

マイクロホン３０２ａは、収音位置Ｐ（乗客が着座する位置）から距離Ｄａだけ離れた図中の位置に配置されており、その位置座標はマイク位置Ｐａとして表される。すなわち、マイクロホン３０２ａは、着座した乗客の前方であり乗客から距離Ｄａだけ離れた場所Ｐａに位置する。 The microphone 302a is disposed at a position in the figure separated from the sound collection position P (position where the passenger is seated) by a distance Da, and the position coordinate is represented as a microphone position Pa. That is, the microphone 302a is located at a location Pa that is in front of the seated passenger and is separated from the passenger by a distance Da.

マイクロホン３０２ｂは、収音位置Ｐから距離Ｄｂだけ離れた図中の位置に配置されており、その位置座標はマイク位置Ｐｂとして表される。すなわち、マイクロホン３０２ｂは、着座した乗客の後方であり乗客から距離Ｄｂだけ離れた場所Ｐｂに位置する。 The microphone 302b is disposed at a position in the figure separated from the sound collection position P by a distance Db, and the position coordinate is represented as a microphone position Pb. That is, the microphone 302b is located at a location Pb behind the seated passenger and separated from the passenger by a distance Db.

これらの表示は、図２における顔面方向Ｘを座席の配置方向Ｙと単純に置き換えたものであり、処理の方法としては同様である。これにより、−θ〜＋θの角度範囲内に存在するマイクロホン３０２、すなわち座席位置で発せられた音声の音量レベルの減衰量の小さな、座席の配置方向に対して前方に存在するマイクロホン３０２が選択されやすくなる。 These displays are obtained by simply replacing the face direction X in FIG. 2 with the seat arrangement direction Y, and the processing method is the same. As a result, the microphone 302 that exists within the angle range of −θ to + θ, that is, the microphone 302 that exists in front of the seat arrangement direction with a small attenuation level of the volume level of the sound emitted at the seat position is selected. It becomes easy.

例えば、収音装置１００内のＣＰＵ（コントローラ）１０１は、座席の配置方向Ｙを基準として各収音部配置情報との相対位置関係で決定される重み係数（ｗ１、ｗ２）と、収音位置Ｐと各マイクロホン（収音部）３０２との直線距離（Ｄａ、Ｄｂ）とに基づいて算出される重み付き距離（Ｄｗａ、Ｄｗｂ）が最も小さくなるマイクロホン（収音部）３０２に対応する音声信号を選択する。 For example, the CPU (controller) 101 in the sound collection device 100 includes a weight coefficient (w1, w2) determined based on a relative positional relationship with each sound collection unit arrangement information with respect to the arrangement direction Y of the seat, and a sound collection position. Audio signal corresponding to the microphone (sound collecting unit) 302 having the smallest weighted distance (Dwa, Dwb) calculated based on the linear distance (Da, Db) between P and each microphone (sound collecting unit) 302 Select.

また、収音位置Ｐは、所定の方向に配置された１つ以上の座席の中から選択され、ＣＰＵ１０１は、収音位置Ｐに対して所定の方向にあるマイクロホン（収音部）３０２のうちで、収音位置情報と収音部配置情報とに基づいて算出される収音位置Ｐと各マイクロホン（収音部）３０２との直線距離が最も小さな収音部を選択してもよい。さらに、所定の方向が、収音位置Ｐに対して前方向であってもよい。 The sound collection position P is selected from one or more seats arranged in a predetermined direction, and the CPU 101 selects a microphone (sound collection unit) 302 in a predetermined direction with respect to the sound collection position P. Thus, the sound collection unit having the shortest linear distance between the sound collection position P calculated based on the sound collection position information and the sound collection unit arrangement information and each microphone (sound collection unit) 302 may be selected. Furthermore, the predetermined direction may be a forward direction with respect to the sound collection position P.

（実施の形態３）
図５及び図６を用いて実施の形態３を説明する。実施の形態３における収音システムの構成は、実施の形態１と同様（図１参照）であるので、詳細な説明は省略する。例えば、収音位置Ｐを指定するための構成は、実施の形態１と同様である。 (Embodiment 3)
The third embodiment will be described with reference to FIGS. Since the configuration of the sound collection system in the third embodiment is the same as that in the first embodiment (see FIG. 1), detailed description thereof is omitted. For example, the configuration for designating the sound collection position P is the same as in the first embodiment.

図５は、収音位置Ｐと、マイクロホン３０２（収音部）の位置と、障害物の配置関係の一例を示す俯瞰図（配置図の一部）である。図５において、配置されている障害物は、空間の高さと同一の高さを有する。すなわち、収音位置Ｐ及びマイクロホン３０２ａの位置Ｐａ、マイクロホン３０２ｂの位置Ｐｂの高さ情報を無視できる２次元座標系として取り扱うことのできる簡単な例である場合を示している。 FIG. 5 is an overhead view (part of the layout diagram) showing an example of the relationship between the sound collection position P, the position of the microphone 302 (sound collection unit), and the obstacle. In FIG. 5, the arranged obstacle has the same height as the space. That is, it shows a simple example that can be handled as a two-dimensional coordinate system that can ignore the height information of the sound collection position P, the position Pa of the microphone 302a, and the position Pb of the microphone 302b.

空間内に任意に設定される、空間について離散化された座標系の原点Ｏは、図中の位置に設定されているとする。また、この離散化された座標系において、ｄｘは、ｘ方向（図５における右方向）に規定される距離の最小単位である。ｄｙは、ｙ方向（図５における下方向）に規定される距離の最小単位である。ここでは、人や物の位置をｄｘ及びｄｙの整数倍で表すものとする。 It is assumed that the origin O of the coordinate system discretized with respect to the space arbitrarily set in the space is set at a position in the drawing. In this discretized coordinate system, dx is the minimum unit of distance defined in the x direction (right direction in FIG. 5). dy is the minimum unit of distance defined in the y direction (downward direction in FIG. 5). Here, the position of a person or an object is represented by an integer multiple of dx and dy.

この座標系に従って、任意の位置として図中の収音位置Ｐが指定されると、その座標は（３ｄｘ，４ｄｙ）である。 When the sound collection position P in the figure is designated as an arbitrary position according to this coordinate system, the coordinates are (3dx, 4dy).

マイクロホン３０２ａは、図中の位置に配置されており、その位置座標Ｐａは（３ｄｘ，ｄｙ）である。同様にマイクロホン３０２ｂは、図中の位置に配置されており、その位置座標Ｐｂは（３ｄｘ，６ｄｙ）である。 The microphone 302a is arranged at a position in the figure, and its position coordinate Pa is (3dx, dy). Similarly, the microphone 302b is arranged at a position in the figure, and its position coordinate Pb is (3dx, 6dy).

また、図中の位置に、空間の高さと同一の高さを有する物体が障害物として配置されている。具体的には、離散化された座標上で物体の配置範囲に対応する座標である（２ｄｘ，２ｄｙ）、（３ｄｘ，２ｄｙ）、（４ｄｘ，２ｄｙ）、（２ｄｘ，３ｄｙ）、（３ｄｘ，３ｄｙ）、（４ｄｘ，３ｄｙ）が、収音装置１００内のメモリ（記録媒体）１０２にデータベース（物体の配置情報）として格納されているとする。 In addition, an object having the same height as the space is arranged as an obstacle at a position in the drawing. Specifically, (2dx, 2dy), (3dx, 2dy), (4dx, 2dy), (2dx, 3dy), (3dx, 3dy) are coordinates corresponding to the arrangement range of the object on the discretized coordinates. ), (4dx, 3dy) are stored as a database (object arrangement information) in the memory (recording medium) 102 in the sound collection device 100.

以下では、上述のような配置関係にある場合を例にとって、本実施の形態における収音装置１００の動作を説明する。 In the following, the operation of the sound collection device 100 in the present embodiment will be described by taking as an example the case where there is an arrangement relationship as described above.

図６は、本実施の形態における収音装置１００の処理を示すフローチャートである。図６において、収音装置１００は、クライアント２００から収音位置の指定を受け付けると、収音位置Ｐを算出する（Ｓ３０１）。収音位置Ｐの指定は、実施の形態１に記載の方法と同様の方法にて指定される。ＣＰＵ（コントローラ）１０１は、収音位置Ｐ及びそれぞれのマイクロホン３０２ａ及び３０２ｂの座標に基づいて、収音位置Ｐとマイクロホン３０２ａの位置Ｐａ及びマイクロホン３０２ｂの位置Ｐｂの各々とを結ぶ線分の方程式を算出する（Ｓ３０２）。 FIG. 6 is a flowchart showing processing of the sound collection device 100 according to the present embodiment. In FIG. 6, when the sound collection device 100 receives the designation of the sound collection position from the client 200, the sound collection device 100 calculates the sound collection position P (S301). The sound collection position P is designated by a method similar to the method described in the first embodiment. Based on the sound collection position P and the coordinates of the microphones 302a and 302b, the CPU (controller) 101 calculates an equation of a line segment connecting the sound collection position P and each of the position Pa of the microphone 302a and the position Pb of the microphone 302b. Calculate (S302).

例えば、収音位置Ｐの座標を（Ｐｘ１，Ｐｙ１）とし、あるマイクロホン３０２の位置の座標を（Ｐｘ２，Ｐｙ２）とすると、収音位置Ｐ及びマイクロホン３０２の位置を結ぶ線分の方程式は以下のように表される。
（ｉ）Ｐｘ１≠Ｐｘ２の場合、
Ｙ＝（Ｐｙ２−Ｐｙ１）／（Ｐｘ２−Ｐｘ１）×（Ｘ−Ｐｘ１）＋Ｐｙ１
ただし、Ｐｘ１＜＝Ｘ＜＝Ｐｘ２（Ｐｘ１＜Ｐｘ２とする）
（ｉｉ）Ｐｘ１＝Ｐｘ２、Ｐｙ１≠Ｐｙ２の場合、
Ｘ＝Ｐｘ１
ただし、Ｐｙ１＜＝Ｙ＜＝Ｐｙ２（Ｐｙ１＜Ｐｙ２とする）
図５に示す例の場合、収音位置Ｐ及びマイクロホン３０２ａの位置Ｐａを結ぶ線分の方程式は、
Ｘ＝３ｄｘ、ただし、１ｄｙ＜＝Ｙ＜＝４ｄｙ
であり、収音位置Ｐ及びマイクロホン３０２ｂの位置Ｐｂを結ぶ線分の方程式は、
Ｘ＝３ｄｘ、ただし、４ｄｙ＜＝Ｙ＜＝６ｄｙ
となる。 For example, assuming that the coordinates of the sound collection position P are (Px1, Py1) and the coordinates of the position of a certain microphone 302 are (Px2, Py2), the equation of the line segment connecting the sound collection position P and the position of the microphone 302 is as follows: It is expressed as follows.
(I) When Px1 ≠ Px2,
Y = (Py2-Py1) / (Px2-Px1) * (X-Px1) + Py1
However, Px1 <= X <= Px2 (Px1 <Px2)
(Ii) When Px1 = Px2 and Py1 ≠ Py2,
X = Px1
However, Py1 <= Y <= Py2 (assuming Py1 <Py2)
In the case of the example shown in FIG. 5, the equation of the line segment connecting the sound collection position P and the position Pa of the microphone 302a is
X = 3 dx, where 1 dy <= Y <= 4 dy
And the equation of the line segment connecting the sound pickup position P and the position Pb of the microphone 302b is
X = 3 dx, where 4 dy <= Y <= 6 dy
It becomes.

次に、ＣＰＵ（コントローラ）１０１は、収音位置Ｐ、各マイクロホン３０２の位置及び物体の配置情報に基づいて、収音位置Ｐと各マイクロホン３０２の位置とを結ぶ線分上に物体の存在しないマイクロホン３０２のうちで、収音位置Ｐと各マイクロホン３０２の位置との直線距離が最も小さなマイクロホン３０２に対応する音声信号を選択する。より具体的には、ＣＰＵ１０１はまず、収音位置Ｐとの間に物体が存在しないようなマイクロホン３０２が１つ以上存在するかどうかを判定する。具体的には、ＣＰＵ１０１は、算出された線分の方程式の各々について、物体の配置範囲内を通るか否かを判定する。ＣＰＵ１０１は、物体の配置範囲内を通らない（障害物を横切らない）線分に対応するマイクロホン３０２を、収音位置Ｐとの間に物体が存在しないマイクロホン３０２であると判定する（Ｓ３０３）。メモリ（記録媒体）１０２は、空間内に任意に設定される座標系において、物体の存在範囲に対応するすべての座標のディジタル値をデータベースとして格納している。 Next, the CPU (controller) 101 has no object on a line segment connecting the sound collection position P and the position of each microphone 302 based on the sound collection position P, the position of each microphone 302 and the object arrangement information. Among the microphones 302, an audio signal corresponding to the microphone 302 having the smallest linear distance between the sound collection position P and the position of each microphone 302 is selected. More specifically, the CPU 101 first determines whether or not there is one or more microphones 302 such that no object exists between the sound pickup position P and the sound pickup position P. Specifically, the CPU 101 determines whether or not each of the calculated line segment equations passes through the object arrangement range. The CPU 101 determines that the microphone 302 corresponding to the line segment that does not pass through the object arrangement range (does not cross the obstacle) is the microphone 302 in which no object exists between the sound collection position P (S303). The memory (recording medium) 102 stores digital values of all coordinates corresponding to the existence range of the object as a database in a coordinate system arbitrarily set in the space.

そして、ＣＰＵ１０１は、ステップＳ３０２において算出した線分の方程式について、それぞれとりうるすべての座標のディジタル値を算出し、メモリ１０２に格納されているデータベースに格納されている物体の存在する座標との比較を行うことで、線分が物体の配置範囲内を通るか否かの判定を行う。 Then, the CPU 101 calculates digital values of all possible coordinates for the line segment equation calculated in step S302 and compares them with the coordinates of the object stored in the database stored in the memory 102. It is determined whether or not the line segment passes through the object arrangement range.

図５に示す例の場合、物体の配置範囲に対応する座標値として、（２ｄｘ，２ｄｙ）、（３ｄｘ，２ｄｙ）、（４ｄｘ，２ｄｙ）、（２ｄｘ，３ｄｙ）、（３ｄｘ，３ｄｙ）、（４ｄｘ，３ｄｙ）を有しており、収音位置Ｐ及びマイクロホン３０２ａの位置Ｐａを結ぶ線分の方程式が、（３ｄｘ，２ｄｙ）及び（３ｄｘ，３ｄｙ）の座標値を取りうる。従って、収音位置Ｐ及びマイクロホン３０２ａの位置Ｐａを結ぶ線分は物体の配置範囲内を通ると判定される。一方、収音位置Ｐ及びマイクロホン３０２ｂの位置Ｐｂを結ぶ線分は、物体の配置範囲内を通らないと判定される。以上のようにして、ＣＰＵ１０１は、収音位置Ｐと各マイクロホン３０２の位置とを結ぶ線分上に物体の存在しないマイクロホン３０２を特定する。 In the case of the example shown in FIG. 5, the coordinate values corresponding to the arrangement range of the object are (2dx, 2dy), (3dx, 2dy), (4dx, 2dy), (2dx, 3dy), (3dx, 3dy), ( 4dx, 3dy), and an equation of a line segment connecting the sound collection position P and the position Pa of the microphone 302a can take the coordinate values of (3dx, 2dy) and (3dx, 3dy). Therefore, it is determined that the line segment connecting the sound pickup position P and the position Pa of the microphone 302a passes through the object arrangement range. On the other hand, it is determined that the line segment connecting the sound pickup position P and the position Pb of the microphone 302b does not pass through the object arrangement range. As described above, the CPU 101 identifies the microphone 302 having no object on the line segment connecting the sound collection position P and the position of each microphone 302.

ＣＰＵ１０１は、物体の配置範囲内を通らない線分に対応するマイクロホン３０２が１つ以上あると判定した場合（Ｓ３０３においてＹｅｓ）、物体の配置範囲内を通らないと判定したマイクロホン３０２のうち、収音位置Ｐからそれぞれのマイクロホン３０２までの直線距離が最も小さなマイクロホン３０２に対応する音声信号を選択する（Ｓ３０４）。 When the CPU 101 determines that there is one or more microphones 302 corresponding to a line segment that does not pass through the object arrangement range (Yes in S303), the CPU 101 determines that the microphones 302 that are determined not to pass through the object arrangement range are included. An audio signal corresponding to the microphone 302 having the smallest linear distance from the sound position P to each microphone 302 is selected (S304).

すなわち、ＣＰＵ（コントローラ）１０１は、収音位置Ｐと各マイクロホン（収音部）３０２を結ぶ線分上に物体の存在しないマイクロホン（収音部）３０２のうちで、収音位置情報と収音部配置情報とに基づいて算出される収音位置Ｐと各マイクロホン（収音部）３０２との直線距離が最も小さなマイクロホン（収音部）３０２に対応する音声信号を選択する。また、ＣＰＵ１０１は、収音位置Ｐと各マイクロホン３０２を結ぶ線分上に物体が存在するかどうかで異なる重み係数と、収音位置Ｐと各マイクロホン３０２との直線距離とに基づいて算出される重み付き距離が最も小さくなるマイクロホン３０２に対応する音声信号を選択してもよい。 That is, the CPU (controller) 101 collects the sound collection position information and the sound collection among the microphones (sound collection units) 302 having no object on the line segment connecting the sound collection position P and each microphone (sound collection unit) 302. The sound signal corresponding to the microphone (sound collecting unit) 302 having the shortest linear distance between the sound collecting position P calculated based on the part arrangement information and each microphone (sound collecting unit) 302 is selected. Further, the CPU 101 is calculated based on a weighting factor that varies depending on whether an object exists on a line segment connecting the sound collection position P and each microphone 302 and a linear distance between the sound collection position P and each microphone 302. An audio signal corresponding to the microphone 302 having the smallest weighted distance may be selected.

ＣＰＵ１０１は、どのマイクロホン３０２も物体の配置範囲内を通ると判定した場合（Ｓ３０３においてＮｏ）、それぞれのマイクロホン３０２から収音位置Ｐまでの直線距離のうち、その直線距離が最も小さいマイクロホン３０２に対応する音声信号を選択する（Ｓ３０５）。 When the CPU 101 determines that any microphone 302 passes through the object arrangement range (No in S303), the CPU 101 corresponds to the microphone 302 having the smallest linear distance among the linear distances from the respective microphones 302 to the sound pickup position P. The audio signal to be selected is selected (S305).

図５に示す例の場合、ステップＳ３０３において、物体の配置範囲内を通らない線分に対応するマイクロホン３０２はマイクロホン３０２ｂのみと判定される。すなわち、物体の配置範囲内を通らない線分に対応するマイクロホンのうち、収音位置Ｐとの距離が最小となるマイクロホン３０２はマイクロホン３０２ｂであるため、マイクロホン３０２ｂに対応する音声信号が選択される。これにより、収音位置Ｐとマイクロホン３０２の位置とを結ぶ線分上に物体の存在しない、すなわち障害物による音声の音量レベルの減衰量の小さな、マイクロホン３０２が選択されやすくなる。 In the case of the example shown in FIG. 5, in step S303, it is determined that the microphone 302 corresponding to the line segment that does not pass through the object arrangement range is only the microphone 302b. That is, among the microphones corresponding to the line segments that do not pass through the object arrangement range, the microphone 302 having the minimum distance to the sound collection position P is the microphone 302b, and therefore the audio signal corresponding to the microphone 302b is selected. . As a result, it is easy to select the microphone 302 in which no object exists on the line segment connecting the sound collection position P and the position of the microphone 302, that is, the sound volume level attenuation amount is small due to the obstacle.

なお、本実施の形態では、説明の簡略化のため、収音システムの配置される空間内に任意に設定される座標系が、高さ情報を無視した２次元座標系として扱える場合について説明を行ったが、高さ情報を考慮した３次元座標系でもよい。３次元座標系の場合、収音位置の高さは２つ以上のカメラ３０１が撮像する映像データと、メモリ１０２が記憶しているカメラ３０１の位置座標とに基づいて算出される。 In this embodiment, for the sake of simplification of explanation, a case where a coordinate system arbitrarily set in the space where the sound collection system is arranged can be treated as a two-dimensional coordinate system ignoring height information is explained. However, a three-dimensional coordinate system that takes height information into consideration may be used. In the case of a three-dimensional coordinate system, the height of the sound collection position is calculated based on video data captured by two or more cameras 301 and the position coordinates of the cameras 301 stored in the memory 102.

（他の実施の形態）
以上のように、本出願において開示する技術の例示として、実施の形態１〜３を説明した。しかしながら、本開示における技術は、これに限定されず、変更、置き換え、付加、省略等を行った実施の形態にも適用できる。また、上記実施の形態１、２及び３で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。そこで、以下、他の実施の形態を例示する。 (Other embodiments)
As described above, Embodiments 1 to 3 have been described as examples of the technology disclosed in the present application. However, the technology in the present disclosure is not limited to this, and can also be applied to embodiments in which changes, replacements, additions, omissions, and the like are performed. In addition, it is possible to combine the components described in the first, second, and third embodiments to form a new embodiment. Therefore, other embodiments will be exemplified below.

実施の形態１〜３では、空間内に複数のマイクロホン（収音部）３０２が配置されている収音システムにおいて、マイクロホン３０２を適切に選択する収音装置１００について説明した。しかし、空間内に配置されるマイクロホン３０２をそれぞれ、複数のマイクロホン素子からなるマイクロホンアレイに置き換え、マイクロホン素子の音声信号を信号処理した結果の音声信号を選択する収音装置としてもよい。 In the first to third embodiments, the sound collection device 100 that appropriately selects the microphone 302 in the sound collection system in which a plurality of microphones (sound collection units) 302 are arranged in the space has been described. However, each of the microphones 302 arranged in the space may be replaced with a microphone array composed of a plurality of microphone elements, and a sound collection device that selects a sound signal obtained as a result of signal processing of the sound signal of the microphone element may be used.

また、実施の形態１〜３では、空間内に複数のマイクロホン３０２が配置されている収音システムにおいて、収音に最適な１本のマイクロホン３０２を選択する収音装置１００について説明した。しかし、複数のマイクロホン３０２からの音声信号を各一定条件に基づいて重み付けして加算し、そして、その重み付け加算された音声信号を収音すべき音声信号として選択する収音装置としてもよい。具体的には、図２、図４において、−θ〜＋θの角度範囲内に存在するマイクロホン３０２からの音声信号に対する重み付けを相対的に大きくし、それ以外の範囲に存在するマイクロホン３０２からの音声信号に対する重み付けを相対的に小さくして重み付け加算してもよい。さらに、−θ〜＋θの角度範囲内に複数のマイクロホン３０２が存在する場合、収音位置Ｐとマイク位置との距離が近いほど重み付けを大きくし、遠くなるに連れて重み付けを小さくしてもよい。 In the first to third embodiments, the sound collection device 100 that selects one microphone 302 that is most suitable for sound collection in the sound collection system in which a plurality of microphones 302 is arranged in the space has been described. However, a sound collecting device may be used in which sound signals from the plurality of microphones 302 are weighted and added based on each predetermined condition, and the weighted and summed sound signals are selected as sound signals to be picked up. Specifically, in FIGS. 2 and 4, the weighting of the sound signal from the microphone 302 existing in the angle range of −θ to + θ is relatively increased, and the sound from the microphone 302 existing in the other range is set. The weighting addition may be performed with a relatively small weighting for the signal. Furthermore, when there are a plurality of microphones 302 within the angle range of −θ to + θ, the weighting may be increased as the distance between the sound collection position P and the microphone position is shorter, and the weighting may be decreased as the distance is increased. .

実施の形態３では、収音位置Ｐとマイク位置とを結ぶ線分上に物体の存在しないマイクロホン３０２のうちで、収音位置Ｐからマイクロホン３０２までの直線距離が最も小さなマイクロホン３０２に対応する音声信号を選択する収音装置を説明した。しかし、物体の配置情報に基づいて、収音位置と各マイクロホン３０２の位置とを結ぶ線分上に物体が存在するか否かで、異なる重み係数を算出し、収音位置からマイクロホン３０２までの直線距離に対してこの重み係数を掛けることにより計算される重み付き距離が最も小さなマイクロホン３０２に対応する音声信号を選択する収音装置としてもよい。 In the third embodiment, among the microphones 302 in which no object is present on the line segment connecting the sound pickup position P and the microphone position, the sound corresponding to the microphone 302 having the smallest linear distance from the sound pickup position P to the microphone 302. A sound collection device for selecting a signal has been described. However, based on the object arrangement information, a different weighting factor is calculated depending on whether or not the object exists on a line segment connecting the sound collection position and the position of each microphone 302, and the distance from the sound collection position to the microphone 302 is calculated. A sound collection device that selects a sound signal corresponding to the microphone 302 having the smallest weighted distance calculated by multiplying the linear distance by this weighting coefficient may be used.

なお、上述の実施の形態は、本開示における技術を例示するためのものであるから、特許請求の範囲またはその均等の範囲において種々の変更、置き換え、付加、省略等を行うことができる。 Note that the above-described embodiments are for illustrating the technique in the present disclosure, and therefore various modifications, replacements, additions, omissions, and the like can be made within the scope of the claims and the equivalents thereof.

本開示は、指定した位置における音声の収音を行う収音装置に適用可能である。具体的には、航空機等の移動体の客室内に設置する収音装置等に、本開示は適用可能である。 The present disclosure is applicable to a sound collection device that collects sound at a specified position. Specifically, the present disclosure is applicable to a sound collection device or the like installed in a cabin of a moving body such as an aircraft.

１００収音装置
１０１ＣＰＵ（コントローラ）
１０２メモリ（記録媒体）
１０３ネットワークインタフェース（収音位置入力インタフェース）
１０４映像インタフェース
１０５音声インタフェース
２００クライアント
２０１ＣＰＵ
２０２メモリ
２０３ネットワークインタフェース
２０６入出力インタフェース
２０６ａディスプレイ
２０６ｂタッチパネル
２０６ｃスピーカ
３０１カメラ（撮像部）
３０１ａカメラ
３０１ｂカメラ
３０２マイクロホン（収音部）
３０２ａマイクロホン
３０２ｂマイクロホン
４０１座席
４０１ａ座席
４０１ｂ座席
４０１ｃ座席 100 sound pickup device 101 CPU (controller)
102 Memory (recording medium)
103 Network interface (sound pickup position input interface)
104 Video interface 105 Audio interface 200 Client 201 CPU
202 Memory 203 Network Interface 206 Input / Output Interface 206a Display 206b Touch Panel 206c Speaker 301 Camera (Imaging Unit)
301a camera 301b camera 302 microphone (sound collecting unit)
302a microphone 302b microphone 401 seat 401a seat 401b seat 401c seat

Claims

A sound collection position input interface for inputting sound collection position information indicating a sound collection position;
An audio interface for inputting two or more audio signals to be collected by a sound collection unit arranged in advance at a predetermined position;
A recording medium storing sound collection unit arrangement information indicating arrangement information of the sound collection unit;
A sound collection device comprising: a controller that selects an audio signal output by the sound collection unit based on the sound collection position information and the sound collection unit arrangement information.

The controller is
Of the sound collection units within a predetermined angle range centered on the sound collection direction with the sound collection position as a reference, the sound collection unit is calculated based on the sound collection position information and the sound collection unit arrangement information. Selecting an audio signal corresponding to the sound collection unit having the shortest linear distance between the sound position and each sound collection unit;
The sound collecting device according to claim 1.

A video interface that inputs a video image of one or more imaging units that are arranged in advance at a predetermined position and that captures a video image of a predetermined range;
The recording medium further stores imaging unit arrangement information indicating the arrangement of the imaging unit,
The controller determines the face direction of the person reflected within a certain distance range from the sound collection position,
The face direction is the sound collection direction,
The sound collecting device according to claim 2.

The recording medium stores specific position information indicating one or more specific positions in a predetermined space,
The sound collection position is selected from the one or more specific positions.
The sound collecting device according to claim 2.

The specific position is a position of a seat arranged in a predetermined space.
The sound collecting device according to claim 4.

The sound collection direction is a direction in which a seat is arranged.
The sound collecting device according to claim 5.

The controller is calculated based on a weighting factor determined by a relative positional relationship between the sound collection position information and each sound collection unit arrangement information, and a linear distance between the sound collection position and each sound collection unit. Selecting an audio signal corresponding to the sound collection unit having the smallest weighted distance.
The sound collecting device according to claim 1.

A video interface for inputting video output from one or more imaging units that are pre-arranged at a predetermined position and capture a video in a predetermined range;
The recording medium further stores imaging unit arrangement information indicating the arrangement of the imaging unit,
The controller determines a face direction of a person reflected within a certain distance range from the sound collection position,
Weighted distance calculated based on a weighting factor determined by a relative positional relationship with each of the sound collecting unit arrangement information with respect to the face direction, and a linear distance between the sound collecting position and each of the sound collecting units. Select the audio signal corresponding to the sound collection part where
The sound collecting device according to claim 7.

The recording medium stores specific position information indicating one or more specific positions in a predetermined space,
The sound collection position is selected from the one or more specific positions.
The sound collecting device according to claim 7.

The specific position is a position of a seat arranged in a predetermined space.
The sound collecting device according to claim 9.

The controller is based on a weighting factor determined by a relative positional relationship with each of the sound collection unit arrangement information with respect to a direction in which a seat is arranged, and a linear distance between the sound collection position and each of the sound collection units. Select the audio signal corresponding to the sound collection unit with the smallest weighted distance calculated by
The sound collecting device according to claim 10.

The controller includes the sound collection position calculated based on the sound collection position information and the sound collection unit arrangement information among the sound collection units in a predetermined direction with respect to the sound collection position, and Selecting an audio signal corresponding to the sound collection unit having the smallest linear distance to the sound collection unit;
The sound collecting device according to claim 1.

The sound collection position is selected from one or more seats arranged in a predetermined direction,
The controller is configured to calculate the sound collection position and the sound collection units calculated based on the sound collection position information and the sound collection unit arrangement information among the sound collection units in the forward direction with respect to the sound collection position. Select the sound pickup part with the smallest linear distance to
The sound collecting device according to claim 12.

The recording medium further stores arrangement information of an object fixedly arranged in a predetermined space,
The controller specifies a sound collection unit where the object does not exist on a line segment connecting the sound collection position and each of the sound collection units based on the arrangement information of the object, and identifies the object on the line segment. Among the non-existing sound collecting units, the sound collecting unit having the shortest linear distance between the sound collecting position calculated based on the sound collecting position information and the sound collecting unit arrangement information and each sound collecting unit. Select the corresponding audio signal,
The sound collecting device according to claim 1.

The recording medium further stores arrangement information of an object fixedly arranged in a predetermined space,
The controller, based on the arrangement information of the object, a weighting factor that differs depending on whether or not the object exists on a line segment connecting the sound collection position and each sound collection unit, the sound collection position and each of the sound collection positions Selecting an audio signal corresponding to the sound collection unit that has the smallest weighted distance calculated based on the linear distance to the sound collection unit;
The sound collecting device according to claim 1.

A first step of inputting sound collection position information indicating a sound collection position;
A second step of inputting two or more audio signals collected by a sound collection unit arranged in advance at a predetermined position;
A sound collecting method comprising: a third step of selecting an audio signal output by the sound collecting unit based on the sound collecting position information and sound collecting unit arrangement information indicating arrangement information of the sound collecting unit.

The third step includes a weighting factor determined by a relative positional relationship between the position indicated by the sound collection position information and the position indicated by the sound collection unit arrangement information, and the sound collection position and each sound collection unit. Selecting an audio signal corresponding to the sound collection unit having the smallest weighted distance calculated based on the linear distance;
The sound collection method according to claim 16.

The third step includes: the sound collection position calculated based on the sound collection position information and the sound collection section arrangement information among the sound collection sections in a predetermined direction with respect to the sound collection position; Selecting an audio signal corresponding to the sound collection unit having the smallest linear distance to each sound collection unit;
The sound collection method according to claim 16.

In the third step, a sound collection unit where the object does not exist is specified on a line segment connecting the sound collection position and each sound collection unit based on the arrangement information of the object fixedly arranged in a predetermined space. Of the sound collection units where the object does not exist on the line segment, the sound collection position calculated based on the sound collection position information and the sound collection unit arrangement information and each of the sound collection units Selecting an audio signal corresponding to the sound pickup part having the smallest linear distance;
The sound collection method according to claim 16.

The third step differs depending on whether or not the object exists on a line segment connecting the sound collection position and each sound collection unit based on the arrangement information of the object fixedly arranged in a predetermined space. A weighting factor is calculated, and an audio signal corresponding to the sound collecting unit having the smallest weighted distance calculated based on the weighting factor and the linear distance between the sound collecting position and each sound collecting unit is calculated. select,
The sound collection method according to claim 16.