JP2017195581A

JP2017195581A - Signal processing device, signal processing method and program

Info

Publication number: JP2017195581A
Application number: JP2016086506A
Authority: JP
Inventors: 典朗多和田; Noriaki Tawada
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-04-22
Filing date: 2016-04-22
Publication date: 2017-10-26

Abstract

PROBLEM TO BE SOLVED: To provide a signal processing device achieving both of the clear sound reproduction of a point sound source and the reproduction of an omnidirectional and uniform atmospheric sound.SOLUTION: In a signal processing device 1, an acoustic signal input unit 2 acquires each directional sound associated with a plurality of orientations from acoustic signals which are collected by a plurality of microphone elements 3. A signal analysis processing unit 13, in order to achieve both of the sound reproduction of the point source and the omnidirectional and uniform atmospheric sound, optimizes the orientation of the acquired directional sound. Among the values of beam pattern gains each indicating the directivity of the directional sound associated with each of the plurality of orientations, the signal analysis processing unit 13 sets a ratio of a maximum gain value in the detected point source direction to other values to be larger than a predetermined value. The signal analysis processing unit 13 sets a deviation among the gains of synthetic beam patterns, obtained by synthesizing the directional beam patterns associated with the plurality of orientations, to be smaller than a predetermined value. The signal analysis processing unit 13 controls each orientation in a manner to satisfy these two conditions.SELECTED DRAWING: Figure 1

Description

本発明は、音響信号から方向音を取得する信号処理装置、信号処理方法およびプログラムに関する。 The present invention relates to a signal processing device, a signal processing method, and a program for acquiring a direction sound from an acoustic signal.

複数のマイク素子を備えるマイクアレイで収音した音響から複数チャンネルの音響信号を取得し、当該音響信号から、方向毎の音（以下、「方向音」と称する。）を取得する技術が知られている。そして、取得した全方位の各方向音がそれぞれの方向から再生されているようにユーザに提示できれば、その場にいるかのような高臨場感を実現することができる。
方向音を取得する方法としては指向性を有するマイク素子を備えるマイクアレイを用いて収音した音響の音響信号から方向音を取得する方法の他に、フィルタリングに基づく方法がある。この方法では、無指向性のマイクアレイで収音して得た複数チャンネルの音響信号に、所望の指向方向に対応する指向性形成のフィルタ係数によってフィルタ処理を行うことで、任意の指向方向の方向音を取得することができる。
特許文献１に開示された技術では、指向性形成フィルタのフィルタ係数を切り換えて指向性を回転させ、音源の存在する方向を推定する。そして、推定した音源の方向に対応するフィルタ係数を選択して音源の方向に指向性を向けて収音する。 A technique is known in which sound signals of a plurality of channels are acquired from sound collected by a microphone array including a plurality of microphone elements, and sound for each direction (hereinafter referred to as “direction sound”) is acquired from the sound signals. ing. If the obtained omnidirectional sounds in all directions can be presented to the user so that they are reproduced from the respective directions, a high sense of presence as if they were present can be realized.
As a method for acquiring a direction sound, there is a method based on filtering in addition to a method for acquiring a direction sound from an acoustic signal of sound collected using a microphone array including a microphone element having directivity. In this method, a multi-channel acoustic signal obtained by picking up sound with an omnidirectional microphone array is filtered by a directivity-forming filter coefficient corresponding to a desired directivity direction, so that an arbitrary directivity direction can be obtained. Directional sound can be acquired.
In the technique disclosed in Patent Literature 1, the directivity is rotated by switching the filter coefficient of the directivity forming filter, and the direction in which the sound source exists is estimated. Then, a filter coefficient corresponding to the estimated direction of the sound source is selected, and sound is collected with directivity directed toward the direction of the sound source.

特許第４８９８９０７号公報Japanese Patent No. 4898907

方向音には、特定の方向にある音源（以下、「点音源」と称する。）の音と、非方向性の拡散音源の音（以下、「雰囲気音」と称する。）が含まれている。全方位の各方向音がそれぞれの方向から再生されているようにユーザに提示するためには、点音源の音を明瞭に再生することに加えて、雰囲気音を全方位でムラ無く再生することも大切である。
明瞭な点音源の音の再生が音の方向感に寄与するのに対して、全方位でムラの無い雰囲気音の再生は音場に包まれているように感じる「包まれ感」に寄与する。これら方向感や包まれ感は、音の臨場感の大切な要素である。 Directional sound includes sound of a sound source in a specific direction (hereinafter referred to as “point sound source”) and sound of a non-directional diffused sound source (hereinafter referred to as “atmosphere sound”). . In order to present to the user that each omnidirectional sound is reproduced from each direction, in addition to clearly reproducing the sound of the point sound source, the ambient sound should be reproduced evenly in all directions. Is also important.
Clear point source sound reproduction contributes to the direction of the sound, whereas omnidirectional atmosphere sound reproduction in all directions contributes to the "wrapping feeling" that feels like being wrapped in the sound field. . These senses of direction and envelopment are important elements of the realism of sound.

しかしながら、特許文献１に開示された技術では、音源の存在する方向に指向性をつけて収音することにより、明瞭な点音源の音の再生は実現できるものと考えられるが、全方位でムラの無い雰囲気音の再生については考慮されていない。
本発明は、上記課題を解決するためになされたものであり、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立することを目的とする。 However, with the technique disclosed in Patent Document 1, it is considered that sound reproduction of a clear point sound source can be realized by collecting sound with directivity in the direction in which the sound source exists. No consideration is given to the reproduction of atmospheric sounds without noise.
The present invention has been made in order to solve the above-described problems, and an object of the present invention is to achieve both the reproduction of a clear point sound source sound and the reproduction of atmospheric sound without any unevenness in all directions.

本発明に係る信号処理装置は、複数のマイク素子で収音された音響の音響信号から複数の指向方向ごとの方向音を取得する取得手段と、前記音響信号から点音源の方向を検出する検出手段と、前記複数の指向方向夫々に対応する方向音の指向性を示すビームパターンのゲインの値であって、前記検出手段により検出された前記点音源の方向におけるゲインの値が最大のものとそれ以外のものとの比を所定値より大きくするとともに、複数の指向性のビームパターンを合成した合成ビームパターンのゲインの偏差を所定値より小さくするように、前記各指向方向を制御する制御手段と、を備える。 The signal processing apparatus according to the present invention includes an acquisition means for acquiring a direction sound for each of a plurality of directional directions from an acoustic signal collected by a plurality of microphone elements, and a detection for detecting a direction of a point sound source from the acoustic signal. And a gain value of a beam pattern indicating the directivity of the directional sound corresponding to each of the plurality of directivity directions, wherein the gain value in the direction of the point sound source detected by the detection means is the largest. Control means for controlling the respective directivity directions so that the ratio with other ones is greater than a predetermined value and the deviation of the gain of the combined beam pattern obtained by combining a plurality of directivity beam patterns is smaller than the predetermined value. And comprising.

本発明によれば、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立することができる。 According to the present invention, it is possible to achieve both the reproduction of a clear point sound source sound and the reproduction of an atmospheric sound without any unevenness in all directions.

本発明の実施形態１に係る信号処理システムの構成を示すブロック図。1 is a block diagram showing a configuration of a signal processing system according to Embodiment 1 of the present invention. 実施形態１のコンピュータの構成を示すブロック図。FIG. 2 is a block diagram illustrating a configuration of a computer according to the first embodiment. 実施形態１の信号解析処理の流れを示すフローチャート。3 is a flowchart showing a flow of signal analysis processing according to the first embodiment. 指向方向制御の説明図。Explanatory drawing of directivity direction control. 指向方向制御の説明図。Explanatory drawing of directivity direction control. 指向方向制御の説明図。Explanatory drawing of directivity direction control. 本発明の実施形態２に係る信号解析処理の流れを示すフローチャート。The flowchart which shows the flow of the signal analysis process which concerns on Embodiment 2 of this invention.

以下、添付図面を参照して、本発明を実施するための実施形態について詳細に説明する。なお、以下に説明する実施形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施形態に限定されるものではない。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the accompanying drawings. The embodiment described below is an example as means for realizing the present invention, and should be appropriately modified or changed according to the configuration and various conditions of the apparatus to which the present invention is applied. It is not limited to the embodiment.

＜実施形態１＞
（全体構成）
図１は、本発明の実施形態１の信号処理システムの構成を示すブロック図である。
この信号処理システムは、音響信号に対する信号処理を行う信号処理装置１と、音響信号を信号処理装置１に入力する音響信号入力部２と、音響を収音して音響信号を出力するマイクアレイ３と、供給された音響信号に応じた音を再生するヘッドホン４を備えている。信号処理装置１は、装置全体の制御を行うシステム制御部１１と、音響信号等の各種データを記憶しておく記憶部１２と、音響信号の解析処理を行う信号解析処理部１３と、音響信号をヘッドホン４に出力する音響信号出力部１４とを備えている。音響信号入力部２は、信号処理装置１に無線または有線で接続される。また、信号処理装置１は、ヘッドホン４に無線または有線で接続される。
なお、図１では、ヘッドホン４を信号処理装置１の外に設けた構成を示しているが、信号処理装置１内にヘッドホン４を含む構成としてもよいし、ヘッドホン４内に信号処理装置１を内蔵する構成としてもよい。 <Embodiment 1>
(overall structure)
FIG. 1 is a block diagram showing the configuration of the signal processing system according to the first embodiment of the present invention.
This signal processing system includes a signal processing device 1 that performs signal processing on an acoustic signal, an acoustic signal input unit 2 that inputs the acoustic signal to the signal processing device 1, and a microphone array 3 that collects the sound and outputs the acoustic signal. And a headphone 4 for reproducing a sound corresponding to the supplied acoustic signal. The signal processing device 1 includes a system control unit 11 that controls the entire device, a storage unit 12 that stores various data such as an acoustic signal, a signal analysis processing unit 13 that performs an analysis process of the acoustic signal, and an acoustic signal. Is output to the headphone 4. The acoustic signal input unit 2 is connected to the signal processing device 1 wirelessly or by wire. The signal processing device 1 is connected to the headphones 4 wirelessly or by wire.
Although FIG. 1 shows a configuration in which the headphones 4 are provided outside the signal processing device 1, the configuration may include the headphones 4 in the signal processing device 1, or the signal processing device 1 may be included in the headphones 4. It is good also as a structure to incorporate.

マイクアレイ３は、例えばＭ個（６個）のマイク素子３ａ、３ｂ、３ｃ、３ｄ、３ｅおよび３ｆを備えている。なお、少なくとも２個のマイク素子を備えていれば、後述するフィルタ処理におけるフィルタ係数の選択によって任意の指向方向の指向性を形成して方向音の取得を行うことができるため、マイク素子の数（Ｍ）は６個に限られない。各マイク素子３ａ〜３ｆは、その周囲の音響を収音し、アナログ音響信号を生成して、当該アナログ音響信号を音響信号入力部２に出力する。周囲の音響には、点音源の音および雰囲気音が含まれる。点音源の音には、例えば、人、動物、乗り物、楽器等の音や、バレーボール等のスポーツにおけるボールのアタック時の音等が含まれる。また、雰囲気音には、屋内では例えば反射・残響音、屋外では環境音といった背景となる音が含まれる。 The microphone array 3 includes, for example, M (six) microphone elements 3a, 3b, 3c, 3d, 3e, and 3f. If at least two microphone elements are provided, directionality can be obtained by forming directivity in an arbitrary directivity direction by selecting a filter coefficient in filter processing to be described later. (M) is not limited to six. Each of the microphone elements 3 a to 3 f collects surrounding sounds, generates an analog sound signal, and outputs the analog sound signal to the sound signal input unit 2. Ambient sounds include point sound and atmosphere sounds. The sound of the point sound source includes, for example, sounds of people, animals, vehicles, musical instruments and the like, and sounds when a ball is attacked in sports such as volleyball. The ambient sound includes background sounds such as reflection / reverberation sound indoors and environmental sound outdoors.

音響信号入力部２は、各マイク素子３ａ〜３ｆからの６チャンネルの音響信号に対して増幅処理およびＡ／Ｄ変換処理等を施し、所定の音響サンプリングレートに対応する周期で、デジタル音響信号である６チャンネルの信号を生成する。音響信号入力部２は、生成したデジタル音響信号を記憶部１２に入力する。
ヘッドホン４は、右耳用の音を再生する再生部４Ｒと左耳用の音を再生する再生部４Ｌとを備えている。このヘッドホン４は、音響信号出力部１４から供給された音響信号に応じた音を再生する。
記憶部１２には、音響信号入力部２から入力された６チャンネルの音響信号が格納される。また、記憶部１２には、左右の耳の頭部伝達関数（ＨＲＴＦ：Ｈｅａｄ−ＲｅｌａｔｅｄＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ）と、方向音の取得を行うための指向性を形成するフィルタ係数が格納されている。 The acoustic signal input unit 2 performs amplification processing, A / D conversion processing, and the like on the six-channel acoustic signals from the respective microphone elements 3a to 3f, and is a digital acoustic signal with a period corresponding to a predetermined acoustic sampling rate. A 6-channel signal is generated. The acoustic signal input unit 2 inputs the generated digital acoustic signal to the storage unit 12.
The headphone 4 includes a reproduction unit 4R that reproduces sound for the right ear and a reproduction unit 4L that reproduces sound for the left ear. The headphones 4 reproduce sound corresponding to the acoustic signal supplied from the acoustic signal output unit 14.
The storage unit 12 stores 6-channel acoustic signals input from the acoustic signal input unit 2. In addition, the storage unit 12 stores a head-related transfer function (HRTF) for the left and right ears and a filter coefficient that forms directivity for obtaining a directional sound.

（再生動作概要）
信号解析処理部１３は、後述する信号解析処理によって、記憶部１２に格納されている６チャンネルの音響信号から点音源の方向を検出し、検出した点音源の方向に応じて、取得する方向音の指向方向の制御を行う。また、信号解析処理部１３は、後述する信号解析処理によって、６チャンネルの音響信号から各指向方向の方向音を取得し、取得した方向音からヘッドホン４で再生する音響信号（以下、「ヘッドホン再生信号」と称する。）を生成する。この生成では、信号解析処理部１３は、記憶部１２に格納されている６チャンネルの音響信号から取得した各方向音の指向方向に、仮想スピーカを設定する。さらに、信号解析処理部１３は、記憶部１２に格納されているＨＲＴＦを考慮して、各仮想スピーカの方向の方向音に対応する音響信号を、左右それぞれ加算してヘッドホン再生信号を生成する。信号解析処理部１３は、生成したヘッドホン再生信号を、音響信号出力部１４に入力する。 (Reproduction operation overview)
The signal analysis processing unit 13 detects the direction of the point sound source from the six-channel acoustic signals stored in the storage unit 12 by signal analysis processing described later, and acquires the direction sound to be acquired according to the detected direction of the point sound source. Control the directivity direction. Further, the signal analysis processing unit 13 acquires direction sounds in the respective directional directions from the six-channel acoustic signals by signal analysis processing described later, and reproduces the sound signals (hereinafter referred to as “headphone reproduction”) from the acquired direction sounds using the headphones 4. Signal "). In this generation, the signal analysis processing unit 13 sets a virtual speaker in the directivity direction of each directional sound acquired from the 6-channel acoustic signal stored in the storage unit 12. Further, in consideration of the HRTF stored in the storage unit 12, the signal analysis processing unit 13 adds a sound signal corresponding to the direction sound in the direction of each virtual speaker to generate a headphone reproduction signal. The signal analysis processing unit 13 inputs the generated headphone reproduction signal to the acoustic signal output unit 14.

音響信号出力部１４は、信号解析処理部１３から入力されたヘッドホン再生信号にＤＡ変換および増幅を施し、ヘッドホン４に供給する。ヘッドホン４は、音響信号出力部１４から供給されたヘッドホン再生信号に応じた音を再生する。
このように、各方向音に応じて生成したヘッドホン再生信号により音を再生することにより、実際のスピーカをユーザの周囲に配置して各チャンネルの音（方向音）が再生されているようにユーザに提示することができる。 The acoustic signal output unit 14 performs DA conversion and amplification on the headphone reproduction signal input from the signal analysis processing unit 13 and supplies the resultant signal to the headphone 4. The headphone 4 reproduces a sound corresponding to the headphone reproduction signal supplied from the acoustic signal output unit 14.
As described above, the sound is reproduced by the headphone reproduction signal generated in accordance with the sound of each direction, so that the sound (direction sound) of each channel is reproduced by arranging an actual speaker around the user. Can be presented.

（ハードウェア構成）
図１に示す各機能ブロックはプログラムとして、後述するＲＯＭ２２等の記憶部に記憶され、ＣＰＵ２１によって実行される。なお、図１に示す機能ブロックの少なくとも一部をハードウェアにより実現してもよい。ハードウェアにより実現する場合、例えば、所定のコンパイラを用いることで、各ステップを実現するためのプログラムからＦＰＧＡ上に自動的に専用回路を生成すればよい。ＦＰＧＡとは、ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙの略である。また、ＦＰＧＡと同様にしてＧａｔｅＡｒｒａｙ回路を形成し、ハードウェアとして実現するようにしてもよい。また、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）により実現するようにしてもよい。 (Hardware configuration)
Each functional block shown in FIG. 1 is stored as a program in a storage unit such as a ROM 22 described later, and is executed by the CPU 21. Note that at least a part of the functional blocks shown in FIG. 1 may be realized by hardware. When realized by hardware, for example, a dedicated circuit may be automatically generated on the FPGA from a program for realizing each step by using a predetermined compiler. FPGA is an abbreviation for Field Programmable Gate Array. Further, a Gate Array circuit may be formed in the same manner as an FPGA and realized as hardware. Further, it may be realized by an ASIC (Application Specific Integrated Circuit).

図２は、信号処理装置１のハードウェア構成の一例を示している。信号処理装置１は、ＣＰＵ２１、ＲＯＭ２２、ＲＡＭ２３、外部メモリ２４、入力部２５、出力部２６を有する。
ＣＰＵ２１は、入力された信号やプログラムに従って、各種の演算や信号処理装置１を構成する各部分の制御を行う。具体的には、ＣＰＵ２１は、点音源の方向の検出、方向音を取得する際の指向方向の制御、方向音の取得、ヘッドホン再生信号の生成等を行う。前述した図１の機能ブロックは、ＣＰＵ２１によって実行される機能を図示したものである。 FIG. 2 shows an example of the hardware configuration of the signal processing apparatus 1. The signal processing device 1 includes a CPU 21, a ROM 22, a RAM 23, an external memory 24, an input unit 25, and an output unit 26.
The CPU 21 performs various operations and controls each part constituting the signal processing device 1 in accordance with the input signal and program. Specifically, the CPU 21 performs detection of the direction of a point sound source, control of a directivity when acquiring a direction sound, acquisition of a direction sound, generation of a headphone reproduction signal, and the like. The functional blocks in FIG. 1 described above illustrate functions executed by the CPU 21.

ＲＡＭ２３は、一時的なデータを記憶し、ＣＰＵ２１の作業用に使われる。ＲＯＭ２２は、図１に示した各機能部を実行するためのプログラムや、各種の設定情報を記憶する。外部メモリ２４は、例えば、着脱可能なメモリカードであり、ＰＣ（パーソナルコンピュータ）などに装着してデータを読み出すことが可能である。
また、ＲＡＭ２３あるいは外部メモリ２４の所定の領域は記憶部１２として使われる。
入力部２５は、音響信号入力部２から入力した音響信号をＲＡＭ２３あるいは外部メモリ２４の記憶部１２として使われる領域に格納する。出力部２６は、ＣＰＵ２１が生成したヘッドホン再生信号をヘッドホン４に供給する。 The RAM 23 stores temporary data and is used for the work of the CPU 21. The ROM 22 stores a program for executing each functional unit shown in FIG. 1 and various setting information. The external memory 24 is a detachable memory card, for example, and can be loaded into a PC (personal computer) or the like to read data.
A predetermined area of the RAM 23 or the external memory 24 is used as the storage unit 12.
The input unit 25 stores the acoustic signal input from the acoustic signal input unit 2 in an area used as the storage unit 12 of the RAM 23 or the external memory 24. The output unit 26 supplies the headphone reproduction signal generated by the CPU 21 to the headphones 4.

（信号解析処理）
図３のフローチャートは、信号処理装置１が備えるＣＰＵ２１が、ＲＯＭ２２等に記憶されるプログラムを実行することにより処理される。この処理は、例えば、記憶部１２に格納されている音響信号からヘッドホン再生信号を生成し、ヘッドホン４で再生する際に実行される処理であり、ユーザによる開始指示に応じて開始される処理である。
以下、本実施形態の信号解析処理について、図３のフローチャートに沿って説明する。なお、この図３のフローチャートの処理は、特に別記しない限り信号解析処理部１３が行うものとする。
なお、図３の処理を開始する前に、Ｍ個（図１の場合では６個）のマイク素子３ａ〜３ｆで収音したＭチャンネル（図１の構成では６チャンネル）の音響信号が、記憶部１２に格納されているものとする。信号解析処理部１３は、図３の処理により、記憶部１２に格納されているＭ個の音響信号から、Ｄ個の方向音を取得し、Ｄ個の方向音からヘッドホン再生信号を生成する。なお、指向性形成のフィルタ処理におけるフィルタ係数の選択によって任意の指向方向の指向性を形成して方向音の取得を行うことができるため、方向音の数Ｄは、音響信号のチャンネル数Ｍと同じ数としても、異なる数としてもよい。 (Signal analysis processing)
The flowchart of FIG. 3 is processed by the CPU 21 included in the signal processing device 1 executing a program stored in the ROM 22 or the like. This process is, for example, a process executed when a headphone reproduction signal is generated from an acoustic signal stored in the storage unit 12 and reproduced by the headphones 4, and is a process started in response to a start instruction from the user. is there.
Hereinafter, the signal analysis processing of this embodiment will be described along the flowchart of FIG. Note that the processing of the flowchart of FIG. 3 is performed by the signal analysis processing unit 13 unless otherwise specified.
Before starting the processing of FIG. 3, the sound signals of M channels (6 channels in the configuration of FIG. 1) collected by M (six in the case of FIG. 1) microphone elements 3a to 3f are stored. It is assumed that it is stored in the unit 12. The signal analysis processing unit 13 acquires D directional sounds from the M acoustic signals stored in the storage unit 12 and generates a headphone reproduction signal from the D directional sounds by the process of FIG. In addition, since the directivity in an arbitrary directivity direction can be formed by selecting the filter coefficient in the filter processing for directivity formation, and the direction sound can be acquired, the number D of directional sounds is the number of channels M of the acoustic signal. The number may be the same or different.

（点音源の方向の検出）
Ｓ１では、信号解析処理部１３は、記憶部１２が保持しているＭチャンネルの音響信号を取得し、チャンネル毎にフーリエ変換することで周波数領域のデータ（フーリエ係数）であるｚ（ｆ）を得る。ここで、各周波数のｚ（ｆ）はＭ個の要素を持つベクトルである。
Ｓ２では、信号解析処理部１３は、Ｓ３で音響信号から点音源の方向を検出するために、点音源方向に感度のピークを形成する空間スペクトルＰ（ｆ，θ）を算出する。この算出において、信号解析処理部１３は、音響信号の空間的性質を表す統計量である式（１）の空間相関行列Ｒ（ｆ）と、各方向（方位角θ）の音源と各マイク素子３ａ〜３ｆの間の伝達関数であるアレイ・マニフォールド・ベクトルａ（ｆ，θ）とを用いる。
Ｒ（ｆ）＝Ｅ［ｚ（ｆ）ｚ^Ｈ（ｆ）］（１）
ここで、Ｅは期待値を表し、上付きのＨは複素共役転置を表す。また、ａ（ｆ，θ）は周波数領域のデータ（フーリエ係数）であり、Ｍ個の要素で構成される。
例えば、最小分散法に基づく空間スペクトルＰ_ＭＶ（ｆ，θ）は、式（２）で得られる。 (Detection of the direction of a point sound source)
In S <b> 1, the signal analysis processing unit 13 acquires the M channel acoustic signal held in the storage unit 12 and performs Fourier transform for each channel to obtain z (f) that is frequency domain data (Fourier coefficient). obtain. Here, z (f) of each frequency is a vector having M elements.
In S2, the signal analysis processing unit 13 calculates a spatial spectrum P (f, θ) that forms a peak of sensitivity in the point sound source direction in order to detect the direction of the point sound source from the acoustic signal in S3. In this calculation, the signal analysis processing unit 13 includes a spatial correlation matrix R (f) of Equation (1), which is a statistic representing the spatial properties of the acoustic signal, a sound source in each direction (azimuth angle θ), and each microphone element. An array manifold vector a (f, θ) which is a transfer function between 3a to 3f is used.
R (f) = E [z (f) z ^H (f)] (1)
Here, E represents the expected value, and the superscript H represents the complex conjugate transpose. Further, a (f, θ) is frequency domain data (Fourier coefficient), and is composed of M elements.
For example, the spatial spectrum P _MV (f, θ) based on the minimum variance method is obtained by the equation (2).

また、空間相関行列Ｒ（ｆ）のＭ個の固有ベクトルのうち、雑音部分空間に対応するものを並べた行列をＥ_ｎと置く。このとき、信号部分空間に属するアレイ・マニフォールド・ベクトルａ（ｆ，θ）との直交性を考えれば、ＭＵＳＩＣ（ＭｕｌｔｉｐｌｅＳｉｇｎａｌＣｌａｓｓｉｆｉｃａｔｉｏｎ）法の空間スペクトルＰ_ＭＵ（ｆ，θ）が式（３）で得られる。 In addition, a matrix in which elements corresponding to the noise subspace among M eigenvectors of the spatial correlation matrix R (f) are arranged is _denoted by En. At this time, considering the orthogonality with the array manifold vector a (f, θ) belonging to the signal subspace, the spatial spectrum P _MU (f, θ) of the MUSIC (Multiple Signal Classification) method is expressed by Equation (3). can get.

ａ（ｆ，θ）のθを、例えば−１８０°から１８０°まで１°刻みで変えながら、Ｐ（ｆ，θ）＝Ｐ_ＭＶ（ｆ，θ）［式（２）］やＰ（ｆ，θ）＝Ｐ_ＭＵ（ｆ，θ）［式（３）］のように計算することで、水平全方位の空間スペクトルが得られる。なお、音響信号に対応する音の収音に用いたマイクアレイ３の構造によっては、自由空間や剛球等の理論式により、任意の解像度でアレイ・マニフォールド・ベクトルａ（ｆ，θ）を算出できる。
Ｓ３では、信号解析処理部１３は、Ｓ２で算出した空間スペクトルをもとに、音響信号から点音源の方向を検出する。具体的には、信号解析処理部１３は、周波数毎の空間スペクトルＰ（ｆ，θ）を、例えばｆ_ｍｉｎ〜ｆ_ｍａｘの範囲で平均化して平均空間スペクトルＰ_ｍｅａｎ（θ）を算出する。さらに、信号解析処理部１３は、平均空間スペクトルＰ_ｍｅａｎ（θ）がピーク（極大値）となる方向を検出して点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］とする。ここで、ｆ_ｍｉｎ、ｆ_ｍａｘは点音源の方向の検出の対象とする下限および上限周波数であり、Ｑは検出された点音源の数である。 P (f, θ) = P _MV (f, θ) [Expression (2)] or P (f, θ) while changing θ of a (f, θ), for example, in increments of 1 ° from −180 ° to 180 °. By calculating as θ) = P _MU (f, θ) [Expression (3)], a spatial spectrum in all horizontal directions can be obtained. Depending on the structure of the microphone array 3 used to collect sound corresponding to the acoustic signal, the array manifold vector a (f, θ) can be calculated at an arbitrary resolution by a theoretical formula such as free space or a hard sphere. .
In S3, the signal analysis processing unit 13 detects the direction of the point sound source from the acoustic signal based on the spatial spectrum calculated in S2. Specifically, the signal analysis processing unit 13 calculates the average spatial spectrum P _mean (θ) by averaging the spatial spectrum P (f, θ) for each frequency in a range of, for example, f _{min to} f _max . Further, the signal analysis processing unit 13 detects the direction in which the average spatial spectrum P _mean (θ) is a peak (maximum value) and sets it as the point sound source direction θ _sq [q = 1 to Q]. Here, f _min and f _max are lower and upper limit frequencies to be detected in the direction of the point sound source, and Q is the number of detected point sound sources.

（指向方向の最適化）
Ｓ４〜Ｓ１２では、信号解析処理部１３は、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立するために、取得する方向音の指向方向を最適化する。
信号解析処理部１３は、複数の指向方向夫々に対応する方向音の指向性を示すビームパターンのゲインの値であって、検出された点音源の方向におけるゲインの値が最大であるものとそれ以外のものとの比を所定値より大きくする。また、信号解析処理部１３は、複数の指向方向に対応する指向性のビームパターンを合成した合成ビームパターンのゲインの偏差を所定値より小さくする。信号解析処理部１３は、これら２つの条件を満たすように、各指向方向を制御する。 (Optimization of directivity)
In S4 to S12, the signal analysis processing unit 13 optimizes the directivity direction of the directional sound to be acquired in order to achieve both the reproduction of a clear point sound source sound and the reproduction of an atmospheric sound with no unevenness in all directions.
The signal analysis processing unit 13 is a beam pattern gain value indicating the directivity of the directional sound corresponding to each of the plurality of directivity directions, and the gain value in the direction of the detected point sound source is the maximum. The ratio with the other is made larger than a predetermined value. In addition, the signal analysis processing unit 13 makes the deviation of the gain of the combined beam pattern obtained by combining the directional beam patterns corresponding to the plurality of directivity directions smaller than a predetermined value. The signal analysis processing unit 13 controls each directivity direction so as to satisfy these two conditions.

全方位でムラの無い雰囲気音の再生のみを目的にするのであれば、例えば図４に示すように、各方向音の指向方向３１〜３６を、全円周（−１８０°〜１８０°）で均等に配置すればよい。図４では、方向音の指向方向の数Ｄが図１のマイク素子３ａ〜３ｆの数Ｍと同じ６である場合を示している。なお、マイク素子の数Ｍが２以上であれば、指向性を形成するフィルタ処理におけるフィルタ係数の選択によって任意の指向方向の指向性を形成できるため、指向方向の数Ｄはマイク素子の数Ｍと異なっていてもよい。
また、図４において、最外周の実線の円およびこれと中心を同じくする円は、その半径が、ビームパターンの相対的なゲインに対応している。これら円の円周方向の位置は、マイクアレイ３の所定の基準方向（０°）からの方位に対応している。太い１点鎖線の直線３１〜３６は、各方向音の指向方向（メインローブ方向）を示している。太い破線の円３１ａ〜３６ａは、各指向方向３１〜３６に対応する方向音の指向性を示すビームパターンを示している。
このように、各方向音の指向方向３１〜３６を均等に配置することにより、各指向方向の方向音のビームパターン３１ａ〜３６ａを合成した合成ビームパターン３７が略円形となるため、全方位でムラの無い雰囲気音の再生を実現できる。 If it is only intended to reproduce the atmosphere sound without any unevenness in all directions, for example, as shown in FIG. 4, the directivity directions 31 to 36 of the sound in each direction are all around (−180 ° to 180 °). What is necessary is just to arrange | position equally. FIG. 4 shows a case where the number D of directivity directions of directional sounds is 6, which is the same as the number M of the microphone elements 3a to 3f in FIG. If the number M of microphone elements is two or more, the directivity in an arbitrary directivity direction can be formed by selecting a filter coefficient in the filter processing for forming directivity. Therefore, the number D of directivity directions is equal to the number M of microphone elements. And may be different.
In FIG. 4, the radius of the outermost solid circle and the circle having the same center as the circle correspond to the relative gain of the beam pattern. The circumferential positions of these circles correspond to the orientation of the microphone array 3 from a predetermined reference direction (0 °). Thick one-dot chain lines 31 to 36 indicate the directing direction (main lobe direction) of each direction sound. Thick broken circles 31a to 36a indicate beam patterns indicating the directivity of directional sounds corresponding to the directivity directions 31 to 36, respectively.
In this way, by arranging the directivity directions 31 to 36 of each directional sound evenly, the combined beam pattern 37 obtained by synthesizing the directional sound beam patterns 31a to 36a in each directional direction becomes a substantially circular shape. Reproduction of atmosphere sound without unevenness can be realized.

ところで、ある指向性の指向方向（メインローブ方向）が点音源の方向を向いていれば、その指向性に対応する方向音として捉えられる音のエネルギーは、それ以外の指向性に対応する方向音として捉えられる音のエネルギーに比べてかなり大きくなる。このような状態で取得された方向音からヘッドホン再生信号を生成すれば、点音源の音は、点音源の方向に配置された仮想スピーカから主として再生される。このため、点音源の方向に配置された仮想スピーカから再生される点音源の音は、その他の方向に配置された仮想スピーカから再生される当該点音源の音より音量が大きくなり、点音源の音が明瞭に再生される。 By the way, if the directivity direction (main lobe direction) of a certain directivity is in the direction of the point sound source, the energy of the sound captured as the directional sound corresponding to that directivity is the directional sound corresponding to the other directivity. It is considerably larger than the energy of sound that can be perceived as. If the headphone reproduction signal is generated from the direction sound acquired in such a state, the sound of the point sound source is mainly reproduced from the virtual speaker arranged in the direction of the point sound source. For this reason, the sound of the point sound source reproduced from the virtual speaker arranged in the direction of the point sound source is louder than the sound of the point sound source reproduced from the virtual speaker arranged in the other direction. Sound is reproduced clearly.

これに対し、指向性の指向方向が点音源の方向を向いておらず、隣接する複数の指向性の指向方向の間に点音源があると、複数の指向性に対応する方向音として捉えられる音のエネルギーにはあまり差が生じない。
ここで、図４の点音源方向３０ａ、３０ｂに点音源がある場合を想定する。この場合、例えばビームパターン３２ａとビームパターン３３ａの点音源方向３０ａにおける値の比が小さい。このため、点音源方向３０ａに対応する点音源の音については、ビームパターン３２ａに対応する指向方向の方向音のエネルギー４２と、ビームパターン３３ａに対応する指向方向の方向音のエネルギー４３にはあまり差がない。このような状態で取得された方向音に応じてヘッドホン再生信号を生成すると、指向方向３２に配置される仮想スピーカの音量と、指向方向３３に配置される仮想スピーカの音量の差が小さくなってしまう。 On the other hand, if the directivity direction does not face the direction of the point sound source, and there is a point sound source between the adjacent directivity directions, it can be captured as a direction sound corresponding to the multiple directivities. There is not much difference in sound energy.
Here, it is assumed that there is a point sound source in the point sound source directions 30a and 30b in FIG. In this case, for example, the ratio of the values of the beam pattern 32a and the beam pattern 33a in the point sound source direction 30a is small. For this reason, the sound of the point sound source corresponding to the point sound source direction 30a is not so much in the direction sound energy 42 in the pointing direction corresponding to the beam pattern 32a and the direction sound energy 43 in the pointing direction corresponding to the beam pattern 33a. There is no difference. When the headphone reproduction signal is generated according to the directional sound acquired in such a state, the difference between the volume of the virtual speaker arranged in the directing direction 32 and the volume of the virtual speaker arranged in the directing direction 33 becomes small. End up.

このため、ヘッドホン４で再生される音についても、指向方向３２と指向方向３３とで音量の差が小さい状態となり、点音源の音の再生が不明瞭になる。
このため、本実施形態の信号処理システムでは、信号解析処理部１３からの制御により、点音源方向３０ａ、３０ｂにそれぞれ最も近い指向方向３２、３５を向け、例えば図５のように、指向方向３２’、３５’とする。これにより、例えばビームパターン３２ａ’とビームパターン３３ａの点音源方向３０ａにおけるゲインの値とそれ以外のビームパターンのゲインの値の比が図４の場合より大きくなる。すなわち、点音源方向３０ａにおいては、ゲインの値が最大のもの（ビームパターン３２ａ’）とそれ以外のもの（ビームパターン）との比が所定値より大きくなる。このため、点音源方向３０ａの点音源については、ビームパターン３２ａ’に対応する指向方向の方向音のエネルギー４２’と、ビームパターン３３ａに対応する指向方向の方向音のエネルギー４３の差が図４の場合より大きくなる。このような状態で取得された方向音に応じてヘッドホン再生信号を生成し、ヘッドホン４で音を再生すると、点音源の音の再生が、図４に示す状態で取得された方向音に応じてヘッドホン再生信号を生成した場合より明瞭になる。 For this reason, the sound reproduced by the headphones 4 also has a small volume difference between the directivity direction 32 and the directivity direction 33, and the sound reproduction of the point sound source becomes unclear.
For this reason, in the signal processing system of the present embodiment, the directivity directions 32 and 35 closest to the point sound source directions 30a and 30b are directed by the control from the signal analysis processing unit 13, respectively, for example, as shown in FIG. ', 35'. Thereby, for example, the ratio of the gain value of the beam pattern 32a ′ and the beam pattern 33a in the point sound source direction 30a and the gain value of the other beam pattern becomes larger than that in the case of FIG. That is, in the point sound source direction 30a, the ratio between the maximum gain value (beam pattern 32a ′) and the other gain value (beam pattern) is larger than a predetermined value. Therefore, for the point sound source in the point sound source direction 30a, the difference between the direction sound energy 42 'in the directivity direction corresponding to the beam pattern 32a' and the direction sound energy 43 in the directivity direction corresponding to the beam pattern 33a is shown in FIG. It becomes bigger than the case. When a headphone reproduction signal is generated according to the direction sound acquired in such a state and the sound is reproduced by the headphone 4, the sound of the point sound source is reproduced according to the direction sound acquired in the state shown in FIG. It becomes clearer than when the headphone playback signal is generated.

しかしながら、このような指向方向の配置変更を行った場合には、合成ビームパターン３７’に、膨らみ５１、５３や凹み５２、５４といった乱れが生じてしまっている。つまり、合成ビームパターン３７’は、略円形ではなくなる。このため、全ての方向音に基づいて再生される雰囲気音にもムラが生じている。
また、合成ビームパターン３７’に乱れがある場合、雰囲気音のムラの他に、次のような問題が生じる。例えば点音源方向３０ａに対応する点音源については、ビームパターン３２ａ’に対応する指向方向の方向音として取得した音が、点音源方向３０ａと同じ指向方向３２’に配置される仮想スピーカから再生される。これに加えて、ビームパターン３３ａに対応する指向方向の方向音として取得した音が指向方向３３に配置される仮想スピーカから再生される。この場合、合成ビームパターン３７’の膨らみ５１があるため、点音源の方向が合成ビームパターン３７’の膨らみ５１の方にずれて知覚されてしまう可能性がある。
このため、本実施形態の信号処理システムでは、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立するために、信号解析処理部１３が、式（４）のような最適化問題の解となるように、方向音を取得する指向方向θ_ｄ［ｄ＝１〜Ｄ］を最適化する。 However, when such an arrangement change in the directivity direction is performed, disturbances such as bulges 51 and 53 and dents 52 and 54 occur in the combined beam pattern 37 ′. That is, the combined beam pattern 37 ′ is not substantially circular. For this reason, the atmosphere sound reproduced based on all directional sounds is also uneven.
In addition, when the synthesized beam pattern 37 ′ is disturbed, the following problem occurs in addition to the unevenness of the atmospheric sound. For example, for the point sound source corresponding to the point sound source direction 30a, the sound acquired as the direction sound in the directivity direction corresponding to the beam pattern 32a ′ is reproduced from the virtual speaker arranged in the same directivity direction 32 ′ as the point sound source direction 30a. The In addition to this, the sound acquired as the direction sound in the directivity direction corresponding to the beam pattern 33 a is reproduced from the virtual speaker arranged in the directivity direction 33. In this case, since there is a bulge 51 of the composite beam pattern 37 ′, the direction of the point sound source may be perceived as being shifted toward the bulge 51 of the composite beam pattern 37 ′.
For this reason, in the signal processing system of this embodiment, in order to achieve both the reproduction of a clear point sound source sound and the reproduction of an atmospheric sound with no unevenness in all directions, the signal analysis processing unit 13 is represented by Equation (4). The directivity direction θ _d [d = 1 to D] for obtaining the directional sound is optimized so as to solve the optimization problem.

ここで、σ_ｂｓｕｍ（θ_ｄ）は合成ビームパターンの乱れの目安である標準偏差である。また、点音源のインデックスｑの関数であるｄｍｉｎ（ｑ）は、式（５）のように点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］に最も近い指向方向を示すインデックスである。

Here, σ _bsum (θ _d ) is a standard deviation that is a measure of the disturbance of the combined beam pattern. Further, dmin (q), which is a function of the point sound source index q, is an index indicating the directivity direction closest to the point sound source direction θ _sq [q = 1 to Q] as shown in Equation (5).

式（４）は、Ｄ個の指向方向θ_ｄ［ｄ＝１〜Ｄ］のうちＱ個の指向方向θ_{ｄｍｉｎ（ｑ）}［ｑ＝１〜Ｑ］を点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］に向けるという制約条件の下、合成ビームパターンの乱れを最小化し、指向方向を最適化することを意味している。合成ビームパターンの乱れは、合成ビームパターンのゲインの偏差、例えばゲインの標準偏差σ_ｂｓｕｍ（θ_ｄ）により評価する。すなわち、この最適化問題では、各指向方向が最適化変数となり、標準偏差σ_ｂｓｕｍ（θ_ｄ）が評価関数となる。信号解析処理部１３は、このように定義された指向方向を最適化変数とする最適化問題を解き、その解を各指向方向として決定する。 Equation (4) is obtained by changing Q directivity directions θ _{dmin (q)} [q = 1 to Q] out of D directivity directions θ _d [d = 1 to D] as point sound source directions θ _sq [q = 1 to 1]. Q] means to minimize the disturbance of the combined beam pattern and to optimize the directivity direction under the constraint condition. The disturbance of the combined beam pattern is evaluated based on a gain deviation of the combined beam pattern, for example, a standard deviation σ _bsum (θ _d ) of the gain. That is, in this optimization problem, each directivity direction becomes an optimization variable, and the standard deviation σ _bsum (θ _d ) becomes an evaluation function. The signal analysis processing unit 13 solves an optimization problem using the thus defined pointing direction as an optimization variable, and determines the solution as each pointing direction.

以上を踏まえた上で、指向方向の最適化を行うＳ４〜Ｓ１２の処理を説明する。なお、指向方向の最適化は、全周波数のうち少なくとも代表周波数（例えば１ｋＨｚ）について考えればよい。以下のＳ４〜Ｓ１２の説明における周波数のインデックスｆは、このような代表周波数を表しているものとする。なお、代表周波数は、例えば音響信号中の強度が高い帯域の中心周波数等としてもよい。 Based on the above, the processing of S4 to S12 for optimizing the pointing direction will be described. Note that the optimization of the directivity direction may be considered for at least the representative frequency (for example, 1 kHz) among all frequencies. The frequency index f in the description of S4 to S12 below represents such a representative frequency. The representative frequency may be, for example, the center frequency of a band with high intensity in the acoustic signal.

Ｓ４では、信号解析処理部１３は、各指向性の指向方向θ_ｄ［ｄ＝１〜Ｄ］を初期化する。まず、複数の指向性で水平全方位をカバーするため、音響を収音したマイクアレイの座標系における正面０°を基準方向として、図４のように指向方向数Ｄ（＝６）で各指向性の指向方向３１〜３６を均等配置する。すなわち、指向方向３１のθ_１＝０°、指向方向３２のθ_２＝６０°、指向方向３３のθ_３＝１２０°、指向方向３４のθ_４＝１８０°、指向方向３５のθ_５＝−１２０°、指向方向３６のθ_６＝−６０°となる。なお、指向方向数Ｄが少ないと均等配置でも合成ビームパターンに凹みが生じるため、少なくとも略円形になり始めるくらいのＤ（例えば図４に示す場合では６程度）を用いるのが好適である。 In S4, the signal analysis processing unit 13 initializes the directivity direction θ _d [d = 1 to D] of each directivity. First, in order to cover all horizontal directions with a plurality of directivities, each directivity is indicated by the number of directivity directions D (= 6) as shown in FIG. 4 with reference to 0 ° in the coordinate system of the microphone array that picks up the sound. The directivity directions 31 to 36 of the sex are evenly arranged. That is, θ ₁ = 0 ° in the pointing direction 31, θ ₂ = 60 ° in the pointing direction 32, θ ₃ = 120 ° in the pointing direction 33, θ ₄ = 180 ° in the pointing direction 34, θ _{5 in the} pointing direction 35 = −. 120 ° and θ _{6 in the} directivity direction 36 = −60 °. Note that if the number of directivity directions D is small, the composite beam pattern will be recessed evenly arranged, so it is preferable to use at least D that starts to be substantially circular (for example, about 6 in the case shown in FIG. 4).

ここで、Ｓ３で例えばＱ＝２個の点音源が検出され、点音源方向３０ａのθ_ｓ１＝８５°、点音源方向３０ｂのθ_ｓ２＝−１４８°であったとする。この場合、上記均等配置の指向方向θ_ｄ［ｄ＝１〜Ｄ］のいずれも点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］を向いていないため、式（４）の制約条件を満たしていない。そこで、信号解析処理部１３は、点音源方向３０ａ、３０ｂにそれぞれ最も近い指向方向３２、３５を点音源方向に向け、図５のように指向方向３２’、３５’とすることで、式（４）の制約条件を満たすように指向方向θ_ｄ［ｄ＝１〜Ｄ］を初期化する。これにより、指向方向３１のθ_１＝０°、指向方向３２’のθ_２＝８５°、指向方向３３のθ_３＝１２０°、指向方向３４のθ_４＝１８０°、指向方向３５’のθ_５＝−１４８°、指向方向３６のθ_６＝−６０°となる。 Here, it is assumed that, for example, Q = 2 point sound sources are detected in S3, and θ _s1 = 85 ° in the point sound source direction 30a and θ _s2 = −148 ° in the point sound source direction 30b. In this case, none of the evenly directed directivity directions θ _d [d = 1 to D] is directed to the point sound source direction θ _sq [q = 1 to Q], and thus does not satisfy the constraint condition of Expression (4). . Therefore, the signal analysis processing unit 13 directs the directivity directions 32 and 35 closest to the point sound source directions 30a and 30b to the point sound source direction, and sets the directivity directions 32 ′ and 35 ′ as shown in FIG. The directivity direction θ _d [d = 1 to D] is initialized so as to satisfy the constraint condition 4). Accordingly, θ ₁ = 0 ° in the pointing direction 31, θ ₂ = 85 ° in the pointing direction 32 ′, θ ₃ = 120 ° in the pointing direction 33, θ ₄ = 180 ° in the pointing direction 34, θ in the pointing direction 35 ′. ₅ = −148 ° and θ _{6 in the} directivity direction 36 = −60 °.

これに続くＳ５〜Ｓ１１は、反復的な最適化計算に係る処理である。信号解析処理部１３は、最適化ループの中で、Ｓ５〜Ｓ１１の処理を繰り返し実行する。また、Ｓ５〜Ｓ６は、Ｓ４で指向方向を初期化した指向性毎の処理である。信号解析処理部１３は、指向性ループの中で、Ｓ５〜Ｓ６の処理を繰り返し実行する。
Ｓ５では、信号解析処理部１３は、現在の指向性ループで対象としている指向性を形成するためのフィルタ係数を取得する。ここでは、記憶部１２に格納されているフィルタ係数から、指向方向θ_ｄに対応するフィルタ係数ｗ_ｄ（ｆ）を取得する。ここで、フィルタ係数ｗ_ｄ（ｆ）は、周波数領域のベクトルデータ（フーリエ係数）であり、Ｍ個の要素で構成される。なお、マイクアレイ３の構成が異なるとフィルタ係数も異なるため、収音に用いたマイクアレイ３の種別を示す種別ＩＤを音響信号の付加情報として記録しておいてもよい。この場合は、信号解析処理部１３が、種別ＩＤに対応するマイクアレイ３のフィルタ係数を記憶部１２から取得し、本ステップの処理で用いるようにしてもよい。 Subsequent S5 to S11 are processes related to iterative optimization calculation. The signal analysis processing unit 13 repeatedly executes the processes of S5 to S11 in the optimization loop. S5 to S6 are processing for each directivity in which the directivity direction is initialized in S4. The signal analysis processing unit 13 repeatedly executes the processes of S5 to S6 in the directivity loop.
In S5, the signal analysis processing unit 13 acquires a filter coefficient for forming the directivity targeted in the current directivity loop. Here, the filter coefficient w _d (f) corresponding to the directivity direction θ _d is acquired from the filter coefficient stored in the storage unit 12. Here, the filter coefficient w _d (f) is vector data (Fourier coefficient) in the frequency domain, and is composed of M elements. In addition, since the filter coefficients differ when the configuration of the microphone array 3 is different, a type ID indicating the type of the microphone array 3 used for sound collection may be recorded as additional information of the acoustic signal. In this case, the signal analysis processing unit 13 may acquire the filter coefficient of the microphone array 3 corresponding to the type ID from the storage unit 12 and use it in the processing of this step.

指向性形成のフィルタ係数の算出には、アレイ・マニフォールド・ベクトルａ（ｆ，θ）が一般に用いられる。指向方向θ_ｄに指向性のメインローブを形成する方法として、例えば遅延和法ならθ_ｄ方向のアレイ・マニフォールド・ベクトルａ_ｄ（ｆ）を用いて、ｗ_ｄ（ｆ）＝ａ_ｄ（ｆ）／（ａ_ｄ ^Ｈ（ｆ）ａ_ｄ（ｆ））のようにフィルタ係数が得られる。
Ｓ６では、信号解析処理部１３は、Ｓ５で取得した指向性形成のフィルタ係数ｗ_ｄ（ｆ）と、アレイ・マニフォールド・ベクトルａ（ｆ，θ）とを用いて指向性のビームパターンを算出する。ビームパターンの方位角θ方向の値ｂ_ｄ（ｆ，θ）は、式（６）で得られる。
ｂ_ｄ（ｆ，θ）＝ｗ_ｄ ^Ｈ（ｆ）ａ（ｆ，θ）（６） The array manifold vector a (f, θ) is generally used for calculating the directivity forming filter coefficient. As a method of forming a directional main lobe in the directivity direction θ _d , for example, in the case of the delay sum method, the array manifold vector a _d (f) in the θ _d direction is used, and w _d (f) = _ad (f) filter coefficients are obtained as _{^{_{/ (a d H (f)}}} a d (f)).
In S6, the signal analysis processing unit 13 calculates a directivity beam pattern using the directivity forming filter coefficient w _d (f) acquired in S5 and the array manifold vector a (f, θ). . The value b _d (f, θ) in the azimuth angle θ direction of the beam pattern is obtained by Expression (6).
b _d (f, θ) = w _d ^H (f) a (f, θ) (6)

アレイ・マニフォールド・ベクトルａ（ｆ，θ）のθを、例えば−１８０°から１８０°まで１°刻みで変えながらｂ_ｄ（ｆ，θ）を計算することで、水平全方位のビームパターンが得られる。なお、円状等間隔マイクアレイ等のように、マイク素子が等方的に配置されている場合は、指向方向が正面０°の場合のビームパターンｂ_１（ｆ，θ）を順次回転させることで、他の指向性のビームパターンｂ_ｄ（ｆ，θ）［ｄ＝２〜］を得ることもできる。
Ｓ７では、信号解析処理部１３は、Ｓ６で算出した各指向性のビームパターンｂ_ｄ（ｆ，θ）［ｄ＝１〜Ｄ］を合成することで、式（７）のように合成ビームパターンｂ_ｓｕｍ（ｆ，θ）を算出する。 By calculating b _d (f, θ) while changing θ of the array manifold vector a (f, θ) in increments of 1 ° from −180 ° to 180 °, for example, a beam pattern in all horizontal directions is obtained. It is done. When the microphone elements are isotropically arranged, such as a circular equidistant microphone array, the beam pattern b ₁ (f, θ) when the directing direction is 0 ° in front is sequentially rotated. Thus, it is possible to obtain beam patterns b _d (f, θ) [d = 2 to] having other directivities.
In S7, the signal analysis processing unit 13 synthesizes the beam patterns b _d (f, θ) [d = 1 to D] of the directivities calculated in S6, so that the combined beam pattern is expressed by Expression (7). b _sum (f, θ) is calculated.

Ｓ８では、信号解析処理部１３は、合成ビームパターンｂ_ｓｕｍ（ｆ，θ）を、例えばデシベル［ｄＢ］表示に変換して標準偏差σ_ｂｓｕｍ（θ_ｄ）を算出し、式（４）の最適化問題の評価関数とする。ここで、標準偏差は指向方向θ_ｄ［ｄ＝１〜Ｄ］の関数となるためσ_ｂｓｕｍ（θ_ｄ）と表記し、周波数のインデックスｆは省略している。
Ｓ９では、信号解析処理部１３は、最適化ループにおける最適化が収束したかを判定し、収束した場合はＳ１２へ進み、収束していない場合はＳ１０へ進む。収束したか否かの判定は、例えば評価関数値［式（４）の場合では標準偏差σ_ｂｓｕｍ（θ_ｄ）］の前の最適化ループの実行時の値に対する減少量が所定値未満となったか否かの判定で行う。あるいは、最適化変数である指向方向θ_ｄ［ｄ＝１〜Ｄ］について、前の最適化ループで求めた値との差が所定値未満となったか否かで収束の判定を行ってもよい。あるいは、現在の最適化ループの評価関数値が所定値未満となった時点で収束したと判定してもよい。この場合では、評価関数として標準偏差を用いているため、収束するまで最適化ループの処理を実行することにより、合成ビームパターンの標準偏差を所定値より小さくするように指向方向が制御される。 In S8, the signal analysis processing unit 13 converts the combined beam pattern b _sum (f, θ) into, for example, decibel [dB] display to calculate the standard deviation σ _bsum (θ _d ), and optimizes the expression (4). An evaluation function for the optimization problem. Here, since the standard deviation is a function of the directivity direction θ _d [d = 1 to D], it is expressed as σ _bsum (θ _d ), and the frequency index f is omitted.
In S9, the signal analysis processing unit 13 determines whether or not the optimization in the optimization loop has converged. If it has converged, the process proceeds to S12, and if it has not converged, the process proceeds to S10. The determination as to whether or not it has converged is, for example, the amount of decrease with respect to the value at the time of execution of the optimization loop before the evaluation function value [standard deviation σ _bsum (θ _d ) in the case of equation (4)] is less than a predetermined value. This is done by determining whether or not. Alternatively, the convergence may be determined depending on whether the difference between the pointing direction θ _d [d = 1 to D], which is an optimization variable, and the value obtained in the previous optimization loop is less than a predetermined value. . Or you may determine with having converged when the evaluation function value of the present optimization loop becomes less than predetermined value. In this case, since the standard deviation is used as the evaluation function, the pointing direction is controlled so as to make the standard deviation of the combined beam pattern smaller than a predetermined value by executing the process of the optimization loop until convergence.

Ｓ１０では、信号解析処理部１３は、最適化ループにおける指向方向θ_ｄ［ｄ＝１〜Ｄ］の更新回数が所定の上限値に達したかを判定し、達した場合はＳ１２へ進み、達していない場合はＳ１１へ進む。
Ｓ１１では、信号解析処理部１３は、指向性の指向方向を更新する。すなわち、式（４）の制約条件に基づきＱ個の指向方向θ_{ｄｍｉｎ（ｑ）}［ｑ＝１〜Ｑ］を点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］に固定（拘束）した状態で、合成ビームパターンの標準偏差σ_ｂｓｕｍ（θ_ｄ）が小さくなる方向へ（Ｄ−Ｑ）個の指向方向を更新する。なお、式（４）のように最適化問題として数式で定義（以下、「定式化」と称する。）すれば、最適化変数である指向方向の更新には、種々の公知な最適化アルゴリズムを適用することができる。あるいは、最適化アルゴリズムの代わりに全探索やランダム探索によって指向方向の更新を行ってもよい。 In S10, the signal analysis processing unit 13 determines whether or not the number of updates of the directivity direction θ _d [d = 1 to D] in the optimization loop has reached a predetermined upper limit value. If reached, the process proceeds to S12. If not, the process proceeds to S11.
In S11, the signal analysis processing unit 13 updates the directivity direction. That is, in the state where Q directivity directions θ _{dmin (q)} [q = 1 to Q] are fixed (constrained) to the point sound source directions θ _sq [q = 1 to Q] based on the constraint condition of Expression (4), (DQ) number of directivity directions are updated in a direction in which the standard deviation σ _bsum (θ _d ) of the combined beam pattern decreases. If an optimization problem is defined by a mathematical expression as in Expression (4) (hereinafter referred to as “formulation”), various known optimization algorithms can be used for updating the pointing direction as an optimization variable. Can be applied. Alternatively, the pointing direction may be updated by full search or random search instead of the optimization algorithm.

最適化が収束するか最適化ループにおける更新回数が上限値に達すると、信号解析処理部１３は、Ｓ１２において、最適化で評価関数（この場合では、標準偏差σ_ｂｓｕｍ（θ_ｄ））の値が最小となったときの指向方向θ_ｄ［ｄ＝１〜Ｄ］を選択する。これにより、方向音を取得する際の指向方向が最適化される。すなわち、式（４）の制約条件の下、複数の指向性のビームパターンを合成した合成ビームパターンのゲインの偏差が最小化される。
例えば図５のように初期化した指向方向を最適化した結果を、図６に示す。ここで、図５の指向方向３１、３３、３４、３６がそれぞれ、図６の指向方向３１’、３３’、３４’、３６’に最適化されている。すなわち、指向方向３１’のθ_１＝−３１．５°［−３１．５°］、指向方向３３’のθ_３＝２６．９°［−９３．１°］、指向方向３４’のθ_４＝１４８．６°［−３１．４°］、指向方向３６’のθ_６＝−８９．９°［−２９．９°］である。なお、カギ括弧［］内の数値は指向方向の初期値からの更新量であり、図６では矢印で模式的に表現されている。 When the optimization converges or the number of updates in the optimization loop reaches the upper limit value, the signal analysis processing unit 13 optimizes the evaluation function (in this case, the standard deviation σ _bsum (θ _d )) in _S12. The directivity direction θ _d [d = 1 to D] is selected. Thereby, the directivity direction at the time of acquiring a direction sound is optimized. That is, the deviation of the gain of the combined beam pattern obtained by combining a plurality of directional beam patterns is minimized under the constraint condition of Expression (4).
For example, FIG. 6 shows the result of optimizing the pointing direction initialized as shown in FIG. Here, the directivity directions 31, 33, 34, and 36 in FIG. 5 are optimized to the directivity directions 31 ′, 33 ′, 34 ′, and 36 ′ in FIG. 6, respectively. That is, θ ₁ in the pointing direction 31 ′ = − 31.5 ° [−31.5 °], θ _{3 in the} pointing direction 33 ′ = 26.9 ° [−93.1 °], and θ _{4 in the} pointing direction 34 ′. = 148.6 ° [-31.4 °], θ _{6 in} the pointing direction 36 ′ = − 89.9 ° [−29.9 °]. Note that the numerical value in the brackets [] is the update amount from the initial value in the directivity direction, and is schematically represented by an arrow in FIG.

例えば点音源方向３０ａについては、指向性のビームパターンのゲインの値が最も大きいビームパターン３２ａ’のゲインの値と他のビームパターン３３ａ’、３４ａ’のゲインの値との比が図５の場合より大きくなっている。点音源方向３０ａの点音源については、ビームパターン３２ａ’に対応する指向方向の方向音のエネルギー４２’が、ビームパターン３３ａ’、３４ａ’に対応する指向方向の方向音のエネルギー４３’、４４’に比べてかなり大きくなる。すなわち、点音源の方向における指向性のビームパターンのゲインの値が最大のものとそれ以外のものとの比が所定値より大きく設定される。なお、この比を最大化するようにしてもよい。 For example, with respect to the point sound source direction 30a, the ratio of the gain value of the beam pattern 32a ′ having the largest directivity beam pattern gain value to the gain values of the other beam patterns 33a ′ and 34a ′ is shown in FIG. It is getting bigger. For the point sound source in the point sound source direction 30a, the directional sound energy 42 'in the directional direction corresponding to the beam pattern 32a' is the directional sound energy 43 ', 44' in the directional direction corresponding to the beam patterns 33a 'and 34a'. It will be considerably larger than In other words, the ratio of the gain value of the directivity beam pattern in the direction of the point sound source to the maximum value is set larger than the predetermined value. Note that this ratio may be maximized.

また、点音源方向３０ｂについても、同様に、指向方向の最適化を行う。これにより、検出された複数の点音源の方向の夫々について、指向性のビームパターンのゲインの値であって、点音源の方向におけるゲインの値が最大のものとそれ以外のものの比が所定値より大きく設定される。図６のような状態で、音響信号から方向音の取得を行い、取得した方向音に応じてヘッドホン再生信号を生成し、ヘッドホン再生信号に応じてヘッドホン４により音を再生すれば、点音源の音を明瞭に再生することができる。 Similarly, the pointing direction is optimized for the point sound source direction 30b. As a result, for each of the detected direction of the plurality of point sound sources, the gain value of the directional beam pattern, and the ratio of the gain value in the direction of the point sound source to the maximum value is the predetermined value. It is set larger. In the state shown in FIG. 6, if a direction sound is acquired from an acoustic signal, a headphone reproduction signal is generated according to the acquired direction sound, and sound is reproduced by the headphones 4 according to the headphone reproduction signal, Sound can be reproduced clearly.

また、図６に示す状態では、合成ビームパターン３７”が略円形となるため、全方位でムラの無い雰囲気音の再生を実現できる。合成ビームパターンの標準偏差σ_ｂｓｕｍ（θ_ｄ）の具体的な値は、例えば図４の合成ビームパターン３７が０．２１ｄＢ、図５の合成ビームパターン３７’が１．６５ｄＢ、図６の合成ビームパターン３７”が０．２９ｄＢである。すなわち、指向方向を最適化した図６の状態では、点音源方向に指向方向を向けつつ、合成ビームパターンの乱れを図４の均等配置の場合と同程度にまで抑制できている。即ち、上述の処理により、点音源の方向に対応する音響信号の方向音を再生する際のゲインと、点音源の方向以外の方向音を再生する際のゲインとの差が所定値より小さくなるように、指向方向を設定する。このような状態で、音響信号から方向音の取得を行い、取得した方向音に応じてヘッドホン再生信号を生成し、ヘッドホン再生信号に応じてヘッドホン４により音を再生すれば、全方位でムラの無い雰囲気音を再生することができる。したがって、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立できる。
ここで、円状等間隔マイクアレイ等のようにマイク素子が等方的に配置されており、点音源の数Ｑ＝１であれば、いずれかの指向方向θ_ｄ［ｄ＝１〜Ｄ］を点音源の方向θ_ｓ１に向け、均等配置の他の指向方向θ_ｄを同じ角度だけ回転させればよい。 In addition, in the state shown in FIG. 6, since the synthesized beam pattern 37 ″ is substantially circular, it is possible to reproduce the atmospheric sound without any unevenness in all directions. Specific of the standard deviation σ _bsum (θ _d ) of the synthesized beam pattern For example, the combined beam pattern 37 in FIG. 4 is 0.21 dB, the combined beam pattern 37 ′ in FIG. 5 is 1.65 dB, and the combined beam pattern 37 ″ in FIG. 6 is 0.29 dB. That is, in the state of FIG. 6 in which the pointing direction is optimized, the disturbance of the combined beam pattern can be suppressed to the same level as in the case of the uniform arrangement in FIG. 4 while directing the pointing direction toward the point sound source direction. That is, by the above processing, the difference between the gain when reproducing the direction sound of the acoustic signal corresponding to the direction of the point sound source and the gain when reproducing the direction sound other than the direction of the point sound source is smaller than a predetermined value. Thus, the directivity direction is set. In such a state, if direction sound is acquired from the acoustic signal, a headphone reproduction signal is generated according to the acquired direction sound, and sound is reproduced by the headphone 4 according to the headphone reproduction signal, unevenness is obtained in all directions. No atmosphere sound can be reproduced. Therefore, it is possible to reproduce both the sound of a clear point sound source and the atmosphere sound without any unevenness in all directions.
Here, if the microphone elements are arranged isotropically, such as a circular equidistant microphone array, and the number of point sound sources Q = 1, any one of the directivity directions θ _d [d = 1 to D]. _Is directed to the direction of the point sound source θ _s1 , and the other directivity direction θ _d of equal arrangement may be rotated by the same angle.

これに対し、マイク素子の配置が等方的でなく、指向方向によって形成可能なビームパターンの形が異なったり、複数の点音源が存在したりする等、指向方向の最適化のための条件が複雑になると、適切な指向方向を導くためには式（４）のような式が必要になる。これは、２つの点音源が存在する図４の例において、図６の最適化された指向方向の初期値からの更新量がそれぞれ異なっており、特に指向方向３３’が指向方向３１’と指向方向３２’の間に入っているといった結果からも分かる。 In contrast, the arrangement of microphone elements is not isotropic, the shape of the beam pattern that can be formed differs depending on the direction of the direction, and there are multiple point sound sources. When it becomes complicated, an expression such as Expression (4) is required to derive an appropriate pointing direction. This is because, in the example of FIG. 4 where there are two point sound sources, the amount of update from the initial value of the optimized pointing direction of FIG. 6 is different. In particular, the pointing direction 33 ′ is different from the pointing direction 31 ′. It can also be seen from the result of being in the direction 32 ′.

なお、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立するための、最適化問題を定義する式は、式（４）の他にも様々考えられる。例えば式（８）は、点音源方向に最も近い指向方向θ_{ｄｍｉｎ（ｑ）}［ｑ＝１〜Ｑ］と、点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］との差を閾値Δθ_ｑ［ｑ＝１〜Ｑ］以下にする制約条件のもと、合成ビームパターンの乱れを評価関数として最小化する式である。 In addition to formula (4), various formulas for defining an optimization problem in order to achieve both the reproduction of a clear point sound source sound and the reproduction of an atmospheric sound with no unevenness in all directions are conceivable. For example, the equation (8) indicates that the difference between the directivity direction θ _{dmin (q)} [q = 1 to Q] closest to the point sound source direction and the point sound source direction θ _sq [q = 1 to Q] is the threshold Δθ _q [q = 1 to Q] This is an expression that minimizes the disturbance of the combined beam pattern as an evaluation function under the constraint condition.

ここで、閾値Δθ_ｑ［ｑ＝１〜Ｑ］は点音源毎に変えてもよく、例えばＳ３で平均空間スペクトルのピーク（極大値）が大きかった点音源ほど優先し、閾値Δθ_ｑを小さく設定するようにしてもよい。これにより、優先度の高い点音源は指向方向を正確に向けることで明瞭にし、優先度の低い点音源は指向方向の多少のずれを許容して、その分合成ビームパターンの乱れを抑えることができる。この結果、式（４）と評価関数は同じでも、より柔軟に制約条件を記述することができる。
また、最適化問題において、制約条件を評価関数に組み込むような定義も可能である。例えば式（９）は、点音源方向に最も近い指向方向θ_{ｄｍｉｎ（ｑ）}［ｑ＝１〜Ｑ］と点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］との差の総和と、合成ビームパターンの乱れとの重み付き和を評価関数として最小化する式である。 Here, the threshold value Δθ _q [q = 1 to Q] may be changed for each point sound source. For example, a point sound source having a larger average spatial spectrum peak (maximum value) in S3 is prioritized and the threshold value Δθ _q is set smaller. You may make it do. As a result, point sound sources with high priority can be clarified by directing the pointing direction accurately, and point sources with low priority can allow slight deviations in the pointing direction and suppress the disturbance of the combined beam pattern accordingly. it can. As a result, even if the expression (4) is the same as the evaluation function, the constraint condition can be described more flexibly.
Also, in the optimization problem, it is possible to define such that the constraint condition is incorporated into the evaluation function. For example, the equation (9) indicates that the sum of the differences between the directivity direction θ _{dmin (q)} [q = 1 to Q] closest to the point sound source direction and the point sound source direction θ _sq [q = 1 to Q] and the combined beam pattern This is an expression that minimizes the weighted sum of the disturbance as an evaluation function.

ここで、λ_ｑ［ｑ＝１〜Ｑ］は点音源の優先度を表す重みであり、例えばＳ３で平均空間スペクトルのピーク（極大値）が大きかった点音源ほど優先し、λ_ｑを大きく設定するようにしてもよい。また、β_θは式（９）の第１項に係る全方位でムラの無い雰囲気音と、第２項に係る明瞭な点音源の音の再生との間のトレードオフ（優先度）を調整する重みである。なお、例えばシステム制御部１１によって制御される不図示のＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）部を介して、ユーザがこのトレードオフを間接的に調整できるようにしてもよい。ＧＵＩ部を用いることにより、例えば、両端を方向感重視と包まれ感重視としたスライダバーを表示させ、ユーザの指示を入力する。信号解析処理部１３は、ユーザの指示に応じて、バーの位置が方向感重視に近いほどβ_θを大きくして明瞭な点音源の音の再生を優先してもよい。そして、バーの位置が包まれ感重視に近いほどβ_θを小さくして全方位でムラの無い雰囲気音を優先するようにしてもよい。 Here, λ _q [q = 1 to Q] is a weight representing the priority of the point sound source. For example, a point sound source having a larger average spatial spectrum peak (maximum value) in S3 is prioritized and λ _q is set larger. You may make it do. Β _θ adjusts the trade-off (priority) between the omnidirectional atmosphere sound according to the first term of the equation (9) and the sound reproduction of the clear point sound source according to the second term. The weight to be. Note that, for example, the user may be able to indirectly adjust this trade-off via a GUI (Graphical User Interface) unit (not illustrated) controlled by the system control unit 11. By using the GUI unit, for example, a slider bar in which both ends are emphasized with a sense of direction and a feeling of emphasis is displayed, and a user instruction is input. Signal analyzing processing section 13, in accordance with an instruction of the user, may give priority to reproduction of apparent point source of sound bars positions by increasing the extent beta _theta closer to sense of direction emphasis. Then, as the position of the bar is wrapped and the feeling is closer to emphasis, β _θ may be decreased to give priority to an atmosphere sound that is uniform in all directions.

なお、式（４）のような等式制約の式では、例えば２つの点音源が近接している場合でも、２つの指向方向をそれぞれの点音源に向けるため、点音源方向の合成ビームパターンは膨らんでしまう。これに対し、式（８）や式（９）のような式を用いて最適化を行えば、指向方向の多少のずれを許容して２つの点音源を１つの指向性でカバーし、その分合成ビームパターンの乱れを抑えるような結果が期待できる。 In the equation constraint equation such as Equation (4), for example, even when two point sound sources are close to each other, the two pointing directions are directed to the respective point sound sources. It will swell. On the other hand, if optimization is performed using equations such as Equation (8) and Equation (9), two point sound sources are covered with one directivity while allowing a slight deviation in the directivity direction. A result that suppresses the disturbance of the combined beam pattern can be expected.

（方向音の取得及びヘッドホン再生信号の生成）
Ｓ１３〜Ｓ１６（正確にはこれらを含む周波数ループ内の処理）は、記憶部１２に格納されているＭチャンネルの音響信号から方向音を取得し、ヘッドホン再生信号を生成する処理を示している。
Ｓ１３〜Ｓ１５は、周波数毎の処理であるため、信号解析処理部１３は、周波数ループの中でＳ１３〜Ｓ１５の処理を繰り返し実行する。また、Ｓ１３〜Ｓ１５の処理は、Ｓ１２で指向方向を決定した指向性毎の処理でもあるため、信号解析処理部１３は、指向性ループの中でＳ１３〜Ｓ１５の処理を繰り返し実行する。
Ｓ１３では、信号解析処理部１３は、Ｓ５と同様に、現在の指向性ループで対象としている指向性を形成するためのフィルタ係数ｗ_ｄ（ｆ）を取得する。すなわち、信号解析処理部１３は、記憶部１２に保持されている指向性形成のフィルタ係数から、指向方向θ_ｄに対応するフィルタ係数ｗ_ｄ（ｆ）を取得する。 (Directional sound acquisition and headphone playback signal generation)
S13 to S16 (precisely, processing in a frequency loop including these) indicate processing for acquiring a direction sound from the M channel acoustic signal stored in the storage unit 12 and generating a headphone reproduction signal.
Since S13 to S15 are processes for each frequency, the signal analysis processing unit 13 repeatedly executes the processes of S13 to S15 in the frequency loop. Moreover, since the process of S13-S15 is also the process for every directivity which determined the directivity direction by S12, the signal analysis process part 13 repeatedly performs the process of S13-S15 in a directivity loop.
In S13, the signal analysis processing unit 13 acquires the filter coefficient w _d (f) for forming the directivity targeted in the current directivity loop, as in S5. That is, the signal analysis processing unit 13 acquires the filter coefficient w _d (f) corresponding to the directivity direction θ _d from the directivity forming filter coefficient held in the storage unit 12.

Ｓ１４では、信号解析処理部１３は、Ｓ１で取得したＭチャンネルの音響信号のフーリエ係数ｚ（ｆ）に、Ｓ１３で取得した指向性形成のフィルタ係数ｗ_ｄ（ｆ）によってフィルタ処理を行う。これにより、信号解析処理部１３は、現在の指向性ループに対応する指向方向θ_ｄの方向音Ｙ_ｄ（ｆ）を式（１０）のように生成する。Ｙ_ｄ（ｆ）は周波数領域のデータ（フーリエ係数）である。
Ｙ_ｄ（ｆ）＝ｗ_ｄ ^Ｈ（ｆ）ｚ（ｆ）（１０）
Ｓ１５では、信号解析処理部１３は、Ｓ１４で取得した指向方向θ_ｄの方向音のフーリエ係数Ｙ_ｄ（ｆ）に、指向方向θ_ｄと同じ方向の左右の耳のＨＲＴＦ［Ｈ_Ｌ（ｆ，θ_ｄ）、Ｈ_Ｒ（ｆ，θ_ｄ）］を乗じる。さらに、信号解析処理部１３は、この乗算の結果を、式（１１）のように左右それぞれのヘッドホン再生信号Ｘ_Ｌ（ｆ）、Ｘ_Ｒ（ｆ）に加算する。 In S14, the signal analysis processing unit 13 performs a filtering process on the Fourier coefficient z (f) of the M-channel acoustic signal acquired in S1 using the directivity-forming filter coefficient w _d (f) acquired in S13. As a result, the signal analysis processing unit 13 generates a direction sound Y _d (f) in the directivity direction θ _d corresponding to the current directivity loop as shown in Expression (10). Y _d (f) is frequency domain data (Fourier coefficient).
Y _d (f) = w _d ^H (f) z (f) (10)
In S15, the signal analysis processing unit 13 adds the left and right ear HRTFs [H _L (f, F, F) in the same direction as the directivity direction θ _d to the Fourier coefficient Y _d (f) of the direction sound in the directivity direction θ _d acquired in S14. θ _d ), H _R (f, θ _d )]. Further, the signal analysis processing unit 13 adds the multiplication result to the left and right headphone reproduction signals X _L (f) and X _R (f) as shown in Expression (11).

ここで、Ｘ_Ｌ（ｆ）、Ｘ_Ｒ（ｆ）は周波数領域のデータ（フーリエ係数）である。なお、ＨＲＴＦは、記憶部１２に格納されているものを取得して用いればよい。指向性ループの中で本ステップの処理を行うことは、各指向方向の方向音を再生する仮想スピーカをユーザの周囲に順次配置することに相当する。
この後、Ｓ１６において、信号解析処理部１３は、Ｓ１３〜Ｓ１５の処理で生成したヘッドホン再生信号のフーリエ係数Ｘ_Ｌ（ｆ）、Ｘ_Ｒ（ｆ）を各々逆フーリエ変換し、時間波形であるヘッドホン再生信号ｘ_Ｌ（ｔ）、ｘ_Ｒ（ｔ）を生成する。さらに、信号解析処理部１３は、生成したヘッドホン再生信号ｘ_Ｌ（ｔ）、ｘ_Ｒ（ｔ）を音響信号出力部１４に入力する。
なお、Ｓ１３〜Ｓ１５の処理は周波数領域ではなく時間領域で行ってもよく、その場合は本ステップの逆フーリエ変換は不要となる。 Here, X _L (f) and X _R (f) are frequency domain data (Fourier coefficients). In addition, what is necessary is just to acquire and use what is stored in the memory | storage part 12 for HRTF. Performing this step in the directivity loop is equivalent to sequentially arranging virtual speakers that reproduce direction sounds in each directivity direction around the user.
Thereafter, in S16, the signal analysis processing unit 13 performs inverse Fourier transform on each of the Fourier coefficients X _L (f) and X _R (f) of the headphone reproduction signal generated in the processes in S13 to S15, and the headphones having a time waveform. Reproduction signals x _L (t) and x _R (t) are generated. Further, the signal analysis processing unit 13 inputs the generated headphone reproduction signals x _L (t) and x _R (t) to the acoustic signal output unit 14.
In addition, you may perform the process of S13-S15 not in the frequency domain but in the time domain, and in that case, the inverse Fourier transform of this step becomes unnecessary.

Ｓ１７では、音響信号出力部１４が、Ｓ１６で信号解析処理部から入力されたヘッドホン再生信号ｘ_Ｌ（ｔ）、ｘ_Ｒ（ｔ）にＤＡ変換および増幅を施し、ヘッドホン４に供給する。ヘッドホン４は、供給されたヘッドホン再生信号に応じた音を再生する。
なお、図１において、音響入力部２は、マイクアレイ３から分離して示しているが、マイクアレイ３と一体に設けられてもよい。あるいは、音響入力部２は、信号処理装置１に含まれるように設けてもよい。 In S <b> 17, the acoustic signal output unit 14 performs DA conversion and amplification on the headphone reproduction signals x _L (t) and x _R (t) input from the signal analysis processing unit in S <b> 16 and supplies them to the headphones 4. The headphone 4 reproduces a sound corresponding to the supplied headphone reproduction signal.
In FIG. 1, the acoustic input unit 2 is illustrated separately from the microphone array 3, but may be provided integrally with the microphone array 3. Alternatively, the sound input unit 2 may be provided so as to be included in the signal processing device 1.

（効果）
以上、信号解析処理部１３は、検出した点音源の方向における各指向方向のビームパターンのうちの最大のものと他のものとの比を最小にし、各指向方向のビームパターンを合成した合成ビームパターンの乱れを最小化するように各指向方向を決定する。
このように決定した各指向方向に応じて、信号解析処理部１３は、各方向音を取得し、取得した方向音によりヘッドホン再生信号を生成する。さらに、ヘッドホン再生信号に応じて、ヘッドホン４によって音を再生することにより、本実施形態の信号処理システムでは、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立させることができる。 (effect)
As described above, the signal analysis processing unit 13 minimizes the ratio between the maximum beam pattern in each pointing direction in the direction of the detected point sound source and the other beam pattern, and combines the beam patterns in each pointing direction. Each directivity direction is determined so as to minimize pattern disturbance.
The signal analysis processing unit 13 acquires each directional sound according to each directional direction determined as described above, and generates a headphone reproduction signal using the acquired directional sound. Furthermore, by reproducing the sound with the headphones 4 in accordance with the headphone reproduction signal, the signal processing system according to the present embodiment makes it possible to reproduce both the sound of a clear point sound source and the reproduction of the atmospheric sound without any unevenness in all directions. be able to.

＜実施形態２＞
実施形態１では、複数のマイク素子で収音された音響の音響信号から点音源の方向を検出したが、本発明はこのような実施形態に限定されない。例えば、点音源の方向の検出は、映像に基づいて行ってもよい。点音源の方向の検出を映像に基づいて行う実施形態を、実施形態２として以下に説明する。以下の記載において、実施形態１との相違点を中心に説明をする。また、実施形態１と同じ構成については、実施形態１と同じ参照符号を用いる。実施形態１のヘッドホン４は、例えば、ヘッドマウントディスプレイ（ＨＭＤ：ＨｅａｄＭｏｕｎｔＤｉｓｐｌａｙ）に置換する。 <Embodiment 2>
In the first embodiment, the direction of the point sound source is detected from the acoustic signal of the sound collected by the plurality of microphone elements, but the present invention is not limited to such an embodiment. For example, the direction of the point sound source may be detected based on the video. An embodiment in which the direction of a point sound source is detected based on video will be described below as a second embodiment. In the following description, differences from the first embodiment will be mainly described. The same reference numerals as those in the first embodiment are used for the same configurations as those in the first embodiment. The headphones 4 according to the first embodiment are replaced with, for example, a head mounted display (HMD).

（構成）
実施形態２の信号処理システムは、図１に示す実施形態１の信号処理システムの構成に加えて、被写体を撮影して映像信号を出力する図示しないカメラ等の撮像部と、撮像部からの映像信号を入力して記憶部１２に入力する映像信号入力部とを備える。映像信号入力部は、撮像部から入力した映像信号に、ＡＤ変換、符号化等の処理を行ない、デジタル映像信号として記憶部１２に入力する。撮像部と映像信号入力部を設けたことにより、音響信号の取得と同時に映像信号を取得し、記憶部１２に格納しておくことができる。この信号処理システムでは、映像信号から点音源の方向を検出する。 (Constitution)
In addition to the configuration of the signal processing system of the first embodiment shown in FIG. 1, the signal processing system of the second embodiment captures an object and outputs an image signal such as a camera (not shown) and an image from the imaging unit. A video signal input unit that inputs a signal to the storage unit 12. The video signal input unit performs processing such as AD conversion and encoding on the video signal input from the imaging unit, and inputs the video signal to the storage unit 12 as a digital video signal. By providing the imaging unit and the video signal input unit, the video signal can be acquired simultaneously with the acquisition of the acoustic signal and stored in the storage unit 12. In this signal processing system, the direction of a point sound source is detected from a video signal.

実施形態１では、指向方向の最適化を行うことにより、点音源の方向におけるある指向方向の方向音として捉える音のエネルギーと、それ以外の指向方向の方向音として捉える音のエネルギーとの比を、間接的に最大化していた。
これに対して、実施形態２では、各指向方向の方向音として捉える音のエネルギー比を直接的に最大化する。このために、まず、各指向方向の指向性のビームパターンの点音源方向の値ｂ_ｄ（ｆ，θ_ｓｑ）［ｄ＝１〜Ｄ］から、式（１２）のようにエネルギー比ｒ_ｑ（θ_ｄ）［ｑ＝１〜Ｑ］を定める。 In the first embodiment, by optimizing the directivity direction, the ratio of the energy of the sound captured as a directional sound in a certain directional direction in the direction of the point sound source to the energy of the sound captured as a directional sound in other directional directions is calculated. Was indirectly maximizing.
On the other hand, in the second embodiment, the energy ratio of the sound captured as the direction sound in each directivity direction is directly maximized. For this purpose, first, from the value b _d (f, θ _sq ) [d = 1 to D] in the point sound source direction of the directional beam pattern in each directional direction, the energy ratio r _q ( θ _d ) [q = 1 to Q] is determined.

ここで、エネルギー比は指向方向θ_ｄ［ｄ＝１〜Ｄ］の関数となるためｒ_ｑ（θ_ｄ）と表記し、周波数のインデックスｆは省略している。また、ｄｍａｘ（ｆ，θ_ｓｑ）は式（１３）で表されるような指向方向のインデックスであり、ｂ_ｄｍａｘ（ｆ，θ_ｓｑ）は点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］の各ビームパターン値ｂ_ｄ（ｆ，θ_ｓｑ）［ｄ＝１〜Ｄ］の最大値である。 Here, since the energy ratio is a function of the directivity direction θ _d [d = 1 to D], it is expressed as r _q (θ _d ), and the frequency index f is omitted. Further, dmax (f, θ _sq ) is an index in the directivity direction as expressed by the equation (13), and b _dmax (f, θ _sq ) is each of the point sound source directions θ _sq [q = 1 to Q]. This is the maximum value of the beam pattern value b _d (f, θ _sq ) [d = 1 to D].

信号解析処理部１３は、エネルギー比の最大化による明瞭な点音源の音の再生と、合成ビームパターンの乱れの最小化による全方位でムラの無い雰囲気音の再生を両立させる。このために、信号解析処理部１３は、式（１４）で定義される最適化問題の解となるように、各指向性の指向方向θ_ｄ［ｄ＝１〜Ｄ］を最適化する。 The signal analysis processing unit 13 achieves both the reproduction of a clear point sound source by maximizing the energy ratio and the reproduction of atmospheric sound without any unevenness in all directions by minimizing the disturbance of the combined beam pattern. For this purpose, the signal analysis processing unit 13 optimizes the directivity direction θ _d [d = 1 to D] of each directivity so as to be a solution to the optimization problem defined by Expression (14).

式（１４）は、点音源方向のエネルギー比ｒ_ｑ（θ_ｄ）［ｑ＝１〜Ｑ］の和の符号反転値と、合成ビームパターンの乱れとの重み付き和を評価関数として最小化する式である。ここで、符号反転値としたのは、エネルギー比の最大化問題を最小化問題に変換するためである。また、μ_ｑ［ｑ＝１〜Ｑ］は点音源の優先度を表す重みであり、優先度の高い点音源ほどμ_ｑを大きく設定するようにする。また、β_ｒは式（１４）の第１項に係る全方位でムラの無い雰囲気音の再生と、第２項に係る明瞭な点音源の音の再生との間のトレードオフ（優先度）を調整する重みである。
なお、実施形態１と同様に、例えばシステム制御部１１によって制御される不図示のＧＵＩ部を介して、ユーザがこのトレードオフを間接的に調整できるようにしてもよい。 Equation (14) minimizes, as an evaluation function, a weighted sum of the sign inversion value of the sum of the energy ratios r _q (θ _d ) [q = 1 to Q] in the point sound source direction and the disturbance of the combined beam pattern. It is a formula. Here, the sign inversion value is used to convert the energy ratio maximization problem into a minimization problem. Further, μ _q [q = 1 to Q] is a weight representing the priority of the point sound source, and μ _q is set to be larger as the point sound source has a higher priority. In addition, β _r is a trade-off (priority) between the reproduction of the atmospheric sound having no unevenness in all directions according to the first term of the formula (14) and the reproduction of the sound of the clear point sound source according to the second term. Is a weight to adjust.
Note that, similarly to the first embodiment, for example, the user may be able to indirectly adjust this trade-off via a GUI unit (not shown) controlled by the system control unit 11.

（信号解析処理）
以下、本実施形態の信号解析処理について、図７のフローチャートに沿って説明する。なお、実施形態１と同様に、この図７のフローチャートの処理は、特に別記しない限り信号解析処理部１３が行うものとする。
なお、図７の処理を開始する前に、Ｍ個（図１の場合では６個）のマイク素子３ａ〜３ｆで収音したＭチャンネル（図１の構成では６チャンネル）の音響信号と、映像信号入力部から入力された映像信号が、記憶部１２に格納されているものとする。 (Signal analysis processing)
Hereinafter, the signal analysis processing of this embodiment will be described along the flowchart of FIG. As in the first embodiment, the signal analysis processing unit 13 performs the processing of the flowchart in FIG. 7 unless otherwise specified.
Note that before starting the processing of FIG. 7, M-channel (six channels in the configuration of FIG. 1) sound signals and video collected by M (six in the case of FIG. 1) microphone elements 3a to 3f. It is assumed that the video signal input from the signal input unit is stored in the storage unit 12.

Ｓ２１の処理は、図３の実施形態１のＳ１の処理と同じであるため説明を省略する。
Ｓ２２では、信号解析処理部１３は、記憶部１２が保持している映像信号を取得し、映像認識処理を実行して点音源になり得る被写体（オブジェクト）を検出する。具体的には、例えば、信号解析処理部１３が、公知の顔認識や口認識（発話認識）等の処理を実行したり、公知の機械学習の手法を用いたりすることで、人、動物、乗り物、楽器等といった音を発し得るオブジェクトを検出する。また、信号解析処理部１３が、映像信号中から検出した動きベクトルの反転から、例えばバレーボール等のスポーツにおけるアタックの瞬間のボール等をオブジェクトとして検出してもよい。
Ｓ２３では、信号解析処理部１３は、Ｓ２２で検出したＱ個のオブジェクトから点音源の方向を算出する。映像信号の中心（マイクアレイ座標系の正面０°と一致するものとする）を原点とする座標系において、オブジェクトの水平画素座標（例えばオブジェクト検出枠の中心とする）をＵとすると、点音源方向θ_ｓｑ［ｑ＝１〜Ｑ］は次の式（１５）で算出できる。 Since the process of S21 is the same as the process of S1 of Embodiment 1 of FIG. 3, description is abbreviate | omitted.
In S <b> 22, the signal analysis processing unit 13 acquires the video signal held by the storage unit 12, executes video recognition processing, and detects a subject (object) that can be a point sound source. Specifically, for example, the signal analysis processing unit 13 executes processes such as known face recognition and mouth recognition (speech recognition), or uses a known machine learning technique, so that humans, animals, An object that can emit sound such as a vehicle or a musical instrument is detected. Further, the signal analysis processing unit 13 may detect, for example, a ball at the moment of attack in sports such as volleyball as an object from the inversion of the motion vector detected from the video signal.
In S23, the signal analysis processing unit 13 calculates the direction of the point sound source from the Q objects detected in S22. In a coordinate system having the origin of the center of the video signal (which coincides with 0 ° front of the microphone array coordinate system), if the horizontal pixel coordinate of the object (for example, the center of the object detection frame) is U, a point sound source The direction θ _sq [q = 1 to Q] can be calculated by the following equation (15).

ここで、Ｖは映像信号の水平撮影画角であり、Ｂは映像信号の水平画素数である。
（優先度の設定例）
なお、信号解析処理部１３が、Ｓ２２で検出したオブジェクトに応じて点音源の優先度を設定するようにしてもよい。
具体的には、例えば、オブジェクトの検出枠の大きさに応じて、点音源の優先度を設定するようにしてもよい。あるいは、例えば、検出枠（水平画素数）が小さい点音源ほど映像信号上で占める水平方向範囲が狭くなるため、明瞭にする必要があるとして優先度を高くしてもよい。 Here, V is the horizontal shooting angle of view of the video signal, and B is the number of horizontal pixels of the video signal.
(Priority setting example)
Note that the signal analysis processing unit 13 may set the priority of the point sound source according to the object detected in S22.
Specifically, for example, the priority of the point sound source may be set according to the size of the detection frame of the object. Alternatively, for example, a point sound source having a smaller detection frame (the number of horizontal pixels) has a narrower horizontal range on the video signal, so that the priority may be increased because it is necessary to clarify.

また、映像信号が例えば全方位の映像信号であり、その一部の領域のみを頭部装着型のＨＭＤで頭部運動に応じて表示する場合は、ＨＭＤに表示する表示範囲に応じて点音源の優先度を設定するようにしてもよい。あるいは、例えば、点音源が表示範囲の中心に近いほど優先度を高くしてもよい。また、映像としてユーザに見せたい点音源が表示範囲外である場合に、優先度を上げて明瞭にすることで視線誘導の効果を狙ってもよい。
また、点音源の優先度には音の大きさも関わるため、実施形態１のような音響信号による点音源の検出と組み合わせて、音と映像の両面から点音源の優先度を決定するようにしてもよい。 In addition, when the video signal is, for example, an omnidirectional video signal, and only a part of the video signal is displayed according to the head movement by the head-mounted HMD, the point sound source is selected according to the display range displayed on the HMD. The priority may be set. Alternatively, for example, the priority may be increased as the point sound source is closer to the center of the display range. In addition, when the point sound source that the user wants to show as a video is out of the display range, the effect of gaze guidance may be aimed at by increasing the priority and clarifying.
Also, since the priority of the point sound source is related to the loudness, the point sound source priority is determined from both the sound and the video in combination with the detection of the point sound source by the acoustic signal as in the first embodiment. Also good.

Ｓ２４では、信号解析処理部１３は、各指向性の指向方向θ_ｄ［ｄ＝１〜Ｄ］を初期化する。ただし、指向方向の制約条件があった実施形態１の式（４）と異なり、本実施形態の式（１４）には制約条件が無いため、図４のように均等配置した指向方向３１〜３６を初期値としてもよい。
Ｓ２５〜Ｓ３２は反復的な最適化計算に係る処理であり、最適化ループの中でＳ２５〜Ｓ３２の処理を繰り返し実行する。
Ｓ２５〜Ｓ２７の処理は、実施形態１のＳ５〜Ｓ７の処理と同じであるため説明を省略する。 In S24, the signal analysis processing unit 13 initializes the directivity direction θ _d [d = 1 to D] of each directivity. However, unlike the equation (4) of the first embodiment in which the directivity direction constraint is present, the equation (14) of the present embodiment has no constraint condition, and therefore the directivity directions 31 to 36 arranged as shown in FIG. May be the initial value.
S25 to S32 are processes related to iterative optimization calculation, and the processes of S25 to S32 are repeatedly executed in the optimization loop.
Since the process of S25-S27 is the same as the process of S5-S7 of Embodiment 1, description is abbreviate | omitted.

Ｓ２８では、信号解析処理部１３は、式（１２）のように点音源方向のエネルギー比ｒ_ｑ（θ_ｄ）［ｑ＝１〜Ｑ］を算出する。図４の場合、点音源方向３０ａに対応する点音源［ｑ＝１］については、ビームパターン３２ａに対応する方向音として捉えられる音のエネルギー４２が最も大きく、次いでビームパターン３３ａに対応する方向音として捉えられる音のエネルギー４３が大きい。また、ビームパターン３１ａ、３４ａ〜３６ａに対応する方向音として捉えられる音のエネルギーは相対的に小さい。よって、この場合、点音源方向３０ａのエネルギー比ｒ_１（θ_ｄ）は概ね音のエネルギー４２と４３の比となる。
Ｓ２９では、信号解析処理部１３は、Ｓ２７で算出した合成ビームパターンの標準偏差σ_ｂｓｕｍ（θ_ｄ）と、Ｓ２８で算出した点音源方向のエネルギー比ｒ_ｑ（θ_ｄ）［ｑ＝１〜Ｑ］から、式（１４）に示す最適化問題の評価関数を算出する。
Ｓ３０〜Ｓ３８の処理は、実施形態１のＳ９〜Ｓ１７の処理と同じであるため説明を省略する。 In S28, the signal analysis processing unit 13 calculates the energy ratio r _q (θ _d ) [q = 1 to Q] in the point sound source direction as shown in Expression (12). In the case of FIG. 4, for the point sound source [q = 1] corresponding to the point sound source direction 30a, the sound energy 42 captured as the direction sound corresponding to the beam pattern 32a is the largest, and then the direction sound corresponding to the beam pattern 33a. The sound energy 43 that is captured as Moreover, the energy of the sound perceived as the directional sound corresponding to the beam patterns 31a, 34a to 36a is relatively small. Therefore, in this case, the energy ratio r ₁ (θ _d ) in the point sound source direction 30 a is approximately the ratio of the sound energy 42 and 43.
In S29, the signal analysis processing unit 13 and the standard deviation σ _bsum (θ _d ) of the combined beam pattern calculated in S27 and the energy ratio r _q (θ _d ) [q = 1 to Q in the point sound source direction calculated in S28. ], An evaluation function for the optimization problem shown in Expression (14) is calculated.
Since the process of S30-S38 is the same as the process of S9-S17 of Embodiment 1, description is abbreviate | omitted.

なお、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立するための最適化問題を定義する式は、式（１４）の他にも考えられる。例えば次の式（１６）は、点音源方向のエネルギー比ｒ_ｑ（θ_ｄ）［ｑ＝１〜Ｑ］を閾値Δｒ_ｑ［ｑ＝１〜Ｑ］以上にする制約条件のもと、合成ビームパターンの乱れを評価関数として最小化する式である。 In addition to the equation (14), an equation that defines an optimization problem for achieving both the sound reproduction of a clear point sound source and the reproduction of atmosphere sound that is uniform in all directions can be considered. For example, the following equation (16) is obtained by combining the beam ratio under the constraint that the energy ratio r _q (θ _d ) [q = 1 to Q] in the direction of the point sound source is greater than or equal to the threshold Δr _q [q = 1 to Q]. This is an expression that minimizes pattern disturbance as an evaluation function.

ここで、閾値Δｒ_ｑ［ｑ＝１〜Ｑ］は点音源毎に変えてもよく、優先度の高い点音源ほどΔｒ_ｑを大きく設定するようにしてもよい。
以上、本実施形態の信号処理システムでは、方向音の取得を行うための指向性の指向方向を最適化することで、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立することができる。
なお、実施形態２の信号処理システムでは頭部装着型のヘッドマウントディスプレイを用いたが、信号処理装置１からの再生信号を受け取ることができる装置であれば、ヘッドマウントディスプレイ以外の機器を用いてもよい。 Here, the threshold value Δr _q [q = 1 to Q] may be changed for each point sound source, or Δr _q may be set larger for a point sound source having a higher priority.
As described above, in the signal processing system according to the present embodiment, by reproducing the direction of the directivity for obtaining the direction sound, it is possible to reproduce the sound of a clear point sound source and the atmosphere sound that is uniform in all directions. Can be compatible.
In the signal processing system according to the second embodiment, the head-mounted head mounted display is used. However, any device other than the head mounted display can be used as long as it can receive a reproduction signal from the signal processing device 1. Also good.

＜他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
なお、上述の実施形態では、音響信号の入力とヘッドホン再生信号の生成等の処理を１つの信号処理システムで実現していたが、各々別の信号処理システムあるいは信号処理装置として構成することもできる。
また、上述の実施形態において記憶部１２があらかじめ保持しているとした各種データは、システム制御部１１によって制御される不図示のデータ入出力部を介して、外部から入力するようにしてもよい。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
In the above-described embodiment, processing such as input of an acoustic signal and generation of a headphone reproduction signal is realized by one signal processing system. However, each processing can be configured as a separate signal processing system or signal processing device. .
Further, various data that the storage unit 12 holds in advance in the above-described embodiment may be input from the outside via a data input / output unit (not shown) controlled by the system control unit 11. .

また、ユーザの頭部運動を検出可能なセンサを、例えばヘッドホン４が備える構成としてもよい。このような構成とした場合、例えば音響信号の所定長（以下、「音響フレーム」と称する。）毎に、センサで検出したユーザの頭部運動に応じて、Ｓ１５あるいはＳ３６で使用するＨＲＴＦを切り替えるヘッドトラッキング処理を行ってもよい。
なお、上述の各実施形態の処理を音響信号の音響フレーム毎に行えば、移動する点音源にも対応できることは言うまでもない。すなわち、移動する点音源を追尾しつつ、合成ビームパターンの乱れを最小化するよう各方向音を取得する指向方向を逐次制御する。このとき、各音響フレームにおける指向方向の初期値には、前の音響フレームでの最適化結果を用いるのが好適である。 Moreover, it is good also as a structure with which the headphone 4 is equipped with the sensor which can detect a user's head movement, for example. In such a configuration, for example, for each predetermined length of an acoustic signal (hereinafter referred to as “acoustic frame”), the HRTF used in S15 or S36 is switched according to the user's head movement detected by the sensor. Head tracking processing may be performed.
Needless to say, if the processing of each of the above-described embodiments is performed for each acoustic frame of the acoustic signal, it can be applied to a moving point sound source. That is, while tracking the moving point sound source, the directivity direction for acquiring each direction sound is sequentially controlled so as to minimize the disturbance of the combined beam pattern. At this time, it is preferable to use the optimization result in the previous acoustic frame as the initial value of the pointing direction in each acoustic frame.

また、マイクアレイ３の代わりに、無指向性のマイク素子をマトリクス状に配置したマイクアレイを用いるようにしてもよい。このようなマイクアレイを用いる場合には、各マイク素子で収音した音響信号に指向性を形成するフィルタ処理を行って所望の方向の音響信号を生成することができる。各指向方向の方向音の取得は、指向性を形成するフィルタ処理におけるフィルタ係数を調整することによって行うことができる。このため、このような構成とした場合にも、指向方向の最適化を行うことによって、明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立することができる。 Instead of the microphone array 3, a microphone array in which omnidirectional microphone elements are arranged in a matrix may be used. When such a microphone array is used, an acoustic signal in a desired direction can be generated by performing filter processing for forming directivity on the acoustic signal collected by each microphone element. The direction sound in each directivity direction can be acquired by adjusting the filter coefficient in the filter processing for forming directivity. For this reason, even in such a configuration, by optimizing the directivity direction, it is possible to achieve both reproduction of a clear point sound source sound and reproduction of atmosphere sound without any unevenness in all directions.

また、ヘッドホン４の代わりに、ユーザの周囲に配置した複数のスピーカにより音響信号を再生するようにしてもよい。この場合、Ｓ１３〜Ｓ１５あるいはＳ３４〜Ｓ３６におけるヘッドホン再生信号の生成の代わりに、取得した方向音から各スピーカの再生信号を生成する。さらに、各スピーカの配置方向を制御可能な機構を備えた周囲スピーカシステムにより音響信号を再生するようにしてもよい。この場合、機構上の制約を考慮して、最適化問題において各指向方向の順番が入れ替わらないような制約条件を課してもよい。
また、信号処理装置１自体が、収音（マイクアレイ）や撮影（カメラ）・表示（ディスプレイ）等の機能を備える構成としてもよい。また、撮影・収音を行う機能と、表示・再生を行う機能を分離し、遠隔地で同期的に動作するように構成すれば、遠隔ライブシステムを実現することができる。 Further, instead of the headphones 4, sound signals may be reproduced by a plurality of speakers arranged around the user. In this case, instead of generating the headphone playback signal in S13 to S15 or S34 to S36, a playback signal for each speaker is generated from the acquired direction sound. Furthermore, the sound signal may be reproduced by an ambient speaker system having a mechanism capable of controlling the arrangement direction of each speaker. In this case, in consideration of mechanical constraints, a constraint condition may be imposed so that the order of each directivity direction is not changed in the optimization problem.
The signal processing device 1 itself may be configured to have functions such as sound collection (microphone array), shooting (camera), and display (display). Further, a remote live system can be realized by separating the function for photographing / sound collection and the function for displaying / reproducing and so as to operate synchronously in a remote place.

なお、上述の各実施形態では、水平全方位で明瞭な点音源の音の再生と全方位でムラの無い雰囲気音の再生を両立するようにしていたが、対象とする方向範囲を任意に設定してもよい。例えば、水平方向だけでなく、仰角方向も含めた全方位を対象方向範囲としてもよいし、水平前方半面や、撮影した映像信号の画角範囲等に限定してもよい。この場合、例えば合成ビームパターンの乱れの目安である標準偏差は、水平全方位ではなく対象方向範囲の合成ビームパターンから算出する。 In each of the above-described embodiments, the sound reproduction of a clear point sound source in all horizontal directions and the reproduction of the atmosphere sound without unevenness in all directions are compatible, but the target direction range is arbitrarily set. May be. For example, not only the horizontal direction but also all directions including the elevation direction may be set as the target direction range, or may be limited to the horizontal front half, the angle of view range of the captured video signal, and the like. In this case, for example, the standard deviation, which is an indication of the disturbance of the combined beam pattern, is calculated from the combined beam pattern in the target direction range instead of all horizontal directions.

また、指向性を有する各マイク素子の軸方向を制御可能な機構を備えた指向性のマイクアレイを用いてもよい。この場合は、機構上の制約を考慮して、最適化問題において各指向方向の順番が入れ替わらないような制約条件を課してもよい。
また、上述の各実施形態では、合成ビームパターンの全周でのゲインの標準偏差により、合成ビームパターンの乱れの評価を行った場合について説明したが、これ以外の偏差を用いて評価を行ってもよい。例えば、合成ビームパターンの全方位において、ゲインの値と所定の値との差を求め、求めた差の総和によって合成ビームパターンの乱れを評価してもよい。これにより、各実施形態と同様に、合成ビームパターンのゲインの偏差を所定値より小さくすることができる。 Further, a directional microphone array having a mechanism capable of controlling the axial direction of each microphone element having directivity may be used. In this case, a constraint condition may be imposed so that the order of each directivity direction is not changed in the optimization problem in consideration of the constraints on the mechanism.
Further, in each of the above-described embodiments, the case where the disturbance of the combined beam pattern is evaluated based on the standard deviation of the gain around the combined beam pattern has been described. However, the evaluation is performed using other deviations. Also good. For example, the difference between the gain value and a predetermined value may be obtained in all directions of the synthesized beam pattern, and the disturbance of the synthesized beam pattern may be evaluated by the sum of the obtained differences. Thereby, similarly to each embodiment, the deviation of the gain of the combined beam pattern can be made smaller than a predetermined value.

１…信号処理装置、２…音響信号入力部、３ａ，３ｂ，３ｃ，３ｄ，３ｅ，３ｆ…マイク素子、４…ヘッドホン、１１…システム制御部、１２…記憶部、１３…信号解析処理部、１４…音響信号出力部 DESCRIPTION OF SYMBOLS 1 ... Signal processing apparatus, 2 ... Acoustic signal input part, 3a, 3b, 3c, 3d, 3e, 3f ... Microphone element, 4 ... Headphone, 11 ... System control part, 12 ... Memory | storage part, 13 ... Signal analysis process part, 14 ... Acoustic signal output section

Claims

Obtaining means for obtaining directional sound for each of a plurality of directional directions from an acoustic signal of sound collected by a plurality of microphone elements;
Detecting means for detecting the direction of the point sound source from the acoustic signal;
A beam pattern gain value indicating the directivity of the directional sound corresponding to each of the plurality of directivity directions, the gain value in the direction of the point sound source detected by the detection means being the largest and other values Control means for controlling each directivity direction so as to make a deviation of a gain of a combined beam pattern obtained by combining a plurality of beam patterns having a ratio larger than a predetermined value and a combined beam pattern smaller than a predetermined value;
A signal processing apparatus comprising:

The detection means detects directions of the plurality of point sound sources from the acoustic signal,
The control means, for each of the direction of the plurality of point sound sources detected by the detection means, is a beam pattern gain value indicating the directivity of the directional sound corresponding to each of the plurality of directivity directions, 2. The signal processing according to claim 1, wherein each of the directivity directions is controlled so that a ratio of a gain value in the direction of the point sound source to a maximum value is larger than a predetermined value. apparatus.

3. The signal processing apparatus according to claim 1, wherein the acquisition unit acquires the directional sound by performing filter processing for forming directivity corresponding to the directivity direction on the acoustic signal. 4. .

The signal processing apparatus according to claim 1, wherein the control unit solves an optimization problem using the directivity direction as an optimization variable and sets the solution as the directivity direction.

The optimization problem is minimization of a deviation in gain of the combined beam pattern under a constraint that restricts the directivity of the number of point sound sources to the direction of the point sound sources. 5. The signal processing device according to 4.

The optimization problem is that a minimum deviation in gain of the combined beam pattern under the constraint that a difference between a directivity direction closest to the direction of the point sound source and a direction of the point sound source is equal to or less than a first threshold value. The signal processing apparatus according to claim 4, wherein

The optimization problem is minimization of a weighted sum of a sum of differences between the pointing direction closest to the direction of the point sound source and a direction of the point sound source and a gain deviation of the combined beam pattern. 5. The signal processing apparatus according to claim 4, wherein

5. The optimization problem is minimization of a deviation in gain of the combined beam pattern under a constraint that makes the ratio in the direction of the point sound source equal to or greater than a second threshold. A signal processing device according to 1.

5. The optimization problem according to claim 4, wherein the optimization problem is minimization of a weighted sum of a sign inversion value of the sum of the ratios in the direction of the point sound source and a deviation in gain of the combined beam pattern. Signal processing equipment.

The signal processing apparatus according to claim 7, further comprising adjustment means for adjusting a weight in the weighted sum.

The detecting means obtains a spatial spectrum using a spatial correlation matrix of the acoustic signal, calculates an average spatial spectrum by averaging, and detects a direction of the point sound source from a maximum value of the average spatial spectrum. The signal processing device according to claim 1.

The signal processing apparatus according to claim 11, wherein the control unit gives priority to a point sound source having a large maximum value of the average spatial spectrum.

13. The input device according to claim 1, further comprising an input unit configured to input a video signal, wherein the detection unit detects a direction of the point sound source according to a position of an object detected from the video signal. The signal processing device according to item.

The signal processing apparatus according to claim 13, wherein the control unit sets the priority of the point sound source according to the size of the detected object.

15. The signal processing apparatus according to claim 13, wherein the control unit sets the priority of the point sound source according to a display range of the video signal.

The signal processing apparatus according to claim 1, further comprising a generation unit that generates a reproduction signal from the directional sound acquired by the acquisition unit.

The signal processing apparatus according to claim 16, wherein the generation unit generates the reproduction signal using a head-related transfer function.

The signal processing apparatus according to claim 1, further comprising head-mounted display means for displaying an image.

Detection means for detecting the direction of a point sound source based on acoustic signals collected by a plurality of microphone elements;
Control means for setting the plurality of directivity directions in order to reproduce the direction sound of the acoustic signal by output for each of the plurality of directivity directions, and the direction of the point sound source detected by the detection means is set to the plurality of directivity The direction of the sound signal corresponding to the direction of the point sound source detected by the detection means and the gain for reproducing the direction sound of the acoustic signal detected by the detection means, and the direction detected by the detection means Control means for setting another directivity direction among the plurality of directivity directions so that a difference from a gain when reproducing the direction sound of the acoustic signal corresponding to a direction different from the direction of the point sound source is smaller than a predetermined value. When,
A signal processing apparatus comprising:

Detecting the direction of the point source;
The ratio of the maximum value to the other value in the direction of the detected point sound source of each directional beam pattern for acquiring the directional sound in each directional direction is increased, and the directional beam patterns are synthesized. Controlling the directivity direction so as to reduce disturbance of the combined beam pattern;
Obtaining a directional sound for each directional direction from acoustic signals collected by a plurality of microphone elements;
A signal processing method characterized by comprising:

A computer program for causing a computer to function as the signal processing device according to any one of claims 1 to 19.