JP7435948B2

JP7435948B2 - Sound collection device, sound collection method and sound collection program

Info

Publication number: JP7435948B2
Application number: JP2020043913A
Authority: JP
Inventors: 博基古川; 慎一杠
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2019-11-18
Filing date: 2020-03-13
Publication date: 2024-02-21
Anticipated expiration: 2040-03-13
Also published as: JP2021081696A

Description

本開示は、複数のマイクロホン素子を用いて目的音を収音する技術に関するものである。 The present disclosure relates to a technique for collecting target sound using a plurality of microphone elements.

従来、少なくとも２つのマイクロホン素子からの出力信号を用いて指向性を制御するビームフォーマが知られている。そして、このビームフォーマを用いて、周囲ノイズを抑圧し、目的音を周囲ノイズから分離して収音する収音装置がある。ビームフォーマのノイズ抑圧性能は、少なくとも２つのマイクロホン素子間の感度ばらつきにより劣化するおそれがある。 BACKGROUND ART Beamformers that control directivity using output signals from at least two microphone elements have been known. There is a sound collection device that uses this beamformer to suppress ambient noise and separate target sound from ambient noise. The noise suppression performance of the beamformer may deteriorate due to variations in sensitivity between at least two microphone elements.

例えば、特許文献１は、一般化サイドローブキャンセラ（以下、ＧＳＣ（ＧｅｎｅｒａｌＳｉｄｅｌｏｂｅＣａｎｃｅｌｌｅｒ）と呼ぶ）に自動キャリブレーション処理を組み合わせたビームフォーマを開示している。特許文献１においては、周囲ノイズによって複数のマイクロホン間の感度ばらつきが補正されている。 For example, Patent Document 1 discloses a beamformer that combines a generalized sidelobe canceller (hereinafter referred to as GSC) with automatic calibration processing. In Patent Document 1, sensitivity variations among a plurality of microphones are corrected by ambient noise.

特許第４７３４０７０号明細書Patent No. 4734070 specification

しかしながら、上記従来の技術では、指向性合成におけるノイズ抑圧性能が低下するおそれがあるので、更なる改善が必要とされていた。 However, with the above-mentioned conventional technology, there is a risk that the noise suppression performance in directional synthesis may deteriorate, so further improvement is required.

本開示は、上記の問題を解決するためになされたもので、指向性合成におけるノイズ抑圧性能を向上させることができるとともに、目的音を高Ｓ／Ｎ比で収音することができる技術を提供することを目的とする。 The present disclosure has been made to solve the above problems, and provides a technology that can improve noise suppression performance in directional synthesis and can collect target sound with a high S/N ratio. The purpose is to

本開示の一態様に係る収音装置は、複数のマイクロホン素子と、前記複数のマイクロホン素子の出力信号にゲインを掛けることにより前記複数のマイクロホン素子間の感度差を補正する感度補正部と、発話者の音声を目的音として検出する目的音検出部と、前記目的音検出部の検出結果に基づいて前記ゲインを制御するゲイン制御部と、前記感度補正部によって補正された前記複数のマイクロホン素子の前記出力信号を用いて、所定の方向から到来する前記目的音を強調して収音する指向性合成部と、を備え、前記ゲイン制御部は、前記目的音検出部によって前記発話者の前記音声が検出された場合、前記複数のマイクロホン素子の前記出力信号に基づいて前記ゲインを更新し、前記目的音検出部によって前記発話者の前記音声が検出されない場合、前記ゲインを更新しない。 A sound collection device according to an aspect of the present disclosure includes a plurality of microphone elements, a sensitivity correction unit that corrects a sensitivity difference between the plurality of microphone elements by multiplying an output signal of the plurality of microphone elements by a gain, and an utterance. a target sound detection unit that detects a person's voice as a target sound; a gain control unit that controls the gain based on a detection result of the target sound detection unit; and a gain control unit that controls the gain of the plurality of microphone elements corrected by the sensitivity correction unit. a directional synthesis unit that uses the output signal to emphasize and collect the target sound coming from a predetermined direction; is detected, the gain is updated based on the output signals of the plurality of microphone elements, and when the target sound detection section does not detect the voice of the speaker, the gain is not updated.

本開示によれば、指向性合成におけるノイズ抑圧性能を向上させることができるとともに、目的音を高Ｓ／Ｎ比で収音することができる。 According to the present disclosure, noise suppression performance in directional synthesis can be improved, and target sound can be collected with a high S/N ratio.

本開示の実施の形態１における収音装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a sound collection device in Embodiment 1 of the present disclosure. 本開示の実施の形態１におけるマイクロホンアレイの設置位置の一例を示す図である。FIG. 3 is a diagram showing an example of an installation position of a microphone array in Embodiment 1 of the present disclosure. 本開示の実施の形態１におけるマイクロホンアレイのマイクロホン素子の配置例を示す図である。FIG. 3 is a diagram illustrating an example arrangement of microphone elements of a microphone array according to Embodiment 1 of the present disclosure. 本開示の実施の形態１における収音装置の目的音検出部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a target sound detection section of the sound collection device in Embodiment 1 of the present disclosure. 本開示の実施の形態１における収音装置の音声判定部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a sound determination section of the sound collection device according to Embodiment 1 of the present disclosure. 本開示の実施の形態１における収音装置の感度補正制御部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a sensitivity correction control section of the sound collection device in Embodiment 1 of the present disclosure. 本開示の実施の形態１における収音装置の動作について説明するためのフローチャートである。It is a flow chart for explaining operation of a sound collection device in Embodiment 1 of this indication. 本開示の実施の形態２における収音装置の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a sound collection device in Embodiment 2 of the present disclosure. 本開示の実施の形態２における収音装置の目的音検出部の構成を示すブロック図である。It is a block diagram showing the composition of the target sound detection part of the sound collection device in Embodiment 2 of this indication. 本開示の実施の形態２における収音装置の目的音方向判定部の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of a target sound direction determining section of a sound collection device in Embodiment 2 of the present disclosure. 本開示の実施の形態２の変形例における収音装置の目的音方向判定部の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of a target sound direction determining section of a sound collection device in a modification of Embodiment 2 of the present disclosure.

（本開示の基礎となった知見）
上記のように、従来技術においては、周囲ノイズによって複数のマイクロホン間の感度ばらつきが補正されている。 (Findings that formed the basis of this disclosure)
As described above, in the prior art, sensitivity variations among a plurality of microphones are corrected by ambient noise.

しかしながら、ノイズ源が、複数のマイクロホン素子で構成されるマイクロホンアレイの近傍にある場合、ノイズ源と各マイクロホン素子との距離差を無視することができず、ノイズ源から発生したノイズが各マイクロホン素子の位置での音圧差となって表れる。このようなマイクロホンアレイの近傍にあるノイズ源から発生したノイズによって複数のマイクロホン素子の感度補正又は自動キャリブレーションが行われた場合、正しく感度補正又は自動キャリブレーションが行えず、かえってその後段のビームフォーマの出力の性能を劣化させるおそれがあった。 However, when a noise source is near a microphone array consisting of multiple microphone elements, the distance difference between the noise source and each microphone element cannot be ignored, and the noise generated from the noise source is transmitted to each microphone element. This appears as a sound pressure difference at the position. If the sensitivity correction or automatic calibration of multiple microphone elements is performed due to noise generated from such a noise source near the microphone array, the sensitivity correction or automatic calibration may not be performed correctly, and the beamformer in the subsequent stage may be damaged. There was a risk that the output performance would deteriorate.

特に、ＧＳＣにおいては、ブロッキングマトリックスが、目的音方向に感度の死角を持つノイズ参照信号を作成する。しかしながら、複数のマイクロホン素子間に感度ばらつきがあると、目的音方向に感度の死角が形成できず、ノイズ参照信号に目的音が漏れこむ。この場合、後段の適応ノイズキャンセリングを経由した目的音が漏れ込んだノイズ参照信号が、重み付き和ビームフォーマの出力から差引かれることで、出力信号の目的音に歪を与えることがある。ノイズ参照信号に目的音が漏れ込まないようにするためには、少なくとも複数のマイクロホン素子間の感度を揃える必要がある。 In particular, in GSC, a blocking matrix creates a noise reference signal with a blind spot of sensitivity in the direction of the target sound. However, if there are variations in sensitivity among a plurality of microphone elements, a blind spot of sensitivity cannot be formed in the direction of the target sound, and the target sound leaks into the noise reference signal. In this case, the noise reference signal into which the target sound has leaked through the subsequent adaptive noise canceling is subtracted from the output of the weighted sum beamformer, which may distort the target sound of the output signal. In order to prevent the target sound from leaking into the noise reference signal, it is necessary to equalize the sensitivity of at least a plurality of microphone elements.

以上の課題を解決するために、本開示の一態様に係る収音装置は、複数のマイクロホン素子と、前記複数のマイクロホン素子の出力信号にゲインを掛けることにより前記複数のマイクロホン素子間の感度差を補正する感度補正部と、発話者の音声を目的音として検出する目的音検出部と、前記目的音検出部の検出結果に基づいて前記ゲインを制御するゲイン制御部と、前記感度補正部によって補正された前記複数のマイクロホン素子の前記出力信号を用いて、所定の方向から到来する前記目的音を強調して収音する指向性合成部と、を備え、前記ゲイン制御部は、前記目的音検出部によって前記発話者の前記音声が検出された場合、前記複数のマイクロホン素子の前記出力信号に基づいて前記ゲインを更新し、前記目的音検出部によって前記発話者の前記音声が検出されない場合、前記ゲインを更新しない。 In order to solve the above problems, a sound collection device according to one aspect of the present disclosure includes a plurality of microphone elements and a sensitivity difference between the plurality of microphone elements by multiplying an output signal of the plurality of microphone elements by a gain. a target sound detection unit that detects the speaker's voice as the target sound, a gain control unit that controls the gain based on the detection result of the target sound detection unit, and the sensitivity correction unit. a directional synthesis section that emphasizes and collects the target sound coming from a predetermined direction using the corrected output signals of the plurality of microphone elements; If the detection unit detects the voice of the speaker, the gain is updated based on the output signals of the plurality of microphone elements, and if the target sound detection unit does not detect the voice of the speaker, The gain is not updated.

この構成によれば、複数のマイクロホン素子の出力信号にゲインを掛けることにより複数のマイクロホン素子間の感度差が補正される。このとき、発話者の音声が検出された場合、複数のマイクロホン素子の出力信号に基づいてゲインが更新され、発話者の音声が検出されない場合、ゲインが更新されない。そして、感度差が補正された複数のマイクロホン素子の出力信号を用いて、所定の方向から到来する目的音が強調して収音される。 According to this configuration, the sensitivity difference between the plurality of microphone elements is corrected by multiplying the output signals of the plurality of microphone elements by the gain. At this time, if the speaker's voice is detected, the gain is updated based on the output signals of the plurality of microphone elements, and if the speaker's voice is not detected, the gain is not updated. Then, using the output signals of the plurality of microphone elements whose sensitivity differences have been corrected, the target sound coming from a predetermined direction is emphasized and collected.

したがって、目的音である発話者の音声が検出された場合に、複数のマイクロホン素子間の感度差を補正するためのゲインが更新されるので、目的音に対する複数のマイクロホン素子間の感度差を補正することができ、後段の指向性合成において、目的音方向に感度の死角を有するノイズ参照信号に目的音が漏れこむ量を低減することができる。その結果、指向性合成におけるノイズ抑圧性能を向上させることができるとともに、目的音を高Ｓ／Ｎ比で収音することができる。 Therefore, when the speaker's voice, which is the target sound, is detected, the gain for correcting the sensitivity difference between multiple microphone elements is updated, so the sensitivity difference between the multiple microphone elements for the target sound is corrected. In the subsequent directional synthesis, it is possible to reduce the amount of the target sound leaking into the noise reference signal that has a blind spot in sensitivity in the direction of the target sound. As a result, the noise suppression performance in directional synthesis can be improved, and the target sound can be collected with a high S/N ratio.

また、上記の収音装置において、前記目的音検出部は、前記複数のマイクロホン素子のうちの１つのマイクロホン素子の出力信号が前記音声と前記音声以外の非音声とのいずれであるかを判定する音声判定部を含んでもよい。 Further, in the above sound collection device, the target sound detection unit determines whether the output signal of one of the plurality of microphone elements is the voice or a non-voice other than the voice. It may also include a voice determination section.

この構成によれば、複数のマイクロホン素子のうちの１つのマイクロホン素子の出力信号が音声と非音声とのいずれであるかが判定されることにより、発話者の音声を容易に検出することができる。 According to this configuration, the voice of the speaker can be easily detected by determining whether the output signal of one of the plurality of microphone elements is voice or non-voice. .

また、上記の収音装置において、前記目的音検出部は、前記１つのマイクロホン素子の出力信号から特定の帯域の信号を抽出する第１抽出部を含み、前記音声判定部は、前記第１抽出部によって抽出された前記信号に対して前記音声と前記非音声とのいずれであるかを判定してもよい。 Further, in the above sound collection device, the target sound detection section includes a first extraction section that extracts a signal in a specific band from the output signal of the one microphone element, and the sound determination section includes the first extraction section. It may be determined whether the signal extracted by the unit is the voice or the non-voice.

この構成によれば、１つのマイクロホン素子の出力信号から抽出された特定の帯域の信号に対して音声と非音声とのいずれであるかが判定されるので、より高い精度で発話者の音声を検出することができる。 According to this configuration, it is determined whether the signal in a specific band extracted from the output signal of one microphone element is voice or non-voice, so the voice of the speaker can be detected with higher accuracy. can be detected.

また、上記の収音装置において、前記目的音検出部は、前記複数のマイクロホン素子の出力信号を用いて、予め決められた目的音方向から前記目的音が到来しているか否かを判定する目的音方向判定部と、前記目的音方向判定部によって前記目的音方向から前記目的音が到来していると判定され、かつ前記音声判定部によって前記１つのマイクロホン素子の出力信号が前記音声であると判定された場合、前記目的音が検出されたと判定する目的音判定部と、を含んでもよい。 Further, in the above-mentioned sound collection device, the purpose of the target sound detection unit is to determine whether or not the target sound is coming from a predetermined target sound direction using the output signals of the plurality of microphone elements. A sound direction determination unit and the target sound direction determination unit determine that the target sound is coming from the target sound direction, and the audio determination unit determines that the output signal of the one microphone element is the voice. The target sound determination unit may also include a target sound determining unit that determines that the target sound has been detected when the target sound is determined.

音声が検出されたか否かの判定のみでは、目的音方向以外の方向から発話があった場合も音声が検出されたと判定されてしまうおそれがある。一方、上記の構成によれば、音声が検出され、且つ目的音方向から目的音が到来している場合のみ、目的音が検出されたと判定され、ゲインが更新されるので、より高い精度で目的音を用いて感度差を補正することができる。 If only the determination is made as to whether or not speech has been detected, there is a risk that it will be determined that speech has been detected even if speech is made from a direction other than the direction of the target sound. On the other hand, according to the above configuration, only when a voice is detected and the target sound is coming from the direction of the target sound, it is determined that the target sound has been detected and the gain is updated. Sensitivity differences can be corrected using sound.

また、上記の収音装置において、前記目的音検出部は、前記複数のマイクロホン素子の出力信号から特定の帯域の信号を抽出する第２抽出部を含み、前記目的音方向判定部は、前記第２抽出部によって抽出された前記信号に対して前記目的音方向から前記目的音が到来しているか否かを判定してもよい。 Further, in the above sound collection device, the target sound detection unit includes a second extraction unit that extracts a signal in a specific band from the output signals of the plurality of microphone elements, and the target sound direction determination unit includes a second extraction unit that extracts a signal in a specific band from the output signals of the plurality of microphone elements, and It may be determined whether or not the target sound is coming from the target sound direction with respect to the signal extracted by the second extraction unit.

この構成によれば、複数のマイクロホン素子の出力信号から抽出された特定の帯域の信号に対して目的音方向から目的音が到来しているか否かが判定されるので、より高い精度で目的音方向から目的音が到来しているか否かを判定することができる。 According to this configuration, it is determined whether or not the target sound is coming from the target sound direction with respect to the signal of a specific band extracted from the output signals of the plurality of microphone elements, so that the target sound can be heard with higher accuracy. It can be determined whether the target sound is coming from the direction.

また、上記の収音装置において、前記目的音方向判定部は、前記複数のマイクロホン素子の出力信号の位相差を用いて、前記目的音が到来する方向を推定する方向推定部と、前記方向推定部によって推定された前記方向が、予め決められた前記目的音方向であるか否かを判定する方向判定部と、を含んでもよい。 Further, in the above sound collection device, the target sound direction determination unit includes a direction estimation unit that estimates a direction in which the target sound arrives using a phase difference between output signals of the plurality of microphone elements; and a direction determination unit that determines whether the direction estimated by the unit is the predetermined target sound direction.

目的音が到来する方向は、複数のマイクロホン素子の出力信号の位相差を用いることによって容易に推定することができる。そのため、目的音方向から目的音が到来しているか否かは、予め目的音方向が既知であれば、目的音が到来する方向の推定結果に基づいて容易に判定することができる。 The direction in which the target sound arrives can be easily estimated by using the phase difference between the output signals of the plurality of microphone elements. Therefore, whether or not the target sound is coming from the target sound direction can be easily determined based on the estimation result of the direction in which the target sound is coming, if the target sound direction is known in advance.

また、上記の収音装置において、前記目的音方向判定部は、前記複数のマイクロホン素子の出力信号を用いて前記目的音方向の信号を強調することにより前記目的音方向に指向性を形成する第１指向性合成部と、前記複数のマイクロホン素子の出力信号を用いて前記目的音方向に感度の死角を形成する第２指向性合成部と、前記第１指向性合成部からの出力信号の出力レベルと、前記第２指向性合成部からの出力信号の出力レベルとを比較し、前記目的音方向から前記目的音が到来しているか否かを判定するレベル比較判定部と、を含んでもよい。 Further, in the above-mentioned sound collection device, the target sound direction determination unit is configured to form a directivity in the target sound direction by emphasizing a signal in the target sound direction using the output signals of the plurality of microphone elements. a second directivity synthesis section that uses the output signals of the plurality of microphone elements to form a blind spot of sensitivity in the direction of the target sound, and an output signal from the first directionality synthesis section. and a level comparison determination unit that compares the level with the output level of the output signal from the second directional synthesis unit and determines whether or not the target sound is coming from the target sound direction. .

目的音方向から目的音が到来している場合、第１指向性合成部からの出力信号レベルは、第２指向性合成部からの出力信号レベルより大きくなる。そのため、第１指向性合成部からの出力信号レベルが、第２指向性合成部からの出力信号レベルより大きい場合、目的音方向から目的音が到来していると判定することができる。一方、目的音方向から目的音が到来していない場合、第１指向性合成部及び第２指向性合成部の出力信号には、周辺ノイズのみが含まれる。したがって、第１指向性合成部からの出力信号レベルは、第２指向性合成部からの出力信号レベルとほぼ等しくなるか、第２指向性合成部からの出力信号レベルよりも小さくなる。そのため、第１指向性合成部からの出力信号レベルが、第２指向性合成部からの出力信号レベル以下である場合、目的音方向から目的音が到来していないと判定することができる。 When the target sound is coming from the target sound direction, the output signal level from the first directional synthesis section is higher than the output signal level from the second directional synthesis section. Therefore, if the output signal level from the first directional synthesis section is higher than the output signal level from the second directional synthesis section, it can be determined that the target sound is coming from the direction of the target sound. On the other hand, when the target sound is not arriving from the target sound direction, the output signals of the first directional synthesis section and the second directional synthesis section include only peripheral noise. Therefore, the output signal level from the first directivity combining section is approximately equal to the output signal level from the second directivity combining section, or is smaller than the output signal level from the second directivity combining section. Therefore, when the output signal level from the first directional synthesis section is equal to or lower than the output signal level from the second directional synthesis section, it can be determined that the target sound is not coming from the direction of the target sound.

また、上記の収音装置において、前記ゲイン制御部は、前記複数のマイクロホン素子それぞれの出力信号の出力レベルを検出するレベル検出部と、前記目的音検出部によって前記発話者の前記音声が検出された場合に、前記レベル検出部によって検出された各出力レベルの時間平均レベルを算出する時間平均レベル算出部と、前記時間平均レベル算出部によって算出された前記時間平均レベルから、前記ゲインを更新した補正ゲインを算出する補正ゲイン算出部と、を含んでもよい。 Further, in the above sound collection device, the gain control section includes a level detection section that detects the output level of the output signal of each of the plurality of microphone elements, and a level detection section that detects the voice of the speaker by the target sound detection section. a time average level calculation unit that calculates a time average level of each output level detected by the level detection unit; and a time average level calculation unit that updates the gain from the time average level calculated by the time average level calculation unit. A correction gain calculation unit that calculates a correction gain may also be included.

この構成によれば、発話者の音声が検出された場合に、複数のマイクロホン素子それぞれの出力信号の出力レベルの時間平均レベルが算出される。そして、算出された時間平均レベルから、ゲインを更新した補正ゲインが算出されるので、複数のマイクロホン素子の出力信号に算出された補正ゲインを掛けることにより複数のマイクロホン素子間の感度差を補正することができる。 According to this configuration, when the voice of the speaker is detected, the time average level of the output level of the output signal of each of the plurality of microphone elements is calculated. Then, a correction gain that updates the gain is calculated from the calculated time average level, so the sensitivity difference between the plurality of microphone elements is corrected by multiplying the output signals of the plurality of microphone elements by the calculated correction gain. be able to.

また、上記の収音装置において、前記補正ゲイン算出部は、前記複数のマイクロホン素子のうちの予め決められている１つのマイクロホン素子の前記時間平均レベルを基準として、前記１つのマイクロホン素子以外の他のマイクロホン素子の前記時間平均レベルが前記１つのマイクロホン素子の前記時間平均レベルと同じになるように前記他のマイクロホン素子の前記補正ゲインを算出してもよい。 Further, in the above-mentioned sound collection device, the correction gain calculation unit may calculate the time average level of one microphone element determined in advance among the plurality of microphone elements as a reference, and calculate The correction gain of the other microphone element may be calculated so that the time average level of the microphone element is the same as the time average level of the one microphone element.

この構成によれば、複数のマイクロホン素子のうちの予め決められている１つのマイクロホン素子の出力レベルに対して、他のマイクロホン素子の出力レベルが揃うように、複数のマイクロホン素子間の感度差を補正することができる。 According to this configuration, the sensitivity difference between the plurality of microphone elements is adjusted so that the output level of the other microphone elements is equal to the predetermined output level of one of the plurality of microphone elements. Can be corrected.

また、上記の収音装置において、前記補正ゲイン算出部は、前記複数のマイクロホン素子のうちの予め決められている少なくとも２つのマイクロホン素子の前記時間平均レベルの平均値を基準として、前記複数のマイクロホン素子の前記時間平均レベルが前記少なくとも２つのマイクロホン素子の前記時間平均レベルの前記平均値と同じになるように前記複数のマイクロホン素子の前記補正ゲインを算出してもよい。 Further, in the above-mentioned sound collection device, the correction gain calculation unit calculates the time average level of at least two predetermined microphone elements of the plurality of microphone elements as a reference. The correction gains of the plurality of microphone elements may be calculated such that the time average level of the element is the same as the average value of the time average levels of the at least two microphone elements.

この構成によれば、複数のマイクロホン素子のうちの予め決められている少なくとも２つのマイクロホン素子の出力レベルの平均値に対して、複数のマイクロホン素子の出力レベルが揃うように、複数のマイクロホン素子間の感度差を補正することができる。 According to this configuration, between the plurality of microphone elements, the output levels of the plurality of microphone elements are equalized with respect to the average value of the output levels of at least two microphone elements determined in advance among the plurality of microphone elements. The difference in sensitivity can be corrected.

また、上記の収音装置において、前記ゲイン制御部は、前記複数のマイクロホン素子それぞれの出力信号から特定の帯域の信号を抽出する第３抽出部を含み、前記レベル検出部は、前記第３抽出部によって抽出された各信号の出力レベルを検出してもよい。 Further, in the above sound collection device, the gain control section includes a third extraction section that extracts a signal in a specific band from the output signal of each of the plurality of microphone elements, and the level detection section includes the third extraction section. The output level of each signal extracted by the section may be detected.

この構成によれば、複数のマイクロホン素子それぞれの出力信号から抽出された特定の帯域の各信号の出力レベルが検出されるので、目的音以外のノイズによる影響を低減することができる。 According to this configuration, since the output level of each signal in a specific band extracted from the output signals of each of the plurality of microphone elements is detected, the influence of noise other than the target sound can be reduced.

また、上記の収音装置において、前記特定の帯域は、２００Ｈｚから５００Ｈｚの帯域であってもよい。 Moreover, in the above-mentioned sound collection device, the specific band may be a band from 200Hz to 500Hz.

この構成によれば、複数のマイクロホン素子それぞれの出力信号から抽出された２００Ｈｚから５００Ｈｚの帯域の各信号の出力レベルが検出される。したがって、２００Ｈｚ以下の低域ノイズが除去されることで低域ノイズの影響を低減することができる。また、５００Ｈｚ以上の帯域が除去されることでマイクロホンアレイの大きさよりも十分長い波長の音に限定し、マイクロホンアレイを構成するマイクロホン素子の位置による音圧の差異が少なくなる。これにより、精度良い感度補正が可能となる。 According to this configuration, the output level of each signal in the band from 200 Hz to 500 Hz extracted from the output signal of each of the plurality of microphone elements is detected. Therefore, by removing low-frequency noise of 200 Hz or less, the influence of low-frequency noise can be reduced. Furthermore, by removing the band of 500 Hz or more, the sound is limited to sounds with a wavelength sufficiently longer than the size of the microphone array, and the difference in sound pressure depending on the position of the microphone elements forming the microphone array is reduced. This enables accurate sensitivity correction.

本開示の他の態様に係る収音方法は、コンピュータが、複数のマイクロホン素子の出力信号にゲインを掛けることにより前記複数のマイクロホン素子間の感度差を補正し、発話者の音声を目的音として検出し、前記目的音の検出結果に基づいて前記ゲインを制御し、補正された前記複数のマイクロホン素子の前記出力信号を用いて、所定の方向から到来する前記目的音を強調して収音し、前記ゲインの制御において、前記発話者の前記音声が検出された場合、前記複数のマイクロホン素子の前記出力信号に基づいて前記ゲインを更新し、前記発話者の前記音声が検出されない場合、前記ゲインを更新しない。 In a sound collection method according to another aspect of the present disclosure, the computer corrects the sensitivity difference between the plurality of microphone elements by multiplying the output signals of the plurality of microphone elements by a gain, and uses the voice of the speaker as the target sound. detecting the target sound, controlling the gain based on the detection result of the target sound, and using the corrected output signals of the plurality of microphone elements to emphasize and collect the target sound coming from a predetermined direction. , in controlling the gain, if the voice of the speaker is detected, the gain is updated based on the output signals of the plurality of microphone elements, and if the voice of the speaker is not detected, the gain is updated. is not updated.

本開示の他の態様に係る収音プログラムは、複数のマイクロホン素子の出力信号にゲインを掛けることにより前記複数のマイクロホン素子間の感度差を補正する感度補正部と、発話者の音声を目的音として検出する目的音検出部と、前記目的音検出部の検出結果に基づいて前記ゲインを制御するゲイン制御部と、前記感度補正部によって補正された前記複数のマイクロホン素子の前記出力信号を用いて、所定の方向から到来する前記目的音を強調して収音する指向性合成部としてコンピュータを機能させ、前記ゲイン制御部は、前記目的音検出部によって前記発話者の前記音声が検出された場合、前記複数のマイクロホン素子の前記出力信号に基づいて前記ゲインを更新し、前記目的音検出部によって前記発話者の前記音声が検出されない場合、前記ゲインを更新しない。 A sound collection program according to another aspect of the present disclosure includes a sensitivity correction unit that corrects sensitivity differences between the plurality of microphone elements by multiplying output signals of the plurality of microphone elements by a gain, and converts the voice of a speaker into a target sound. a target sound detection unit that detects the target sound as a target sound, a gain control unit that controls the gain based on the detection result of the target sound detection unit, and the output signals of the plurality of microphone elements corrected by the sensitivity correction unit. , the computer functions as a directional synthesis unit that emphasizes and collects the target sound coming from a predetermined direction, and the gain control unit controls when the voice of the speaker is detected by the target sound detection unit. , the gain is updated based on the output signals of the plurality of microphone elements, and the gain is not updated when the target sound detection unit does not detect the voice of the speaker.

以下、本開示の実施の形態について図面を参照して詳細に説明する。なお、以下の実施の形態は、本開示を具体化した一例であって、本開示の技術的範囲を限定するものではない。 Embodiments of the present disclosure will be described in detail below with reference to the drawings. Note that the following embodiments are examples that embody the present disclosure, and do not limit the technical scope of the present disclosure.

（実施の形態１）
図１は、本開示の実施の形態１における収音装置の構成を示すブロック図である。 (Embodiment 1)
FIG. 1 is a block diagram showing the configuration of a sound collection device according to Embodiment 1 of the present disclosure.

図１に示す収音装置１０１は、マイクロホンアレイ１、感度補正部２、目的音検出部３、感度補正制御部（ゲイン制御部）４及び指向性合成部５を備える。 The sound collection device 101 shown in FIG. 1 includes a microphone array 1, a sensitivity correction section 2, a target sound detection section 3, a sensitivity correction control section (gain control section) 4, and a directivity synthesis section 5.

マイクロホンアレイ１は、音響信号を電気信号に変換するｎ個（ｎは自然数）のマイクロホン素子１１，１２，・・・，１ｎを含む。マイクロホンアレイ１は、複数のマイクロホン素子を含む。 The microphone array 1 includes n microphone elements 11, 12, . . . , 1n (n is a natural number) that convert acoustic signals into electrical signals. Microphone array 1 includes a plurality of microphone elements.

図２は、本開示の実施の形態１におけるマイクロホンアレイの設置位置の一例を示す図であり、図３は、本開示の実施の形態１におけるマイクロホンアレイのマイクロホン素子の配置例を示す図である。 FIG. 2 is a diagram illustrating an example of the installation position of the microphone array according to Embodiment 1 of the present disclosure, and FIG. 3 is a diagram illustrating an example of the arrangement of microphone elements of the microphone array according to Embodiment 1 of the present disclosure. .

図２に示すように、本実施の形態１におけるマイクロホンアレイ１は、車両内のディスプレイ２０１の近傍に配置される。ディスプレイ２０１は、カーナビゲーションシステムの構成要素である。また、ディスプレイ２０１の下方には、空気調和機の吹き出し口２０２が設けられている。吹き出し口２０２からは、冷却された空気又は暖められた空気が出力される。 As shown in FIG. 2, the microphone array 1 in the first embodiment is arranged near a display 201 in the vehicle. Display 201 is a component of a car navigation system. Further, below the display 201, an air outlet 202 of an air conditioner is provided. Cooled air or warmed air is output from the air outlet 202 .

また、図３に示すマイクロホンアレイ１は、例えば、４つのマイクロホン素子１１，１２，１３，１４を備える。マイクロホン素子１１，１２，１３，１４は、四角形の基板上の四隅にそれぞれ配置される。基板の下部に配置されたマイクロホン素子１１，１２の水平方向の間隔は、例えば２ｃｍである。また、基板の上部に配置されたマイクロホン素子１３，１４の水平方向の間隔は、例えば２ｃｍである。さらに、マイクロホン素子１１，１３の垂直方向の間隔は、例えば２ｃｍであり、マイクロホン素子１２，１４の垂直方向の間隔は、例えば２ｃｍである。 Further, the microphone array 1 shown in FIG. 3 includes, for example, four microphone elements 11, 12, 13, and 14. Microphone elements 11, 12, 13, and 14 are arranged at each of the four corners of a rectangular substrate. The horizontal distance between the microphone elements 11 and 12 arranged at the bottom of the substrate is, for example, 2 cm. Further, the horizontal interval between the microphone elements 13 and 14 arranged on the upper part of the substrate is, for example, 2 cm. Further, the vertical distance between the microphone elements 11 and 13 is, for example, 2 cm, and the vertical distance between the microphone elements 12 and 14 is, for example, 2 cm.

マイクロホンアレイ１と吹き出し口２０２との間隔は、例えば２ｃｍである。マイクロホンアレイ１は、運転席に座る発話者の音声を目的音として取得する。このとき、吹き出し口２０２から出力される空気の音がノイズとして目的音に含まれる。吹き出し口２０２に最も近いマイクロホン素子１３と、吹き出し口２０２との間隔は２ｃｍであり、吹き出し口２０２から最も遠いマイクロホン素子１１と、吹き出し口２０２との間隔は４ｃｍである。マイクロホン素子１１と吹き出し口２０２との間隔は、マイクロホン素子１３と吹き出し口２０２との間隔の２倍となる。 The distance between the microphone array 1 and the air outlet 202 is, for example, 2 cm. The microphone array 1 acquires the voice of a speaker sitting in a driver's seat as a target sound. At this time, the sound of the air output from the air outlet 202 is included as noise in the target sound. The distance between the microphone element 13 closest to the air outlet 202 and the air outlet 202 is 2 cm, and the distance between the microphone element 11 furthest from the air outlet 202 and the air outlet 202 is 4 cm. The distance between the microphone element 11 and the air outlet 202 is twice the distance between the microphone element 13 and the air outlet 202.

この場合、ノイズ源である吹き出し口２０２と各マイクロホン素子１１，１３との距離差は無視することができず、吹き出し口２０２から発生したノイズが各マイクロホン素子１１，１３の位置での音圧差となって表れる。このようなマイクロホンアレイ１の近傍にあるノイズ源から発生したノイズを用いてマイクロホン素子１１～１４の感度補正が行われた場合、正しく感度補正が行えず、かえってその後段の指向性合成部（ビームフォーマ）の出力の性能を劣化させるおそれがあった。そこで、本実施の形態１における収音装置１０１は、目的音を用いてマイクロホン素子１１～１４の感度補正を行う。 In this case, the distance difference between the air outlet 202, which is a noise source, and each microphone element 11, 13 cannot be ignored, and the noise generated from the air outlet 202 is equal to the sound pressure difference at the position of each microphone element 11, 13. It appears. If the sensitivity of the microphone elements 11 to 14 is corrected using noise generated from a noise source near the microphone array 1, the sensitivity may not be corrected correctly, and the directivity synthesis section (beam There was a risk that the output performance of the former (former) would deteriorate. Therefore, the sound collection device 101 in the first embodiment corrects the sensitivity of the microphone elements 11 to 14 using the target sound.

なお、マイクロホンアレイ１が備えるマイクロホン素子の数は、４つに限定されない。また、複数のマイクロホン素子の配置位置についても、図３に示す配置位置に限定されない。 Note that the number of microphone elements included in the microphone array 1 is not limited to four. Furthermore, the arrangement positions of the plurality of microphone elements are not limited to the arrangement positions shown in FIG. 3.

マイクロホン素子１１，１２，・・・，１ｎのうちの１つのマイクロホン素子１１の出力信号は、目的音検出部３に入力される。また、マイクロホン素子１１，１２，・・・，１ｎの各出力信号は、感度補正部２及び感度補正制御部４に入力される。 An output signal from one of the microphone elements 11, 12, . . . , 1n is input to the target sound detection section 3. Further, each output signal of the microphone elements 11, 12, . . . , 1n is input to the sensitivity correction section 2 and the sensitivity correction control section 4.

感度補正部２は、複数のマイクロホン素子１１，１２，・・・，１ｎの出力信号にゲインを掛けることにより複数のマイクロホン素子１１，１２，・・・，１ｎ間の感度差を補正する。感度補正部２は、各マイクロホン素子１１，１２，・・・，１ｎの出力信号に指定されたゲインを乗じることにより各マイクロホン素子１１，１２，・・・，１ｎの感度のばらつきを補正する。感度補正部２は、複数のマイクロホン素子１１，１２，・・・，１ｎ間の感度を揃える。 The sensitivity correction section 2 corrects the sensitivity difference between the plurality of microphone elements 11, 12, . . . , 1n by multiplying the output signals of the plurality of microphone elements 11, 12, . . . , 1n by gain. The sensitivity correction unit 2 corrects variations in sensitivity of each microphone element 11, 12, . . . , 1n by multiplying the output signal of each microphone element 11, 12, . . . , 1n by a specified gain. The sensitivity correction section 2 equalizes the sensitivity among the plurality of microphone elements 11, 12, . . . , 1n.

目的音検出部３は、発話者の音声を目的音として検出する。目的音検出部３は、マイクロホン素子１１，１２，・・・，１ｎのうちの１つのマイクロホン素子１１の出力信号を取得し、マイクロホンアレイ１で収音する目的音の有無を検出する。なお、本実施の形態１では、音声判定部３２は、マイクロホン素子１１の出力信号を用いて、目的音の有無を検出しているが、本開示は特にこれに限定されない。音声判定部３２は、マイクロホン素子１１，１２，・・・，１ｎのうちのいずれか１つの出力信号を用いて目的音の有無を検出してもよい。 The target sound detection unit 3 detects the speaker's voice as the target sound. The target sound detection unit 3 acquires the output signal of one of the microphone elements 11, 12, . Note that in the first embodiment, the audio determination unit 32 uses the output signal of the microphone element 11 to detect the presence or absence of a target sound, but the present disclosure is not particularly limited thereto. The audio determination unit 32 may detect the presence or absence of the target sound using the output signal of any one of the microphone elements 11, 12, . . . , 1n.

なお、目的音検出部３の構成については、図４及び図５を用いて更に詳細に説明する。 Note that the configuration of the target sound detection section 3 will be explained in more detail using FIGS. 4 and 5.

感度補正制御部４は、目的音検出部３の検出結果に基づいてゲインを制御する。感度補正制御部４は、各マイクロホン素子１１，１２，・・・，１ｎの出力信号を取得し、目的音検出部３によって目的音が検出された場合に、感度補正部２における各マイクロホン素子１１，１２，・・・，１ｎからの出力信号に対する感度補正ゲインを算出する。 The sensitivity correction control section 4 controls the gain based on the detection result of the target sound detection section 3. The sensitivity correction control section 4 acquires the output signal of each microphone element 11, 12, . , 12, . . . , 1n.

感度補正制御部４は、目的音検出部３によって発話者の音声が検出された場合、複数のマイクロホン素子の出力信号に基づいてゲインを更新し、目的音検出部３によって発話者の音声が検出されない場合、ゲインを更新しない。なお、感度補正制御部４の構成については、図６を用いて更に詳細に説明する。 When the target sound detection unit 3 detects the speaker's voice, the sensitivity correction control unit 4 updates the gain based on the output signals of the plurality of microphone elements, and the target sound detection unit 3 detects the speaker's voice. If not, do not update the gain. Note that the configuration of the sensitivity correction control section 4 will be explained in more detail using FIG. 6.

指向性合成部（ビームフォーマ）５は、感度補正部２によって補正された複数のマイクロホン素子の出力信号を用いて、所定の方向から到来する目的音を強調して収音する。指向性合成部５は、感度補正部２によって補正された各マイクロホン素子１１，１２，・・・，１ｎの出力信号を取得し、目的音のＳ／Ｎ比を改善する。 A directional synthesis section (beamformer) 5 uses the output signals of the plurality of microphone elements corrected by the sensitivity correction section 2 to emphasize and collect the target sound coming from a predetermined direction. The directivity synthesis section 5 acquires the output signals of the microphone elements 11, 12, . . . , 1n corrected by the sensitivity correction section 2, and improves the S/N ratio of the target sound.

続いて、図１に示す目的音検出部３の構成について更に説明する。 Next, the configuration of the target sound detection section 3 shown in FIG. 1 will be further explained.

図４は、本開示の実施の形態１における収音装置の目的音検出部の構成を示すブロック図である。 FIG. 4 is a block diagram showing the configuration of the target sound detection section of the sound collection device according to Embodiment 1 of the present disclosure.

図４に示す目的音検出部３は、帯域通過フィルタ部（第１抽出部）３１及び音声判定部３２を備える。 The target sound detection section 3 shown in FIG. 4 includes a bandpass filter section (first extraction section) 31 and a voice determination section 32.

帯域通過フィルタ部３１は、複数のマイクロホン素子１１，１２，・・・，１ｎのうちの１つのマイクロホン素子１１の出力信号から特定の帯域の信号を抽出する。帯域通過フィルタ部３１は、マイクロホン素子１１の出力信号から、例えば２００Ｈｚから５００Ｈｚの帯域の信号を抽出する。帯域通過フィルタ部３１は、マイクロホン素子１１の出力信号から、人の発話した音声を抽出可能な帯域の信号を抽出する。 The bandpass filter section 31 extracts a signal in a specific band from the output signal of one microphone element 11 among the plurality of microphone elements 11, 12, . . . , 1n. The bandpass filter section 31 extracts a signal in a band from 200 Hz to 500 Hz, for example, from the output signal of the microphone element 11. The bandpass filter section 31 extracts a signal in a band from the output signal of the microphone element 11 in which the voice uttered by a person can be extracted.

音声判定部３２は、複数のマイクロホン素子１１，１２，・・・，１ｎのうちの１つのマイクロホン素子１１の出力信号が音声と音声以外の非音声とのいずれであるかを判定する。音声判定部３２は、帯域通過フィルタ部３１によって抽出された信号に対して音声と非音声とのいずれであるかを判定する。 The audio determination unit 32 determines whether the output signal of one microphone element 11 among the plurality of microphone elements 11, 12, . . . , 1n is audio or non-audio other than audio. The voice determination section 32 determines whether the signal extracted by the bandpass filter section 31 is voice or non-voice.

続いて、図４に示す音声判定部３２の構成について更に説明する。 Next, the configuration of the voice determining section 32 shown in FIG. 4 will be further explained.

図５は、本開示の実施の形態１における収音装置の音声判定部の構成を示すブロック図である。 FIG. 5 is a block diagram showing the configuration of the audio determination section of the sound collection device according to Embodiment 1 of the present disclosure.

音声判定部３２は、レベル検出部３２１、ノイズレベル検出部３２２、比較部３２３、時間－周波数変換部３２４、音声特徴量抽出部３２５及び判定部３２６を備える。 The voice determination section 32 includes a level detection section 321, a noise level detection section 322, a comparison section 323, a time-frequency conversion section 324, a voice feature amount extraction section 325, and a determination section 326.

レベル検出部３２１は、マイクロホン素子１１の出力信号の信号レベルを検出する。 The level detection section 321 detects the signal level of the output signal of the microphone element 11.

ノイズレベル検出部３２２は、レベル検出部３２１によって検出された信号レベルのミニマム値をホールドすることでノイズレベルを検出する。 The noise level detection section 322 detects the noise level by holding the minimum value of the signal level detected by the level detection section 321.

比較部３２３は、レベル検出部３２１の出力とノイズレベル検出部３２２の出力とを比較して波形レベルでの音声の有無を判定する。例えば、比較部３２３は、ノイズレベル検出部３２２によって検出されたノイズレベルの２倍の値を閾値に設定する。そして、比較部３２３は、レベル検出部３２１によって検出された信号レベルが閾値以上であるか否かを判定する。比較部３２３は、レベル検出部３２１によって検出された信号レベルが閾値以上である場合、マイクロホン素子１１の出力信号に音声が含まれると判定する。一方、比較部３２３は、レベル検出部３２１によって検出された信号レベルが閾値より小さい場合、マイクロホン素子１１の出力信号に音声が含まれないと判定する。 The comparison unit 323 compares the output of the level detection unit 321 and the output of the noise level detection unit 322 to determine the presence or absence of audio at the waveform level. For example, the comparison unit 323 sets a value twice the noise level detected by the noise level detection unit 322 as the threshold value. Then, the comparison unit 323 determines whether the signal level detected by the level detection unit 321 is equal to or higher than the threshold value. The comparison unit 323 determines that the output signal of the microphone element 11 includes audio when the signal level detected by the level detection unit 321 is equal to or higher than the threshold value. On the other hand, when the signal level detected by the level detection section 321 is smaller than the threshold value, the comparison section 323 determines that the output signal of the microphone element 11 does not include audio.

時間－周波数変換部３２４は、マイクロホン素子１１の時間領域の出力信号を周波数領域の出力信号に変換する。 The time-frequency converter 324 converts the time domain output signal of the microphone element 11 into a frequency domain output signal.

音声特徴量抽出部３２５は、周波数領域の出力信号から音声特徴量を抽出する。音声特徴量は、音声を示す特徴量である。音声特徴量抽出部３２５は、特許第５４５０２９８号明細書に示すような音声ピッチを用いて音声特徴量を抽出する方法、又は、特許第３８４９１１６号明細書に示すような調波構造の性質を特徴量として用いて音声特徴量を抽出する方法を用いてもよい。収音装置１０１が車載される場合には、図２に示すように、コンソールに埋め込まれたディスプレイ２０１周辺にマイクロホンアレイ１が組み込まれる。そのため、ノイズ源は、空気調和機の吹き出し口２０２となる。この場合、雑音のスペクトルは比較的単調であるため、音声特徴量抽出部３２５は、振幅スペクトルの交流成分又は振幅スペクトルのピークとディップとの比を音声特徴量として抽出してもよい。これにより、空気調和機の吹き出し口２０２から発生するノイズと音声とを判別することができる。 The audio feature extraction unit 325 extracts audio features from the frequency domain output signal. The audio feature amount is a feature amount indicating audio. The audio feature extracting unit 325 uses a method of extracting audio features using audio pitch as shown in Japanese Patent No. 5450298, or a method of extracting audio features using audio pitch as shown in Japanese Patent No. 3849116. Alternatively, a method may be used in which the voice feature is extracted using the voice feature as a quantity. When the sound collection device 101 is mounted on a vehicle, the microphone array 1 is installed around a display 201 embedded in a console, as shown in FIG. Therefore, the noise source is the air outlet 202 of the air conditioner. In this case, since the noise spectrum is relatively monotonous, the audio feature extraction unit 325 may extract the alternating current component of the amplitude spectrum or the ratio between the peak and dip of the amplitude spectrum as the audio feature. This makes it possible to distinguish between noise and voice generated from the air outlet 202 of the air conditioner.

判定部３２６は、比較部３２３によってマイクロホン素子１１の出力信号に音声が含まれると判定され、かつ音声特徴量抽出部３２５によってマイクロホン素子１１の出力信号から音声特徴量が抽出された場合、マイクロホン素子１１の出力信号が音声であると判定する。一方、判定部３２６は、比較部３２３によってマイクロホン素子１１の出力信号に音声が含まれないと判定された場合、又は音声特徴量抽出部３２５によってマイクロホン素子１１の出力信号から音声特徴量が抽出されない場合、マイクロホン素子１１の出力信号が非音声であると判定する。判定部３２６は、音声及び非音声のいずれかを示す判定結果信号Ｏｄｅｔ（ｊ）を感度補正制御部４へ出力する。なお、ｊは時間に対応するサンプル番号を示す。 If the comparison unit 323 determines that the output signal of the microphone element 11 includes voice, and the voice feature extraction unit 325 extracts the voice feature from the output signal of the microphone element 11, the determination unit 326 determines whether the microphone element It is determined that the output signal of No. 11 is audio. On the other hand, if the comparing unit 323 determines that the output signal of the microphone element 11 does not include audio, or the audio feature extracting unit 325 does not extract the audio feature from the output signal of the microphone element 11, In this case, it is determined that the output signal of the microphone element 11 is non-speech. The determination unit 326 outputs a determination result signal Odet(j) indicating either voice or non-voice to the sensitivity correction control unit 4. Note that j indicates a sample number corresponding to time.

その結果、目的音検出部３は、マイクロホン素子１１の出力信号が音声であると判定した場合、判定結果信号Ｏｄｅｔ（ｊ）＝１を出力し、マイクロホン素子１１の出力信号が非音声であると判定した場合、判定結果信号Ｏｄｅｔ（ｊ）＝０を出力する。 As a result, when the target sound detection unit 3 determines that the output signal of the microphone element 11 is voice, it outputs the determination result signal Odet(j)=1, and when the output signal of the microphone element 11 is determined to be non-voice. If it is determined, a determination result signal Odet(j)=0 is output.

続いて、図１に示す感度補正制御部４の構成について更に説明する。 Next, the configuration of the sensitivity correction control section 4 shown in FIG. 1 will be further explained.

図６は、本開示の実施の形態１における収音装置の感度補正制御部の構成を示すブロック図である。 FIG. 6 is a block diagram showing the configuration of the sensitivity correction control section of the sound pickup device according to Embodiment 1 of the present disclosure.

感度補正制御部４は、第１～ｎ帯域通過フィルタ部（第３抽出部）４１１～４１ｎ、第１～ｎレベル検出部４２１～４２ｎ、第１～ｎ平均レベル算出部（時間平均レベル算出部）４３１～４３ｎ及び補正ゲイン算出部４４を備える。第１～ｎ帯域通過フィルタ部４１１～４１ｎ、第１～ｎレベル検出部４２１～４２ｎ及び第１～ｎ平均レベル算出部４３１～４３ｎは、それぞれマイクロホン素子１１～１ｎの数に応じて設けられる。例えば、マイクロホン素子１１の出力信号ｘ（１，ｊ）は、第１帯域通過フィルタ部４１１に入力される。 The sensitivity correction control section 4 includes first to n band pass filter sections (third extraction section) 411 to 41n, first to n level detection sections 421 to 42n, and first to nth average level calculation sections (time average level calculation section). ) 431 to 43n and a correction gain calculation unit 44. The first to nth bandpass filter sections 411 to 41n, the first to nth level detection sections 421 to 42n, and the first to nth average level calculation sections 431 to 43n are provided according to the number of microphone elements 11 to 1n, respectively. For example, the output signal x(1,j) of the microphone element 11 is input to the first band pass filter section 411.

第１～ｎ帯域通過フィルタ部４１１～４１ｎは、複数のマイクロホン素子１１～１ｎそれぞれの出力信号から特定の帯域の信号を抽出する。なお、特定の帯域は、２００Ｈｚから５００Ｈｚの帯域である。 The first to nth bandpass filter sections 411 to 41n extract signals in specific bands from the output signals of the plurality of microphone elements 11 to 1n, respectively. Note that the specific band is a band from 200Hz to 500Hz.

第１～ｎレベル検出部４２１～４２ｎは、複数のマイクロホン素子１１～１ｎそれぞれの出力信号の出力レベルを検出する。 The first to nth level detection units 421 to 42n detect the output levels of the output signals of the plurality of microphone elements 11 to 1n, respectively.

第１～ｎレベル検出部４２１～４２ｎは、各マイクロホン素子の出力信号ｘ（ｉ，ｊ）の出力レベルＬｘ（ｉ，ｊ）を下記の一般的な振幅平滑化の式（１）を用いて検出する。 The first to nth level detection units 421 to 42n calculate the output level Lx (i, j) of the output signal x (i, j) of each microphone element using the following general amplitude smoothing formula (1). To detect.

Ｌｘ（ｉ，ｊ）＝ｂｅｔａ１・｜ｘ（ｉ，ｊ）｜＋（１－ｂｅｔａ１）・Ｌｘ（ｉ，ｊ－１）・・・（１） Lx(i,j)=beta1・|x(i,j)|+(1−beta1)・Lx(i,j−1)...(1)

式（１）において、ｉはマイクロホン素子番号を示し、ｊは時間に対応するサンプル番号を示す。また、式（１）において、ｂｅｔａ１は、重み係数を示し、平均化の速度を決めるパラメータである。 In equation (1), i indicates a microphone element number, and j indicates a sample number corresponding to time. Furthermore, in equation (1), beta1 indicates a weighting coefficient and is a parameter that determines the speed of averaging.

また、本実施の形態１では、第１～ｎ帯域通過フィルタ部４１１～４１ｎを通過した出力信号ｘｂｐｆ（ｉ，ｊ）が第１～ｎレベル検出部４２１～４２ｎに入力される。そのため、第１～ｎレベル検出部４２１～４２ｎは、第１～ｎ帯域通過フィルタ部４１１～４１ｎによって抽出された各マイクロホン素子の出力信号ｘｂｐｆ（ｉ，ｊ）の出力レベルＬｘ（ｉ，ｊ）を下記の一般的な振幅平滑化の式（２）を用いて検出する。 Furthermore, in the first embodiment, the output signal xbpf(i,j) that has passed through the first to n band pass filter sections 411 to 41n is input to the first to n level detection sections 421 to 42n. Therefore, the first to nth level detection units 421 to 42n detect the output level Lx(i,j) of the output signal xbpf(i,j) of each microphone element extracted by the first to nth bandpass filter units 411 to 41n. is detected using the following general amplitude smoothing equation (2).

Ｌｘ（ｉ，ｊ）＝ｂｅｔａ１・｜ｘｂｐ（ｉ，ｊ）｜＋（１－ｂｅｔａ１）・Ｌｘ（ｉ，ｊ－１）・・・（２） Lx(i,j)=beta1・|xbp(i,j)|+(1−beta1)・Lx(i,j−1)...(2)

第１～ｎ平均レベル算出部４３１～４３ｎは、目的音検出部３によって発話者の音声が検出された場合に、第１～ｎレベル検出部４２１～４２ｎによって検出された各出力レベルＬｘ（ｉ，ｊ）の時間平均レベルＡｖｅｘ（ｉ，ｊ）を算出する。 The first to nth average level calculation units 431 to 43n calculate each output level Lx(i , j) is calculated.

第１～ｎ平均レベル算出部４３１～４３ｎは、目的音検出部３によって目的音が検出される期間（判定結果信号Ｏｄｅｔ（ｊ）＝１）のみ、各マイクロホン素子の出力レベルＬｘ（ｉ，ｊ）の長時間の平均値（時間平均レベルＡｖｅｘ（ｉ，ｊ））を下記の式（３）を用いて算出する。また、第１～ｎ平均レベル算出部４３１～４３ｎは、目的音検出部３によって目的音が検出されない期間（判定結果信号Ｏｄｅｔ（ｊ）＝０）、時間平均レベルＡｖｅｘ（ｉ，ｊ）を下記の式（４）を用いて算出する。すなわち、第１～ｎ平均レベル算出部４３１～４３ｎは、目的音検出部３によって発話者の音声が検出されなかった場合に、前回算出された時間平均レベルＡｖｅｘ（ｉ，ｊ－１）を今回の時間平均レベルＡｖｅｘ（ｉ，ｊ）として算出する。 The first to nth average level calculation units 431 to 43n calculate the output level Lx(i, j ) (time average level Avex(i,j)) is calculated using the following equation (3). In addition, the first to nth average level calculation units 431 to 43n calculate the time average level Avex (i, j) as follows during a period in which the target sound is not detected by the target sound detection unit 3 (judgment result signal Odet (j) = 0). Calculated using equation (4). That is, when the target sound detection unit 3 does not detect the speaker's voice, the first to nth average level calculation units 431 to 43n use the previously calculated time average level Avex (i, j−1) this time. is calculated as the time average level Avex(i,j).

Ａｖｅｘ（ｉ，ｊ）＝ｂｅｔａ２・｜Ｌｘ（ｉ，ｊ）｜＋（１－ｂｅｔａ２）・Ａｖｅｘ（ｉ，ｊ－１）ｉｆＯｄｅｔ（ｊ）＝１・・・（３） Avex(i,j)=beta2・|Lx(i,j)|+(1−beta2)・Avex(i,j−1) if Odet(j)=1...(3)

Ａｖｅｘ（ｉ，ｊ）＝Ａｖｅｘ（ｉ，ｊ－１）ｉｆＯｄｅｔ（ｊ）＝０・・・（４） Avex(i,j)=Avex(i,j-1) if Odet(j)=0...(4)

式（３）及び式（４）において、ｉはマイクロホン素子番号を示し、ｊは時間に対応するサンプル番号を示す。また、式（３）において、ｂｅｔａ２は、重み係数であり、平均化の速度を決めるパラメータである。また、ｂｅｔａ１＞＞ｂｅｔａ２である。例えば、サンプリング周波数が１６ｋＨｚである場合、ｂｅｔａ１は、１００ｍ秒での平均レベルとなるように０．０００６２５に設定され、ｂｅｔａ２は、５秒での平均となるように０．００００１２５に設定される。マイクロホン素子の感度補正に用いる平均信号レベルに長時間の平均レベルが用いられることで正確に感度補正ゲインを算出することができる。 In equations (3) and (4), i indicates a microphone element number, and j indicates a sample number corresponding to time. Furthermore, in equation (3), beta2 is a weighting coefficient and is a parameter that determines the speed of averaging. Also, beta1>>beta2. For example, if the sampling frequency is 16 kHz, beta1 is set to 0.000625 to be the average level over 100 msec, and beta2 is set to 0.0000125 to be the average level over 5 seconds. By using a long-term average level as the average signal level used for sensitivity correction of the microphone element, it is possible to accurately calculate the sensitivity correction gain.

補正ゲイン算出部４４は、第１～ｎ平均レベル算出部４３１～４３ｎによって算出された時間平均レベルから、ゲインを更新した感度補正ゲインを算出する。 The correction gain calculation unit 44 calculates a sensitivity correction gain with the gain updated from the time average level calculated by the first to nth average level calculation units 431 to 43n.

補正ゲイン算出部４４は、複数のマイクロホン素子１１～１ｎのうちの予め決められている１つのマイクロホン素子１１の時間平均レベルを基準として、１つのマイクロホン素子１１以外の他のマイクロホン素子１２～１ｎの時間平均レベルが１つのマイクロホン素子１１の時間平均レベルと同じになるように他のマイクロホン素子１２～１ｎの感度補正ゲインを算出する。すなわち、補正ゲイン算出部４４は、第１～ｎ平均レベル算出部４３１～４３ｎによって算出された各マイクロホン素子１１～１ｎの時間平均レベルＡｖｅｘ（ｉ，ｊ）と、マイクロホン素子１１の時間平均レベルＡｖｅｘ（１，ｊ）とを用いて、下記の式（５）により感度補正ゲインＧ（ｉ，ｊ）を算出する。 The correction gain calculation unit 44 calculates the level of the microphone elements 12 to 1n other than the one microphone element 11 based on the time average level of one microphone element 11 determined in advance among the plurality of microphone elements 11 to 1n. The sensitivity correction gains of the other microphone elements 12 to 1n are calculated so that the time average level is the same as the time average level of one microphone element 11. That is, the correction gain calculation unit 44 calculates the time average level Avex(i,j) of each of the microphone elements 11 to 1n calculated by the first to nth average level calculation units 431 to 43n, and the time average level Avex of the microphone element 11. (1, j), the sensitivity correction gain G(i, j) is calculated by the following equation (5).

Ｇ（ｉ，ｊ）＝Ａｖｅｘ（１，ｊ）／Ａｖｅｘ（ｉ，ｊ）・・・（５） G(i,j)=Avex(1,j)/Avex(i,j)...(5)

上記の式（５）の感度補正ゲインが用いられる場合は、マイクロホン素子１１を基準として、その他のマイクロホン素子１２～１ｎの出力レベルが揃うように感度補正が行われることになる。 When the sensitivity correction gain of the above equation (5) is used, sensitivity correction is performed so that the output levels of the other microphone elements 12 to 1n are equalized with respect to the microphone element 11.

なお、上記の式（５）では、補正ゲイン算出部４４は、予め決められている１つのマイクロホン素子１１の時間平均レベルを基準として感度補正ゲインを算出しているが、本開示は特にこれに限定されない。補正ゲイン算出部４４は、マイクロホン素子１１とは異なる他の１つのマイクロホン素子の時間平均レベルを基準として感度補正ゲインを算出してもよい。 Note that in the above equation (5), the correction gain calculation unit 44 calculates the sensitivity correction gain based on the predetermined time average level of one microphone element 11, but the present disclosure particularly focuses on this. Not limited. The correction gain calculation unit 44 may calculate the sensitivity correction gain based on the time average level of another microphone element different from the microphone element 11.

また、補正ゲイン算出部４４は、複数のマイクロホン素子１１～１ｎのうちの予め決められている少なくとも２つのマイクロホン素子の時間平均レベルの平均値を基準として、複数のマイクロホン素子１１～１ｎの時間平均レベルが少なくとも２つのマイクロホン素子の時間平均レベルの平均値と同じになるように複数のマイクロホン素子１１～１ｎの感度補正ゲインを算出してもよい。すなわち、補正ゲイン算出部４４は、第１～ｎ平均レベル算出部４３１～４３ｎによって算出された各マイクロホン素子１１～１ｎの時間平均レベルＡｖｅｘ（ｉ，ｊ）と、時間平均レベルＡｖｅｘ（ｉ，ｊ）の平均値とを用いて、下記の式（６）により感度補正ゲインＧ（ｉ，ｊ）を算出してもよい。 Further, the correction gain calculating unit 44 calculates the time average level of the plurality of microphone elements 11 to 1n based on the average value of the time average level of at least two microphone elements determined in advance among the plurality of microphone elements 11 to 1n. The sensitivity correction gains of the plurality of microphone elements 11 to 1n may be calculated so that the level is the same as the average value of the time-averaged levels of at least two microphone elements. That is, the correction gain calculation unit 44 calculates the time average level Avex(i, j) of each microphone element 11 to 1n calculated by the first to nth average level calculation units 431 to 43n, and the time average level Avex(i, j ) may be used to calculate the sensitivity correction gain G(i, j) using the following equation (6).

Ｇ（ｉ，ｊ）＝｛Ａｖｅｘ（１，ｊ）＋Ａｖｅｘ（２，ｊ）＋・・・＋Ａｖｅｘ（ｎ，ｊ）｝／ｎ／Ａｖｅｘ（ｉ，ｊ）・・・（６） G(i,j)={Avex(1,j)+Avex(2,j)+...+Avex(n,j)}/n/Avex(i,j)...(6)

なお、上記の式（６）では、補正ゲイン算出部４４は、マイクロホン素子１１～１ｎのうちの全てのマイクロホン素子１１～１ｎの時間平均レベルの平均値を基準として感度補正ゲインを算出しているが、本開示は特にこれに限定されない。補正ゲイン算出部４４は、マイクロホン素子１１～１ｎのうちの少なくとも２つのマイクロホン素子の時間平均レベルの平均値を基準として感度補正ゲインを算出してもよい。 Note that in the above equation (6), the correction gain calculation unit 44 calculates the sensitivity correction gain based on the average value of the time average levels of all the microphone elements 11 to 1n among the microphone elements 11 to 1n. However, the present disclosure is not particularly limited thereto. The correction gain calculation unit 44 may calculate the sensitivity correction gain based on the average value of the time average levels of at least two of the microphone elements 11 to 1n.

感度補正部２は、感度補正制御部４によって算出された各マイクロホン素子１１～１ｎに対応する感度補正ゲインＧ（ｉ，ｊ）を各マイクロホン素子１１～１ｎの出力信号ｘ（ｉ，ｊ）に乗じることで感度補正を行う。 The sensitivity correction section 2 applies the sensitivity correction gain G(i, j) corresponding to each microphone element 11 to 1n calculated by the sensitivity correction control section 4 to the output signal x(i, j) of each microphone element 11 to 1n. Sensitivity correction is performed by multiplying.

指向性合成部５は、感度補正部２によって補正された出力信号Ｇ（ｉ，ｊ）・ｘ（ｉ，ｊ）を用いて、特許文献１に示されるＧＳＣにより指向性合成（ビームフォーミング）する。また、指向性合成部５は、ＧＳＣ以外のビームフォーミング処理、例えば、ＭａｘｉｍｕｍＬｉｋｅｌｉｈｏｏｄ法又はＭｉｎｉｍｕｍＶａｒｉａｎｃｅ法などの既存のビームフォーミング処理によりビームフォーミングしてもよい。 The directivity synthesis unit 5 uses the output signals G(i,j) and x(i,j) corrected by the sensitivity correction unit 2 to perform directionality synthesis (beamforming) by GSC shown in Patent Document 1. . Further, the directivity combining unit 5 may perform beamforming using a beamforming process other than GSC, for example, an existing beamforming process such as the Maximum Likelihood method or the Minimum Variance method.

続いて、本開示の実施の形態１における収音装置１０１の動作について説明する。 Next, the operation of the sound collection device 101 in Embodiment 1 of the present disclosure will be described.

図７は、本開示の実施の形態１における収音装置の動作について説明するためのフローチャートである。 FIG. 7 is a flowchart for explaining the operation of the sound collection device in Embodiment 1 of the present disclosure.

まず、ステップＳ１において、目的音検出部３は、マイクロホン素子１１から出力信号を取得し、感度補正部２及び感度補正制御部４、各マイクロホン素子１１～１ｎから出力信号を取得する。 First, in step S1, the target sound detection unit 3 acquires an output signal from the microphone element 11, and acquires output signals from the sensitivity correction unit 2, the sensitivity correction control unit 4, and each of the microphone elements 11 to 1n.

次に、ステップＳ２において、目的音検出部３は、マイクロホン素子１１の出力信号から目的音（音声）が検出されたか否かを判定する。目的音検出部３は、マイクロホン素子１１の出力信号から目的音が検出されたか否かを示す判定結果信号を感度補正制御部４へ出力する。 Next, in step S2, the target sound detection unit 3 determines whether the target sound (voice) is detected from the output signal of the microphone element 11. The target sound detection unit 3 outputs a determination result signal indicating whether or not the target sound has been detected from the output signal of the microphone element 11 to the sensitivity correction control unit 4.

ここで、マイクロホン素子１１の出力信号から目的音が検出されたと判定された場合（ステップＳ２でＹＥＳ）、ステップＳ３において、感度補正制御部４は、複数のマイクロホン素子１１～１ｎの出力信号に基づいて感度補正ゲインを更新する。 Here, if it is determined that the target sound is detected from the output signal of the microphone element 11 (YES in step S2), in step S3, the sensitivity correction control unit 4 to update the sensitivity correction gain.

一方、マイクロホン素子１１の出力信号から目的音が検出されなかったと判定された場合（ステップＳ２でＮＯ）、感度補正ゲインが更新されずに、ステップＳ４に処理が移行する。 On the other hand, if it is determined that the target sound is not detected from the output signal of the microphone element 11 (NO in step S2), the process proceeds to step S4 without updating the sensitivity correction gain.

次に、ステップＳ４において、感度補正部２は、各マイクロホン素子１１～１ｎの出力信号に感度補正ゲインを掛けることにより各マイクロホン素子間の感度差を補正する。 Next, in step S4, the sensitivity correction section 2 corrects the sensitivity difference between each microphone element by multiplying the output signal of each microphone element 11 to 1n by a sensitivity correction gain.

次に、ステップＳ５において、指向性合成部５は、感度補正部２によって補正された各マイクロホン素子１１～１ｎの出力信号を用いて、指向性を合成する。指向性が合成されることにより、所定の方向から到来する目的音が強調して収音される。 Next, in step S5, the directivity synthesis unit 5 uses the output signals of the microphone elements 11 to 1n corrected by the sensitivity correction unit 2 to synthesize the directivity. By combining the directivity, the target sound coming from a predetermined direction is emphasized and collected.

上記のように、複数のマイクロホン素子１１～１ｎの出力信号にゲインを掛けることにより複数のマイクロホン素子１１～１ｎ間の感度差が補正される。このとき、発話者の音声が検出された場合、複数のマイクロホン素子１１～１ｎの出力信号に基づいてゲインが更新され、発話者の音声が検出されない場合、ゲインが更新されない。そして、感度差が補正された複数のマイクロホン素子１１～１ｎの出力信号を用いて、所定の方向から到来する目的音が強調して収音される。 As described above, the sensitivity difference between the plurality of microphone elements 11 to 1n is corrected by multiplying the output signals of the plurality of microphone elements 11 to 1n by a gain. At this time, if the speaker's voice is detected, the gain is updated based on the output signals of the plurality of microphone elements 11 to 1n, and if the speaker's voice is not detected, the gain is not updated. Then, using the output signals of the plurality of microphone elements 11 to 1n whose sensitivity differences have been corrected, the target sound coming from a predetermined direction is emphasized and collected.

したがって、目的音である発話者の音声が検出された場合に、複数のマイクロホン素子１１～１ｎ間の感度差を補正するためのゲインが更新されるので、目的音に対する複数のマイクロホン素子１１～１ｎ間の感度差を補正することができ、後段の指向性合成において、目的音方向に感度の死角を有するノイズ参照信号に目的音が漏れこむ量を低減することができる。その結果、指向性合成におけるノイズ抑圧性能を向上させることができるとともに、目的音を高Ｓ／Ｎ比で収音することができる。 Therefore, when the target sound of the speaker's voice is detected, the gain for correcting the sensitivity difference between the plurality of microphone elements 11 to 1n is updated, so that the plurality of microphone elements 11 to 1n for the target sound It is possible to correct the difference in sensitivity between the two, and in the subsequent directional synthesis, it is possible to reduce the amount of the target sound leaking into the noise reference signal that has a blind spot of sensitivity in the direction of the target sound. As a result, the noise suppression performance in directional synthesis can be improved, and the target sound can be collected with a high S/N ratio.

（実施の形態２）
上記の実施の形態１では、目的音検出部３は、１つのマイクロホン素子の出力信号が音声と非音声とのいずれであるかを判定している。これに対し、実施の形態２では、目的音検出部は、複数のマイクロホン素子の出力信号を用いて予め決められた目的音方向から目的音が到来しているか否かをさらに判定する。 (Embodiment 2)
In the first embodiment described above, the target sound detection unit 3 determines whether the output signal of one microphone element is voice or non-voice. In contrast, in the second embodiment, the target sound detection unit further determines whether or not the target sound is coming from a predetermined target sound direction using the output signals of the plurality of microphone elements.

図８は、本開示の実施の形態２における収音装置の構成を示すブロック図である。 FIG. 8 is a block diagram showing the configuration of a sound collection device in Embodiment 2 of the present disclosure.

図８に示す収音装置１０２は、マイクロホンアレイ１、感度補正部２、感度補正制御部４、指向性合成部５及び目的音検出部６を備える。実施の形態１の収音装置１０１と異なる点は、目的音検出部６に複数のマイクロホン素子１１、１２，・・・，１ｎからの出力信号が入力されている点である。なお、本実施の形態２において、実施の形態１と同じ構成については同じ符号が付され、説明が省略される。 The sound collection device 102 shown in FIG. 8 includes a microphone array 1, a sensitivity correction section 2, a sensitivity correction control section 4, a directional synthesis section 5, and a target sound detection section 6. The difference from the sound collection device 101 of the first embodiment is that output signals from a plurality of microphone elements 11, 12, . . . , 1n are input to the target sound detection section 6. Note that in the second embodiment, the same components as in the first embodiment are denoted by the same reference numerals, and the description thereof will be omitted.

図９は、本開示の実施の形態２における収音装置の目的音検出部の構成を示すブロック図である。 FIG. 9 is a block diagram showing the configuration of the target sound detection section of the sound collection device in Embodiment 2 of the present disclosure.

図９に示す目的音検出部６は、帯域通過フィルタ部３１、音声判定部３２、帯域通過フィルタ部（第２抽出部）６３、目的音方向判定部６４及び目的音判定部６５を備える。実施の形態１の目的音検出部３に対して、実施の形態２の目的音検出部６には、帯域通過フィルタ部６３、目的音方向判定部６４及び目的音判定部６５が追加されている。 The target sound detection unit 6 shown in FIG. 9 includes a bandpass filter unit 31, a voice determination unit 32, a bandpass filter unit (second extraction unit) 63, a target sound direction determination unit 64, and a target sound determination unit 65. In contrast to the target sound detection unit 3 of the first embodiment, the target sound detection unit 6 of the second embodiment includes a bandpass filter unit 63, a target sound direction determination unit 64, and a target sound determination unit 65. .

帯域通過フィルタ部６３は、複数のマイクロホン素子の出力信号から特定の帯域の信号を抽出する。帯域通過フィルタ部６３は、マイクロホン素子１１～１ｎそれぞれの出力信号から、例えば２００Ｈｚから５００Ｈｚの帯域の信号を抽出する。 The bandpass filter section 63 extracts signals in a specific band from the output signals of the plurality of microphone elements. The bandpass filter section 63 extracts signals in a band of, for example, 200 Hz to 500 Hz from the output signals of the microphone elements 11 to 1n.

目的音方向判定部６４は、複数のマイクロホン素子の出力信号を用いて、予め決められた目的音方向から目的音が到来しているか否かを判定する。目的音方向判定部６４は、帯域通過フィルタ部６３によって抽出された信号に対して目的音方向から目的音が到来しているか否かを判定する。ここで、車内に配置された収音装置１０２が、運転者の発話音声を収音する場合、運転者の発話音声がマイクロホンアレイ１に入射する角度は予め決められる。そのため、目的音方向判定部６４は、発話音声の入射角度を予め記憶している。なお、目的音方向判定部６４の構成については、図１０及び図１１を用いて更に詳細に説明する。 The target sound direction determination unit 64 determines whether or not the target sound is coming from a predetermined target sound direction using the output signals of the plurality of microphone elements. The target sound direction determination unit 64 determines whether or not the target sound is coming from the target sound direction with respect to the signal extracted by the bandpass filter unit 63. Here, when the sound collection device 102 placed in the vehicle collects the voice uttered by the driver, the angle at which the voice uttered by the driver enters the microphone array 1 is determined in advance. Therefore, the target sound direction determination unit 64 stores the incident angle of the uttered sound in advance. Note that the configuration of the target sound direction determining section 64 will be explained in more detail using FIGS. 10 and 11.

目的音判定部６５は、音声判定部３２と目的音方向判定部６４との２つの判定結果を用いて、目的音の有無を判定する。目的音判定部６５は、目的音方向判定部６４によって目的音方向から目的音が到来していると判定され、かつ音声判定部３２によって１つのマイクロホン素子の出力信号が音声であると判定された場合、目的音が検出されたと判定する。また、目的音判定部６５は、目的音方向判定部６４によって目的音方向から目的音が到来していないと判定された場合、又は音声判定部３２によって１つのマイクロホン素子の出力信号が音声ではないと判定された場合、目的音が検出されていないと判定する。 The target sound determination section 65 determines the presence or absence of the target sound using the two determination results from the voice determination section 32 and the target sound direction determination section 64. The target sound determination unit 65 determines that the target sound is coming from the target sound direction by the target sound direction determination unit 64, and the audio determination unit 32 determines that the output signal of one microphone element is voice. If so, it is determined that the target sound has been detected. Further, when the target sound direction determining unit 64 determines that the target sound is not coming from the target sound direction, or the audio determining unit 32 determines that the output signal of one microphone element is not a voice. If it is determined that the target sound is not detected, it is determined that the target sound is not detected.

続いて、図９に示す目的音方向判定部６４の構成について更に説明する。 Next, the configuration of the target sound direction determining section 64 shown in FIG. 9 will be further explained.

図１０は、本開示の実施の形態２における収音装置の目的音方向判定部の構成を示すブロック図である。なお、図１０では、説明の都合上、２つのマイクロホン素子１１，１２からの出力信号が目的音方向判定部６４に入力される例について説明する。 FIG. 10 is a block diagram showing the configuration of a target sound direction determining section of a sound collection device in Embodiment 2 of the present disclosure. Note that in FIG. 10, for convenience of explanation, an example will be described in which output signals from two microphone elements 11 and 12 are input to the target sound direction determining section 64.

目的音方向判定部６４は、遅延和指向性合成部（遅延和ビームフォーマ）（第１指向性合成部）６４１、傾度型指向性合成部（傾度型ビームフォーマ）（第２指向性合成部）６４２、目的音レベル検出部６４３、非目的音レベル検出部６４４及びレベル比較判定部６４５を備える。 The target sound direction determination section 64 includes a delay-sum directivity synthesis section (delay-sum beamformer) (first directionality synthesis section) 641, and a tilt-type directivity synthesis section (gradient-type beamformer) (second directionality synthesis section). 642, a target sound level detection section 643, a non-target sound level detection section 644, and a level comparison determination section 645.

遅延和指向性合成部６４１は、複数のマイクロホン素子１１～１ｎの出力信号を用いて目的音方向の信号を強調することにより目的音方向に指向性を形成する。遅延和指向性合成部６４１は、目的音方向に高い指向性感度を有する。図１０に示す指向特性６４１１は、遅延和指向性合成部６４１の指向特性を示している。遅延和指向性合成部６４１の指向特性６４１１は、目的音方向に指向性を有しており、目的音方向の信号を強調する。 The delay-sum directivity synthesis unit 641 forms directivity in the direction of the target sound by emphasizing the signal in the direction of the target sound using the output signals of the plurality of microphone elements 11 to 1n. The delay-sum directivity synthesis section 641 has high directivity sensitivity in the direction of the target sound. A directional characteristic 6411 shown in FIG. 10 indicates a directional characteristic of the delay-sum directivity synthesis section 641. The directional characteristic 6411 of the delay-sum directional synthesis unit 641 has directivity in the direction of the target sound, and emphasizes the signal in the direction of the target sound.

遅延和指向性合成部６４１は、マイクロホン素子１１とマイクロホン素子１２との間の距離をｄとし、目的音方向からの入射角度をθとすると、マイクロホン素子１１からの出力信号を経路差Δ（Δ＝ｄｓｉｎθ）だけ遅延させる。そして、遅延和指向性合成部６４１は、遅延させたマイクロホン素子１１からの出力信号とマイクロホン素子１２からの出力信号とを加算する。なお、距離ｄ及び入射角度θは、不図示のメモリに予め記憶されている。 The delay-sum directivity synthesis unit 641 converts the output signal from the microphone element 11 into a path difference Δ(Δ =dsinθ). Then, the delay-sum directivity synthesis section 641 adds the delayed output signal from the microphone element 11 and the output signal from the microphone element 12. Note that the distance d and the incident angle θ are stored in advance in a memory (not shown).

傾度型指向性合成部６４２は、複数のマイクロホン素子１１，１２の出力信号を用いて目的音方向に感度の死角を形成する。図１０に示す指向特性６４２１は、傾度型指向性合成部６４２の指向特性を示している。傾度型指向性合成部６４２の指向特性６４２１は、目的音方向に死角を有しており、目的音方向に垂直な方向の信号（ノイズ）を強調する。 The gradient-type directivity synthesis section 642 forms a sensitivity blind spot in the direction of the target sound using the output signals of the plurality of microphone elements 11 and 12. A directional characteristic 6421 shown in FIG. 10 indicates a directional characteristic of the gradient-type directivity synthesis section 642. The directional characteristic 6421 of the gradient-type directional synthesis unit 642 has a blind spot in the direction of the target sound, and emphasizes the signal (noise) in the direction perpendicular to the direction of the target sound.

傾度型指向性合成部６４２は、マイクロホン素子１１とマイクロホン素子１２との間の距離をｄとし、目的音方向からの音の入射角度をθとすると、マイクロホン素子１１からの出力信号を経路差Δ（Δ＝ｄｓｉｎθ）だけ遅延させる。そして、傾度型指向性合成部６４２は、遅延させたマイクロホン素子１１からの出力信号から、マイクロホン素子１２からの出力信号を減算する。なお、距離ｄ及び入射角度θは、予め記憶されている。 The gradient-type directivity synthesis unit 642 converts the output signal from the microphone element 11 into a path difference Δ, where d is the distance between the microphone element 11 and the microphone element 12, and θ is the incident angle of sound from the target sound direction. (Δ=dsinθ). Then, the gradient-type directivity synthesis unit 642 subtracts the output signal from the microphone element 12 from the delayed output signal from the microphone element 11. Note that the distance d and the incident angle θ are stored in advance.

目的音レベル検出部６４３は、遅延和指向性合成部６４１の出力信号レベルを検出する。 The target sound level detection section 643 detects the output signal level of the delay-sum directivity synthesis section 641.

非目的音レベル検出部６４４は、傾度型指向性合成部６４２の出力信号レベルを検出する。 The non-target sound level detection section 644 detects the output signal level of the gradient-type directivity synthesis section 642.

レベル比較判定部６４５は、遅延和指向性合成部６４１からの出力信号の出力レベルと、傾度型指向性合成部６４２からの出力信号の出力レベルとを比較し、目的音方向から目的音が到来しているか否かを判定する。レベル比較判定部６４５は、目的音レベル検出部６４３によって検出された出力信号レベルと、非目的音レベル検出部６４４によって検出された出力信号レベルとを比較し、目的音方向から目的音が到来しているか否かを判定する。 The level comparison/judgment unit 645 compares the output level of the output signal from the delay sum directivity synthesis unit 641 and the output level of the output signal from the gradient type directivity synthesis unit 642, and determines whether the target sound is coming from the direction of the target sound. Determine whether or not. The level comparison and determination section 645 compares the output signal level detected by the target sound level detection section 643 and the output signal level detected by the non-target sound level detection section 644, and determines whether the target sound is coming from the direction of the target sound. Determine whether or not the

遅延和指向性合成部６４１は目的音方向に指向性を有している。そのため、目的音である発話者の音声は、遅延和指向性合成部６４１の出力に含まれる。一方、傾度型指向性合成部６４２は目的音方向に死角を有している。そのため、目的音である発話者の音声は、傾度型指向性合成部６４２の出力に殆ど含まれない。したがって、目的音方向から目的音が到来している場合、目的音レベル検出部６４３によって検出される出力信号レベルは大きくなり、非目的音レベル検出部６４４によって検出される出力信号レベルは小さくなる。レベル比較判定部６４５は、目的音レベル検出部６４３によって検出される出力信号レベル（目的音レベル）が非目的音レベル検出部６４４によって検出される出力信号レベル（非目的音レベル）より大きい場合、目的音方向から目的音が到来していると判定する。 The delay-sum directivity synthesis section 641 has directivity in the direction of the target sound. Therefore, the speaker's voice, which is the target sound, is included in the output of the delay-sum directional synthesis section 641. On the other hand, the gradient-type directivity synthesis section 642 has a blind spot in the direction of the target sound. Therefore, the speaker's voice, which is the target sound, is hardly included in the output of the gradient-type directional synthesis unit 642. Therefore, when the target sound is coming from the target sound direction, the output signal level detected by the target sound level detection section 643 becomes high, and the output signal level detected by the non-target sound level detection section 644 becomes low. When the output signal level (target sound level) detected by the target sound level detection unit 643 is higher than the output signal level (non-target sound level) detected by the non-target sound level detection unit 644, the level comparison determination unit 645 determines that It is determined that the target sound is coming from the direction of the target sound.

一方、目的音方向から目的音が到来していない場合、遅延和指向性合成部６４１及び傾度型指向性合成部６４２の出力には、周辺ノイズのみが含まれる。したがって、目的音レベル検出部６４３によって検出される出力信号レベルは、非目的音レベル検出部６４４によって検出される出力信号レベルとほぼ等しくなるか、非目的音レベル検出部６４４によって検出される出力信号レベルよりも小さくなる。レベル比較判定部６４５は、目的音レベル検出部６４３によって検出される出力信号レベル（目的音レベル）が非目的音レベル検出部６４４によって検出される出力信号レベル（非目的音レベル）以下である場合、目的音方向から目的音が到来していないと判定する。 On the other hand, when the target sound does not arrive from the target sound direction, the outputs of the delay-sum directivity synthesis section 641 and the gradient-type directivity synthesis section 642 include only peripheral noise. Therefore, the output signal level detected by the target sound level detection section 643 is approximately equal to the output signal level detected by the non-target sound level detection section 644, or the output signal level detected by the non-target sound level detection section 644 is approximately equal to the output signal level detected by the non-target sound level detection section 644. smaller than the level. When the output signal level (target sound level) detected by the target sound level detection unit 643 is equal to or lower than the output signal level (non-target sound level) detected by the non-target sound level detection unit 644, the level comparison determination unit 645 determines that , it is determined that the target sound is not coming from the target sound direction.

実施の形態１では、音声が検出されると目的音が検出されたと判定していたため、目的音方向以外の方向から発話があった場合も目的音が検出されたと判定されてしまい、感度補正が行われる。一方、実施の形態２では、音声が検出され、且つ目的音方向から目的音が到来している場合のみ、目的音が検出されたと判定される。したがって、実施の形態２の収音装置１０２は、実施の形態１の収音装置１０１よりも精度良く目的音を用いて感度補正を行うことができる。 In the first embodiment, when a voice is detected, it is determined that the target sound has been detected. Therefore, even if speech is made from a direction other than the direction of the target sound, it is determined that the target sound has been detected, and sensitivity correction is performed. It will be done. On the other hand, in the second embodiment, it is determined that the target sound has been detected only when a voice is detected and the target sound is coming from the direction of the target sound. Therefore, the sound collection device 102 of the second embodiment can perform sensitivity correction using the target sound with higher accuracy than the sound collection device 101 of the first embodiment.

続いて、本実施の形態２の変形例における目的音方向判定部の構成について更に説明する。 Next, the configuration of the target sound direction determining section in a modification of the second embodiment will be further described.

図１１は、本開示の実施の形態２の変形例における収音装置の目的音方向判定部の構成を示すブロック図である。なお、図１１では、説明の都合上、２つのマイクロホン素子１１，１２からの出力信号が目的音方向判定部６４Ａに入力される例について説明する。また、図９に示す目的音検出部６は、図９に示す目的音方向判定部６４に替えて、図１１に示す目的音方向判定部６４Ａを備える。 FIG. 11 is a block diagram showing the configuration of a target sound direction determining section of a sound collection device in a modification of Embodiment 2 of the present disclosure. Note that in FIG. 11, for convenience of explanation, an example will be described in which output signals from two microphone elements 11 and 12 are input to the target sound direction determining section 64A. Further, the target sound detection unit 6 shown in FIG. 9 includes a target sound direction determination unit 64A shown in FIG. 11 instead of the target sound direction determination unit 64 shown in FIG.

目的音方向判定部６４Ａは、目的音方向推定部（方向推定部）６４６及び方向判定部６４７を備える。 The target sound direction determination unit 64A includes a target sound direction estimation unit (direction estimation unit) 646 and a direction determination unit 647.

目的音方向推定部６４６は、複数のマイクロホン素子の出力信号の位相差を用いて、目的音が到来する方向を推定する。不図示のメモリは、マイクロホン素子１１とマイクロホン素子１２との間の距離ｄを予め記憶している。目的音方向推定部６４６は、マイクロホン素子１１とマイクロホン素子１２との位相差と、マイクロホン素子１１とマイクロホン素子１２との間の距離ｄとに基づき、目的音方向からの音の入射角度θを推定する。 The target sound direction estimation unit 646 estimates the direction in which the target sound arrives, using the phase difference between the output signals of the plurality of microphone elements. A memory (not shown) stores the distance d between the microphone element 11 and the microphone element 12 in advance. The target sound direction estimation unit 646 estimates the incident angle θ of the sound from the target sound direction based on the phase difference between the microphone element 11 and the microphone element 12 and the distance d between the microphone element 11 and the microphone element 12. do.

方向判定部６４７は、目的音方向推定部６４６によって推定された方向が、予め決められた目的音方向であるか否かを判定する。方向判定部６４７は、目的音方向推定部６４６によって推定された方向が、予め記憶されている目的音方向を含む所定の範囲に入っている場合、目的音方向から目的音が到来していると判定する。一方、方向判定部６４７は、目的音方向推定部６４６によって推定された方向が、予め記憶されている目的音方向を含む所定の範囲に入っていない場合、目的音方向から目的音が到来していないと判定する。例えば、方向判定部６４７は、目的音方向推定部６４６によって推定された音の入射角度が、予め記憶されている目的音方向の角度の－５度～＋５度の範囲に入っているか否かを判定してもよい。なお、不図示のメモリは、目的音方向の角度を予め記憶している。 The direction determination unit 647 determines whether the direction estimated by the target sound direction estimation unit 646 is a predetermined target sound direction. If the direction estimated by the target sound direction estimation unit 646 is within a predetermined range that includes the target sound direction stored in advance, the direction determination unit 647 determines that the target sound is coming from the target sound direction. judge. On the other hand, if the direction estimated by the target sound direction estimation unit 646 is not within a predetermined range that includes the target sound direction stored in advance, the direction determination unit 647 determines that the target sound is coming from the target sound direction. It is determined that there is no. For example, the direction determination unit 647 determines whether the incident angle of the sound estimated by the target sound direction estimation unit 646 is within the range of -5 degrees to +5 degrees of the angle of the target sound direction stored in advance. You may judge. Note that a memory (not shown) stores the angle of the target sound direction in advance.

なお、上記各実施の形態において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵまたはプロセッサなどのプログラム実行部が、ハードディスクまたは半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 Note that in each of the above embodiments, each component may be configured with dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.

本開示の実施の形態に係る装置の機能の一部又は全ては典型的には集積回路であるＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）として実現される。これらは個別に１チップ化されてもよいし、一部又は全てを含むように１チップ化されてもよい。また、集積回路化はＬＳＩに限るものではなく、専用回路又は汎用プロセッサで実現してもよい。ＬＳＩ製造後にプログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、又はＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。 Part or all of the functions of the device according to the embodiments of the present disclosure are typically realized as an LSI (Large Scale Integration), which is an integrated circuit. These may be integrated into one chip individually, or may be integrated into one chip including some or all of them. Further, circuit integration is not limited to LSI, and may be realized using a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure the connections and settings of circuit cells inside the LSI may be used.

また、本開示の実施の形態に係る装置の機能の一部又は全てを、ＣＰＵ等のプロセッサがプログラムを実行することにより実現してもよい。 Further, some or all of the functions of the device according to the embodiment of the present disclosure may be realized by a processor such as a CPU executing a program.

また、上記で用いた数字は、全て本開示を具体的に説明するために例示するものであり、本開示は例示された数字に制限されない。 Moreover, all the numbers used above are exemplified to specifically explain the present disclosure, and the present disclosure is not limited to the illustrated numbers.

また、上記フローチャートに示す各ステップが実行される順序は、本開示を具体的に説明するために例示するためのものであり、同様の効果が得られる範囲で上記以外の順序であってもよい。また、上記ステップの一部が、他のステップと同時（並列）に実行されてもよい。 Further, the order in which the steps shown in the above flowchart are executed is for illustrative purposes to specifically explain the present disclosure, and an order other than the above may be used as long as the same effect can be obtained. . Further, some of the above steps may be executed simultaneously (in parallel) with other steps.

本開示に係る技術は、指向性合成におけるノイズ抑圧性能を向上させることができるとともに、目的音を高Ｓ／Ｎ比で収音することができるので、複数のマイクロホン素子を用いて目的音を収音する技術に有用である。 The technology according to the present disclosure can improve the noise suppression performance in directional synthesis and can collect the target sound with a high S/N ratio, so the technology can collect the target sound using multiple microphone elements. Useful for sound technology.

１マイクロホンアレイ
２感度補正部
３，６目的音検出部
４感度補正制御部
５指向性合成部
１１～１ｎマイクロホン素子
３１帯域通過フィルタ部
３２音声判定部
４４補正ゲイン算出部
６３帯域通過フィルタ部
６４，６４Ａ目的音方向判定部
６５目的音判定部
２０１ディスプレイ
２０２吹き出し口
３２１レベル検出部
３２２ノイズレベル検出部
３２３比較部
３２４時間－周波数変換部
３２５音声特徴量抽出部
３２６判定部
４１１～４１ｎ第１～ｎ帯域通過フィルタ部
４２１～４２ｎ第１～ｎレベル検出部
４３１～４３ｎ第１～ｎ平均レベル算出部
６４１遅延和指向性合成部
６４２傾度型指向性合成部
６４３目的音レベル検出部
６４４非目的音レベル検出部
６４５レベル比較判定部
６４６目的音方向推定部
６４７方向判定部 1 Microphone array 2 Sensitivity correction unit 3, 6 Target sound detection unit 4 Sensitivity correction control unit 5 Directivity synthesis unit 11 to 1n Microphone element 31 Bandpass filter unit 32 Audio determination unit 44 Correction gain calculation unit 63 Bandpass filter unit 64, 64A Target sound direction determination section 65 Target sound determination section 201 Display 202 Air outlet 321 Level detection section 322 Noise level detection section 323 Comparison section 324 Time-frequency conversion section 325 Audio feature amount extraction section 326 Determination section 411 to 41n 1st to n Bandpass filter section 421 to 42n 1st to n level detection section 431 to 43n 1st to nth average level calculation section 641 Delay sum directivity synthesis section 642 Gradient directivity synthesis section 643 Target sound level detection section 644 Non-target sound level Detection unit 645 Level comparison and determination unit 646 Target sound direction estimation unit 647 Direction determination unit

Claims

multiple microphone elements;
a sensitivity correction unit that corrects sensitivity differences between the plurality of microphone elements by multiplying output signals of the plurality of microphone elements by a gain;
a target sound detection unit that detects a speaker's voice as a target sound;
a gain control unit that controls the gain based on the detection result of the target sound detection unit;
a directional synthesis unit that emphasizes and collects the target sound coming from a predetermined direction using the output signals of the plurality of microphone elements corrected by the sensitivity correction unit;
Equipped with
The gain control unit updates the gain based on the output signals of the plurality of microphone elements when the target sound detection unit detects the voice of the speaker, and the gain control unit updates the gain based on the output signals of the plurality of microphone elements, and the target sound detection unit updates the utterance. not updating the gain if the voice of the person is not detected;
Sound collection device.

The target sound detection unit includes a voice determination unit that determines whether the output signal of one of the plurality of microphone elements is the voice or non-voice other than the voice.
The sound collection device according to claim 1.

The target sound detection unit includes a first extraction unit that extracts a signal in a specific band from the output signal of the one microphone element,
The voice determination unit determines whether the signal extracted by the first extraction unit is the voice or the non-voice.
The sound collection device according to claim 2.

The target sound detection section includes:
a target sound direction determination unit that determines whether the target sound is coming from a predetermined target sound direction using output signals of the plurality of microphone elements;
When the target sound direction determination unit determines that the target sound is coming from the target sound direction, and the voice determination unit determines that the output signal of the one microphone element is the voice, a target sound determination unit that determines that the target sound has been detected;
including,
The sound collection device according to claim 2 or 3.

The target sound detection unit includes a second extraction unit that extracts a signal in a specific band from the output signals of the plurality of microphone elements,
The target sound direction determination unit determines whether or not the target sound is coming from the target sound direction with respect to the signal extracted by the second extraction unit.
The sound collection device according to claim 4.

The target sound direction determination unit includes:
a direction estimation unit that estimates a direction in which the target sound arrives using a phase difference between output signals of the plurality of microphone elements;
a direction determination unit that determines whether the direction estimated by the direction estimation unit is the predetermined target sound direction;
including,
The sound collection device according to claim 4 or 5.

The target sound direction determination unit includes:
a first directivity synthesis unit that forms directivity in the direction of the target sound by emphasizing a signal in the direction of the target sound using the output signals of the plurality of microphone elements;
a second directivity synthesis unit that uses the output signals of the plurality of microphone elements to form a blind spot of sensitivity in the direction of the target sound;
Comparing the output level of the output signal from the first directional synthesis section and the output level of the output signal from the second directional synthesis section, and determining whether or not the target sound is coming from the direction of the target sound. a level comparison determination unit that determines the
including,
The sound collection device according to claim 4 or 5.

The gain control section includes:
a level detection unit that detects the output level of the output signal of each of the plurality of microphone elements;
a time average level calculation unit that calculates a time average level of each output level detected by the level detection unit when the voice of the speaker is detected by the target sound detection unit;
a correction gain calculation unit that calculates a correction gain that updates the gain from the time average level calculated by the time average level calculation unit;
including,
The sound collection device according to any one of claims 1 to 7.

The correction gain calculation unit calculates, based on the time average level of one microphone element predetermined among the plurality of microphone elements, the time average level of other microphone elements other than the one microphone element. calculating the correction gain of the other microphone element so that it is the same as the time average level of the one microphone element;
The sound collection device according to claim 8.

The correction gain calculation unit is configured to calculate the time average level of the plurality of microphone elements such that the time average level of the plurality of microphone elements is equal to or less than the at least one microphone element based on the average value of the time average level of at least two predetermined microphone elements among the plurality of microphone elements. calculating the correction gain of the plurality of microphone elements so that it is the same as the average value of the time average level of two microphone elements;
The sound collection device according to claim 8.

The gain control section includes a third extraction section that extracts a signal in a specific band from the output signal of each of the plurality of microphone elements,
The level detection section detects the output level of each signal extracted by the third extraction section.
The sound collection device according to any one of claims 8 to 10.

The specific band is a band from 200Hz to 500Hz,
The sound collection device according to claim 11.

The computer is
correcting sensitivity differences between the plurality of microphone elements by multiplying output signals of the plurality of microphone elements by a gain;
Detects the speaker's voice as the target sound,
controlling the gain based on the detection result of the target sound;
Emphasizing and collecting the target sound coming from a predetermined direction using the corrected output signals of the plurality of microphone elements;
In controlling the gain, when the voice of the speaker is detected, the gain is updated based on the output signals of the plurality of microphone elements, and when the voice of the speaker is not detected, the gain is updated. do not update,
Sound collection method.

a sensitivity correction unit that corrects sensitivity differences between the plurality of microphone elements by multiplying output signals of the plurality of microphone elements by a gain;
a target sound detection unit that detects a speaker's voice as a target sound;
a gain control unit that controls the gain based on the detection result of the target sound detection unit;
Using the output signals of the plurality of microphone elements corrected by the sensitivity correction unit, the computer functions as a directional synthesis unit that emphasizes and collects the target sound coming from a predetermined direction;
The gain control unit updates the gain based on the output signals of the plurality of microphone elements when the target sound detection unit detects the voice of the speaker, and the gain control unit updates the gain based on the output signals of the plurality of microphone elements, and the target sound detection unit updates the utterance. not updating the gain if the voice of the person is not detected;
Sound recording program.