JP2016121957A

JP2016121957A - Target sound section determination device, target sound section determination method, and target sound section determination program

Info

Publication number: JP2016121957A
Application number: JP2014262703A
Authority: JP
Inventors: 克之高橋; Katsuyuki Takahashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2014-12-25
Filing date: 2014-12-25
Publication date: 2016-07-07

Abstract

PROBLEM TO BE SOLVED: To provide a target sound section determination device capable of correctly determining a section that a target sound has when an allophone including only one or a small number of frequency components is the target sound.SOLUTION: A target sound section determination device has: directional forming means 12, 13 of forming a plurality of directional signals having different predetermined directions including dead angles from input sound signals S1(n), S2(n); coherence coefficient calculation means 14 of obtaining a coherence coefficient using the plurality of generated directional signals; coherence coefficient feature quantity calculation means 15 of regarding the coherence coefficient as a time change signal and obtaining a coherence coefficient feature quantity representing the frequency and magnitude of change in inclination direction of a signal waveform thereof; and target sound section determination means 16 of determining whether a section of an input sound signal is the target sound section based upon a range of the coherence coefficient and a range of the coherence coefficient feature quantity.SELECTED DRAWING: Figure 1

Description

本発明は、目的音区間判定装置、目的音区間判定方法及び目的音区間判定プログラムに関し、例えば、入力音響信号から、１又は少数の周波数成分だけを含む目的音が有する区間を判定する場合に適用し得るものである。 The present invention relates to a target sound segment determination device, a target sound segment determination method, and a target sound segment determination program, and is applied to, for example, a case where a segment of a target sound including only one or a small number of frequency components is determined from an input sound signal. It is possible.

特許文献１では、入力音信号の到来方位に基づいて、正面から到来する目的音を含む区間（目的音区間）を検出する技術が既に提案されている。 Patent Document 1 has already proposed a technique for detecting a section (target sound section) including a target sound coming from the front based on the arrival direction of an input sound signal.

ところで、近年、音源分離処理は、機械設備の点検などのために機械が発する音などのような音声以外の音響の処理にも活用されるようになってきている。機械が発する音（機械音）は、例えば、１又は少数の周波数成分だけを含むものであり（１つの周波数成分を含む信号はその周波数を有する正弦波信号となっており、この明細書においては、適宜、正弦波信号と表現している）、言い換えると、特定の周波数成分にエネルギーが集中した定常的な音である。 By the way, in recent years, the sound source separation processing has come to be used for processing of sound other than sound such as sound generated by a machine for inspection of mechanical equipment. The sound emitted from the machine (mechanical sound) includes, for example, only one or a small number of frequency components (a signal including one frequency component is a sine wave signal having that frequency, and in this specification, In other words, it is a steady sound in which energy is concentrated on a specific frequency component.

音源分離処理においては、この機械音等の特殊な音を含む目的音区間を正確に検出できる技術が求められるようになって来ている。 In the sound source separation process, a technique capable of accurately detecting a target sound section including a special sound such as a mechanical sound has been demanded.

特開２０１３−１２６０２６号公報JP 2013-1206026 A 特開２０１４−１０６３３７号公報JP 2014-106337 A

しかし、例えば、正面から正弦波の異音が到来する場合には、特許文献１に記載の目的音声区間の検出法は、正面から到来するにも関わらず、正弦波異音を目的音ではない音（以下、非目的音と呼ぶ）と誤って判定してしまう恐れがある。 However, for example, when a sine wave abnormal noise comes from the front, the detection method of the target speech section described in Patent Document 1 is not the target sound even though it arrives from the front. There is a risk of erroneously determining it as a sound (hereinafter referred to as a non-target sound).

特許文献１の目的音声区間の検出方法では、２つのマイクロホンで捕捉した信号に対して特定の指向性を付与した処理後の２つの信号についての各周波数成分におけるパワー及び相関度合を反映させた、特許文献１の（３）式に示す周波数ビン毎のコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）と、（４）式に示すコヒーレンスＣＯＨ（Ｋ）とが利用される（ｆは周波数ビンを表すインデックスであり、Ｋは入力フレームを表すインデックスである）。正弦波異音を誤判定する原因は、入力音信号に含まれる成分が特定周波数に集中するため、正弦波が有する周波数の周波数ビンにおけるコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）は大きな値を持つものの、それ以外の周波数ビンでは微小な値となるため、全帯域でコヒーレンス係数ｃｏｅｆを平均化したコヒーレンスＣＯＨ（Ｋ）は小さい値となってしまい、目的音区間の判定閾値Θに達しないためである。 In the method of detecting the target speech section of Patent Document 1, the power and the degree of correlation in each frequency component of the two processed signals that have given specific directivity to the signals captured by the two microphones are reflected. The coherence coefficient coef (f, K) for each frequency bin shown in Equation (3) of Patent Document 1 and the coherence COH (K) shown in Equation (4) are used (f is an index representing a frequency bin) , K is an index representing the input frame). The cause of misjudgment of sinusoidal abnormal noise is that the components included in the input sound signal are concentrated at a specific frequency, so the coherence coefficient coef (f, K) in the frequency bin of the frequency of the sine wave has a large value, This is because other frequency bins have a minute value, so that the coherence COH (K) obtained by averaging the coherence coefficient coef in the entire band becomes a small value and does not reach the determination threshold Θ of the target sound section.

そのため、１又は少数の周波数成分だけを含む異音が目的音のときに、目的音が有する区間を正しく判定できる目的音区間判定装置、目的音区間判定方法及び目的音区間判定プログラムが望まれている。 Therefore, there is a need for a target sound section determination device, a target sound section determination method, and a target sound section determination program that can correctly determine a section of the target sound when an abnormal sound including only one or a small number of frequency components is the target sound. Yes.

第１の本発明は、１又は少数の周波数成分を含む音を目的音とし、入力音信号の区間から、目的音区間を判定する目的音区間判定装置であって、（１）入力音信号に遅延減算処理を施すことで、所定方位に死角を有する指向性特性を付与した複数の指向性信号であって、死角を有する所定方位が異なる複数の指向性信号を形成する指向性形成手段と、（２）形成された複数の指向性信号を用いてコヒーレンス係数を得るコヒーレンス係数計算手段と、（３）得られたコヒーレンス係数を時間変化信号と捉え、その信号波形の傾き方向が変化する回数とその大きさを表すコヒーレンス係数特徴量を得るコヒーレンス係数特徴量算出手段と、（４）上記コヒーレンス係数のレンジと上記コヒーレンス係数特徴量のレンジとに基づいて、入力音信号の区間が目的音区間であるか否かを判定する目的音区間判定手段とを有することを特徴とする。 A first aspect of the present invention is a target sound section determination device that determines a target sound section from a section of an input sound signal using a sound including one or a small number of frequency components as a target sound, and (1) an input sound signal Directivity forming means for forming a plurality of directional signals having a directional characteristic having a dead angle in a predetermined azimuth by performing a delay subtraction process, wherein the directional signals are different from each other in a predetermined azimuth having a blind angle; (2) Coherence coefficient calculation means for obtaining a coherence coefficient using a plurality of formed directional signals, (3) The number of times the obtained coherence coefficient is regarded as a time-varying signal, and the inclination direction of the signal waveform changes. A coherence coefficient feature quantity calculating means for obtaining a coherence coefficient feature quantity representing the magnitude; and (4) an input sound signal based on the coherence coefficient range and the coherence coefficient feature quantity range. Section is characterized by having a target sound period determining means for determining whether a target sound period.

第２の本発明は、１又は少数の周波数成分を含む音を目的音とし、入力音信号の区間から、目的音区間を判定する目的音区間判定方法であって、（１）指向性形成手段が、入力音信号に遅延減算処理を施すことで、所定方位に死角を有する指向性特性を付与した複数の指向性信号であって、死角を有する所定方位が異なる複数の指向性信号を形成し、（２）コヒーレンス係数計算手段が、形成された複数の指向性信号を用いてコヒーレンス係数を取得し、（３）コヒーレンス係数特徴量算出手段が、得られたコヒーレンス係数を時間変化信号と捉え、その信号波形の傾き方向が変化する回数とその大きさを表すコヒーレンス係数特徴量を取得し、（４）目的音区間判定手段が、上記コヒーレンス係数のレンジと上記コヒーレンス係数特徴量のレンジとに基づいて、入力音信号の区間が目的音区間であるか否かを判定することを特徴とする。 The second aspect of the present invention is a target sound section determination method for determining a target sound section from a section of an input sound signal using a sound including one or a small number of frequency components as a target sound, and (1) directivity forming means However, by performing a delay subtraction process on the input sound signal, a plurality of directivity signals having a directivity characteristic having a blind spot in a predetermined direction and a plurality of directivity signals having different blind headings are formed. (2) The coherence coefficient calculation means acquires a coherence coefficient using the formed directional signals, and (3) the coherence coefficient feature quantity calculation means regards the obtained coherence coefficient as a time-varying signal, A coherence coefficient feature amount representing the number of times and the magnitude of the change in the inclination direction of the signal waveform is acquired; and (4) the target sound section determination means determines the range of the coherence coefficient and the coherence coefficient feature amount. Based on the Nji, the section of the input sound signal and judging whether the target sound period.

第３の本発明は、１又は少数の周波数成分を含む音を目的音とし、入力音信号の区間から、目的音区間を判定するために適用する目的音区間判定プログラムであって、コンピュータを、（１）入力音信号に遅延減算処理を施すことで、所定方位に死角を有する指向性特性を付与した複数の指向性信号であって、死角を有する所定方位が異なる複数の指向性信号を形成する指向性形成手段と、（２）形成された複数の指向性信号を用いてコヒーレンス係数を得るコヒーレンス係数計算手段と、（３）得られたコヒーレンス係数を時間変化信号と捉え、その信号波形の傾き方向が変化する回数とその大きさを表すコヒーレンス係数特徴量を得るコヒーレンス係数特徴量算出手段と、（４）上記コヒーレンス係数のレンジと上記コヒーレンス係数特徴量のレンジとに基づいて、入力音信号の区間が目的音区間であるか否かを判定する目的音区間判定手段として機能させることを特徴とする。 A third aspect of the present invention is a target sound section determination program applied to determine a target sound section from a section of an input sound signal using a sound including one or a small number of frequency components as a target sound. (1) By performing a delay subtraction process on the input sound signal, a plurality of directivity signals having a directivity characteristic having a dead angle in a predetermined direction and a plurality of directivity signals having different blind directions are formed. Directivity forming means, (2) coherence coefficient calculating means for obtaining a coherence coefficient using a plurality of formed directivity signals, and (3) taking the obtained coherence coefficient as a time-varying signal, A coherence coefficient feature quantity calculating means for obtaining a coherence coefficient feature quantity representing the number of times the inclination direction changes and its magnitude; and (4) the range of the coherence coefficient and the coherence coefficient characteristic. Based on the amount of range, the section of the input sound signal, characterized in that function as target sound period determining means for determining whether a target sound period.

本発明によれば、１又は少数の周波数成分だけを含む異音が目的音のときに、目的音が有する区間を正しく判定できる目的音区間判定装置、目的音区間判定方法及び目的音区間判定プログラムを実現できる。 According to the present invention, when an abnormal sound including only one or a small number of frequency components is a target sound, a target sound section determination device, a target sound section determination method, and a target sound section determination program that can correctly determine a section of the target sound. Can be realized.

第１の実施形態に係る目的音区間判定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the target sound area determination apparatus which concerns on 1st Embodiment. 異音に含まれている周波数成分が１つの場合（異音が正弦波異音の場合）における、周波数成分（周波数ビン）毎のコヒーレンス係数の値を示す説明図である。It is explanatory drawing which shows the value of the coherence coefficient for every frequency component (frequency bin) in case the frequency component contained in the noise is one (when the noise is a sine wave noise). 第１の実施形態に係る判定部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the determination part which concerns on 1st Embodiment. 第１の実施形態に係る判定部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the determination part which concerns on 1st Embodiment.

（Ａ）第１の実施形態
以下、本発明による目的音区間判定装置、目的音区間判定方法及び目的音区間判定プログラムの第１の実施形態を、図面を参照しながら説明する。 (A) First Embodiment Hereinafter, a first embodiment of a target sound section determination device, a target sound section determination method, and a target sound section determination program according to the present invention will be described with reference to the drawings.

第１の実施形態に係る目的音区間判定装置は、１又は少数の周波数成分だけを含む異音を目的音とし、その目的音の区間を判定するものである。 The target sound section determination apparatus according to the first embodiment uses an abnormal sound including only one or a small number of frequency components as a target sound, and determines a section of the target sound.

（Ａ−１）適用するパラメータとその適用理由
第１の実施形態に係る目的音区間判定装置では、特許文献１の記載技術で利用されていたコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）に加え、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）についての（１）式に示すｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を適用している。なお、（１）式においては、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）をｓ（Ｋ）で表しており、ｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）をｍｏｄＧＩで表している。

(A-1) Parameter to be applied and reason for application In the target sound section determination device according to the first embodiment, in addition to the coherence coefficient coef (f, K) used in the technique described in Patent Document 1, the coherence coefficient The modGI value modGI (f, K) shown in the equation (1) for coef (f, K) is applied. In equation (1), the coherence coefficient coef (f, K) is represented by s (K), and the modGI value modGI (f, K) is represented by modGI.

ｍｏｄＧＩ値について簡単に説明する（詳細については、特許文献２参照）。ｍｏｄＧＩは、修正されたグラディエント・インデックス（ＧｒａｄｉｅｎｔＩｎｄｅｘ：以下、ＧＩと呼ぶ）を意味している。 The modGI value will be briefly described (refer to Patent Document 2 for details). modGI means a modified gradient index (hereinafter referred to as GI).

修正される前のＧＩについては、参照文献「ＮａｏｆｕｍｉＡｏｋｉ，”ＡＢａｎｄＥｘｔｅｎｓｉｏｎＴｅｃｈｎｉｑｕｅｆｏｒＮａｒｒｏｗ−ＢａｎｄＴｅｌｅｐｈｏｎｙＳｐｅｅｃｈＢａｓｅｄｏｎＦｕｌｌＷａｖｅＲｅｃｔｉｆｉｃａｔｉｏｎ”，ＩＥＩＣＥＴｒａｎｓ．Ｃｏｍｍｕｎ．，Ｖｏｌ．Ｅ９３−Ｂ（３），ｐｐ．７２９−７３１，２０１０」に記載されている。 For the GI before amendment, refer to the reference “Naofumi Aoki,” A Band Extension Technology for Narrow-Band Telephony Speech Based on Full Wave Rectification., IEICE Co. 729-731, 2010 ".

ＧＩは、信号波形の傾き方向が変化する回数とその大きさを測る指標である。ＧＩは、傾き方向が変化したときの、相前後するサンプルの差分絶対値の総和を、そのフレームのパワーの平方根で除算したものとして求められる。従って、ＧＩは、１フレーム内の傾きの変化回数が多いほど大きくなり易く、また、傾きが変化したときの変化量が大きいほど大きくなり易いものである。このような性質から、ＧＩは、入力波形に含まれる高周波数成分の量と直結しているということもできる。 GI is an index for measuring the number of times and the magnitude of the change in the inclination direction of the signal waveform. The GI is obtained by dividing the sum of absolute difference values of successive samples when the tilt direction is changed by the square root of the power of the frame. Therefore, the GI is likely to increase as the number of changes in inclination within one frame increases, and also increases as the amount of change when the inclination changes increases. From such a property, it can also be said that GI is directly connected to the amount of high frequency components included in the input waveform.

しかしながら、ＧＩは、変数ΔΨ（ｎ）という０又は２の２値しかとらない、時系列的に値の大きな飛び跳ねが多発するパラメータを算出要素としているため、値が不規則に大きくなったり小さくなったりするという特徴（「値が暴れる」）がある。 However, since GI uses as a calculation element a variable ΔΨ (n) that takes only two values of 0 or 2 and has many jumps with large values in time series, the value increases or decreases irregularly. There is a characteristic ("value goes wild").

ｍｏｄＧＩは、ＧＩの値が暴れる（値の大きな飛び跳ねを有する）という性質を有することに鑑み、ＧＩに代えて、ＧＩと高い相関を持ちながら、値の大きな飛び跳ねを抑制した変化が安定した新しい特徴量として提案されたものである。ｍｏｄＧＩは、特徴量算出対象の任意の信号（本願ではコヒーレンス係数）に関し、その「算出対象信号のパワー」で正規化された、その「算出対象信号の２階差分のパワー」（これを定数倍したものも含まれる）として定義される。 In view of the fact that modGI has the property that the value of GI is rampant (has a jump with a large value), instead of GI, it has a high correlation with GI, and a new feature with stable changes that suppresses a large jump in value It is proposed as a quantity. modGI is the “power of the second-order difference of the calculation target signal” normalized by the “power of the calculation target signal” with respect to an arbitrary signal (a coherence coefficient in this application) of the feature quantity calculation target (this is a constant multiple). Are also included).

ｍｏｄＧＩは、ＧＩと高い相関を持つので、信号波形の傾き方向が変化する回数とその大きさを測る安定した指標として機能し、また、入力波形に含まれる高周波数成分の量を反映したものとして機能する。 Since modGI has a high correlation with GI, it functions as a stable index that measures the number and magnitude of changes in the slope direction of the signal waveform, and reflects the amount of high-frequency components contained in the input waveform. Function.

ここで、正面から正弦波の異音が到来する場合におけるｍｏｄＧＩ値の挙動を検討する。周波数成分毎に、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）についてｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を計算した場合、正弦波が有する周波数の周波数成分ではコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）はほぼ一定値（この値は大きい）となるが、直流信号に近い特性となるためにｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）は微小な値となる。また、正弦波が有する周波数以外の周波数成分では入力が存在しないため、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）は微小で、かつ、ほぼ一定の値となることから、直流信号のような特性となり、やはりｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）は微小な値となる。 Here, the behavior of the modGI value when a sine wave noise comes from the front is examined. When the modGI value modGI (f, K) is calculated for each frequency component with respect to the coherence coefficient coef (f, K), the coherence coefficient coef (f, K) is a substantially constant value (this value) in the frequency component of the frequency of the sine wave. Although the value is large), the modGI value modGI (f, K) is a minute value because the characteristic is close to a DC signal. In addition, since there is no input at a frequency component other than the frequency of the sine wave, the coherence coefficient coef (f, K) is very small and has a substantially constant value. The modGI value modGI (f, K) is a minute value.

上述した到来する正弦波異音に含まれる周波数成分とそれ以外の周波数成分でコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）とｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）の挙動を比較すると、（ａ１）正弦波異音に含まれる周波数成分ではコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）は大きな定常値となり、ｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）は微小な値となる、（ａ２）正弦波異音に含まれない周波数成分ではコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）は極めて小さな定常値となり、ｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）は微小な値となる、という違いがあることが分かる。 When the behavior of the coherence coefficient coef (f, K) and the modGI value modGI (f, K) is compared between the frequency component included in the incoming sine wave abnormal noise and the other frequency components, (a1) sine wave abnormal noise is compared. The coherence coefficient coef (f, K) is a large steady value and the modGI value modGI (f, K) is a small value. (A2) The coherence is a frequency component not included in the sinusoidal abnormal noise. It can be seen that there is a difference that the coefficient coef (f, K) is an extremely small steady value and the modGI value modGI (f, K) is a minute value.

第１の実施形態に係る目的音区間判定装置は、（ａ１）及び（ａ２）の挙動が活用されたものであり、正面から到来する異音を正確に判定しようとしたものである。 The target sound section determination device according to the first embodiment utilizes the behaviors of (a1) and (a2), and attempts to accurately determine an abnormal sound coming from the front.

（Ａ−２）第１の実施形態の構成
図１は、第１の実施形態に係る目的音区間判定装置１０の構成を示すブロック図である。 (A-2) Configuration of First Embodiment FIG. 1 is a block diagram showing a configuration of a target sound segment determination device 10 according to the first embodiment.

第１の実施形態の目的音区間判定装置１０は、ハードウェア的な各種構成要素を接続して構築されたものであっても良く、また、一部の構成要素（例えば、マイクロホン、アナログ／デジタル変換部（Ａ／Ｄ変換部）を除く部分）を、ＣＰＵ、ＲＯＭ、ＲＡＭなどのプログラムの実行構成を適用してその機能を実現するように構築されたものであっても良い。いずれの構築方法を適用した場合であっても、目的音区間判定装置１０の機能的な詳細構成は、図１で表す構成となっている。 The target sound segment determination device 10 of the first embodiment may be constructed by connecting various hardware components, and some components (for example, microphone, analog / digital). The conversion unit (the part excluding the A / D conversion unit) may be constructed so as to realize the function by applying an execution configuration of a program such as a CPU, a ROM, and a RAM. Regardless of which construction method is applied, the functional detailed configuration of the target sound segment determination device 10 is the configuration shown in FIG.

図１において、第１の実施形態に係る目的音区間判定装置１０は、一対のマイクロホンｍ１、ｍ２、ＦＦＴ部１１、第１の指向性形成部１２、第２の指向性形成部１３、コヒーレンス係数計算部１４、ｍｏｄＧＩ計算部１５及び判定部１６を有する。 In FIG. 1, the target sound section determination device 10 according to the first embodiment includes a pair of microphones m1, m2, an FFT unit 11, a first directivity forming unit 12, a second directivity forming unit 13, and a coherence coefficient. It has the calculation part 14, the modGI calculation part 15, and the determination part 16.

一対のマイクロホンｍ１、ｍ２は、所定距離（若しくは任意の距離）だけ離れて配置され、それぞれ、周囲の音響を捕捉するものである。各マイクロホンｍ１、ｍ２は、無指向のもの（若しくは、正面方向にごくごく緩やかな指向性を有するもの）である。各マイクロホンｍ１、ｍ２で捕捉された音響信号（入力音信号）は、図示しない対応するＡ／Ｄ変換部を介してデジタル信号ｓ１（ｎ）、ｓ２（ｎ）に変換されてＦＦＴ部１１に与えられる。なお、ｎはサンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The pair of microphones m1 and m2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each captures surrounding sounds. Each of the microphones m1 and m2 is omnidirectional (or has a very gentle directivity in the front direction). The acoustic signals (input sound signals) captured by the microphones m1 and m2 are converted into digital signals s1 (n) and s2 (n) via a corresponding A / D converter (not shown), and are given to the FFT unit 11. It is done. Note that n is an index indicating the input order of samples, and is expressed as a positive integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is the newer input sample.

ＦＦＴ部１１は、マイクロホンｍ１及びｍ２から入力音信号系列ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力音信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力音信号ｓ１及びｓ２を周波数領域で表現する。なお、高速フーリエ変換を実施するにあたり、入力音信号ｓ１（ｎ）及びｓ２（ｎ）から、所定のＮ個のサンプルからなる分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成して適用する。入力音信号ｓ１（ｎ）から分析フレームＦＲＡＭＥ１（Ｋ）を構成する例を以下の（２）式に示すが、分析フレームＦＲＡＭＥ２（Ｋ）も同様である。

The FFT unit 11 receives input sound signal sequences s1 (n) and s2 (n) from the microphones m1 and m2, and performs fast Fourier transform (or discrete Fourier transform) on the input sound signals s1 and s2. Thereby, the input sound signals s1 and s2 are expressed in the frequency domain. In performing the Fast Fourier Transform, analysis frames FRAME1 (K) and FRAME2 (K) composed of predetermined N samples are configured and applied from the input sound signals s1 (n) and s2 (n). An example of constructing the analysis frame FRAME1 (K) from the input sound signal s1 (n) is shown in the following equation (2), and the analysis frame FRAME2 (K) is the same.

なお、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。本文中では、Ｋが小さいほど古い分析フレームであり、大きいほど新しい分析フレームであるとする。また、以降の説明において、特に但し書きがない限りは、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。 K is an index indicating the order of frames and is expressed by a positive integer. In the text, it is assumed that the smaller the K, the older the analysis frame, and the larger, the newer the analysis frame. In the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified.

ＦＦＴ部１１は、分析フレームごとに高速フーリエ変換処理を施すことで、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換し、得られた周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）を、第１の指向性形成部１２及び第２の指向性形成部１３に与える。なお、ｆは周波数を表すインデックスである。また、Ｘ１（ｆ，Ｋ）は単一の値ではなく、（３）式に示すように、複致の周波数ｆ１〜ｆｍのスペクトル成分から構成されるものである。Ｘ２（ｆ，Ｋ）や後述するＢ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）も同様である。
Ｘ１（ｆ，Ｋ）
＝｛（ｆ１，Ｋ），（ｆ２，Ｋ），…，（ｆｍ，Ｋ）｝ …（３） The FFT unit 11 converts the frequency domain signals X1 (f, K) and X2 (f, K) into the frequency domain signals X1 (f, K) by performing a fast Fourier transform process for each analysis frame. And X2 (f, K) are given to the first directivity forming unit 12 and the second directivity forming unit 13. Note that f is an index representing a frequency. X1 (f, K) is not a single value, but is composed of spectral components of multiple frequencies f1 to fm, as shown in equation (3). The same applies to X2 (f, K) and later-described B1 (f, K) and B2 (f, K).
X1 (f, K)
= {(F1, K), (f2, K), ..., (fm, K)} (3)

第１の指向性形成部１２は、２つの周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から特定方向に指向性が強い信号Ｂ１（ｆ，Ｋ）を形成し、第２の指向性形成部１２は、２つの周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から特定方向（上述の特定方向とは異なる）に指向性が強い信号Ｂ２（ｆ，Ｋ）を形成するものである。特定方向に指向性が強い信号Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）の形成方法としては既存の方法を適用でき、例えば、（４）式を適用して右方向に指向性が強いＢ１（ｆ，Ｋ）や（５）式を適用して左方向に指向性が強いＢ２（ｆ，Ｋ）が形成できる。（４）式及び（５）式では、フレームインデックスＫは演算に関与しないので省略している。

The first directivity forming unit 12 forms a signal B1 (f, K) having strong directivity in a specific direction from the two frequency domain signals X1 (f, K) and X2 (f, K), The directivity forming unit 12 generates a signal B2 (f, K) having strong directivity in a specific direction (different from the above-described specific direction) from the two frequency domain signals X1 (f, K) and X2 (f, K). To form. As a method for forming the signals B1 (f, K) and B2 (f, K) having strong directivity in a specific direction, an existing method can be applied. For example, the directivity is strong in the right direction by applying the equation (4). B2 (f, K) having strong directivity in the left direction can be formed by applying B1 (f, K) and (5). In the equations (4) and (5), the frame index K is omitted because it is not involved in the calculation.

コヒーレンス係数計算部１４は、上述した２つの指向性信号Ｂ１（ｆ）、Ｂ２（ｆ）に基づいて、（６）式に示す演算を施すことでコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）を得るものである。なお、（６）式におけるＢ２（ｆ）^*はＢ２（ｆ）の共役複素数である。

The coherence coefficient calculation unit 14 obtains a coherence coefficient coef (f, K) by performing the calculation shown in the equation (6) based on the two directivity signals B1 (f) and B2 (f) described above. is there. Note that B2 (f) ^* in the equation (6) is a conjugate complex number of B2 (f).

コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）は、概念的に言えば、右から到来する信号と左から到来する信号のある周波数成分についての相関を表している。従って、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）が小さい場合とは、２つの指向性信号Ｂ１及びＢ２のその周波数成分の相関が小さい場合であり、反対にコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）が大きい場合とは相関が大きい場合と言い換えることができる。そして、相関が小さい場合は、入力音信号におけるその周波数成分の到来方向が右又は左のどちらかに大きく偏った場合か、偏りがなくても雑音のような相関が現れ難い明確な規則性の少ない成分の場合である。そのため、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）の値が大きい場合は、到来方向の偏りがないため、入力音信号におけるその成分が正面から到来する場合であるといえる。 Conceptually speaking, the coherence coefficient coef (f, K) represents a correlation between certain frequency components of a signal arriving from the right and a signal arriving from the left. Therefore, the case where the coherence coefficient coef (f, K) is small is a case where the correlation between the frequency components of the two directivity signals B1 and B2 is small, and conversely, the case where the coherence coefficient coef (f, K) is large. In other words, it can be said that the correlation is large. If the correlation is small, the arrival direction of the frequency component in the input sound signal is greatly deviated to the right or left, or there is a clear regularity that makes it difficult for noise-like correlation to appear even if there is no deviation. This is the case with few components. Therefore, when the value of the coherence coefficient coef (f, K) is large, it can be said that there is no deviation in the arrival direction, and that component in the input sound signal comes from the front.

但し、異音の場合には、含まれている周波数成分が１若しくは少数であるので、異音が正面から来た場合に、その含まれている周波数成分についてのコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）の値だけが大きくなる。図２は、異音に含まれている周波数成分が１つの場合（異音が正弦波異音の場合）における、周波数成分（周波数ビン）毎のコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）の値（実線）を示したものである。なお、図２では、参考のために、音声信号における周波数成分（周波数ビン）毎のコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）の値を破線で示している。 However, in the case of abnormal noise, the frequency component included is 1 or a small number, so when the abnormal noise comes from the front, the coherence coefficient coef (f, K) for the included frequency component. Only the value of increases. FIG. 2 shows the value (solid line) of the coherence coefficient coef (f, K) for each frequency component (frequency bin) when the frequency component included in the noise is one (when the noise is a sinusoidal noise). ). In FIG. 2, for reference, the value of the coherence coefficient coef (f, K) for each frequency component (frequency bin) in the audio signal is indicated by a broken line.

コヒーレンス係数計算部１４は、得られたコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）を、ｍｏｄＧＩ計算部１５及び判定部１６に与える。 The coherence coefficient calculation unit 14 gives the obtained coherence coefficient coef (f, K) to the modGI calculation unit 15 and the determination unit 16.

ｍｏｄＧＩ計算部１５は、周波数成分毎のコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）についてのｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を計算し、得られたｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を判定部１６に与えるものである。ｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）の計算式として、上述した（１）式を適用し、（１）式の算出対象信号ｓ（Ｋ）にコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）を代入してｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を算出する。（１）式は、特許文献２の（１３）式と同じ算出式であるが、特許文献２に記載されている（５）式や（１０）式〜（１２）式を適用してｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を計算するようにしても良い。 The modGI calculation unit 15 calculates a modGI value modGI (f, K) for the coherence coefficient coef (f, K) for each frequency component, and gives the obtained modGI value modGI (f, K) to the determination unit 16 It is. As the calculation formula of the modGI value modGI (f, K), the above-described formula (1) is applied, and the coherence coefficient coef (f, K) is substituted into the calculation target signal s (K) of the formula (1) to obtain the modGI value. modGI (f, K) is calculated. The expression (1) is the same calculation expression as the expression (13) in Patent Document 2, but the modGI value obtained by applying the expressions (5) and (10) to (12) described in Patent Document 2. modGI (f, K) may be calculated.

図３は、第１の実施形態に係る判定部１６の詳細構成を示すブロック図である。 FIG. 3 is a block diagram illustrating a detailed configuration of the determination unit 16 according to the first embodiment.

図３において、判定部１６は、入力信号受信部２１、コヒーレンス係数レンジ計算部２２、ｍｏｄＧＩレンジ計算部２３、目的音区間判定部２４及び判定結果送信部２５を有する。 In FIG. 3, the determination unit 16 includes an input signal reception unit 21, a coherence coefficient range calculation unit 22, a modGI range calculation unit 23, a target sound section determination unit 24, and a determination result transmission unit 25.

判定部１６は、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）及びｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）に基づいて、入力音信号が目的音（異音）であるか否かを判定するものである。 The determination unit 16 determines whether or not the input sound signal is the target sound (abnormal sound) based on the coherence coefficient coef (f, K) and the modGI value modGI (f, K).

入力信号受信部２１は、コヒーレンス係数計算部１４からのコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）及びｍｏｄＧＩ計算部１５からのｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を取込むものである。 The input signal receiving unit 21 receives the coherence coefficient coef (f, K) from the coherence coefficient calculation unit 14 and the modGI value modGI (f, K) from the modGI calculation unit 15.

コヒーレンス係数レンジ計算部２２は、ｃｏｅｆ（ｆ，Ｋ）の系列内の最小値と最大値を公知の探索アルゴリズムで探し、両者の差ｒａｎｇｅ＿ｃｏｅｆを算出するものである。 The coherence coefficient range calculation unit 22 searches for a minimum value and a maximum value in a sequence of coef (f, K) using a known search algorithm, and calculates a difference range_coef between the two.

ｍｏｄＧＩレンジ計算部２３は、ｍｏｄＧＩ（ｆ，Ｋ）の系列内の最小値と最大値を公知の探索アルゴリズムで探し、両者の差ｒｅｎｇｅ＿ｍｏｄＧＩを算出するものである。 The modGI range calculation unit 23 searches for a minimum value and a maximum value in a series of modGI (f, K) using a known search algorithm, and calculates a difference range_modGI between the two.

目的音区間判定部２４は、コヒーレンス係数レンジ計算部２２で算出されたｒａｎｇｅ＿ｃｏｅｆとｍｏｄＧＩレンジ計算部２３で算出されたｒｅｎｇｅ＿ｍｏｄＧＩを用いて、入力音信号が目的音（異音）であるか否かを判定するものである。 The target sound section determination unit 24 uses the range_coef calculated by the coherence coefficient range calculation unit 22 and the range_modGI calculated by the modGI range calculation unit 23 to determine whether or not the input sound signal is the target sound (abnormal sound). Judgment.

判定結果送信部２５は、目的音区間判定部２４による判定結果ｒｅｓ（Ｋ）を図示しない信号処理部に送信するものである。判定結果ｒｅｓ（Ｋ）は、例えば、信号処理部（例えば、ボイススイッチ処理部）で利用される。 The determination result transmission unit 25 transmits the determination result res (K) by the target sound section determination unit 24 to a signal processing unit (not shown). The determination result res (K) is used in, for example, a signal processing unit (for example, a voice switch processing unit).

（Ａ−３）第１の実施形態の動作
次に、第１の実施形態に係る目的音区間判定装置１０の動作を、図面を参照しながら、全体動作、判定部１６における動作の順に説明する。 (A-3) Operation of the First Embodiment Next, the operation of the target sound section determination device 10 according to the first embodiment will be described in the order of the overall operation and the operation of the determination unit 16 with reference to the drawings. .

一対のマイクロホンｍ１及びｍ２から入力された信号ｓ１（ｎ）、ｓ２（ｎ）はそれぞれ、ＦＦＴ部１１によって時間領域から周波数領域の信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換された後、第１及び第２の指向性形成部１２及び１３のそれぞれによって、所定の方位に死角を有する指向性信号Ｂ１(ｆ，Ｋ)、Ｂ２（ｆ，Ｋ）が生成される。そして、コヒーレンス係数計算部１４において、指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）が適用されて、（６）式の演算が実行され、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）が算出されてｍｏｄＧＩ計算部１５及び判定部１６に与えられる。ｍｏｄＧＩ計算部１５においては、コヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）についてのｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）が例えば(１)式に従って算出されて判定部１６に与えられる。 Signals s1 (n) and s2 (n) input from the pair of microphones m1 and m2 are respectively converted from time domain to frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 11. Thereafter, directivity signals B1 (f, K) and B2 (f, K) having a blind spot in a predetermined direction are generated by the first and second directivity forming units 12 and 13, respectively. Then, in the coherence coefficient calculation unit 14, the directivity signals B1 (f, K) and B2 (f, K) are applied, the calculation of equation (6) is executed, and the coherence coefficient coef (f, K) is calculated. Then, it is given to the modGI calculation unit 15 and the determination unit 16. In the mod GI calculation unit 15, the mod GI value mod GI (f, K) for the coherence coefficient coef (f, K) is calculated according to, for example, the equation (1) and is given to the determination unit 16.

図４は、第１の実施形態に係る判定部１６の動作を示すフローチャートである。図４に示す動作は、処理対象フレームＫが新たなフレームに切り替わる毎に繰り返し実行される。 FIG. 4 is a flowchart illustrating the operation of the determination unit 16 according to the first embodiment. The operation illustrated in FIG. 4 is repeatedly performed every time the processing target frame K is switched to a new frame.

入力信号受信部２１は、コヒーレンス係数計算部１４からのコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）及びｍｏｄＧＩ計算部１５からのｍｏｄＧＩ値ｍｏｄＧＩ（ｆ，Ｋ）を受信する（Ｓ１０１）。 The input signal receiving unit 21 receives the coherence coefficient coef (f, K) from the coherence coefficient calculating unit 14 and the modGI value modGI (f, K) from the modGI calculating unit 15 (S101).

コヒーレンス係数レンジ計算部２２は、全ての周波数成分のコヒーレンス係数ｃｏｅｆ（ｆ，Ｋ）の中の最大値ＭＡＸ（ｃｏｅｆ（ｆ，Ｋ））と最小値ＭＩＮ（ｃｏｅｆ（ｆ，Ｋ））とを探索し、探索で得られた最大値ＭＡＸ（ｃｏｅｆ（ｆ，Ｋ））と最小値ＭＩＮ（ｃｏｅｆ（ｆ，Ｋ））との差ｒａｎｇｅ＿ｃｏｅｆ（Ｋ）を算出する（ステップＳ１０２）。 The coherence coefficient range calculation unit 22 searches for the maximum value MAX (coef (f, K)) and the minimum value MIN (coef (f, K)) among the coherence coefficients coef (f, K) of all frequency components. Then, a difference range_coef (K) between the maximum value MAX (coef (f, K)) obtained by the search and the minimum value MIN (coef (f, K)) is calculated (step S102).

ｍｏｄＧＩレンジ計算部２３は、全ての周波数成分のｍｏｄＧＩ値ｍｏｄＧＩの中の最大値ＭＡＸ（ｍｏｄＧＩ（ｆ，Ｋ））と最小値ＭＩＮ（ｍｏｄＧＩ（ｆ，Ｋ））とを探索し、探索で得られた最大値ＭＡＸ（ｍｏｄＧＩ（ｆ，Ｋ））と最小値ＭＩＮ（ｍｏｄＧＩ（ｆ，Ｋ））との差ｒａｎｇｅ＿ｍｏｄＧＩ（Ｋ）を算出する（ステップＳ１０３）。 The modGI range calculation unit 23 searches the maximum value MAX (modGI (f, K)) and the minimum value MIN (modGI (f, K)) among the modGI values modGI of all frequency components, and is obtained by the search. The difference range_modGI (K) between the maximum value MAX (modGI (f, K)) and the minimum value MIN (modGI (f, K)) is calculated (step S103).

目的音区間判定部２４は、算出された差ｒａｎｇｅ＿ｃｏｅｆ（Ｋ）と閾値Ψとを比較する（ステップＳ１０４）。同様に、目的音区間判定部２４は、算出された差ｒａｎｇｅ＿ｍｏｄＧＩ（Ｋ）と閾値Φとを比較する。ここでの閾値Ψは、シミュレーション等によって予め定められるものであり、目的音（異音）に係る差ｒａｎｇｅ＿ｃｏｅｆ（Ｋ）と、非目的音に係る差ｒａｎｇｅ＿ｃｏｅｆ（Ｋ）とを切り分けられる値に選定される。閾値Φについても同様である。 The target sound section determination unit 24 compares the calculated difference range_coef (K) with the threshold value Ψ (step S104). Similarly, the target sound section determination unit 24 compares the calculated difference range_modGI (K) with the threshold Φ. Here, the threshold Ψ is determined in advance by simulation or the like, and is selected to be a value capable of separating the difference range_coef (K) related to the target sound (abnormal sound) and the difference range_coef (K) related to the non-target sound. The The same applies to the threshold Φ.

算出された差ｒａｎｇｅ＿ｃｏｅｆ（Ｋ）が閾値Ψ以上であって、算出された差ｒａｎｇｅ＿ｍｏｄＧＩ（Ｋ）が閾値Φより小さい条件のときには、現在の処理対象フレームＫは正面異音の区間内のフレームであると判定する。一方、上記条件を満たさない場合には、現在の処理対象フレームＫは非目的音の区間内のフレームであると判定する。 When the calculated difference range_coef (K) is equal to or larger than the threshold Ψ and the calculated difference range_modGI (K) is smaller than the threshold Φ, the current processing target frame K is a frame in the front abnormal noise section. Is determined. On the other hand, when the above condition is not satisfied, it is determined that the current processing target frame K is a frame in the non-target sound section.

先述のステップＳ１０４の処理において、正面異音の区間内のフレームであると判定されたフレーム（Ｋ）について、目的音区間判定部２４は、その判定結果ｒｅｓ（Ｋ）に「１」を格納する（ステップＳ１０５）。 The target sound section determination unit 24 stores “1” in the determination result res (K) for the frame (K) determined to be a frame in the front abnormal sound section in the process of step S104 described above. (Step S105).

先述のステップＳ１０４の処理において、非目的音の区間内のフレームであると判定されたフレーム（Ｋ）について、目的音区間判定部２４は、その判定結果ｒｅｓ（Ｋ）に「０」を格納する（ステップＳ１０６）。なお、ステップＳ１０５及びステップＳ１０６の処理における判定結果ｒｅｓ（Ｋ）に代入する値（「１」又は「０」）は、一例であり、利用する用途に応じて任意の値を格納しても良い。 For the frame (K) determined to be a frame within the non-target sound section in the process of step S104 described above, the target sound section determination unit 24 stores “0” in the determination result res (K). (Step S106). Note that the value (“1” or “0”) to be substituted for the determination result res (K) in the processing of step S105 and step S106 is an example, and an arbitrary value may be stored depending on the application to be used. .

目的音区間判定部２４は、パラメータＫをインクリメントする（ステップＳ１０７）。そして、新たなフレームが処理対象フレームＫになって上述した動作が繰り返される。 The target sound section determination unit 24 increments the parameter K (step S107). Then, the new frame becomes the processing target frame K, and the above-described operation is repeated.

なお、判定結果ｒｅｓ（Ｋ）は、判定結果送信部２５により図示しない処理部（ボイススイッチ処理部等）に送信される。 The determination result res (K) is transmitted by the determination result transmitting unit 25 to a processing unit (not shown) (voice switch processing unit or the like).

（Ａ−４）第１の実施形態の効果
第１の実施形態によれば、１又は少数の周波数成分だけを含む異音が目的音の場合でも、目的音として判定することができるようになる。これにより、音声以外の成分を対象とした異音検出・検査等にも活用することができるようになる。 (A-4) Effect of First Embodiment According to the first embodiment, even when an abnormal sound including only one or a small number of frequency components is the target sound, it can be determined as the target sound. . As a result, it can be used for abnormal sound detection / inspection for components other than speech.

（Ｂ）他の実施形態
上記実施形態に加えて、さらに、以下に例示するような変形実施形態も挙げることができる。 (B) Other Embodiments In addition to the above-described embodiments, the following modified embodiments can be exemplified.

（Ｂ−１）第１の実施形態ではｍｏｄＧＩを適用する場合を示したが、修正される前のＧＩも、信号波形の傾き方向が変化する回数とその大きさを測る指標であるので、第１の実施形態におけるｍｏｄＧＩに代えてＧＩを適用するようにしても良い。 (B-1) Although the case where modGI is applied has been described in the first embodiment, since the GI before correction is also an index for measuring the number and magnitude of changes in the inclination direction of the signal waveform, GI may be applied instead of mod GI in the first embodiment.

（Ｂ−２）第１の実施形態において、周波数領域の信号で処理していた処理を、可能ならば時間領域の信号で処理するようにしても良い。 (B-2) In the first embodiment, the processing performed with the frequency domain signal may be performed with the time domain signal if possible.

（Ｂ−３）本発明は、コヒーレンス係数を得た後の構成に特徴を有し、それ以前の構成は、第１の実施形態のものに必ずしも限定されるものではない。例えば、マイクロホンとして３つ以上を有するマイクロホンアレイの信号を処理してコヒーレンス係数を得、その後、ｍｏｄＧＩ（やＧＩ）を算出して目的音に特有な周波数成分を検出するようにしても良い。 (B-3) The present invention is characterized by the configuration after obtaining the coherence coefficient, and the configuration before that is not necessarily limited to that of the first embodiment. For example, a signal of a microphone array having three or more microphones may be processed to obtain a coherence coefficient, and then modGI (or GI) may be calculated to detect a frequency component peculiar to the target sound.

１０…目的音区間判定装置、ｍ１、ｍ２…マイクロホン、１１…ＦＦＴ部、１２…第１の指向性形成部、１３…第２の指向性形成部、１４…コヒーレンス係数計算部、１５…ｍｏｄＧＩ計算部、１６…判定部、２１…入力信号受信部、２２…コヒーレンス係数レンジ計算部、２３…ｍｏｄＧＩレンジ計算部、２４…目的音区間判定部、２５…判定結果送信部。 DESCRIPTION OF SYMBOLS 10 ... Target sound area determination apparatus, m1, m2 ... Microphone, 11 ... FFT part, 12 ... 1st directivity formation part, 13 ... 2nd directivity formation part, 14 ... Coherence coefficient calculation part, 15 ... modGI calculation , 16 ... determination unit, 21 ... input signal reception unit, 22 ... coherence coefficient range calculation unit, 23 ... modGI range calculation unit, 24 ... target sound section determination unit, 25 ... determination result transmission unit.

Claims

A target sound section determination device for determining a target sound section from a section of an input sound signal using a sound including one or a small number of frequency components as a target sound,
Directivity that forms a plurality of directional signals with different azimuths with a dead angle by applying a delayed subtraction process to the input sound signal to give a directional characteristic having a blind spot in a predetermined direction. Forming means;
A coherence coefficient calculating means for obtaining a coherence coefficient using a plurality of formed directional signals;
A coherence coefficient feature amount calculating means for capturing the obtained coherence coefficient as a time-varying signal and obtaining a coherence coefficient feature amount representing the number of times and the magnitude of the change in the inclination direction of the signal waveform;
A target sound comprising: target sound section determination means for determining whether or not an input sound signal section is a target sound section based on the coherence coefficient range and the coherence coefficient feature amount range. Section determination device.

2. The object according to claim 1, wherein the coherence coefficient feature amount calculating unit calculates a value obtained by normalizing the power of the second-order difference of the coherence coefficient with the power of the coherence coefficient as a coherence coefficient feature amount. Sound segment determination device.

The target sound section determination means determines that the current input sound signal section is the target sound section when the coherence coefficient range is equal to or greater than the first threshold and the coherence coefficient feature amount range is smaller than the second threshold. In other cases, it is determined as a non-target sound section, and the target sound section determination device according to claim 1 or 2.

A target sound section determination method for determining a target sound section from a section of an input sound signal using a sound including one or a small number of frequency components as a target sound,
The directivity forming means performs a delay subtraction process on the input sound signal to provide a plurality of directivity signals having a directivity characteristic having a blind spot in a predetermined direction, and a plurality of directivities having different predetermined directions having a blind spot. Form a signal,
A coherence coefficient calculation means acquires a coherence coefficient using a plurality of formed directional signals,
The coherence coefficient feature amount calculation means regards the obtained coherence coefficient as a time-varying signal, acquires a coherence coefficient feature amount representing the number of times and the magnitude of the change in the inclination direction of the signal waveform,
A target sound section, wherein the target sound section determination means determines whether the section of the input sound signal is a target sound section based on the coherence coefficient range and the coherence coefficient feature amount range. Judgment method.

A target sound section determination program applied to determine a target sound section from a section of an input sound signal, with a sound including one or a small number of frequency components as a target sound,
Computer
Directivity that forms a plurality of directional signals with different azimuths with a dead angle by applying a delayed subtraction process to the input sound signal to give a directional characteristic having a blind spot in a predetermined direction. Forming means;
A coherence coefficient calculating means for obtaining a coherence coefficient using a plurality of formed directional signals;
A coherence coefficient feature amount calculating means for capturing the obtained coherence coefficient as a time-varying signal and obtaining a coherence coefficient feature amount representing the number of times and the magnitude of the change in the inclination direction of the signal waveform;
Based on the range of the coherence coefficient and the range of the coherence coefficient feature value, the input sound signal is made to function as a target sound section determination unit that determines whether or not a section of the input sound signal is a target sound section. Target sound section judgment program.