JP2014123011A

JP2014123011A - Noise detector, method, and program

Info

Publication number: JP2014123011A
Application number: JP2012279013A
Authority: JP
Inventors: Runyu Shi; 潤宇史; Hiroyuki Honma; 弘幸本間; Yuki Yamamoto; 優樹山本; Toru Chinen; 徹知念
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2012-12-21
Filing date: 2012-12-21
Publication date: 2014-07-03
Also published as: CN103886870A; US20140180682A1

Abstract

PROBLEM TO BE SOLVED: To detect various kinds of sudden noise without increasing processing loads of equipment.SOLUTION: A noise detector comprises: a feature variation calculation part which calculates feature variation which is variation of feature quantities between two temporally adjacent frames based on either feature quantities of amplitude feature quantities or frequency feature quantities held at a holding part which holds the amplitude feature quantities and the frequency feature quantities for a plurality of frames; a section specification part which specifies a section of frames in which weighting averaging of the amplitude feature quantities and the frequency feature quantities held at the holding part should be performed by comparing the feature variation with a preset threshold; a feature quantity set generation part which generates a set of weighting average values of each of the amplitude feature quantities and the frequency feature quantities corresponding to each of the frames of the specified section as a feature quantity set; and a noise determination part which determines whether or not the latest frame of an input signal is a frame including non-constant noise which is sudden noise based on the feature quantity set.

Description

本技術は、雑音検出装置および方法、並びに、プログラムに関し、特に、機器の処理負荷を増加させることなく、様々な突発性の雑音を検出することができるようにする雑音検出装置および方法、並びに、プログラムに関する。 The present technology relates to a noise detection apparatus and method, and a program, and in particular, a noise detection apparatus and method that can detect various sudden noises without increasing the processing load on the device, and Regarding the program.

ＩＣレコーダ、スマートフォン、ビデオカメラなどの録音機器は、内蔵された小型のマイクロホンにより、周辺の音声を録音する。 Recording devices such as IC recorders, smartphones, and video cameras record surrounding sounds using a small built-in microphone.

このような録音機器で録音するときには、ユーザが操作ボタンなど用いて該録音機器を操作するときの操作音、または、録音機器から離隔した位置で発生したキーボードの操作音などが、雑音として録音音声に混入してしまう。 When recording with such a recording device, the operation sound when the user operates the recording device using the operation buttons or the operation sound of the keyboard generated at a position separated from the recording device is recorded as noise. It will be mixed.

そこで、録音機器において、録音時に雑音として混入する離隔した位置で発生したキーボードの操作音のような特殊な雑音を検出及び低減するための技術が提案されている（例えば、特許文献１参照。）。 Therefore, a technique has been proposed for detecting and reducing special noise such as keyboard operation sounds generated at remote positions mixed as noise during recording in a recording device (see, for example, Patent Document 1). .

特許文献１の雑音検出方法では、主として、録音機器から離隔した位置で発生したキーボードの操作音を検出対象としている。 In the noise detection method of Patent Document 1, a keyboard operation sound generated at a position separated from a recording device is mainly detected.

キーボードの操作音は、一般的に、録音された音声信号上で継続時間が比較的に長いパルス状の雑音信号の集合として現れる。このため、継続時間が比較的に長いパルス状の雑音信号の振幅値（信号レベル）を閾値と比較したり、音声信号には殆どない高周波数域成分を閾値と比較することで、操作音による雑音を容易に検出することが可能である。 The operation sound of the keyboard generally appears as a set of pulsed noise signals having a relatively long duration on the recorded voice signal. For this reason, the amplitude value (signal level) of a pulse-like noise signal having a relatively long duration is compared with a threshold value, or a high frequency region component that is hardly present in an audio signal is compared with a threshold value. Noise can be easily detected.

また、入力信号が音声（例えば、会話など）であるか非音声であるかを判定する技術も提案されている（例えば、特許文献２参照）。例えば、特許文献２の技術を利用して非音声と判定したフレームが雑音として認識されるようにすることも可能である。 In addition, a technique for determining whether an input signal is voice (for example, conversation) or non-voice has been proposed (see, for example, Patent Document 2). For example, it is possible to recognize a frame determined as non-speech using the technique of Patent Document 2 as noise.

特開２０１２−０２７１８６号公報JP 2012-027186 A 特開２００９−２５１１３４号公報JP 2009-251134 A

しかしながら、録音機器により録音された雑音は、必ずしもキーボードの操作音のような、周波数特徴がパルス信号に似る信号ではなく、多人数の大爆笑や、擦り音など特殊な周波数特徴を持つ突発性の雑音も多く発生している。このような雑音は、例えば、特許文献１などの従来の技術により検出することが困難であった。 However, the noise recorded by the recording device is not necessarily a signal whose frequency characteristic resembles a pulse signal, such as the operation sound of a keyboard. There is a lot of noise. Such noise has been difficult to detect by conventional techniques such as Patent Document 1, for example.

また、録音機器により録音された突発性の雑音の多く（例えば、長時間の拍手、咳、くしゃみ）は、継続時間が安定せず、分散の大きいほぼ予測不能な値となるため、特許文献１の技術に係る雑音検出方式の１つである減衰特徴量を用いた検出方式により検出することも困難であった。 Further, many of the sudden noises recorded by the recording device (for example, long-time applause, cough, sneeze) are unstable in duration and become a value with a large variance that is almost unpredictable. It is also difficult to detect by the detection method using the attenuation feature amount, which is one of the noise detection methods according to the above technique.

さらに、特許文献１の技術のような減衰特徴量を用いた検出方式では、比較的に長い時間範囲で信号を分析しているため、その時間範囲分の遅延が生じるという問題があった。 Furthermore, in the detection method using the attenuation feature amount as in the technique of Patent Document 1, since the signal is analyzed in a relatively long time range, there is a problem that a delay corresponding to the time range occurs.

また、特許文献２の技術は、あくまで入力信号が音声であるかどうかを判断する手法であり、雑音の検出を目的としていない。例えば、特許文献２の技術を利用して雑音検出を行なったとしても、その雑音が突発性雑音であるか否かを判断することはできない。 Moreover, the technique of patent document 2 is a method of determining whether an input signal is a voice to the last, and does not aim at the detection of noise. For example, even if noise detection is performed using the technique of Patent Document 2, it cannot be determined whether or not the noise is sudden noise.

また、特許文献２記載の方式は、計算が複雑であり、例えば、モバイル機器に実装することが難しいと考えられる。 Further, the method described in Patent Document 2 is complicated in calculation, and is considered difficult to implement in, for example, a mobile device.

本技術はこのような状況に鑑みて開示するものであり、機器の処理負荷を増加させることなく、様々な突発性の雑音を検出することができるようにするものである。 The present technology is disclosed in view of such a situation, and allows various sudden noises to be detected without increasing the processing load of the device.

本技術の一側面は、音声の入力信号の所定のフレームの波形における振幅特徴量を計算する振幅特徴量計算部と、前記所定のフレームの波形における周波数特徴量を計算する周波数特徴量計算部と、前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算する特徴変化量計算部と、前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定する区間特定部と、前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成する特徴量集合生成部と、前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定する雑音判定部とを備える雑音検出装置である。 One aspect of the present technology is an amplitude feature amount calculation unit that calculates an amplitude feature amount in a waveform of a predetermined frame of an audio input signal; a frequency feature amount calculation unit that calculates a frequency feature amount in the waveform of the predetermined frame; , The amplitude feature quantity and the frequency feature quantity are temporally adjacent based on any one of the amplitude feature quantity and the frequency feature quantity held in a holding unit that holds a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two frames, and a feature change amount that is held in the holding unit by comparing the feature change amount with a preset threshold value. A section identifying unit that identifies a section of a frame that is a weighted average of the amplitude feature quantity and the frequency feature quantity, and that is identified in time. A feature value set generation unit that generates a set of weighted average values of the amplitude feature value and the frequency feature value corresponding to each of the frames in the section as a feature value set; and the input based on the feature value set The noise detection apparatus includes a noise determination unit that determines whether or not a latest frame of a signal is a frame including non-stationary noise that is sudden noise.

前記振幅特徴量計算部または前記周波数特徴量計算部は、複数種類の振幅特徴量または複数種類の周波数特徴量のうちの少なくとも２種類の振幅特徴量を計算し、前記所定のフレームの入力信号のゼロ交差率、前記所定のフレームの入力信号の複数のサンプル値の平均値、または、前記所定のフレームの入力信号の複数のサンプル値のＲＳＭ値に基づいて、複数種類の振幅特徴量のうち、前記振幅特徴量計算部に計算させる振幅特徴量、または、複数種類の周波数特徴量のうち、前記周波数特徴量計算部に計算させる周波数特徴量を選択する特徴量選択部をさらに備えるようにすることができる。 The amplitude feature amount calculation unit or the frequency feature amount calculation unit calculates at least two types of amplitude feature amounts among a plurality of types of amplitude feature amounts or a plurality of types of frequency feature amounts, and outputs the input signal of the predetermined frame. Based on the zero-crossing rate, the average value of the plurality of sample values of the input signal of the predetermined frame, or the RSM value of the plurality of sample values of the input signal of the predetermined frame, among a plurality of types of amplitude feature quantities, A feature quantity selection unit that selects an amplitude feature quantity to be calculated by the amplitude feature quantity calculation unit or a frequency feature quantity to be calculated by the frequency feature quantity calculation unit among a plurality of types of frequency feature quantities is further provided. Can do.

前記特徴量選択部は、前記所定のフレームの入力信号のゼロ交差率に基づいて、前記所定のフレームの入力信号が母音に近いか子音に近いかを判定し、前記判定結果に応じて前記振幅特徴量計算部に計算させる振幅特徴量、および、複数種類の周波数特徴量のうち、前記周波数特徴量計算部に計算させる周波数特徴量を選択するようにすることができる。 The feature amount selection unit determines whether the input signal of the predetermined frame is close to a vowel or a consonant based on a zero-crossing rate of the input signal of the predetermined frame, and the amplitude according to the determination result The frequency feature quantity to be calculated by the frequency feature quantity calculation section can be selected from among the amplitude feature quantity to be calculated by the feature quantity calculation section and a plurality of types of frequency feature quantities.

前記振幅特徴量計算部は、前記所定のフレームの複数のサンプル値の中のピーク値、前記所定のフレームの複数のサンプル値の平均値、または、前記所定のフレームの複数のサンプル値のＲＭＳ値のうちの、少なくとも１つを前記振幅特徴量として計算し、前記周波数特徴量計算部は、前記所定のフレームの入力信号のゼロ交差率、前記所定のフレームの入力信号の中で全ての周波数成分の音圧に対する特定の周波数成分の音圧の割合、前記所定のフレームの入力信号の中で特定の周波数成分とは異なる周波数成分の音圧に対する当該特定の周波数成分の音圧の割合、または、前記所定のフレームの入力信号をフーリエ変換して得られた周波数スペクトルのうちの特定の１つの値若しくは複数の値のうちの、少なくとも１つを前記周波数特徴量として計算するようにすることができる。 The amplitude feature amount calculation unit includes a peak value among a plurality of sample values of the predetermined frame, an average value of the plurality of sample values of the predetermined frame, or an RMS value of the plurality of sample values of the predetermined frame At least one of them as the amplitude feature amount, and the frequency feature amount calculation unit calculates a zero crossing rate of the input signal of the predetermined frame and all frequency components in the input signal of the predetermined frame. The ratio of the sound pressure of the specific frequency component to the sound pressure of the specific frequency component, the ratio of the sound pressure of the specific frequency component to the sound pressure of the frequency component different from the specific frequency component in the input signal of the predetermined frame, or At least one of a specific value or a plurality of values in a frequency spectrum obtained by Fourier transform of the input signal of the predetermined frame is the frequency characteristic. It can be made to calculate a.

前記雑音判定部は、前記特徴量集合に含まれる前記振幅特徴量の重み付け平均値と予め設定された第１の値との割合、および、前記周波数特徴量の重み付け平均値と予め設定された第２の値との割合を算出し、前記算出された割合に基づいて、雑音尤度を算出し、前記雑音尤度を予め設定された閾値と比較することで、前記入力信号の最新のフレームが前記非定常性雑音を含むフレームであるか否かを判定するようにすることができる。 The noise determination unit includes a ratio between a weighted average value of the amplitude feature value included in the feature value set and a preset first value, and a weighted average value of the frequency feature value. 2 is calculated, a noise likelihood is calculated based on the calculated ratio, and the noise likelihood is compared with a preset threshold value, thereby obtaining the latest frame of the input signal. It can be determined whether or not the frame includes the non-stationary noise.

前記雑音判定部は、前記特徴量集合に含まれる振幅特徴量の重み付け平均値および周波数特徴量の重み付け平均値のうち、一部または全部を用いた特徴ベクトル空間において、予め学習した識別モデルに基づいて、前記特徴量集合に対応する特徴ベクトルから、当該フレームが非定常性雑音のフレームのであることの確からしさを表す雑音尤度を算出し、
前記雑音尤度を予め設定された閾値と比較することで、前記入力信号の最新のフレームが前記非定常性雑音を含むフレームであるか否かを判定するようにすることができる。 The noise determination unit is based on an identification model learned in advance in a feature vector space using a part or all of a weighted average value of amplitude feature amounts and a weighted average value of frequency feature amounts included in the feature amount set. Then, from the feature vector corresponding to the feature amount set, a noise likelihood representing the certainty that the frame is a non-stationary noise frame is calculated,
By comparing the noise likelihood with a preset threshold value, it can be determined whether or not the latest frame of the input signal is a frame including the non-stationary noise.

前記入力信号を供給する信号入力装置の周波数特性を補正する周波数特性補正部をさらに備えるようにすることができる。 A frequency characteristic correction unit that corrects a frequency characteristic of a signal input device that supplies the input signal may be further provided.

前記入力信号から前記非定常性雑音とは異なる雑音である定常性雑音を除去する定常性雑音除去部をさらに備えるようにすることができる。 A stationary noise removing unit that removes stationary noise that is different from the non-stationary noise from the input signal may be further provided.

本技術の一側面は、振幅特徴量計算部が、音声の入力信号の所定のフレームの波形における振幅特徴量を計算し、周波数特徴量計算部が、前記所定のフレームの波形における周波数特徴量を計算し、特徴変化量計算部が、前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算し、区間特定部が、前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定し、特徴量集合生成部が、前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成し、雑音判定部が、前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定するステップを含む雑音検出方法である。 In one aspect of the present technology, the amplitude feature amount calculation unit calculates an amplitude feature amount in a waveform of a predetermined frame of an audio input signal, and the frequency feature amount calculation unit calculates a frequency feature amount in the waveform of the predetermined frame. The feature change amount calculating unit calculates any one of the amplitude feature amount and the frequency feature amount held in the holding unit that holds the amplitude feature amount and the frequency feature amount for a plurality of frames. And calculating a feature change amount that is a change amount of the feature amount between two temporally adjacent frames, and the section specifying unit compares the feature change amount with a preset threshold value. A frame interval in which the amplitude feature value and the frequency feature value held in the holding unit are to be weighted and averaged, and a frame segment that is temporally continuous is specified, A unit generates a set of weighted average values of the amplitude feature amount and the frequency feature amount corresponding to each of the frames in the specified section as a feature amount set, and a noise determination unit includes the feature amount set. And determining whether the latest frame of the input signal is a frame including non-stationary noise, which is sudden noise.

本技術の一側面は、コンピュータを、音声の入力信号の所定のフレームの波形における振幅特徴量を計算する振幅特徴量計算部と、前記所定のフレームの波形における周波数特徴量を計算する周波数特徴量計算部と、前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算する特徴変化量計算部と、前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定する区間特定部と、前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成する特徴量集合生成部と、前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定する雑音判定部とを備える雑音検出装置として機能させるプログラムである。 One aspect of the present technology provides an amplitude feature amount calculation unit that calculates an amplitude feature amount in a waveform of a predetermined frame of an audio input signal, and a frequency feature amount that calculates a frequency feature amount in the waveform of the predetermined frame. Based on any one of the amplitude feature quantity and the frequency feature quantity held in the calculation section and the holding section that holds the amplitude feature quantity and the frequency feature quantity for a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two adjacent frames, and the holding unit by comparing the feature change amount with a preset threshold value. A section specifying unit that specifies a section of frames in which the amplitude feature quantity and the frequency feature quantity that are held are to be weighted and averaged and that is continuous in time A feature value set generation unit that generates a set of weighted average values of the amplitude feature value and the frequency feature value corresponding to each of the frames in the specified section as a feature value set; and Based on this, the program is made to function as a noise detection device including a noise determination unit that determines whether or not the latest frame of the input signal is a frame including non-stationary noise that is sudden noise.

本技術の一側面においては、音声の入力信号の所定のフレームの波形における振幅特徴量が計算され、前記所定のフレームの波形における周波数特徴量が計算され、前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量が計算され、前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間が特定され、前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合が、特徴量集合として生成され、前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かが判定される。 In one aspect of the present technology, an amplitude feature amount in a waveform of a predetermined frame of an audio input signal is calculated, a frequency feature amount in the waveform of the predetermined frame is calculated, and the amplitude feature amount and the frequency feature amount are calculated. A change in the feature amount between two temporally adjacent frames based on any one of the amplitude feature amount and the frequency feature amount held in the holding unit that holds a plurality of frames. A feature change amount that is a quantity is calculated, and the feature change amount is compared with a preset threshold value, whereby the amplitude feature amount and the frequency feature amount held in the holding unit are weighted and averaged. Sections of frames that are temporally continuous, are specified, and the amplitude feature amount corresponding to each of the frames of the specified section and the A set of weighted average values of wave number feature quantities is generated as a feature quantity set, and based on the feature quantity set, a frame including non-stationary noise in which the latest frame of the input signal is abrupt noise. It is determined whether or not there is.

本技術によれば、機器の処理負荷を増加させることなく、様々な突発性の雑音を検出することができる。 According to the present technology, various sudden noises can be detected without increasing the processing load on the device.

本技術の一実施の形態に係る雑音検出装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the noise detection apparatus which concerns on one embodiment of this technique. 信号入力部の周波数特性曲線と、周波数特性線形平均の関係を示す図である。It is a figure which shows the relationship between the frequency characteristic curve of a signal input part, and a frequency characteristic linear average. 図１のフレーム統合部の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of the frame integration part of FIG. 入力信号の波形、振幅特徴量の変化を示す波形、および特徴変化量の変化を示す波形の図である。It is a figure of the waveform which shows the waveform of an input signal, the waveform which shows the change of an amplitude feature-value, and the change of a feature-change amount. 図１の雑音検出装置による雑音検出処理の例を説明するフローチャートである。It is a flowchart explaining the example of the noise detection process by the noise detection apparatus of FIG. 図５の統合処理の詳細な例を説明するフローチャートである。It is a flowchart explaining the detailed example of the integration process of FIG. 本技術を適用した雑音検出装置の別の実施の形態に係る構成例を示すブロック図である。It is a block diagram which shows the structural example which concerns on another embodiment of the noise detection apparatus to which this technique is applied. 図７の特徴量選択部の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of the feature-value selection part of FIG. 咳と母音および咳と子音の周波数特性の比較の一例を示す図である。It is a figure which shows an example of the comparison of the frequency characteristic of a cough and a vowel, and a cough and a consonant. 音声信号におけるゼロ交差率の分布の一例を示す図である。It is a figure which shows an example of distribution of the zero crossing rate in an audio | voice signal. 本技術を適用した雑音検出装置のさらに別の実施の形態に係る構成例を示すブロック図である。It is a block diagram which shows the structural example which concerns on another embodiment of the noise detection apparatus to which this technique is applied. パーソナルコンピュータの構成例を示すブロック図である。And FIG. 16 is a block diagram illustrating a configuration example of a personal computer.

以下、図面を参照して、ここで開示する技術の実施の形態について説明する。 Hereinafter, embodiments of the technology disclosed herein will be described with reference to the drawings.

図１は、本技術の一実施の形態に係る雑音検出装置の構成例を示すブロック図である。同図に示される雑音検出装置１００は、例えば、周囲の音声に含まれる突発性の雑音（非定常性雑音とも称する）を検出するようになされている。ここで、突発性の雑音は、例えば、長時間の拍手、咳、くしゃみなどの音とされる。 FIG. 1 is a block diagram illustrating a configuration example of a noise detection device according to an embodiment of the present technology. The noise detection apparatus 100 shown in the figure is configured to detect, for example, sudden noise (also referred to as non-stationary noise) included in surrounding sounds. Here, sudden noise is, for example, sounds such as long applause, coughing, and sneezing.

図１に示されるように、雑音検出装置１００は、周波数特性補正部１０１、定常性雑音軽減部１０２、振幅特徴量計算部１０４、周波数特徴量計算部１０５、フレーム統合部１０６、尤度計算部１０７、および、雑音検出部１０８により構成されている。 As shown in FIG. 1, the noise detection apparatus 100 includes a frequency characteristic correction unit 101, a stationary noise reduction unit 102, an amplitude feature amount calculation unit 104, a frequency feature amount calculation unit 105, a frame integration unit 106, and a likelihood calculation unit. 107 and a noise detection unit 108.

また、雑音検出装置１００には信号入力部５１が接続され、さらに信号処理装置５２が接続されている。 Further, a signal input unit 51 is connected to the noise detection device 100, and a signal processing device 52 is further connected.

信号入力部５１は、周囲の音声を集音マイク、マイクから入力された音声信号を主制御装置から与えられる増幅率で増幅するアンプ、および、アンプから供給されたアナログ信号をデジタル信号に変換するＡＤ変換器を有する構成とされる。 The signal input unit 51 converts a surrounding sound into a sound collecting microphone, an amplifier that amplifies a sound signal input from the microphone with an amplification factor given from the main control device, and converts an analog signal supplied from the amplifier into a digital signal. An AD converter is included.

なお、近年では、アンプおよびＡＤ変換器（ＤＡ変換器を含む場合もある）が一体化されたモジュールが普及しており、信号入力部５１の内部にこのようなモジュールが設けられるようにしてもよい。また、信号入力部５１は、記録媒体（例えば、ハードディスク、ＣＤ、半導体メモリなど）から直接デジタル音声信号を読み込む機能を有するものとされるようにしてもよい。 In recent years, a module in which an amplifier and an AD converter (including a DA converter may be integrated) has become widespread, and such a module may be provided inside the signal input unit 51. Good. The signal input unit 51 may have a function of directly reading a digital audio signal from a recording medium (for example, hard disk, CD, semiconductor memory, etc.).

周波数特性補正部１０１は、例えば、信号入力部５１の固有周波数特性Ｆ_ｉｄ(ｎ)を補間するフィルタを有する構成とされる。すなわち、信号入力部５１から供給されたデジタル信号が、信号入力部５１の固有周波数特性に影響されないようにするため、上述したフィルタにより、入力信号から信号入力部５１の固有周波数特性の影響が除去される。なお、周波数特性補正部１０１の処理の詳細については後述する。 For example, the frequency characteristic correction unit 101 includes a filter that interpolates the natural frequency characteristic F _id (n) of the signal input unit 51. That is, in order to prevent the digital signal supplied from the signal input unit 51 from being affected by the natural frequency characteristic of the signal input unit 51, the above-described filter removes the influence of the natural frequency characteristic of the signal input unit 51 from the input signal. Is done. Details of the processing of the frequency characteristic correction unit 101 will be described later.

周波数特性補正部１０１は、信号入力部５１の固有周波数特性の影響が除去された信号を定常性雑音軽減部に供給する。 The frequency characteristic correction unit 101 supplies the signal from which the influence of the natural frequency characteristic of the signal input unit 51 is removed to the stationary noise reduction unit.

定常性雑音軽減部１０２では、定常性雑音のレベルが算出される。ここで、定常性雑音は、デジタル信号に含まれた周波数特徴および振幅特徴が長い時間区間で変化しない雑音を意味する。例えば、雑音検出装置１００、信号入力部５１、または信号処理装置５２の駆動音、会議室内の空調の音などが定常性雑音とされる。 The stationary noise reduction unit 102 calculates the stationary noise level. Here, stationary noise means noise in which frequency characteristics and amplitude characteristics included in a digital signal do not change in a long time interval. For example, the driving noise of the noise detection device 100, the signal input unit 51, or the signal processing device 52, the sound of air conditioning in the conference room, and the like are stationary noise.

定常性雑音軽減部１０２では、計算したレベルの定常性雑音成分を、入力信号から取り除いた後、振幅特徴量計算部１０４および周波数特徴量計算部１０５に供給する。定常性雑音の軽減は、例えば、一般的に用いられているノイズリダクションの方式などが採用されるようにしてもよいし、その他の方式が採用されるようにしてもよい。 The stationary noise reduction unit 102 removes the calculated level of stationary noise component from the input signal, and then supplies it to the amplitude feature amount calculation unit 104 and the frequency feature amount calculation unit 105. For reducing stationary noise, for example, a commonly used noise reduction method may be employed, or other methods may be employed.

振幅特徴量計算部１０４では、定常性雑音軽減部１０２から供給された入力信号から１以上の振幅特徴量を計算し、フレーム統合部１０６へ供給する。この振幅特徴量の詳細については後述する。 The amplitude feature amount calculation unit 104 calculates one or more amplitude feature amounts from the input signal supplied from the stationary noise reduction unit 102 and supplies the calculated amount to the frame integration unit 106. Details of the amplitude feature amount will be described later.

周波数特徴量計算部１０５では、定常性雑音軽減部１０２から供給された入力信号から１以上の周波数特徴量を計算し、フレーム統合部１０６へ供給する。この周波数特徴量の詳細については後述する。 The frequency feature amount calculation unit 105 calculates one or more frequency feature amounts from the input signal supplied from the stationary noise reduction unit 102 and supplies the calculated frequency feature amount to the frame integration unit 106. Details of the frequency feature amount will be described later.

フレーム統合部１０６では、振幅特徴量計算部１０４と周波数特徴量計算部１０５から供給されたフレーム毎に計算された振幅特徴量および周波数特徴量を、所定数フレーム分収集し、１の特徴量集合Ｆ_packとして統合する。なお、統合方式の詳細については後述する。特徴量集合Ｆ_packは、尤度計算部１０７に供給される。 The frame integration unit 106 collects a predetermined number of amplitude feature amounts and frequency feature amounts calculated for each frame supplied from the amplitude feature amount calculation unit 104 and the frequency feature amount calculation unit 105, and collects one feature amount set. Integrate as F_pack. Details of the integration method will be described later. The feature value set F_pack is supplied to the likelihood calculating unit 107.

尤度計算部１０７は、フレーム統合部１０６で統合された特徴量集合Ｆ_packに含まれる特徴量のそれぞれについて、予め定めた閾値との割合を算出する。そして、尤度計算部１０７は、算出した割合に基づいて、特徴量集合Ｆ_packの特徴量毎の雑音尤度を推定し、推定した特徴量毎の雑音尤度の重み付け平均値を入力信号の雑音尤度として算出する。算出された雑音尤度は、雑音検出部１０８に供給される。なお、雑音尤度の算出方式の詳細については後述する。 The likelihood calculation unit 107 calculates a ratio of each feature amount included in the feature amount set F_pack integrated by the frame integration unit 106 to a predetermined threshold value. Then, the likelihood calculating unit 107 estimates the noise likelihood for each feature amount of the feature amount set F_pack based on the calculated ratio, and calculates the weighted average value of the noise likelihood for each estimated feature amount as the noise of the input signal. Calculated as likelihood. The calculated noise likelihood is supplied to the noise detection unit 108. Details of the noise likelihood calculation method will be described later.

雑音検出部１０８では、尤度計算部１０７から供給された入力信号の雑音尤度を予め定めた閾値と比較し、入力信号が非定常性雑音かどうかを判定する。雑音検出部１０８による判定結果は、雑音検出装置１００による最終的な検出結果として信号処理装置５２に出力される。 The noise detection unit 108 compares the noise likelihood of the input signal supplied from the likelihood calculation unit 107 with a predetermined threshold value, and determines whether or not the input signal is nonstationary noise. The determination result by the noise detection unit 108 is output to the signal processing device 52 as the final detection result by the noise detection device 100.

信号処理装置５２では、雑音検出部１０８から出力された検出結果を利用した信号処理を行う。また、信号処理装置５２には、必要に応じて音声信号を記録する記録部が設けられ、例えば、ハードディスク、ＣＤ、半導体メモリなどの記録媒体に音声信号を記録するようになされている。 The signal processing device 52 performs signal processing using the detection result output from the noise detection unit 108. Further, the signal processing device 52 is provided with a recording unit that records an audio signal as necessary, and for example, the audio signal is recorded on a recording medium such as a hard disk, a CD, or a semiconductor memory.

具体的には、信号処理装置５２では、例えば、雑音検出部１０８から出力された検出結果を利用して、入力信号の音声部分だけに適応した録音感度の算出を行う。例えば、雑音を含んだ周囲の音声の中から、雑音を除いた音声を録音するために適した録音感度を算出する。 Specifically, in the signal processing device 52, for example, using the detection result output from the noise detection unit 108, the recording sensitivity adapted to only the audio portion of the input signal is calculated. For example, a recording sensitivity suitable for recording a sound excluding noise from surrounding sounds including noise is calculated.

また、信号処理装置５２では、雑音検出部１０８から出力された検出結果を利用した適応処理を行う。例えば、信号処理装置５２では、検出結果を利用して、雑音を軽減する処理を実行する。 Further, the signal processing device 52 performs adaptive processing using the detection result output from the noise detection unit 108. For example, the signal processing device 52 uses the detection result to execute a process for reducing noise.

あるいはまた、信号処理装置５２では、検出結果を利用して、雑音の種類（咳、くしゃみ、笑い声など）を知り、その雑音の種類から、入力信号の録音環境を推定し、その情報をフィードバックするようにしてもよい。例えば、雑音の種類が咳である場合、録音環境にいる人の健康状況が良くない旨を表す情報をフィードバックし、雑音の種類がくしゃみである場合、その場の空気が清潔ではない旨を表す情報をフィードバックし、雑音の種類が笑い声である場合、発言が面白い旨の情報をフィードバックするなどしてもよい。 Alternatively, the signal processing device 52 knows the type of noise (cough, sneeze, laughter, etc.) using the detection result, estimates the recording environment of the input signal from the type of noise, and feeds back the information. You may do it. For example, if the noise type is cough, information indicating that the health condition of the person in the recording environment is not good is fed back, and if the noise type is sneeze, the air in the place is not clean. When information is fed back and the type of noise is laughter, information indicating that the speech is interesting may be fed back.

次に、周波数特性補正部１０１の処理の詳細について説明する。周波数特性補正部１０１は、信号入力部５１から、フレームｎに対応する入力信号Ｓ（ｎ）を取得する。ここで、入力信号Ｓ（ｎ）は、式（１）のように定義する。 Next, details of the processing of the frequency characteristic correction unit 101 will be described. The frequency characteristic correction unit 101 acquires an input signal S (n) corresponding to the frame n from the signal input unit 51. Here, the input signal S (n) is defined as in Expression (1).

式（１）において、Ｌは、ＡＤ変換におけるサンプリングの結果得られるサンプル値であって、１つのフレームに含まれるサンプル値の数を表すものとし、式（１）により第ｎ番目のフレームに含まれるサンプル値の集合が得られるものとする。 In Expression (1), L is a sample value obtained as a result of sampling in AD conversion, and represents the number of sample values included in one frame, and is included in the nth frame according to Expression (1). A set of sample values to be obtained shall be obtained.

そして、周波数特性補正部１０１は、予め測定して得られた信号入力部５１の固有周波数特性Ｆ_ｉｄ(ｎ)に基づいて、固有周波数特性Ｆ_ｉｄ(ｎ)を補正するフィルタＨ_ｉｄを生成し、入力信号Ｓ（ｎ）を、フィルタＨ_ｉｄによって処理することで、入力信号Ｓ（ｎ）から固有周波数特性Ｆ_ｉｄ(ｎ)を除去するように補正する。 Then, the frequency characteristic correction unit 101 generates a filter H _id for correcting the natural frequency characteristic F _id (n) based on the natural frequency characteristic F _id (n) of the signal input unit 51 obtained by measurement in advance. The input signal S (n) is processed by the filter H _id so as to correct the natural frequency characteristic F _id (n) from the input signal S (n).

図２は、縦軸を音圧、横軸を周波数とし、信号入力部５１の固有周波数特性を表す周波数特性曲線と、理想的な周波数特性である周波数特性線形平均の関係を示す図である。図２に示されるように、周波数特性曲線は、周波数が３kHz,７kHz,１１kHz,１５kHz付近で、それぞれ−６dB,＋１１dB,＋８dB,―１５dBだけ周波数特性線形平均と異なっている。この場合、周波数が３kHz,７kHz,１１kHz,１５kHz付近で、それぞれ＋６dB,―１１dB,−８dB,＋１５dBだけ補正するＨ_ｉｄを生成することにより、入力信号Ｓ（ｎ）から固有周波数特性Ｆ_ｉｄ(ｎ)を除去するように補正することが可能となる。 FIG. 2 is a diagram illustrating a relationship between a frequency characteristic curve representing a natural frequency characteristic of the signal input unit 51 and a frequency characteristic linear average that is an ideal frequency characteristic, with the vertical axis representing sound pressure and the horizontal axis representing frequency. As shown in FIG. 2, the frequency characteristic curve differs from the frequency characteristic linear average by −6 dB, +11 dB, +8 dB, and −15 dB, respectively, when the frequency is around 3 kHz, 7 kHz, 11 kHz, and 15 kHz. In this case, the frequency is 3 kHz, 7 kHz, 11 kHz, in the vicinity of 15 kHz, respectively + 6dB, -11dB, -8dB, + 15dB by only generate _{H id} to correct natural frequency characteristic from the input signal S (n) _{F id} (n ) Can be corrected to be removed.

なお、図２において抽出された周波数である３kHz,７kHz,１１kHz,１５kHz付近は、例えば、音圧が周波数特性線形平均から最も離れており、補正が必要となる周波数として選択された周波数とされる。 Note that, in the vicinity of 3 kHz, 7 kHz, 11 kHz, and 15 kHz, which are the frequencies extracted in FIG. 2, for example, the sound pressure is farthest from the frequency characteristic linear average, and is selected as a frequency that needs to be corrected. .

あるいはまた、周波数特性補正部１０１は、信号入力部５１の固有周波数特性Ｆ_ｉｄ(ｎ)に応じたマッピングテーブルを生成し、後述する振幅特徴量の算出および周波数特徴量の算出の際に、そのマッピングテーブルを振幅特徴量計算部１０４および周波数特徴量計算部１０５に供給するようにしてもよい。例えば、周波数が３kHz,７kHz,１１kHz,１５kHz付近で、それぞれ＋６dB,―１１dB,−８dB,＋１５dBだけ音圧を付加する旨を表す情報をマッピングテーブルとし、振幅特徴量計算部１０４および周波数特徴量計算部１０５に供給する。 Alternatively, the frequency characteristic correction unit 101 generates a mapping table corresponding to the natural frequency characteristic F _id (n) of the signal input unit 51, and when calculating the amplitude feature amount and the frequency feature amount, which will be described later, The mapping table may be supplied to the amplitude feature quantity calculation unit 104 and the frequency feature quantity calculation unit 105. For example, when the frequency is around 3 kHz, 7 kHz, 11 kHz, and 15 kHz, information indicating that sound pressure is added by +6 dB, −11 dB, −8 dB, and +15 dB is used as a mapping table, and the amplitude feature amount calculation unit 104 and the frequency feature amount calculation are performed. To the unit 105.

なお、定常性雑音軽減部１０２においても、周波数特性補正部１０１と同様にマッピングテーブルを作成し、定常性雑音が軽減されるようにしてもよい。 Note that the stationary noise reduction unit 102 may also create a mapping table in the same manner as the frequency characteristic correction unit 101 to reduce the stationary noise.

次に、振幅特徴量の詳細について説明する。 Next, details of the amplitude feature amount will be described.

振幅特徴量計算部１０４では、入力信号Ｓ（ｎ）の振幅特性を解析し、フレームｎの振幅特性を表す振幅特徴量を算出する。ここでは、フレームｎの振幅特徴量として、Ｅ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）を算出するものとする。 The amplitude feature quantity calculation unit 104 analyzes the amplitude characteristic of the input signal S (n) and calculates an amplitude feature quantity representing the amplitude characteristic of the frame n. Here, it is assumed that E ₁ (n), E ₂ (n), and E ₃ (n) are calculated as the amplitude feature quantities of the frame n.

Ｅ_１（ｎ）は、フレームｎに含まれるＬ個のサンプル値のピーク値を表す振幅特徴量であって、式（２）により算出される。 E ₁ (n) is an amplitude feature amount representing the peak value of the L sample values included in the frame n, and is calculated by Expression (2).

Ｅ_２（ｎ）は、フレームｎに含まれるＬ個のサンプル値の平均値を表す振幅特徴量であって、式（３）により算出される。 E ₂ (n) is an amplitude feature amount that represents an average value of L sample values included in the frame n, and is calculated by Expression (3).

Ｅ_３（ｎ）は、フレームｎに含まれるＬ個のサンプル値のＲＭＳ（Root Mean Square）値を表す振幅特徴量であって、式（４）により算出される。 E ₃ (n) is an amplitude feature amount representing an RMS (Root Mean Square) value of L sample values included in the frame n, and is calculated by Expression (4).

なお、式（３）および式（４）においては、サンプル値の線形平均を算出する例を示したが、例えば、サンプル値の対数平均、または、サンプル値の線形平均と対数平均を重み付けして加算することにより得られた値などを用いるようにしてもよい。 In addition, in Formula (3) and Formula (4), the example which calculates the linear average of a sample value was shown, For example, weighting the logarithmic average of a sample value, or the linear average and logarithmic average of a sample value A value obtained by addition may be used.

さらに、Ｅ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）を算出する前に、入力信号Ｓ（ｎ）をハイパスフィルタによって処理し、入力信号に含まれるＤＣ成分のノイズが除去されるようにしてもよい。 Further, before calculating E ₁ (n), E ₂ (n), and E ₃ (n), the input signal S (n) is processed by a high-pass filter to remove DC component noise contained in the input signal. You may be made to do.

なお、上述したＥ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）以外の振幅特徴量が算出されるようにしてもよい。 Note that amplitude feature quantities other than E ₁ (n), E ₂ (n), and E ₃ (n) described above may be calculated.

次に、周波数特徴量の詳細について説明する。 Next, details of the frequency feature amount will be described.

周波数特徴量計算部１０５では、入力信号Ｓ（ｎ）の周波数特性を解析し、フレームｎの周波数特性を表す周波数特徴量を算出する。ここでは、フレームｎの周波数特徴量として、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）を算出するものとする。 The frequency feature amount calculation unit 105 analyzes the frequency characteristic of the input signal S (n) and calculates a frequency feature amount representing the frequency characteristic of the frame n. Here, F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n) are calculated as the frequency feature quantities of the frame n.

Ｆ_１（ｎ）は、入力信号のゼロ交差率を表す特徴量であって、式（５）により算出される。 F ₁ (n) is a feature amount that represents the zero-crossing rate of the input signal, and is calculated by Expression (5).

式（５）におけるsymbol（ｉ）は、式（６）により表される。 Symbol (i) in equation (5) is expressed by equation (6).

Ｆ_２（ｎ）は、入力信号の中で全ての周波数成分の音圧に対する特定の周波数成分の音圧の割合を表す特徴量であって、式（７）により算出される。 F ₂ (n) is a feature amount that represents the ratio of the sound pressure of a specific frequency component to the sound pressure of all frequency components in the input signal, and is calculated by Expression (7).

式（７）におけるＥ_３（ｎ）は、式（４）により算出されるＥ_３（ｎ）とされる。 Equation (7) _E 3 (n) in is _E 3 and (n) calculated by the equation (4).

また、式（７）に示されるＳｉｇ_{ｂｐｆ＿１}（ｉ）、Ｓｉｇ_{ｂｐｆ＿２}（ｉ）、・・・は、式（８）により算出される。 Also, Sig _{bpf_1} (i), Sig _{bpf_2} (i),... _Shown in Expression (7) are calculated by Expression (8).

なお、式（８）におけるＦ_{ｂｐｆ＿ｍ}（ｈ）は、第ｍ番目の周波数成分を抽出するためのフィルタの係数を表すものとする。 Note that F _{bpf_m} (h) in equation (8) represents a filter coefficient for extracting the m-th frequency component.

Ｆ_３（ｎ）は、入力信号の中で特定の周波数成分とは異なる周波数成分の音圧に対する当該特定の周波数成分の音圧の割合を表す特徴量であって、式（９）により算出される。 F ₃ (n) is a feature amount that represents the ratio of the sound pressure of the specific frequency component to the sound pressure of the frequency component different from the specific frequency component in the input signal, and is calculated by Expression (9). The

式（９）に示されるｂｐｆ_{ａ１＿ｒｍｓ}（ｎ）、ｂｐｆ_{ａ２＿ｒｍｓ}（ｎ）、ｂｐｆ_{ｂ１＿ｒｍｓ}（ｎ）、ｂｐｆ_{ｂ２＿ｒｍｓ}（ｎ）、・・・のそれぞれは、式（７）の分子として示されたｂｐｆ１_ｒｍｓ（ｎ）、ｂｐｆ２_ｒｍｓ（ｎ）、・・・と同様にして算出される。ただし、ｂｐｆ_{ａ１＿ｒｍｓ}（ｎ）、ｂｐｆ_{ａ２＿ｒｍｓ}（ｎ）、ｂｐｆ_{ｂ１＿ｒｍｓ}（ｎ）、ｂｐｆ_{ｂ２＿ｒｍｓ}（ｎ）、・・・を算出する場合、それぞれの周波数成分に対応するＦ_{ｂｐｆ＿ｍ}（ｈ）が用いられるものとする。 _{Each of} bpf _{a1_rms} (n), bpf _{a2_rms} (n), bpf _{b1_rms} (n), bpf _{b2_rms} (n), shown in equation (9) is bpf1 _rms shown as a numerator of equation (7) (N), bpf2 _rms (n),... However, when _calculating bpf _{a1_rms} (n), bpf _{a2_rms} (n), bpf _{b1_rms} (n), bpf _{b2_rms} (n),..., F _{bpf_m} (h) corresponding to each frequency component is used. And

Ｆ_４（ｎ）は、入力信号をフーリエ変換して得られた周波数スペクトルのうちの特定の１つの値または複数の値から成る特徴量であって、式（１０）により算出される。 F ₄ (n) is a feature quantity composed of a specific value or a plurality of values in the frequency spectrum obtained by Fourier transform of the input signal, and is calculated by Expression (10).

なお、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）を算出する前に、入力信号Ｓ（ｎ）をハイパスフィルタによって処理し、入力信号に含まれるＤＣ成分のノイズが除去されるようにしてもよい。 Before calculating F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n), the input signal S (n) is processed by a high-pass filter and included in the input signal. DC component noise may be removed.

ここでは、振幅特徴量計算部１０４がＥ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）を算出し、周波数特徴量計算部１０５がＦ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）を算出すると説明した。しかし、振幅特徴量計算部１０４がＥ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）のうち、いずれか１つまたは２つを算出し、周波数特徴量計算部１０５がＦ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）のうち、いずれか１つ乃至３つを算出するようにしてもよい。 Here, the amplitude feature quantity calculation unit 104 calculates E ₁ (n), E ₂ (n), and E ₃ (n), and the frequency feature quantity calculation unit 105 calculates F ₁ (n), F ₂ (n). , F ₃ (n), and F ₄ (n) are calculated. However, the amplitude feature amount calculation unit 104 calculates one or two of E ₁ (n), E ₂ (n), and E ₃ (n), and the frequency feature amount calculation unit 105 calculates F _1. Any one to three of (n), F ₂ (n), F ₃ (n), and F ₄ (n) may be calculated.

なお、上述したＦ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）以外の周波数特徴量が算出されるようにしてもよい。 Note that frequency feature quantities other than F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n) described above may be calculated.

次に、フレーム統合部１０６による統合方式の詳細について説明する。 Next, details of the integration method by the frame integration unit 106 will be described.

図３は、フレーム統合部１０６の詳細な構成例を示す図である。同図に示されるように、フレーム統合部１０６は、特徴保持部１２１、統合対象判定部１２２、重み計算部１２３、および統合部１２４により構成されている。 FIG. 3 is a diagram illustrating a detailed configuration example of the frame integration unit 106. As shown in the figure, the frame integration unit 106 includes a feature holding unit 121, an integration target determination unit 122, a weight calculation unit 123, and an integration unit 124.

特徴保持部１２１は、振幅特徴量計算部１０４から供給される振幅特徴量および周波数特徴量計算部１０５から供給される周波数特徴量を、過去の所定数のフレーム分（例えば、ａフレーム分）だけ保持する。 The feature holding unit 121 applies the amplitude feature amount supplied from the amplitude feature amount calculation unit 104 and the frequency feature amount supplied from the frequency feature amount calculation unit 105 to the past predetermined number of frames (for example, a frame). Hold.

統合対象判定部１２２は、特徴保持部１２１に保持された振幅特徴量または周波数特徴量を用いて統合対象となるフレームを次のようにして判定する。 The integration target determination unit 122 determines the frame to be integrated using the amplitude feature quantity or the frequency feature quantity held in the feature holding unit 121 as follows.

統合対象判定部１２２では、特徴保持部１２１に保持されている振幅特徴量または周波数特徴量のうちいずれか１つの特徴量Ｆ_ｄを用いて、この特徴量のフレーム間の特徴量の変化を表す特徴変化量Ｆ_ｄ_diffを算出する。 The integration target determination unit 122 represents a change in the feature value between frames of the feature value using any one feature value F _d of the amplitude feature value or the frequency feature value held in the feature holding unit 121. A feature change amount F _d _diff is calculated.

例えば、特徴保持部１２１に、Ｅ_１（ｎ）、Ｅ_２（ｎ）、Ｅ_３（ｎ）、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）が保持されている場合、Ｅ_３（ｎ）を用いて、ｉ−１番目のフレームの振幅特徴量Ｅ_３（ｉ−１）と、ｉ番目のフレームの振幅特徴量Ｅ_３（ｉ）の変化を表す特徴変化量Ｆ_ｄ_diffを算出する。 For example, the feature holding unit 121 may include E ₁ (n), E ₂ (n), E ₃ (n), F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n). Is maintained, E ₃ (n) is used to change the amplitude feature quantity E ₃ (i−1) of the (i−1) th frame and the amplitude feature quantity E ₃ (i) of the i th frame. The feature change amount F _d _diff representing is calculated.

特徴変化量Ｆ_ｄ_diffは、式（１１）により算出される。 The feature change amount F _d _diff is calculated by Expression (11).

統合対象判定部１２２は、特徴保持部１２１に保持されている全フレーム分の特徴量を用いて各フレーム間の特徴変化量を順次算出する。そして、算出された特徴変化量をそれぞれ予め設定された閾値Ｆ_ｄ_diff_thと比較する。過去のフレームにおいて、最初に特徴変化量Ｆ_ｄ_diffが閾値Ｆ_ｄ_diff_thを超えたフレームを統合対象開始フレームとし、統合対象開始フレームから現在のフレームｎまでのフレーム（例えば、ｂフレーム）の振幅特徴量と周波数特徴量を統合対象として判定する。この判定結果は、重み計算部１６３に供給される。 The integration target determination unit 122 sequentially calculates the feature change amount between the frames using the feature amounts for all the frames held in the feature holding unit 121. Then, the calculated feature change amount is compared with a preset threshold value F _d _diff_th. In a past frame, a frame in which the feature change amount F _d _diff first exceeds the threshold value F _d _diff_th is set as an integration target start frame, and amplitude characteristics of frames (for example, b frame) from the integration target start frame to the current frame n The quantity and the frequency feature quantity are determined as integration targets. The determination result is supplied to the weight calculation unit 163.

図４を参照してさらに詳細に説明する。図４は、横軸がフレームとされ、図中上から順番に、入力信号の波形、入力信号から算出された振幅特徴量の変化を示す波形、および振幅特徴量に基づいて算出された特徴変化量の変化を示す波形がそれぞれ示されている。図４の場合、例えば、会議の音声の中に咳の音が混入しているものとする。 This will be described in more detail with reference to FIG. In FIG. 4, the horizontal axis is a frame, and in order from the top in the figure, the waveform of the input signal, the waveform indicating the change in the amplitude feature amount calculated from the input signal, and the feature change calculated based on the amplitude feature amount Each of the waveforms showing the change in quantity is shown. In the case of FIG. 4, for example, it is assumed that a coughing sound is mixed in the audio of the meeting.

いま、現在のフレームが第４６０番目のフレームとされ、特徴保持部１２１には、第４４１番目のフレーム乃至第４６０番目のフレームの２０フレーム分の振幅特徴量と周波数特徴量が保持されているものとする。 Now, the current frame is the 460th frame, and the feature holding unit 121 holds amplitude feature amounts and frequency feature amounts for 20 frames from the 441th frame to the 460th frame. And

図４の例では、２０フレーム分の振幅特徴量の中で、第４５２番目のフレームに対応する特徴変化量が最初に閾値Ｆ_ｄ_diff_th（＝１．２）を超えている。従って、第４５２番目のフレームが統合対象開始フレームとされ、第４６０番目のフレームまでの９フレームが統合対象とされることになる。 In the example of FIG. 4, the feature change amount corresponding to the 452nd frame among the amplitude feature amounts for 20 frames first exceeds the threshold value F _d _diff_th (= 1.2). Therefore, the 452nd frame is the integration target start frame, and the nine frames up to the 460th frame are the integration target.

このようにして統合対象となるフレームが判定される。 In this way, a frame to be integrated is determined.

重み計算部１６３は、特徴保持部１２１に保持されている特徴量のうちの１つの特徴量Ｆ_ｗを用いて、現在のフレームの特徴量Ｆ_ｗと統合対象となる他のフレームの特徴量Ｆ_ｗとの差または比に基づいて重みを計算する。第ｉ番目のフレームの重みＷ（ｉ）は、式（１２）または式（１３）により計算される。 Weight calculator 163 uses the one feature F _w of the feature amounts stored in the feature storage 121, the feature amount F of other frames as a feature amount F _w of the current frame and the integration target _The weight is calculated based on the difference or ratio with _w . The weight W (i) of the i-th frame is calculated by Expression (12) or Expression (13).

なお、式（１２）は、現在のフレームの特徴量Ｆ_ｗと統合対象となる他のフレームの特徴量Ｆ_ｗとの差に基づいて重みを計算する場合の式を示しており、式（１３）は、現在のフレームの特徴量Ｆ_ｗと統合対象となる他のフレームの特徴量Ｆ_ｗとの比に基づいて重みを計算する場合の式を示している。 Note that Equation (12) shows the expression in the case of calculating the weight based on a difference between the feature amount F _w of other frames as a feature amount F _w of the current frame and integration target, the formula (13 ) shows a formula when calculating the weight based on the ratio between the characteristic amount F _w of other frames as a feature amount F _w of the current frame and integration target.

なお、重み計算部１６３が用いる特徴量Ｆ_ｗは、統合対象判定部１２２が用いる特徴量Ｆ_ｄと同じであってもよいし、異なってもよい。 Note that the feature value F _w used by the weight calculation unit 163 may be the same as or different from the feature value F _d used by the integration target determination unit 122.

重み計算部１６３で計算された重みは、統合部１２４に供給される。 The weight calculated by the weight calculation unit 163 is supplied to the integration unit 124.

統合部１２４は、重み計算部１６３から供給された重みを用いて振幅特徴量の重み付け平均値Ｅｓ（ｎ）を式（１４）により計算する。 The integration unit 124 uses the weight supplied from the weight calculation unit 163 to calculate the weighted average value Es (n) of the amplitude feature amount using Expression (14).

式（１４）において、ｎは現在のフレームを表しており、ｂは統合対象となったフレーム数を表している。また、上述したように、複数の振幅特徴量（例えば、Ｅ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ））が特徴保持部１２１に保持されている場合、式（１４）におけるＥ（ｎ）を、Ｅ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）のそれぞれとし、振幅特徴量の、重み付け平均値Ｅｓ_１（ｎ）乃至重み付け平均値Ｅｓ_３（ｎ）がそれぞれ算出される。 In Expression (14), n represents the current frame, and b represents the number of frames targeted for integration. In addition, as described above, when a plurality of amplitude feature quantities (for example, E ₁ (n), E ₂ (n), and E ₃ (n)) are held in the feature holding unit 121, Expression (14) E (n) in E _{2 is} E ₁ (n), E ₂ (n), and E ₃ (n), respectively, and the weighted average value Es ₁ (n) to weighted average value Es ₃ (n ) Are respectively calculated.

また、統合部１２４は、重み計算部１６３から供給された重みを用いて周波数特徴量の重み付け平均値Ｆｓ（ｎ）を式（１５）により計算する。 Further, the integration unit 124 calculates the weighted average value Fs (n) of the frequency feature amount by using the weight supplied from the weight calculation unit 163 according to Expression (15).

式（１５）において、ｎは現在のフレームを表しており、ｂは統合対象となったフレーム数を表している。また、上述したように、複数の周波数特徴量（例えば、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ））が特徴保持部１２１に保持されている場合、式（１４）におけるＦ（ｎ）を、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）のそれぞれとし、周波数特徴量の重み付け平均値Ｆｓ_１（ｎ）乃至Ｆｓ_４（ｎ）がそれぞれ算出される。 In equation (15), n represents the current frame, and b represents the number of frames targeted for integration. Further, as described above, a plurality of frequency feature quantities (for example, F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n)) are held in the feature holding unit 121. In this case, F (n) in Expression (14) is set to F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n), respectively, and the weighted average value Fs ₁ of the frequency feature amount (N) to Fs ₄ (n) are respectively calculated.

そして、統合部１２４は、振幅特徴量の重み付け平均値Ｅｓ（ｎ）および周波数特徴量の重み付け平均値Ｆｓ（ｎ）の集合を特徴量集合Ｆ_packとして尤度計算部１０７に供給する。 Then, the integrating unit 124 supplies a set of the weighted average value Es (n) of the amplitude feature quantity and the weighted average value Fs (n) of the frequency feature quantity to the likelihood calculating unit 107 as a feature quantity set F_pack.

なお、フレーム統合部１０６に、重み計算部１６３が含まれないようにし、統合部１２４では、統合対象判定部１２２で判定した統合対象のフレームの振幅特徴量と周波数特徴量の単純平均のセットを統合して特徴量集合Ｆ_packを生成するようにしてもよい。 The frame integration unit 106 does not include the weight calculation unit 163, and the integration unit 124 sets a simple average set of the amplitude feature amount and the frequency feature amount of the integration target frame determined by the integration target determination unit 122. The feature amount set F_pack may be generated by integration.

また、フレーム統合部１０６に、統合対象判定部１６２が含まれないようにし、重み計算部１２３では、特徴保持部１２１で保持した全フレームの重みを計算し、統合部１２４では、全フレームの振幅特徴量と周波数特徴量の重み付け平均の集合を統合した特徴量集合Ｆ_packを生成するようにしてもよい。 Further, the integration unit determining unit 162 is not included in the frame integration unit 106, the weight calculation unit 123 calculates the weights of all frames held by the feature holding unit 121, and the integration unit 124 calculates the amplitudes of all frames. You may make it produce | generate the feature-value set F_pack which integrated the set of the weighted average of a feature-value and a frequency feature-value.

さらに、フレーム統合部１０６に、統合対象判定部１６２および重み計算部１６３が含まれないようにし、統合部１２４では、特徴保持部１２１で保持した全フレームの振幅特徴量と周波数特徴量の単純な平均値のセットを特徴量集合Ｆ_packとして生成するようにしてもよい。 Furthermore, the integration unit determination unit 162 and the weight calculation unit 163 are not included in the frame integration unit 106, and the integration unit 124 simply calculates the amplitude feature amounts and frequency feature amounts of all frames held by the feature holding unit 121. A set of average values may be generated as a feature value set F_pack.

尤度計算部１０７は、フレーム統合部１０６で統合された特徴量集合Ｆ_packに含まれる特徴量のそれぞれについて、予め定めた閾値との割合を算出する。 The likelihood calculation unit 107 calculates a ratio of each feature amount included in the feature amount set F_pack integrated by the frame integration unit 106 to a predetermined threshold value.

例えば、振幅特徴量に対応する閾値Ｅ_thと周波数特徴量に対応する閾値Ｆ_thが予め定められている。 For example, a threshold value E_th corresponding to the amplitude feature value and a threshold value F_th corresponding to the frequency feature value are determined in advance.

尤度計算部１０７は、特徴量集合Ｆ_packに含まれる振幅特徴量の重み付け平均値についての閾値Ｅ_thの割合Ｒ_Ｅ（ｎ）を式（１６）により計算する。 The likelihood calculating unit 107 calculates the ratio R _E (n) of the threshold value E_th with respect to the weighted average value of the amplitude feature amount included in the feature amount set F_pack, using Expression (16).

また、尤度計算部１０７は、特徴量集合Ｆ_packに含まれる周波数特徴量の重み付け平均値についての閾値Ｆ_thの割合Ｒ_Ｆ（ｎ）を式（１７）により計算する。 In addition, the likelihood calculating unit 107 calculates the ratio R _F (n) of the threshold value F_th with respect to the weighted average value of the frequency feature amounts included in the feature amount set F_pack, using Expression (17).

そして、尤度計算部１０７は、割合Ｒ_Ｅ（ｎ）と割合Ｒ_Ｆ（ｎ）のそれぞれに、予め定められた重みＡ_Ｅと重みＡ_Ｆを乗じて、重み付け加算値を算出する。この重み付け加算値は、式（１８）により算出され、入力信号の第ｎ番目のフレームに対応する雑音尤度Ｒ（ｎ）として雑音検出部１０８に供給される。 The likelihood calculating unit 107 calculates a weighted addition value by multiplying each of the ratio R _E (n) and the ratio R _F (n) by a predetermined weight A _E and weight A _F. This weighted addition value is calculated by Expression (18), and is supplied to the noise detection unit 108 as the noise likelihood R (n) corresponding to the nth frame of the input signal.

雑音検出部１０８は、尤度計算部１０７から供給された入力信号の雑音尤度を予め定めた閾値と比較し、入力信号の第ｎ番目のフレームが非定常性雑音のフレームであるか否かを判定する。例えば、非定常性雑音を判定するための雑音尤度閾値Ｒ_thが予め定められており、雑音尤度Ｒ（ｎ）が雑音尤度閾値Ｒ_thより大きい場合、入力信号の第ｎ番目のフレームが非定常性雑音のフレームであると判定する。一方、雑音尤度Ｒ（ｎ）が雑音尤度閾値Ｒ_th以下である場合、入力信号の第ｎ番目のフレームが非定常性雑音のフレームではないと判定する。 The noise detection unit 108 compares the noise likelihood of the input signal supplied from the likelihood calculation unit 107 with a predetermined threshold value, and determines whether or not the nth frame of the input signal is a frame of nonstationary noise. Determine. For example, when the noise likelihood threshold R_th for determining non-stationary noise is determined in advance and the noise likelihood R (n) is larger than the noise likelihood threshold R_th, the nth frame of the input signal is not non-stationary. It is determined that the frame is stationary noise. On the other hand, when the noise likelihood R (n) is equal to or less than the noise likelihood threshold R_th, it is determined that the nth frame of the input signal is not a frame of nonstationary noise.

このようにして、非定常性雑音が検出される。本技術では、上述したように、少なくとも１つの振幅特徴量、および、少なくとも１つの周波数特徴量を用いて非定常性雑音であるか否かの判定が行われるようにしたので、非定常性雑音をより精度高く検出することができる。 In this way, non-stationary noise is detected. In the present technology, as described above, since it is determined whether or not it is non-stationary noise using at least one amplitude feature quantity and at least one frequency feature quantity, non-stationary noise is determined. Can be detected with higher accuracy.

また、フレーム統合部１０６において、統合対象のフレームが特定されるので、特徴量集合Ｆ_packに含まれる特徴量の計算の負荷を軽減することができる。これにより、例えば、小型の省電力機器などにも、雑音検出装置１００を搭載することが可能となる。 In addition, since the frame to be integrated is specified in the frame integration unit 106, it is possible to reduce the calculation load of the feature amount included in the feature amount set F_pack. Thereby, for example, the noise detection apparatus 100 can be mounted on a small power-saving device.

さらに、雑音尤度閾値を、咳を検出するための専用の雑音尤度閾値とすることで、咳のみを非定常性雑音として判定することができ、拍手を検出するための専用の雑音尤度閾値とすることで、拍手のみを非定常性雑音として判定することができる。このように、本技術では、雑音尤度閾値を適切に設定することにより、非定常性雑音の種類を特定することも可能となる。 Furthermore, by setting the noise likelihood threshold as a dedicated noise likelihood threshold for detecting cough, it is possible to determine only cough as non-stationary noise, and a dedicated noise likelihood for detecting applause. By setting the threshold value, only applause can be determined as non-stationary noise. Thus, in the present technology, it is possible to specify the type of non-stationary noise by appropriately setting the noise likelihood threshold.

上述した例では、尤度計算部１０７が、予め設定された振幅特徴量に対応する閾値Ｅ_thと周波数特徴量に対応する閾値Ｆ_thとに基づく閾値比較を行い、式（１６）乃至式（１８）の計算を行って雑音尤度を計算するものとした。 In the example described above, the likelihood calculating unit 107 performs threshold comparison based on the threshold E_th corresponding to the preset amplitude feature quantity and the threshold F_th corresponding to the frequency feature quantity, and Expressions (16) to (18) are performed. The noise likelihood was calculated by performing the above calculation.

しかしながら、例えば、尤度計算部１０７が、予め学習した識別モデルＭを用いて特徴量集合Ｆ_packから雑音尤度を計算するようにしてもよい。この場合、識別モデルＭとして、例えば、ガウス混合モデル（ＧＭＭ）、隠れマルコフモデル（ＨＭＭ）、サポートベクターマシン（ＳＶＭ）などを採用することができる。 However, for example, the likelihood calculating unit 107 may calculate the noise likelihood from the feature amount set F_pack using the identification model M learned in advance. In this case, for example, a Gaussian mixture model (GMM), a hidden Markov model (HMM), a support vector machine (SVM), or the like can be adopted as the identification model M.

すなわち、特徴量集合Ｆ_packに含まれる振幅特徴量の重み付け平均値および周波数特徴量の重み付け平均値のうち、一部または全部を用いて特徴ベクトル空間が生成される。そして、尤度計算部１０７が、前記特徴ベクトル空間において予め学習した識別モデルに基づいて、特徴量集合Ｆ_packに対応する特徴ベクトルから、当該フレームが非定常性雑音のフレームであることの確からしさを表す雑音尤度を算出する。 That is, a feature vector space is generated using part or all of the weighted average value of amplitude feature values and the weighted average value of frequency feature values included in the feature value set F_pack. Based on the identification model learned in advance in the feature vector space, the likelihood calculation unit 107 determines the certainty that the frame is a frame of nonstationary noise from the feature vector corresponding to the feature amount set F_pack. The noise likelihood to represent is calculated.

なお、これらの識別モデルを用いた尤度の算出方式については従来より一般に採用されているものと同様である。 Note that the likelihood calculation method using these identification models is the same as that generally employed conventionally.

次に、図５のフローチャートを参照して、雑音検出装置１００による雑音検出処理の例について説明する。 Next, an example of noise detection processing by the noise detection apparatus 100 will be described with reference to the flowchart of FIG.

ステップＳ２１において、周波数特性補正部１０１は、信号入力部５１から出力される入力信号Ｓ（ｎ）を取得する。 In step S <b> 21, the frequency characteristic correction unit 101 acquires the input signal S (n) output from the signal input unit 51.

ステップＳ２２において、周波数特性補正部１０１は、信号入力部５１の固有周波数特性Ｆ_ｉｄ(ｎ)を補正する。このとき、例えば、図２を参照して上述したような固有周波数特性が補正され、入力信号から信号入力部５１の固有周波数特性の影響が除去される。 In step S < _b > 22, the frequency characteristic correction unit 101 corrects the natural frequency characteristic F _id (n) of the signal input unit 51. At this time, for example, the natural frequency characteristic as described above with reference to FIG. 2 is corrected, and the influence of the natural frequency characteristic of the signal input unit 51 is removed from the input signal.

ステップＳ２３において、定常性雑音軽減部１０２は、定常性雑音を除去する。これにより、例えば、雑音検出装置１００、信号入力部５１、または信号処理装置５２の駆動音、会議室内の空調の音などが除去される。 In step S23, the stationary noise reduction unit 102 removes stationary noise. Thereby, for example, the driving sound of the noise detection device 100, the signal input unit 51, or the signal processing device 52, the sound of air conditioning in the conference room, and the like are removed.

ステップＳ２４において、振幅特徴量計算部１０４は、定常性雑音軽減部１０２から供給された入力信号から振幅特徴量を計算する。このとき、フレームｎの振幅特徴量として、上述したＥ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）の少なくとも１つが算出される。 In step S <b> 24, the amplitude feature amount calculation unit 104 calculates the amplitude feature amount from the input signal supplied from the stationary noise reduction unit 102. At this time, at least one of E ₁ (n), E ₂ (n), and E ₃ (n) described above is calculated as the amplitude feature quantity of frame n.

ステップＳ２５において、周波数特徴量計算部１０５は、定常性雑音軽減部１０２から供給された入力信号から周波数特徴量を計算する。このとき、フレームｎの周波数特徴量として、上述したＦ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）の少なくとも１つが算出される。 In step S <b> 25, the frequency feature amount calculation unit 105 calculates a frequency feature amount from the input signal supplied from the stationary noise reduction unit 102. At this time, at least one of the above-described F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n) is calculated as the frequency feature amount of the frame n.

ステップＳ２６において、フレーム統合部１０６は、図６を参照して後述する統合処理を実行する。これにより、ステップＳ２４の処理で計算された振幅特徴量、および、ステップＳ２５の処理で計算された周波数特徴量が、所定数フレーム分統合され、振幅特徴量の重み付け平均値Ｅｓ（ｎ）および周波数特徴量の重み付け平均値Ｆｓ（ｎ）が算出される。そして、振幅特徴量の重み付け平均値Ｅｓ（ｎ）および周波数特徴量の重み付け平均値Ｆｓ（ｎ）の集合が特徴量集合Ｆ_packとして出力される。 In step S26, the frame integration unit 106 executes integration processing to be described later with reference to FIG. As a result, the amplitude feature quantity calculated in the process of step S24 and the frequency feature quantity calculated in the process of step S25 are integrated for a predetermined number of frames, and the weighted average value Es (n) and the frequency of the amplitude feature quantity are integrated. A weighted average value Fs (n) of feature amounts is calculated. A set of the weighted average value Es (n) of the amplitude feature quantity and the weighted average value Fs (n) of the frequency feature quantity is output as a feature quantity set F_pack.

ステップＳ２７において、尤度計算部１０７は、入力信号の雑音尤度を計算する。このとき、上述したように、特徴量集合Ｆ_packに含まれる特徴量のそれぞれについて、振幅特徴量に対応する閾値Ｅ_thと周波数特徴量に対応する閾値Ｆ_thとの割合が算出される。そして、割合Ｒ_Ｅ（ｎ）と割合Ｒ_Ｆ（ｎ）のそれぞれに、予め定められた重みＡ_Ｅと重みＡ_Ｆを乗じて、重み付け加算値が算出され、入力信号の第ｎ番目のフレームに対応する雑音尤度Ｒ（ｎ）とされる。 In step S27, the likelihood calculation unit 107 calculates the noise likelihood of the input signal. At this time, as described above, the ratio of the threshold value E_th corresponding to the amplitude feature value and the threshold value F_th corresponding to the frequency feature value is calculated for each feature value included in the feature value set F_pack. Then, each of the ratio R _E (n) and the ratio R _F (n) is multiplied by a predetermined weight A _E and weight A _F to calculate a weighted addition value, and the nth frame of the input signal is calculated. The corresponding noise likelihood is R (n).

ステップＳ２８において、雑音検出部１０８は、雑音尤度Ｒ（ｎ）が雑音尤度閾値Ｒ_thより大きいか否かを判定する。 In step S28, the noise detection unit 108 determines whether or not the noise likelihood R (n) is larger than the noise likelihood threshold R_th.

ステップＳ２８において、雑音尤度Ｒ（ｎ）が雑音尤度閾値Ｒ_thより大きいと判定された場合、処理は、ステップＳ２９に進む。 In Step S28, when it is determined that the noise likelihood R (n) is larger than the noise likelihood threshold R_th, the process proceeds to Step S29.

ステップＳ２９において、雑音検出部１０８は、入力信号の第ｎ番目のフレームが非定常性雑音のフレームであると判定する。 In step S29, the noise detection unit 108 determines that the nth frame of the input signal is a frame of nonstationary noise.

一方、ステップＳ２８において、雑音尤度Ｒ（ｎ）が雑音尤度閾値Ｒ_thより大きくないと判定された場合、処理は、ステップＳ３０に進む。 On the other hand, when it is determined in step S28 that the noise likelihood R (n) is not larger than the noise likelihood threshold R_th, the process proceeds to step S30.

ステップＳ３０において、雑音検出部１０８は、入力信号の第ｎ番目のフレームが非定常性雑音のフレームではないと判定する。 In step S30, the noise detection unit 108 determines that the nth frame of the input signal is not a non-stationary noise frame.

このようにして雑音検出処理が実行される。 In this way, the noise detection process is executed.

次に、図６のフローチャートを参照して、図５のステップＳ２６の統合処理の詳細な例について説明する。 Next, a detailed example of the integration process in step S26 in FIG. 5 will be described with reference to the flowchart in FIG.

ステップＳ５１において、統合対象判定部１２２は、特徴保持部１２１に保持されている振幅特徴量と周波数特徴量を取得する。 In step S <b> 51, the integration target determination unit 122 acquires the amplitude feature quantity and the frequency feature quantity held in the feature holding unit 121.

ステップＳ５２において、統合対象判定部１２２は、ステップＳ５１で取得した振幅特徴量または周波数特徴量のうちいずれか１つの特徴量Ｆ_ｄを用いて、この特徴量のフレーム間の特徴量の変化を表す特徴変化量Ｆ_ｄ_diffを算出する。なお、特徴変化量Ｆ_ｄ_diffは、特徴保持部１２１に保持されている振幅特徴量と周波数特徴量に対応する全フレーム分算出される。 In step S52, the integration target determination unit 122, using any one of the feature amount F _d of the amplitude characteristic quantity or frequency feature amount acquired in step S51, indicating a change in the characteristic amount between the feature quantity of the frame A feature change amount F _d _diff is calculated. The feature change amount F _d _diff is calculated for all the frames corresponding to the amplitude feature amount and the frequency feature amount held in the feature holding unit 121.

例えば、特徴保持部１２１に、Ｅ_１（ｎ）、Ｅ_２（ｎ）、Ｅ_３（ｎ）、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）が保持されている場合、Ｅ_３（ｎ）を用いて、ｉ−１番目のフレームの振幅特徴量Ｅ_３（ｉ−１）と、ｉ番目のフレームの振幅特徴量Ｅ_３（ｉ）の変化を表す特徴変化量Ｆ_ｄ_diff（ｉ）が算出される。 For example, the feature holding unit 121 may include E ₁ (n), E ₂ (n), E ₃ (n), F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n). Is maintained, E ₃ (n) is used to change the amplitude feature quantity E ₃ (i−1) of the (i−1) th frame and the amplitude feature quantity E ₃ (i) of the i th frame. A feature change amount F _d _diff (i) representing is calculated.

ステップＳ５３において、統合対象判定部１２２は、変数ｉに現在のフレームを表す番号ｎをセットする。 In step S53, the integration target determination unit 122 sets a number n representing the current frame to the variable i.

ステップＳ５４において、統合対象判定部１２２は、特徴変化量Ｆ_ｄ_diff（ｉ）を予め設定された閾値Ｆ_ｄ_diff_thと比較し、特徴変化量Ｆ_ｄ_diff（ｉ）が閾値Ｆ_ｄ_diff_thを超えたか否かを判定する。 In step S54, the integration target determining unit 122 compares the feature change amount F _d _diff (i) with a preset threshold value F _d _diff_th, and whether the feature change amount F _d _diff (i) exceeds the threshold value F _d _diff_th. Determine whether or not.

ステップＳ５４において、特徴変化量Ｆ_ｄ_diff（ｉ）が閾値Ｆ_ｄ_diff_thを超えていないと判定された場合、処理は、ステップＳ５５に進む。 If it is determined in step S54 that the feature change amount F _d _diff (i) does not exceed the threshold value F _d _diff_th, the process proceeds to step S55.

ステップＳ５５において、変数ｉがデクリメントされ、処理は、ステップＳ５４に戻る。 In step S55, the variable i is decremented, and the process returns to step S54.

一方、ステップＳ５４において、特徴変化量Ｆ_ｄ_diff（ｉ）が閾値Ｆ_ｄ_diff_thを超えたと判定された場合、処理は、ステップＳ５６に進む。 On the other hand, when it is determined in step S54 that the feature change amount F _d _diff (i) exceeds the threshold value F _d _diff_th, the process proceeds to step S56.

ステップＳ５６において、統合対象判定部１２２は、ｉ番目のフレーム（ｉフレーム）からｎ番目のフレーム（ｎフレーム）までを統合対象として判定する。いまの場合、ｉフレームが統合対象開始フレームとされたことになる。 In step S56, the integration target determination unit 122 determines the i-th frame (i frame) to the n-th frame (n frame) as integration targets. In this case, the i frame is the integration target start frame.

ステップＳ５７において、重み計算部１６３は、特徴保持部１２１に保持されている特徴量のうちの１つの特徴量Ｆ_ｗを用い、現在のフレームの特徴量Ｆ_ｗと統合対象となる他のフレームの特徴量Ｆ_ｗとの差または比に基づいて重みを計算する。なお、重み計算部１６３が用いる特徴量Ｆ_ｗは、統合対象判定部１２２が用いる特徴量Ｆ_ｄと同じであってもよいし、異なってもよい。 In step S57, the weight calculation section 163, using a single feature value F _w of the feature amounts stored in the feature storage 121, the current frame feature value F _w integrated subject to other frames calculating a weighting based on the difference or ratio between the characteristic amount F _w. Note that the feature value F _w used by the weight calculation unit 163 may be the same as or different from the feature value F _d used by the integration target determination unit 122.

ステップＳ５８において、統合部１２４は、ステップＳ５７の処理で計算された重みを用いて振幅特徴量の重み付け平均値Ｅｓ（ｎ）および周波数特徴量の重み付け平均値Ｆｓ（ｎ）を計算する。 In step S58, the integration unit 124 calculates the weighted average value Es (n) of the amplitude feature quantity and the weighted average value Fs (n) of the frequency feature quantity using the weight calculated in the process of step S57.

ステップＳ５９において、統合部１２４は、振幅特徴量の重み付け平均値Ｅｓ（ｎ）および周波数特徴量の重み付け平均値Ｆｓ（ｎ）の集合を特徴量集合Ｆ_packとして生成する。 In step S59, the integration unit 124 generates a set of the weighted average value Es (n) of the amplitude feature quantity and the weighted average value Fs (n) of the frequency feature quantity as the feature quantity set F_pack.

このようにして、統合処理が実行される。 In this way, the integration process is executed.

図７は、本技術を適用した雑音検出装置１００の別の実施の形態に係る構成例を示すブロック図である。図７の構成における雑音検出装置１００には、図１の場合とは異なり、特徴量選択部１０３が設けられている。図７の雑音検出装置１００のそれ以外の構成は、図１の場合と同様である。 FIG. 7 is a block diagram illustrating a configuration example according to another embodiment of the noise detection apparatus 100 to which the present technology is applied. Unlike the case of FIG. 1, the noise detection device 100 in the configuration of FIG. 7 includes a feature amount selection unit 103. The other configuration of the noise detection apparatus 100 of FIG. 7 is the same as that of FIG.

特徴量選択部１０３は、定常性雑音軽減部１０２の処理を経て出力される入力信号に基づいて、振幅特徴量計算部１０４が計算すべき振幅特徴量、および、周波数特徴量計算部１０５が計算すべき周波数特徴量を特定する。これにより、振幅特徴量計算部１０４および周波数特徴量計算部１０５の計算負荷を軽減することができる。 The feature quantity selection unit 103 calculates the amplitude feature quantity to be calculated by the amplitude feature quantity calculation unit 104 and the frequency feature quantity calculation unit 105 based on the input signal output through the processing of the stationary noise reduction unit 102. Specify the frequency feature to be used. Thereby, the calculation load of the amplitude feature quantity calculation unit 104 and the frequency feature quantity calculation unit 105 can be reduced.

図８は、特徴量選択部１０３の詳細な構成例を示すブロック図である。同図に示されるように、特徴量選択部１０３は、特徴計算部１３１、特徴判定部１３２、および、選択情報出力部１３３により構成されている。 FIG. 8 is a block diagram illustrating a detailed configuration example of the feature amount selection unit 103. As shown in the figure, the feature amount selection unit 103 includes a feature calculation unit 131, a feature determination unit 132, and a selection information output unit 133.

特徴計算部１３１は、入力信号の特徴量を計算し、特徴判定部１３２に供給する。特徴計算部１３１により計算される特徴量は、例えば、上述した上述した振幅特徴量である、Ｅ_１（ｎ）、Ｅ_２（ｎ）、およびＥ_３（ｎ）、または上述した周波数特徴量である、Ｆ_１（ｎ）、Ｆ_２（ｎ）、Ｆ_３（ｎ）、およびＦ_４（ｎ）の中の１つとされる。 The feature calculation unit 131 calculates the feature amount of the input signal and supplies it to the feature determination unit 132. The feature amount calculated by the feature calculation unit 131 is, for example, E ₁ (n), E ₂ (n), and E ₃ (n), which are the above-described amplitude feature amounts, or the above-described frequency feature amount. One of F ₁ (n), F ₂ (n), F ₃ (n), and F ₄ (n).

特徴判定部１３２では、特徴計算部１３１から供給された特徴量を閾値と比較し、その結果から、当該フレームの入力信号の特徴タイプを判定し、その特徴タイプを選択情報出力部１３３に供給する。 The feature determination unit 132 compares the feature amount supplied from the feature calculation unit 131 with a threshold, determines the feature type of the input signal of the frame from the result, and supplies the feature type to the selection information output unit 133. .

選択情報出力部１３３では、特徴判定部１３２から供給された特徴タイプを用いて、それぞれの特徴タイプに対応した特徴選択情報を選択し、その特徴選択情報を、振幅特徴量計算部１０４および周波数特徴量計算部１０５に出力する。ここで、特徴選択情報は、振幅特徴量計算部１０４が計算すべき振幅特徴量、および、周波数特徴量計算部１０５が計算すべき周波数特徴量を特定する情報とされる。 The selection information output unit 133 uses the feature type supplied from the feature determination unit 132 to select feature selection information corresponding to each feature type, and uses the feature selection information as the amplitude feature quantity calculation unit 104 and the frequency feature. It outputs to the quantity calculation part 105. Here, the feature selection information is information specifying the amplitude feature amount to be calculated by the amplitude feature amount calculation unit 104 and the frequency feature amount to be calculated by the frequency feature amount calculation unit 105.

図９は、非定常性雑音の１つである咳の周波数特性を説明する図であって、咳と母音および咳と子音の周波数特性の比較の一例を示す図である。同図は、横軸が周波数とされ、縦軸が音圧レベルとされ、咳の音声に係る周波数特性と通常の言葉の音声に係る周波数特性が折れ線により示されている。同図の上側には、母音の音声と咳の音声と咳の音声の周波数性が示されており、同図の下側には、子音の音声と咳の音声の周波数特性が示されている。 FIG. 9 is a diagram for explaining the frequency characteristics of cough, which is one of non-stationary noises, and is a diagram showing an example of comparison of frequency characteristics of cough and vowels and cough and consonants. In the figure, the horizontal axis represents frequency, the vertical axis represents sound pressure level, and the frequency characteristics related to cough speech and the frequency characteristics related to normal speech are indicated by broken lines. The upper side of the figure shows the frequency characteristics of the vowel voice, the cough voice, and the cough voice, and the lower side of the figure shows the frequency characteristics of the consonant voice and the cough voice. .

同図の上側に示されるように、咳の音声と母音の音声とを比較した場合、１.４kHz以下の区間、４kHzから６.８kHzまでの区間、および１１.７kHz以上の区間で、音圧レベルが大きく異なっている。つまり、これらの区間の周波数特徴量、例えば、１.４kHz以下の周波数域成分、４kHzから６.８kHzまでの周波数域成分、および１１.７kHz以上の周波数域成分を取り出すフィルタを用い、入力信号の全ての周波数成分に対する上述した区間の周波数成分の比率を表すパラメータの集合などを算出すれば、簡単に咳の音声と母音の音声とを区別することができる。 As shown in the upper part of the figure, when comparing cough voice and vowel voice, the sound pressure in the section below 1.4kHz, the section from 4kHz to 6.8kHz, and the section above 11.7kHz. The levels are very different. In other words, the frequency feature quantity of these sections, for example, a frequency band component of 1.4 kHz or less, a frequency band component of 4 kHz to 6.8 kHz, and a frequency band component of 11.7 kHz or more are used to extract the input signal. If a set of parameters indicating the ratio of the frequency components in the above-described section to all frequency components is calculated, cough speech and vowel speech can be easily distinguished.

また、同図の下側に示されるように、咳の音声と子音の音声とを比較した場合、１.８kHz以下の区間、６.５kHzから８.８kHzまでの区間、及び１７.７kHz以上の区間で、音圧レベルが大きく異なっている。つまり、咳の音声と母音の音声との比較の場合と同様に各区間の周波数域成分を取り出すフィルタを用い、簡単に咳の音声と子音の音声とを区別することができる。 In addition, as shown in the lower part of the figure, when comparing cough voice and consonant voice, the section below 1.8 kHz, the section from 6.5 kHz to 8.8 kHz, and the section above 17.7 kHz The sound pressure level varies greatly between sections. That is, the cough voice and the consonant voice can be easily distinguished by using a filter that extracts the frequency band component of each section as in the case of the comparison between the cough voice and the vowel voice.

しかし、咳と母音の比較、咳と子音の比較には、それぞれ異なる周波数成分を抽出する必要があり、高い精度で咳を検出するためには、合計６通りの周波数成分に係る特徴量を算出する必要がある。すなわち、入力信号が母音に近い音声であるのか、または子音に近い音声であるのかが事前に分かっていなければ、その両方の場合を想定して特徴量を算出しなければならない。 However, in order to compare cough and vowel and cough and consonant, it is necessary to extract different frequency components, and in order to detect cough with high accuracy, feature values related to a total of six frequency components are calculated. There is a need to. That is, if it is not known in advance whether the input signal is a sound close to a vowel or a sound close to a consonant, the feature amount must be calculated assuming both cases.

例えば、予め入力信号が母音に近い音声であるのか、または子音に近い音声であるのかを認識することが可能であれば、合計３通りの周波数成分に係る特徴量の算出のみで足りるので、特徴量の算出に係る負荷を軽減することが可能となる。 For example, if it is possible to recognize in advance whether the input signal is a sound close to a vowel or a sound close to a consonant, it is only necessary to calculate the feature amounts relating to a total of three frequency components. It is possible to reduce the load related to the amount calculation.

図１０は、複数の音声信号をサンプルとした実験の結果得られた、音声信号のゼロ交差率の分布の一例を示す図である。同図は、横軸がゼロ交差率を表し、縦軸が当該ゼロ交差率を有する音声信号のフレーム単位のサンプル数を表している。 FIG. 10 is a diagram illustrating an example of the distribution of the zero-crossing rate of the audio signal obtained as a result of an experiment using a plurality of audio signals as samples. In the figure, the horizontal axis represents the zero-crossing rate, and the vertical axis represents the number of samples per frame of the audio signal having the zero-crossing rate.

図１０に示されるように、サンプルの分布には、ゼロ交差率０.０５を境界として、２つのガウシアン特性が見られる。ゼロ交差率が０.０５以下にあるサンプルは、そのほとんどが母音であることが分かっている。一方、ゼロ交差率が０.０５以上であるサンプルは、そのほとんどが子音であることが分かっている。 As shown in FIG. 10, two Gaussian characteristics can be seen in the sample distribution with a zero crossing rate of 0.05 as a boundary. It has been found that most of the samples having a zero crossing rate of 0.05 or less are vowels. On the other hand, it is known that most of the samples having a zero crossing rate of 0.05 or more are consonants.

すなわち、ゼロ交差率０.０５を閾値Ｆ_thとし、入力信号のゼロ交差率を閾値Ｆ_thと比較することにより、入力信号が母音に近い音声であるのか、または子音に近い音声であるのかを認識することが可能となる。 That is, by setting the zero crossing rate 0.05 as the threshold value F_th and comparing the zero crossing rate of the input signal with the threshold value F_th, it is recognized whether the input signal is a sound close to a vowel or a sound close to a consonant. It becomes possible.

特徴量選択部１０３の特徴計算部１３１は、例えば、入力信号のゼロ交差率を計算し、特徴判定部１３２では、入力信号のゼロ交差率を閾値Ｆ_thと比較し、その結果から、当該フレームの入力信号の特徴タイプが母音であるか子音であるかを判定する。これにより、振幅特徴量計算部１０４が計算すべき振幅特徴量、および、周波数特徴量計算部１０５が計算すべき周波数特徴量が、母音用の特徴量または子音用の特徴量とされる。 For example, the feature calculation unit 131 of the feature amount selection unit 103 calculates the zero-crossing rate of the input signal, and the feature determination unit 132 compares the zero-crossing rate of the input signal with the threshold value F_th. It is determined whether the feature type of the input signal is a vowel or a consonant. Thus, the amplitude feature quantity to be calculated by the amplitude feature quantity calculation unit 104 and the frequency feature quantity to be calculated by the frequency feature quantity calculation unit 105 are set as a vowel feature quantity or a consonant feature quantity.

このように、特徴量選択部１０３を設けることにより、振幅特徴量計算部１０４および周波数特徴量計算部１０５の計算負荷を軽減することができる。 Thus, by providing the feature quantity selection unit 103, the calculation load of the amplitude feature quantity calculation unit 104 and the frequency feature quantity calculation unit 105 can be reduced.

なお、ここでは、特徴量選択部１０３が、当該フレームの入力信号の特徴タイプが母音であるか子音であるかを判定する例について説明したが、例えば、当該フレームの入力信号の特徴タイプが音圧が大きいもの（大音圧）であるか音圧が小さいもの（小音圧）であるかを判定するようにしてもよい。例えば、小音圧の場合（音量が小さい場合）は、良好なＳ／Ｎ特性を得られにくいので、定常性雑音に影響されにくい特徴量が選択されるようにしてもよい。 Here, an example has been described in which the feature amount selection unit 103 determines whether the feature type of the input signal of the frame is a vowel or a consonant. For example, the feature type of the input signal of the frame is a sound. It may be determined whether the pressure is high (high sound pressure) or low (low sound pressure). For example, in the case of a low sound pressure (when the volume is low), it is difficult to obtain a good S / N characteristic, so that a feature amount that is not easily affected by stationary noise may be selected.

この場合、ゼロ交差率に代えて、フレームｎに含まれるＬ個のサンプル値の平均値を表す振幅特徴量（Ｅ_２（ｎ））、または、フレームｎに含まれるＬ個のサンプル値のＲＭＳ値を表す振幅特徴量（Ｅ_３（ｎ））を閾値と比較することで当該フレームの入力信号の特徴タイプを判定するようにすればよい。 In this case, instead of the zero crossing rate, the amplitude feature amount (E ₂ (n)) representing the average value of the L sample values included in the frame n, or the RMS of the L sample values included in the frame n The feature type of the input signal of the frame may be determined by comparing the amplitude feature amount (E ₃ (n)) representing the value with a threshold value.

図１１は、本技術を適用した雑音検出装置１００のさらに別の実施の形態に係る構成例を示すブロック図である。図１１の構成における雑音検出装置１００には、図１の場合とは異なり、周波数特性補正部１０１、定常性雑音軽減部１０２、フレーム統合部１０６、および尤度計算部１０７が設けられていない。図１１の雑音検出装置１００のそれ以外の構成は、図１の場合と同様である。 FIG. 11 is a block diagram illustrating a configuration example according to still another embodiment of the noise detection apparatus 100 to which the present technology is applied. Unlike the case of FIG. 1, the noise detection apparatus 100 in the configuration of FIG. 11 does not include the frequency characteristic correction unit 101, the stationary noise reduction unit 102, the frame integration unit 106, and the likelihood calculation unit 107. The other configuration of the noise detection apparatus 100 of FIG. 11 is the same as that of FIG.

図１１の構成の場合、雑音検出装置１００は、信号入力部５１から供給された入力信号から直接、振幅特徴量および周波数特徴量を計算し、それらの振幅特徴量および周波数特徴量を直接利用して当該フレームが非定常性雑音のフレームであるか否かの判定を行う。この場合、雑音検出部１０８は、例えば、振幅特徴量および周波数特徴量のそれぞれを閾値判定し、判定結果に対応して当該フレームが非定常性雑音のフレームであるか否かの判定を行うことになる。 In the case of the configuration of FIG. 11, the noise detection apparatus 100 directly calculates the amplitude feature quantity and the frequency feature quantity from the input signal supplied from the signal input unit 51 and directly uses the amplitude feature quantity and the frequency feature quantity. Thus, it is determined whether or not the frame is a non-stationary noise frame. In this case, for example, the noise detection unit 108 performs threshold determination on each of the amplitude feature amount and the frequency feature amount, and determines whether or not the frame is a frame of nonstationary noise corresponding to the determination result. become.

あるいはまた、図１１に示される雑音検出装置１００に、周波数特性補正部１０１、定常性雑音軽減部１０２、フレーム統合部１０６、および尤度計算部１０７のうちの、いずれか１つ乃至３つを追加する構成を採用することも可能である。 Alternatively, any one to three of the frequency characteristic correction unit 101, the stationary noise reduction unit 102, the frame integration unit 106, and the likelihood calculation unit 107 are added to the noise detection device 100 illustrated in FIG. It is also possible to adopt an additional configuration.

なお、上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウェアにより実行させることもできる。上述した一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば図１２に示されるような汎用のパーソナルコンピュータ７００などに、ネットワークや記録媒体からインストールされる。 The series of processes described above can be executed by hardware, or can be executed by software. When the above-described series of processing is executed by software, a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs. For example, a general-purpose personal computer 700 as shown in FIG. 12 is installed from a network or a recording medium.

図１２において、ＣＰＵ（Central Processing Unit）７０１は、ＲＯＭ（Read Only Memory）７０２に記憶されているプログラム、または記憶部７０８からＲＡＭ（Random Access Memory）７０３にロードされたプログラムに従って各種の処理を実行する。ＲＡＭ７０３にはまた、ＣＰＵ７０１が各種の処理を実行する上において必要なデータなども適宜記憶される。 In FIG. 12, a CPU (Central Processing Unit) 701 executes various processes according to a program stored in a ROM (Read Only Memory) 702 or a program loaded from a storage unit 708 to a RAM (Random Access Memory) 703. To do. The RAM 703 also appropriately stores data necessary for the CPU 701 to execute various processes.

ＣＰＵ７０１、ＲＯＭ７０２、およびＲＡＭ７０３は、バス７０４を介して相互に接続されている。このバス７０４にはまた、入出力インタフェース７０５も接続されている。 The CPU 701, ROM 702, and RAM 703 are connected to each other via a bus 704. An input / output interface 705 is also connected to the bus 704.

入出力インタフェース７０５には、キーボード、マウスなどよりなる入力部７０６、ＬＣＤ(Liquid Crystal display)などよりなるディスプレイ、並びにスピーカなどよりなる出力部７０７、ハードディスクなどより構成される記憶部７０８、モデム、ＬＡＮカードなどのネットワークインタフェースカードなどより構成される通信部７０９が接続されている。通信部７０９は、インターネットを含むネットワークを介しての通信処理を行う。 The input / output interface 705 includes an input unit 706 including a keyboard and a mouse, a display including an LCD (Liquid Crystal display), an output unit 707 including a speaker, a storage unit 708 including a hard disk, a modem, a LAN, and the like. A communication unit 709 including a network interface card such as a card is connected. The communication unit 709 performs communication processing via a network including the Internet.

入出力インタフェース７０５にはまた、必要に応じてドライブ７１０が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア７１１が適宜装着され、それらから読み出されたコンピュータプログラムが、必要に応じて記憶部７０８にインストールされる。 A drive 710 is also connected to the input / output interface 705 as necessary, and a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is appropriately mounted, and a computer program read from them is loaded. It is installed in the storage unit 708 as necessary.

上述した一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、インターネットなどのネットワークや、リムーバブルメディア７１１などからなる記録媒体からインストールされる。 When the above-described series of processing is executed by software, a program constituting the software is installed from a network such as the Internet or a recording medium such as a removable medium 711.

なお、この記録媒体は、図１２に示される、装置本体とは別に、ユーザにプログラムを配信するために配布される、プログラムが記録されている磁気ディスク（フロッピディスク（登録商標）を含む）、光ディスク（CD-ROM(Compact Disk-Read Only Memory),DVD(Digital Versatile Disk)を含む）、光磁気ディスク（MD（Mini-Disk）（登録商標）を含む）、もしくは半導体メモリなどよりなるリムーバブルメディア７１１により構成されるものだけでなく、装置本体に予め組み込まれた状態でユーザに配信される、プログラムが記録されているＲＯＭ７０２や、記憶部７０８に含まれるハードディスクなどで構成されるものも含む。 The recording medium shown in FIG. 12 is a magnetic disk (including a floppy disk (registered trademark)) on which a program is recorded, which is distributed to distribute the program to the user separately from the apparatus main body. Removable media consisting of optical disks (including CD-ROM (compact disk-read only memory), DVD (digital versatile disk)), magneto-optical disks (including MD (mini-disk) (registered trademark)), or semiconductor memory It includes not only those configured by 711 but also those configured by a ROM 702 in which a program is recorded, a hard disk included in the storage unit 708, and the like distributed to the user in a state of being incorporated in the apparatus main body in advance.

なお、本明細書において上述した一連の処理は、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 Note that the series of processes described above in this specification includes processes that are performed in parallel or individually even if they are not necessarily processed in time series, as well as processes that are performed in time series in the order described. Is also included.

また、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

なお、本技術は以下のような構成も取ることができる。 In addition, this technique can also take the following structures.

（１）
音声の入力信号の所定のフレームの波形における振幅特徴量を計算する振幅特徴量計算部と、
前記所定のフレームの波形における周波数特徴量を計算する周波数特徴量計算部と、
前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算する特徴変化量計算部と、
前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定する区間特定部と、
前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成する特徴量集合生成部と、
前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定する雑音判定部と
を備える雑音検出装置。
（２）
前記振幅特徴量計算部または前記周波数特徴量計算部は、複数種類の振幅特徴量または複数種類の周波数特徴量のうちの少なくとも２種類の振幅特徴量を計算し、
前記所定のフレームの入力信号のゼロ交差率、前記所定のフレームの入力信号の複数のサンプル値の平均値、または、前記所定のフレームの入力信号の複数のサンプル値のＲＳＭ値に基づいて、複数種類の振幅特徴量のうち、前記振幅特徴量計算部に計算させる振幅特徴量、または、複数種類の周波数特徴量のうち、前記周波数特徴量計算部に計算させる周波数特徴量を選択する特徴量選択部をさらに備える
（１）に記載の雑音検出装置。
（３）
前記特徴量選択部は、
前記所定のフレームの入力信号のゼロ交差率に基づいて、前記所定のフレームの入力信号が母音に近いか子音に近いかを判定し、前記判定結果に応じて前記振幅特徴量計算部に計算させる振幅特徴量、および、複数種類の周波数特徴量のうち、前記周波数特徴量計算部に計算させる周波数特徴量を選択する
（２）に記載の雑音検出装置。
（４）
前記振幅特徴量計算部は、
前記所定のフレームの複数のサンプル値の中のピーク値、前記所定のフレームの複数のサンプル値の平均値、または、前記所定のフレームの複数のサンプル値のＲＭＳ値のうちの、少なくとも１つを前記振幅特徴量として計算し、
前記周波数特徴量計算部は、
前記所定のフレームの入力信号のゼロ交差率、前記所定のフレームの入力信号の中で全ての周波数成分の音圧に対する特定の周波数成分の音圧の割合、前記所定のフレームの入力信号の中で特定の周波数成分とは異なる周波数成分の音圧に対する当該特定の周波数成分の音圧の割合、または、前記所定のフレームの入力信号をフーリエ変換して得られた周波数スペクトルのうちの特定の１つの値若しくは複数の値のうちの、少なくとも１つを前記周波数特徴量として計算する
（１）乃至（３）のいずれかに記載の雑音検出装置。
（５）
前記雑音判定部は、
前記特徴量集合に含まれる前記振幅特徴量の重み付け平均値と予め設定された第１の値との割合、および、前記周波数特徴量の重み付け平均値と予め設定された第２の値との割合を算出し、
前記算出された割合に基づいて、雑音尤度を算出し、
前記雑音尤度を予め設定された閾値と比較することで、前記入力信号の最新のフレームが前記非定常性雑音を含むフレームであるか否かを判定する
（１）乃至（４）のいずれかに記載の雑音検出装置。
（６）
前記雑音判定部は、
前記特徴量集合に含まれる振幅特徴量の重み付け平均値および周波数特徴量の重み付け平均値のうち、一部または全部を用いた特徴ベクトル空間において、予め学習した識別モデルに基づいて、前記特徴量集合に対応する特徴ベクトルから、当該フレームが非定常性雑音のフレームのであることの確からしさを表す雑音尤度を算出し、
前記雑音尤度を予め設定された閾値と比較することで、前記入力信号の最新のフレームが前記非定常性雑音を含むフレームであるか否かを判定する
（１）乃至（５）のいずれかに記載の雑音検出装置。
（７）
前記入力信号を供給する信号入力装置の周波数特性を補正する周波数特性補正部をさらに備える
（１）乃至（６）のいずれかに記載の雑音検出装置。
（８）
前記入力信号から前記非定常性雑音とは異なる雑音である定常性雑音を除去する定常性雑音除去部をさらに備える
（１）乃至（７）のいずれかに記載の雑音検出装置。
（９）
振幅特徴量計算部が、音声の入力信号の所定のフレームの波形における振幅特徴量を計算し、
周波数特徴量計算部が、前記所定のフレームの波形における周波数特徴量を計算し、
特徴変化量計算部が、前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算し、
区間特定部が、前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定し、
特徴量集合生成部が、前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成し、
雑音判定部が、前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定するステップ
を含む雑音検出方法。
（１０）
コンピュータを、
音声の入力信号の所定のフレームの波形における振幅特徴量を計算する振幅特徴量計算部と、
前記所定のフレームの波形における周波数特徴量を計算する周波数特徴量計算部と、
前記振幅特徴量および前記周波数特徴量を複数フレーム分保持する保持部に保持されている前記振幅特徴量および前記周波数特徴量のうち、いずれか１つの特徴量に基づいて、時間的に隣接する２つのフレーム間での前記特徴量の変化量である特徴変化量を計算する特徴変化量計算部と、
前記特徴変化量を予め設定された閾値と比較することにより、前記保持部に保持されている前記振幅特徴量および前記周波数特徴量を重み付け平均化すべきフレームの区間であって、時間的に連続するフレームの区間を特定する区間特定部と、
前記特定された区間のフレームのそれぞれに対応する前記振幅特徴量および前記周波数特徴量のそれぞれの重み付け平均値の集合を、特徴量集合として生成する特徴量集合生成部と、
前記特徴量集合に基づいて、前記入力信号の最新のフレームが突発性の雑音である非定常性雑音を含むフレームであるか否かを判定する雑音判定部とを備える雑音検出装置として機能させる
プログラム。 (1)
An amplitude feature amount calculation unit for calculating an amplitude feature amount in a waveform of a predetermined frame of an audio input signal;
A frequency feature amount calculation unit for calculating a frequency feature amount in the waveform of the predetermined frame;
2 adjacent in terms of time based on any one of the amplitude feature quantity and the frequency feature quantity held in the holding section that holds the amplitude feature quantity and the frequency feature quantity for a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two frames;
By comparing the feature change amount with a preset threshold, the amplitude feature amount and the frequency feature amount held in the holding unit are sections of frames to be weighted and averaged, and are temporally continuous. A section identifying unit that identifies a section of the frame;
A feature quantity set generation unit that generates a set of weighted average values of the amplitude feature quantity and the frequency feature quantity corresponding to each of the frames of the specified section as a feature quantity set;
A noise detection apparatus comprising: a noise determination unit that determines whether the latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set.
(2)
The amplitude feature amount calculation unit or the frequency feature amount calculation unit calculates at least two types of amplitude feature amounts among a plurality of types of amplitude feature amounts or a plurality of types of frequency feature amounts,
Based on a zero-crossing rate of the input signal of the predetermined frame, an average value of a plurality of sample values of the input signal of the predetermined frame, or a plurality of RSM values of a plurality of sample values of the input signal of the predetermined frame Feature quantity selection for selecting an amplitude feature quantity to be calculated by the amplitude feature quantity calculation unit from among the types of amplitude feature quantities, or a frequency feature quantity to be calculated by the frequency feature quantity calculation unit from among a plurality of types of frequency feature quantities The noise detection device according to (1), further including a unit.
(3)
The feature amount selection unit includes:
Based on the zero-crossing rate of the input signal of the predetermined frame, it is determined whether the input signal of the predetermined frame is close to a vowel or a consonant, and the amplitude feature amount calculation unit is made to calculate according to the determination result The noise detection apparatus according to (2), wherein a frequency feature amount to be calculated by the frequency feature amount calculation unit is selected from among an amplitude feature amount and a plurality of types of frequency feature amounts.
(4)
The amplitude feature amount calculation unit includes:
At least one of a peak value among a plurality of sample values of the predetermined frame, an average value of the plurality of sample values of the predetermined frame, or an RMS value of the plurality of sample values of the predetermined frame Calculated as the amplitude feature amount,
The frequency feature amount calculation unit includes:
The zero-crossing rate of the input signal of the predetermined frame, the ratio of the sound pressure of a specific frequency component to the sound pressure of all frequency components in the input signal of the predetermined frame, and the input signal of the predetermined frame The ratio of the sound pressure of the specific frequency component to the sound pressure of the frequency component different from the specific frequency component, or a specific one of the frequency spectrum obtained by Fourier transforming the input signal of the predetermined frame The noise detection device according to any one of (1) to (3), wherein at least one of a value or a plurality of values is calculated as the frequency feature amount.
(5)
The noise determination unit
The ratio between the weighted average value of the amplitude feature quantity included in the feature quantity set and a preset first value, and the ratio between the weighted average value of the frequency feature quantity and a preset second value To calculate
Based on the calculated ratio, a noise likelihood is calculated,
It is determined whether the latest frame of the input signal is a frame including the non-stationary noise by comparing the noise likelihood with a preset threshold value (1) to (4) The noise detection apparatus described in 1.
(6)
The noise determination unit
In the feature vector space using part or all of the weighted average value of the amplitude feature quantity and the weighted average value of the frequency feature quantity included in the feature quantity set, the feature quantity set is based on a previously learned identification model. From the feature vector corresponding to, a noise likelihood representing the probability that the frame is a non-stationary noise frame is calculated,
It is determined whether the latest frame of the input signal is a frame including the non-stationary noise by comparing the noise likelihood with a preset threshold value. The noise detection apparatus described in 1.
(7)
The noise detection device according to any one of (1) to (6), further including a frequency characteristic correction unit that corrects a frequency characteristic of a signal input device that supplies the input signal.
(8)
The noise detection apparatus according to any one of (1) to (7), further including a stationary noise removing unit that removes stationary noise that is different from the non-stationary noise from the input signal.
(9)
The amplitude feature amount calculation unit calculates the amplitude feature amount in the waveform of a predetermined frame of the voice input signal,
The frequency feature amount calculation unit calculates a frequency feature amount in the waveform of the predetermined frame,
The feature change amount calculation unit is based on any one of the amplitude feature amount and the frequency feature amount held in the holding unit that holds the amplitude feature amount and the frequency feature amount for a plurality of frames. Calculating a feature change amount that is a change amount of the feature amount between two temporally adjacent frames;
A section specifying unit is a section of a frame in which the amplitude feature amount and the frequency feature amount held in the holding unit are weighted and averaged by comparing the feature change amount with a preset threshold value, Identify the interval between successive frames,
A feature amount set generation unit generates a set of weighted average values of the amplitude feature amount and the frequency feature amount corresponding to each of the frames of the specified section as a feature amount set;
A noise detection method comprising: a step of determining whether or not a latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set.
(10)
Computer
An amplitude feature amount calculation unit for calculating an amplitude feature amount in a waveform of a predetermined frame of an audio input signal;
A frequency feature amount calculation unit for calculating a frequency feature amount in the waveform of the predetermined frame;
2 adjacent in terms of time based on any one of the amplitude feature quantity and the frequency feature quantity held in the holding section that holds the amplitude feature quantity and the frequency feature quantity for a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two frames;
By comparing the feature change amount with a preset threshold, the amplitude feature amount and the frequency feature amount held in the holding unit are sections of frames to be weighted and averaged, and are temporally continuous. A section identifying unit that identifies a section of the frame;
A feature quantity set generation unit that generates a set of weighted average values of the amplitude feature quantity and the frequency feature quantity corresponding to each of the frames of the specified section as a feature quantity set;
A program that functions as a noise detection device including a noise determination unit that determines whether the latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set .

５１信号入力部，５２信号処理装置，１００雑音検出装置，１０１周波数特性補正部，１０２定常性雑音軽減部，１０３特徴量選択部，１０４振幅特徴量計算部，１０５周波数特徴量計算部，１０６フレーム統合部，１０７尤度計算部，１０８雑音検出部，１２１特徴保持部，１２２統合対象判定部，１２３重み計算部，１２４統合部，１３１特徴計算部，１３２特徴判定部，１３３選択情報出力部，７１１リムーバブルメディア 51 signal input unit, 52 signal processing device, 100 noise detection device, 101 frequency characteristic correction unit, 102 stationary noise reduction unit, 103 feature amount selection unit, 104 amplitude feature amount calculation unit, 105 frequency feature amount calculation unit, 106 frame Integration unit, 107 likelihood calculation unit, 108 noise detection unit, 121 feature holding unit, 122 integration target determination unit, 123 weight calculation unit, 124 integration unit, 131 feature calculation unit, 132 feature determination unit, 133 selection information output unit, 711 Removable media

Claims

An amplitude feature amount calculation unit for calculating an amplitude feature amount in a waveform of a predetermined frame of an audio input signal;
A frequency feature amount calculation unit for calculating a frequency feature amount in the waveform of the predetermined frame;
2 adjacent in terms of time based on any one of the amplitude feature quantity and the frequency feature quantity held in the holding section that holds the amplitude feature quantity and the frequency feature quantity for a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two frames;
By comparing the feature change amount with a preset threshold, the amplitude feature amount and the frequency feature amount held in the holding unit are sections of frames to be weighted and averaged, and are temporally continuous. A section identifying unit that identifies a section of the frame;
A feature quantity set generation unit that generates a set of weighted average values of the amplitude feature quantity and the frequency feature quantity corresponding to each of the frames of the specified section as a feature quantity set;
A noise detection apparatus comprising: a noise determination unit that determines whether the latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set.

The amplitude feature amount calculation unit or the frequency feature amount calculation unit calculates at least two types of amplitude feature amounts among a plurality of types of amplitude feature amounts or a plurality of types of frequency feature amounts,
Based on a zero-crossing rate of the input signal of the predetermined frame, an average value of a plurality of sample values of the input signal of the predetermined frame, or a plurality of RSM values of a plurality of sample values of the input signal of the predetermined frame Feature quantity selection for selecting an amplitude feature quantity to be calculated by the amplitude feature quantity calculation unit from among the types of amplitude feature quantities, or a frequency feature quantity to be calculated by the frequency feature quantity calculation unit from among a plurality of types of frequency feature quantities The noise detection apparatus according to claim 1, further comprising a unit.

The feature amount selection unit includes:
Based on the zero-crossing rate of the input signal of the predetermined frame, it is determined whether the input signal of the predetermined frame is close to a vowel or a consonant, and the amplitude feature amount calculation unit is made to calculate according to the determination result The noise detection device according to claim 2, wherein a frequency feature amount to be calculated by the frequency feature amount calculation unit is selected from an amplitude feature amount and a plurality of types of frequency feature amounts.

The amplitude feature amount calculation unit includes:
At least one of a peak value among a plurality of sample values of the predetermined frame, an average value of the plurality of sample values of the predetermined frame, or an RMS value of the plurality of sample values of the predetermined frame Calculated as the amplitude feature amount,
The frequency feature amount calculation unit includes:
The zero-crossing rate of the input signal of the predetermined frame, the ratio of the sound pressure of a specific frequency component to the sound pressure of all frequency components in the input signal of the predetermined frame, and the input signal of the predetermined frame The ratio of the sound pressure of the specific frequency component to the sound pressure of the frequency component different from the specific frequency component, or a specific one of the frequency spectrum obtained by Fourier transforming the input signal of the predetermined frame The noise detection device according to claim 1, wherein at least one of a value or a plurality of values is calculated as the frequency feature amount.

The noise determination unit
The ratio between the weighted average value of the amplitude feature quantity included in the feature quantity set and a preset first value, and the ratio between the weighted average value of the frequency feature quantity and a preset second value To calculate
Based on the calculated ratio, a noise likelihood is calculated,
The noise detection device according to claim 1, wherein the noise likelihood is determined by comparing the noise likelihood with a preset threshold value to determine whether the latest frame of the input signal is a frame including the non-stationary noise.

The noise determination unit
In the feature vector space using part or all of the weighted average value of the amplitude feature quantity and the weighted average value of the frequency feature quantity included in the feature quantity set, the feature quantity set is based on a previously learned identification model. From the feature vector corresponding to, a noise likelihood representing the probability that the frame is a non-stationary noise frame is calculated,
The noise detection device according to claim 1, wherein the noise likelihood is determined by comparing the noise likelihood with a preset threshold value to determine whether the latest frame of the input signal is a frame including the non-stationary noise.

The noise detection apparatus according to claim 1, further comprising a frequency characteristic correction unit that corrects a frequency characteristic of a signal input device that supplies the input signal.

The noise detection apparatus according to claim 1, further comprising a stationary noise removing unit that removes stationary noise that is different from the non-stationary noise from the input signal.

The amplitude feature amount calculation unit calculates the amplitude feature amount in the waveform of a predetermined frame of the voice input signal,
The frequency feature amount calculation unit calculates a frequency feature amount in the waveform of the predetermined frame,
The feature change amount calculation unit is based on any one of the amplitude feature amount and the frequency feature amount held in the holding unit that holds the amplitude feature amount and the frequency feature amount for a plurality of frames. Calculating a feature change amount that is a change amount of the feature amount between two temporally adjacent frames;
A section specifying unit is a section of a frame in which the amplitude feature amount and the frequency feature amount held in the holding unit are weighted and averaged by comparing the feature change amount with a preset threshold value, Identify the interval between successive frames,
A feature amount set generation unit generates a set of weighted average values of the amplitude feature amount and the frequency feature amount corresponding to each of the frames of the specified section as a feature amount set;
A noise detection method comprising: a step of determining whether or not a latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set.

Computer
An amplitude feature amount calculation unit for calculating an amplitude feature amount in a waveform of a predetermined frame of an audio input signal;
A frequency feature amount calculation unit for calculating a frequency feature amount in the waveform of the predetermined frame;
2 adjacent in terms of time based on any one of the amplitude feature quantity and the frequency feature quantity held in the holding section that holds the amplitude feature quantity and the frequency feature quantity for a plurality of frames. A feature change amount calculation unit that calculates a feature change amount that is a change amount of the feature amount between two frames;
By comparing the feature change amount with a preset threshold, the amplitude feature amount and the frequency feature amount held in the holding unit are sections of frames to be weighted and averaged, and are temporally continuous. A section identifying unit that identifies a section of the frame;
A feature quantity set generation unit that generates a set of weighted average values of the amplitude feature quantity and the frequency feature quantity corresponding to each of the frames of the specified section as a feature quantity set;
A program that functions as a noise detection device including a noise determination unit that determines whether the latest frame of the input signal is a frame including non-stationary noise that is sudden noise based on the feature amount set .