JP6294747B2

JP6294747B2 - Notification sound sensing device, notification sound sensing method and program

Info

Publication number: JP6294747B2
Application number: JP2014089043A
Authority: JP
Inventors: 仲大室; 桂右井本; 尚植松
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-04-23
Filing date: 2014-04-23
Publication date: 2018-03-14
Anticipated expiration: 2034-04-23
Also published as: JP2015206974A

Description

この発明は、周辺で発生した音声や音響を感知する技術に関し、より詳しくは、時計や家電のアラーム音、玄関のチャイム、火災警報器の警報音、電話の着信音、自動車のクラクション、注意を喚起する笛の音などの報知音が発生したときに、その報知音の発生を感知する報知音感知技術に関する。 The present invention relates to technology for detecting sounds and sounds generated in the vicinity, and more specifically, alarm sounds of clocks and home appliances, chimes at the entrance, alarm sounds of fire alarms, ringtones of phones, car horns, and attention. The present invention relates to a notification sound sensing technique for sensing the occurrence of a notification sound when a notification sound such as a whistling sound is generated.

人に何らかの合図をするときに、人の日常動作に伴う音や自然界に存在する音とは特徴の異なる音を発生させることが多い。このような音には、例えば、全自動洗濯機の洗濯が終了したときや電子レンジの調理が終了した際の「ピーピー」という音、玄関チャイムの「ピンポーン」という音、火災警報器の「ピューピュー」という音などがある。これらを総称して報知音と呼ぶ。 When a signal is given to a person, a sound having characteristics different from those of a person's daily movement or a sound existing in nature is often generated. Such sounds include, for example, the sound of “Peepy” when washing of the fully automatic washing machine is finished or cooking of the microwave oven is finished, the sound of “Ping Pong” of the front door chime, and “Pupy” of the fire alarm. Etc. ”. These are collectively referred to as notification sounds.

しかしながら、聴覚障がいのある人にとっては、報知音が発生してもそれを聞くことができず、日常生活に不便を生じるだけでなく、身に危険が生じる可能性もある。 However, for a person with hearing impairment, even if a notification sound is generated, it cannot be heard, which not only causes inconvenience in daily life but also may cause danger to the body.

この問題に対して、マイクで収音した報知音を、振動に変えて呈示する方法が非特許文献１に記載されている。例えば、スマートフォンや専用の機器をユーザが身につけ、マイクから取り込んだ音をソフトウェアで常時分析して、例えば高域通過フィルタで帯域制限した信号のパワーが閾値以上の音を感知したときにはバイブレータを起動し、音の情報を振動に変換してユーザに知らせるものである。音の発生パターンに応じて振動のパターンを変化させ、異なる種類の報知音を区別することも可能である。 Non-Patent Document 1 describes a method for presenting a notification sound collected by a microphone in place of this problem instead of vibration. For example, when a user wears a smartphone or dedicated device, the sound captured from the microphone is constantly analyzed by software, and the vibrator is activated when, for example, a sound with a band-limited signal with a high-pass filter is detected. Then, the sound information is converted into vibration and notified to the user. It is also possible to change the vibration pattern according to the sound generation pattern to distinguish different types of notification sounds.

織田、古家、片岡、「聴覚障害者支援を目的とした振動による報知音の伝達方法とその有効性」、電子情報通信学会論文誌.D、Vol. J89-D、No.12、pp. 2671-2678Oda, Furuya, Kataoka, “Method of transmitting sound by vibration for the purpose of supporting the hearing impaired and its effectiveness”, IEICE Transactions. D, Vol. J89-D, No. 12, pp. 2671 -2678

非特許文献１の方法は、聴覚障がい者に一定の利便性を提供する一方で、閾値による感度の調整が難しく、感度を上げる（すなわち閾値を下げる）とユーザが必要としない音にも反応して過剰に振動し、感度を下げる（すなわち閾値を上げる）とユーザが必要とする音が周囲で発生しても感知しない場合があるという問題があった。 While the method of Non-Patent Document 1 provides a certain level of convenience to the hearing impaired, it is difficult to adjust the sensitivity with the threshold, and when the sensitivity is increased (that is, the threshold is decreased), it reacts to sounds that the user does not need. If the sensitivity is reduced (that is, the threshold value is increased), the sound required by the user may not be detected even if it is generated in the surroundings.

この発明は、このような状況に鑑み、周囲で発生した音が報知音であるかないかをより厳密に判断することができる報知音感知技術を提供することを目的とする。 In view of such a situation, an object of the present invention is to provide a notification sound detection technique capable of more strictly determining whether a sound generated in the surrounding area is a notification sound.

上記の課題を解決するために、この発明の報知音感知装置は、入力された音響信号があらかじめ定めた周波数帯域内でパワーが集中する周波数を所定の数以上持つことを検出すると報知音の感知を示す感知結果を出力する。 In order to solve the above-described problems, the notification sound sensing device of the present invention senses a notification sound when it detects that an input acoustic signal has a predetermined number or more of frequencies where power is concentrated within a predetermined frequency band. A sensing result indicating is output.

この発明の報知音感知技術によれば、周囲で発生した音が報知音であるかないかをより厳密に判断することができる。つまり、人の日常動作に伴う音や自然界に存在する音がある環境において、報知音が鳴ったことを正確に感知することができる。これにより、聴覚障がい者に音以外の方法によって報知音が鳴ったことを正確に通知することができる。また、聴覚障がい者向けのサービスに限らず健常者向けにも、ユーザが報知音の発生源から離れた場所にいるときに、報知音の発生源の近くに報知音感知装置を置き、例えば無線などの通信手段を用いて感知結果を離れたユーザに通知することができる。 According to the notification sound sensing technology of the present invention, it is possible to more strictly determine whether the sound generated in the surrounding area is a notification sound. That is, it is possible to accurately detect that the notification sound has been sounded in an environment where there is a sound that accompanies everyday human movement or a sound that exists in nature. Accordingly, it is possible to accurately notify the hearing impaired person that the notification sound has been generated by a method other than sound. Further, not only for the hearing impaired service but also for the healthy person, when the user is away from the generation source of the notification sound, the notification sound sensing device is placed near the generation source of the notification sound, for example, wireless It is possible to notify the user who has left the sensing result using a communication means such as.

図１は、第一実施形態の報知音感知装置の機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of a notification sound sensing device according to the first embodiment. 図２は、報知音の特徴を説明するための図である。FIG. 2 is a diagram for explaining the characteristics of the notification sound. 図３は、報知音の特徴を説明するための図である。FIG. 3 is a diagram for explaining the characteristics of the notification sound. 図４は、報知音の特徴を説明するための図である。FIG. 4 is a diagram for explaining the characteristics of the notification sound. 図５は、報知音の特徴を説明するための図である。FIG. 5 is a diagram for explaining the characteristics of the notification sound. 図６は、第二実施形態の報知音感知装置の機能構成を例示する図である。FIG. 6 is a diagram illustrating a functional configuration of the notification sound sensing device according to the second embodiment. 図７は、第二実施形態の報知音感知方法の処理フローを例示する図である。FIG. 7 is a diagram illustrating a processing flow of the notification sound sensing method of the second embodiment. 図８は、第三実施形態の報知音感知装置の機能構成を例示する図である。FIG. 8 is a diagram illustrating a functional configuration of the notification sound sensing device according to the third embodiment. 図９は、第四実施形態の報知音感知装置の機能構成を例示する図である。FIG. 9 is a diagram illustrating a functional configuration of the notification sound sensing device of the fourth embodiment. 図１０は、連続音と間欠音の特徴を説明するための図である。FIG. 10 is a diagram for explaining the characteristics of continuous sounds and intermittent sounds. 図１１は、第四実施形態の判定規則を概念的に示す図である。FIG. 11 is a diagram conceptually showing the determination rule of the fourth embodiment.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

［第一実施形態］
この発明の第一実施形態は、この発明の最上位の概念を示す実施形態である。第一実施形態は、図１に示すように、音声や音響信号（以下、音響信号）を入力とし、報知音を感知したこと（もしくは報知音を感知しなかったこと）を示す感知結果を出力する報知音感知装置及び方法である。 [First embodiment]
1st embodiment of this invention is embodiment which shows the highest concept of this invention. As shown in FIG. 1, in the first embodiment, a voice or an acoustic signal (hereinafter referred to as an acoustic signal) is input, and a detection result indicating that the notification sound is detected (or the notification sound is not detected) is output. An alarm sound sensing device and method.

第一実施形態の報知音感知装置１は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。報知音感知装置１は、例えば、中央演算処理装置の制御のもとで各処理を実行する。報知音感知装置１に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、報知音感知装置１の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。 The notification sound sensing device 1 according to the first embodiment reads a special program into a known or dedicated computer having, for example, a central processing unit (CPU), a main storage device (RAM), and the like. It is a special device constructed. The notification sound sensing device 1 executes each process under the control of the central processing unit, for example. The data input to the notification sound sensing device 1 and the data obtained in each process are stored in, for example, a main storage device, and the data stored in the main storage device is read out as necessary for other processing. Used. Further, at least a part of each processing unit of the notification sound sensing device 1 may be configured by hardware such as an integrated circuit.

図２から図５を参照して、報知音を感知する対象となる音響信号の特徴を説明する。 With reference to FIG. 2 to FIG. 5, the characteristics of an acoustic signal that is a target for detecting a notification sound will be described.

図２は、オフィスで一定時間録音した音の例である。図２の例は、横軸を時間とし、縦軸を振幅とし、時間領域の波形を図にしたものである。図２の例では、2-Aの矢印で示す時間帯に携帯電話の着信音が鳴っている。この着信音を文字で表現すると、「ピピピッピピピッ、ピピピッピピピッ、ピピピッピピピッ、ピピピッピピピッ」といった感じである。それ以外の時間は、オフィスにおける通常業務で発生するノイズ（オフィスノイズ）であり、定常的な音もあれば、2-Bの矢印で示すような、突発性で音量の大きいノイズも発生している。 FIG. 2 shows an example of sound recorded for a certain period of time in the office. In the example of FIG. 2, the horizontal axis represents time, the vertical axis represents amplitude, and the waveform in the time domain is illustrated. In the example of FIG. 2, the ringtone of the mobile phone is ringing during the time period indicated by the arrow 2-A. When this ring tone is expressed in characters, it feels like “beep-beep, buzz-beep-beep, blip-beep-beep, blip-beep-beep”. Other times are noise (office noise) that occurs during normal office work, and there is a steady sound, as well as sudden and loud noise as indicated by the arrow 2-B. Yes.

図３はリビングルームで一定時間録音した音の例である。図３の例は、図２と同様に、時間領域の波形を図にしたものである。一見、定常的な雑音（フロアノイズ）だけで、報知音は鳴っていないように見えるが、実際にはこのとき、隣の台所ではキッチンタイマーが鳴っており、録音された音を聴くと、キッチンタイマーの音も録音されていることがわかる。 FIG. 3 shows an example of sound recorded for a certain period of time in the living room. The example of FIG. 3 is a diagram illustrating waveforms in the time domain, as in FIG. At first glance, it seems that only the steady noise (floor noise) does not sound, but in reality, the kitchen timer is sounding in the next kitchen. It can be seen that the timer sound is also recorded.

このような状況において、報知音感知装置１は、図２のオフィスの例では携帯電話の着信音が鳴ったときに、図３のリビングの例ではキッチンタイマーが鳴ったときに、感知を示す感知結果を出力し、それ以外の時間帯では感知を示す感知結果を出力しない（または、不感知を示す感知結果を出力する）ことが求められる。しかしながら、図２及び図３の例において従来技術を用いると、図２のオフィスの例では突発性で音量の大きいノイズでも感知してしまい、図３のリビングの例ではキッチンタイマーを感知することができず、いずれの場合も正確に感知結果を出力することができない。 In such a situation, the notification sound sensing device 1 senses sensing when a ringtone of a mobile phone sounds in the office example of FIG. 2 and when a kitchen timer sounds in the example of living in FIG. It is required to output a result and not output a sensing result indicating sensing (or outputting a sensing result indicating non-sensing) in other time periods. However, if the prior art is used in the examples of FIGS. 2 and 3, the office example of FIG. 2 senses sudden and loud noise, and the living example of FIG. 3 senses the kitchen timer. In any case, the sensing result cannot be output accurately.

図４（2-A）は、図２において2-Aで示した時刻におけるパワースペクトルを図示したものであり、図４（2-B）は、図２において2-Bで示した時刻におけるパワースペクトルを図示したものである。それぞれ、横軸が周波数（Hz）であり、縦軸がパワー（dB）である。報知音である図４（2-A）は、突発性ノイズである図４（2-B）に対して顕著な特徴がある。図４（2-A）では、矢印で示した2.8kHz付近と5.6kHz付近にピークを持つスペクトル構造をしていることがわかる。図４（2-B）では、そのような特徴は認められない。 FIG. 4 (2-A) shows the power spectrum at the time indicated by 2-A in FIG. 2, and FIG. 4 (2-B) shows the power at the time indicated by 2-B in FIG. The spectrum is illustrated. In each case, the horizontal axis represents frequency (Hz) and the vertical axis represents power (dB). FIG. 4 (2-A), which is a notification sound, has a prominent feature compared to FIG. 4 (2-B), which is sudden noise. In FIG. 4 (2-A), it can be seen that the spectrum structure has peaks near 2.8 kHz and 5.6 kHz indicated by arrows. In FIG. 4 (2-B), such a feature is not recognized.

図５（3-A）は、図３において3-Aで示した時刻におけるパワースペクトルを図示したものであり、図５（3-B）は、図３において3-Bで示した時刻におけるパワースペクトルを図示したものである。図３では一見、報知音がないように見られたが、図５（3-B）のスペクトルは4.1kHz付近にピークを持つ構造であり、図５（3-A）にはそのような特徴は認められない。実際に、図３において3-Bで示した時刻にはキッチンタイマーが鳴っており、3-Aで示した時刻には鳴っていない。 FIG. 5 (3-A) illustrates the power spectrum at the time indicated by 3-A in FIG. 3, and FIG. 5 (3-B) indicates the power at the time indicated by 3-B in FIG. The spectrum is illustrated. At first glance, it seems that there is no notification sound in FIG. 3, but the spectrum of FIG. 5 (3-B) has a structure having a peak near 4.1 kHz, and FIG. 5 (3-A) has such a feature. It is not allowed. Actually, the kitchen timer is sounding at the time indicated by 3-B in FIG. 3, and is not sounding at the time indicated by 3-A.

これらの観察結果より、報知音をその他の周囲雑音の中から見つけるためには、１つ以上のピークを持つ、すなわちパワーが集中する周波数を１箇所以上持つ（以下、ピークを持つと呼ぶ。）かどうかを見分ければよいことがわかる。しかも、報知音は一般に、「ブー」という低い音ではなく、「ピー」という比較的高い音が用いられることが多いため、あらかじめ定めた第一の周波数（例えば、1.0kHz）から、第二の周波数（例えば、6.0kHz）の間の周波数に１個以上Ｎ個以下のピークを持つ音であると判断したら、感知を示す感知結果を出力し、それ以外の場合は感知を示す感知結果を出力しない、または不感知を示す感知結果を出力する。ピークの数の上限Ｎを定めることは必須ではないが、例えば人の声のように多数の周波数にパワーが集中した音を報知音として誤感知することを回避することができ、より精度よく報知音を感知することができる。Ｎの具体的な値は実験的に求めるのがよいが、例えば６程度とすることができる。 From these observation results, in order to find a notification sound from other ambient noises, it has one or more peaks, that is, one or more frequencies where power is concentrated (hereinafter referred to as having a peak). You can see if you can tell whether or not. Moreover, in general, the notification sound is not a low sound of “boo” but a relatively high sound of “pea” in many cases, so that the second frequency is determined from a predetermined first frequency (for example, 1.0 kHz). When it is judged that the sound has a peak of 1 or more and N or less at a frequency between frequencies (for example, 6.0 kHz), a sensing result indicating sensing is output. Otherwise, a sensing result indicating sensing is output. No or a sensing result indicating no sensing is output. Although it is not essential to set the upper limit N of the number of peaks, for example, it is possible to avoid misdetecting a sound whose power is concentrated on a large number of frequencies such as a human voice as a notification sound, and more accurate notification is made. Sound can be detected. The specific value of N is preferably obtained experimentally, but can be about 6, for example.

したがって、第一実施形態の報知音感知装置１は、入力された音響信号があらかじめ定めた周波数帯域内にパワーが集中する周波数を持つことを検出すると報知音の感知を示す感知結果を出力するものである。 Therefore, the notification sound sensing device 1 of the first embodiment outputs a detection result indicating the detection of the notification sound when it detects that the input acoustic signal has a frequency at which power is concentrated in a predetermined frequency band. It is.

［第二実施形態］
この発明の第二実施形態は、第一実施形態の報知音感知装置及び方法の機能構成を具体化した報知音感知装置及び方法である。 [Second Embodiment]
The second embodiment of the present invention is a notification sound sensing device and method embodying the functional configuration of the notification sound sensing device and method of the first embodiment.

第二実施形態の報知音感知装置１は、図６に示すように、ケプストラム計算部１０、窓かけ部１４、第二ＦＦＴ部１５及び判定部１６を例えば含み、音響信号s(n)を入力とし、感知結果aを出力する。ケプストラム計算部１０は、第一ＦＦＴ部１１、対数パワースペクトル計算部１２及び逆ＦＦＴ部１３を例えば含む。 As shown in FIG. 6, the notification sound sensing device 1 according to the second embodiment includes, for example, a cepstrum calculation unit 10, a windowing unit 14, a second FFT unit 15, and a determination unit 16, and inputs an acoustic signal s (n). And the sensing result a is output. The cepstrum calculation unit 10 includes, for example, a first FFT unit 11, a logarithmic power spectrum calculation unit 12, and an inverse FFT unit 13.

図７を参照して、第二実施形態の報知音感知方法を説明する。 With reference to FIG. 7, the notification sound sensing method of 2nd embodiment is demonstrated.

入力音響信号s(n)はパルス符号変調（PCM: Pulse Code Modulation）などのディジタル形式で、フレームと呼ばれる一定の時間毎に区切られている。サンプリング周波数は任意の値を利用してよいが、1.0kHzから6.0kHzの周波数特性を分析するには、16kHz以上でサンプリングするのが適当である。以下、特に断わりがある場合を除いて、サンプリング周波数が16kHzであることを例として説明する。フレーム長も任意の値を利用してよく、例えば、5ミリ秒（16kHzサンプリングの場合は、80サンプル）、10ミリ秒（16kHzサンプリングの場合は、160サンプル）、20ミリ秒（16kHzサンプリングの場合は、320サンプル）、32ミリ秒（16kHzサンプリングの場合は、512サンプル）などが利用できる。 The input acoustic signal s (n) is divided into digital formats such as pulse code modulation (PCM) and is divided at regular intervals called frames. An arbitrary value may be used as the sampling frequency, but it is appropriate to sample at 16 kHz or more in order to analyze the frequency characteristic from 1.0 kHz to 6.0 kHz. Hereinafter, the case where the sampling frequency is 16 kHz will be described as an example unless otherwise specified. Any value can be used for the frame length, for example, 5 milliseconds (80 samples for 16 kHz sampling), 10 milliseconds (160 samples for 16 kHz sampling), 20 milliseconds (16 kHz sampling) 320 samples), 32 milliseconds (512 samples for 16 kHz sampling), etc. can be used.

ステップＳ１１において、第一ＦＦＴ部１１は、入力音響信号を蓄えるバッファを備え、短時間フーリエ変換の手法を用いて、入力音響信号を周波数スペクトルS(k)に変換する。周波数スペクトルS(k)は、対数パワースペクトル計算部１２へ入力される。フーリエ変換の窓長はフレーム長以上にするものとし、フーリエ変換の窓長がフレーム長を超えるときは、複数のフレームにまたがった時間領域のバッファ内の信号を変換するものとする。例えば、フレーム長を32ミリ秒、フーリエ変換の窓長を64ミリ秒とすると、２フレーム分をまとめて変換するものとする。フーリエ変換の窓長がフレーム長より長い分は、各フレームの処理において時間領域をオーバーラップさせるものとする。例えば、第iフレームの音響信号が入力されたときには、第i-1フレームと第iフレームの音響信号をフーリエ変換し、第i+1フレームの音響信号が入力されたときには、第iフレームと第i+1フレームの音響信号をフーリエ変換する。 In step S11, the first FFT unit 11 includes a buffer for storing the input acoustic signal, and converts the input acoustic signal into a frequency spectrum S (k) using a short-time Fourier transform technique. The frequency spectrum S (k) is input to the logarithmic power spectrum calculation unit 12. The window length of the Fourier transform is longer than the frame length, and when the window length of the Fourier transform exceeds the frame length, the signal in the time domain buffer across a plurality of frames is transformed. For example, assuming that the frame length is 32 milliseconds and the Fourier transform window length is 64 milliseconds, two frames are converted together. When the window length of the Fourier transform is longer than the frame length, the time domain is overlapped in the processing of each frame. For example, when the sound signal of the i-th frame is input, the sound signals of the i-th frame and the i-th frame are Fourier transformed, and when the sound signal of the i + 1-th frame is input, the i-th frame and the i-th frame are input. Fourier transform of the acoustic signal of i + 1 frame.

ステップＳ１２において、対数パワースペクトル計算部１２は、周波数スペクトルS(k)から、対数尺度のパワースペクトル（以下、対数パワースペクトルという）を計算する。対数パワースペクトルは、逆ＦＦＴ部１３へ入力される。 In step S12, the logarithmic power spectrum calculation unit 12 calculates a logarithmic power spectrum (hereinafter referred to as a logarithmic power spectrum) from the frequency spectrum S (k). The logarithmic power spectrum is input to the inverse FFT unit 13.

ステップＳ１３において、逆ＦＦＴ部１３は、対数パワースペクトル計算部１２が出力する対数パワースペクトルを逆フーリエ変換して時間領域の信号c(n)に戻す。逆ＦＦＴ部１３の出力c(n)はＦＦＴケプストラム係数と呼ぶ（以下、単にケプストラムという）。ケプストラムの計算方法の詳細は、一般に広く知られている方法を用いることができる。ケプストラムの計算方法は、例えば、「斎藤、中田、「音声情報処理の基礎」、オーム社、1981年、pp.99-103（参考文献１）」に記載されている。参考文献１には、後述するケプストラムに窓かけをする手法についても記載されている。 In step S <b> 13, the inverse FFT unit 13 performs inverse Fourier transform on the logarithmic power spectrum output from the logarithmic power spectrum calculation unit 12 to return it to the signal c (n) in the time domain. The output c (n) of the inverse FFT unit 13 is called an FFT cepstrum coefficient (hereinafter simply referred to as a cepstrum). For the details of the cepstrum calculation method, a generally well-known method can be used. The cepstrum calculation method is described, for example, in “Saito, Nakata,“ Basics of Speech Information Processing ”, Ohmsha, 1981, pp. 99-103 (reference document 1)”. Reference 1 also describes a method of windowing a cepstrum described later.

ステップＳ１４において、窓かけ部１４は、あらかじめ定めた窓関数を用いて、ケプストラムc(n)に重みを掛けた重み付きケプストラムw(n)c(n)を出力する。窓関数はnを変数とする重み関数であり、方形窓のほか、三角関数を用いて定義する窓（例えば、ハミング窓、ハニング窓など）を利用することができる。具体的な窓関数の決め方は後述する。 In step S14, the windowing unit 14 outputs a weighted cepstrum w (n) c (n) obtained by multiplying the cepstrum c (n) by a weight using a predetermined window function. The window function is a weight function having n as a variable, and a window defined using a trigonometric function (for example, a Hamming window, a Hanning window, etc.) can be used in addition to a rectangular window. A specific method for determining the window function will be described later.

ステップＳ１５において、第二ＦＦＴ部１５は、重み付きケプストラムw(n)c(n)を再度周波数領域に変換した信号Cw(k)を出力する。一般的に、重み付きケプストラムを周波数領域に変換すると、重みに応じてスペクトルの特徴が強調されたスペクトルが得られることが知られており、以降の説明では、信号Cw(k)を強調スペクトルと呼ぶことにする。 In step S15, the second FFT unit 15 outputs a signal Cw (k) obtained by converting the weighted cepstrum w (n) c (n) into the frequency domain again. In general, it is known that when a weighted cepstrum is converted to the frequency domain, a spectrum in which spectral characteristics are enhanced according to the weight is obtained. In the following description, the signal Cw (k) is defined as an enhanced spectrum. I will call it.

ステップＳ１６において、判定部１６は、第二ＦＦＴ部１５の出力する強調スペクトルCw(k)におけるあらかじめ定めた周波数帯域内の各周波数kに対応する値をあらかじめ定めた閾値Caと比較して報知音の感知を示す感知結果を出力する。具体的には、KLをあらかじめ定めた第一の周波数とし、KHをあらかじめ定めた第二の周波数とし、KL≦k≦KHの範囲の各周波数kに対応する強調スペクトルCw(k)の値を調べ、閾値Caを超える（以上でもよい。以下、閾値との比較において同じ。）値があるときには、報知音の感知を示す感知結果aを出力し、閾値Caを超える値がないときは、報知音の感知を示す感知結果aを出力しない。このとき、閾値Caを超える値がないときは、報知音の不感知を示す感知結果aを出力するようにしてもよい。例えば、閾値Caを超える値があるときは、感知結果としてa=1を出力し、閾値Caを超える値がないときはa=0を出力する。 In step S <b> 16, the determination unit 16 compares the value corresponding to each frequency k in the predetermined frequency band in the enhanced spectrum Cw (k) output from the second FFT unit 15 with a predetermined threshold value Ca and notifies the notification sound. The detection result indicating the detection of is output. Specifically, KL is a predetermined first frequency, KH is a predetermined second frequency, and the value of the enhancement spectrum Cw (k) corresponding to each frequency k in the range of KL ≦ k ≦ KH is set. If there is a value that exceeds the threshold value Ca (or may be the same as the comparison with the threshold value), a detection result a indicating the detection of the notification sound is output, and if there is no value that exceeds the threshold value Ca, the notification is made. Does not output detection result a indicating sound detection. At this time, if there is no value exceeding the threshold value Ca, a detection result a indicating that the notification sound is not detected may be output. For example, when there is a value exceeding the threshold Ca, a = 1 is output as the sensing result, and when there is no value exceeding the threshold Ca, a = 0 is output.

第一の周波数KL、第二の周波数KH及び閾値Caの値は、いずれも報知音の存在する可能性がある範囲によって定める。特定の報知音はJISのガイドライン等で標準化されているため、これらに記載された報知音の周波数範囲に合わせて設定してもよい。また、独自に設計された報知音に合わせて設定するものであってもよい。第一の周波数KLは、例えば、1.0kHzとすることができる。第二の周波数KHは、例えば、6.0kHzとすることができる。 The values of the first frequency KL, the second frequency KH, and the threshold value Ca are all determined by the range where the notification sound may exist. Since the specific notification sound is standardized by JIS guidelines and the like, it may be set according to the frequency range of the notification sound described therein. Moreover, you may set according to the notification sound designed uniquely. The first frequency KL can be set to 1.0 kHz, for example. The second frequency KH can be set to, for example, 6.0 kHz.

窓かけ部１４の処理において、ケプストラムc(n)に重みをかけるのは、図４（2-A）及び図５（3-B）の例において、パワーが集中した周波数成分を強調して、１個以上Ｎ個以下の周波数にピークを持つ音であるかどうかを安定的に判定できるようにするためである。ケプストラムc(n)の値は、nが小さい領域はスペクトルの傾きや緩やかな概形を表し、nが大きくなるにつれてスペクトルの微細構造を表すことが知られている。また、nの値に応じた重み係数を乗算することによって、スペクトルの概形を強調又は除去したり、スペクトルの微細構造を強調又は除去したりできることが知られている。 In the processing of the windowing unit 14, the cepstrum c (n) is weighted by emphasizing the frequency component in which power is concentrated in the examples of FIGS. 4 (2-A) and 5 (3-B). This is because it is possible to stably determine whether or not the sound has a peak at one or more and N or less frequencies. As for the value of cepstrum c (n), it is known that the region where n is small represents the inclination of the spectrum or a gentle outline, and the fine structure of the spectrum is represented as n increases. It is also known that by multiplying a weighting factor corresponding to the value of n, the outline of the spectrum can be emphasized or removed, and the fine structure of the spectrum can be emphasized or removed.

この発明では、報知音を感知すること、すなわち、特定の周波数にパワーが集中した信号が存在することを感知することが目的である。図４及び図５の例において、スペクトルの傾きや緩やかな概形は報知音以外の周囲雑音によって生じているものと考えられる。したがって、第二実施形態では、nが小さい領域の重みがそれ以外の領域よりも小さくなる窓関数を利用するものとする。窓関数の一例として、
n≦Ncのとき、w(n)=0
それ以外のとき、w(n)=1
のような方形窓を利用することができる。Ncの値には、例えば10を用いることができる。 An object of the present invention is to detect a notification sound, that is, to detect the presence of a signal in which power is concentrated at a specific frequency. In the examples of FIG. 4 and FIG. 5, it is considered that the inclination of the spectrum and the gentle outline are caused by ambient noise other than the notification sound. Therefore, in the second embodiment, a window function is used in which the weight of the region where n is small is smaller than that of the other regions. As an example of the window function,
When n ≦ Nc, w (n) = 0
Otherwise, w (n) = 1
A square window such as can be used. For example, 10 can be used as the value of Nc.

窓関数の別の一例として、Nc<Nhとし、
n≦Ncのとき、w(n)=0
n≧Nhのとき、w(n)=0
それ以外のとき、w(n)=1
のように、nが小さい領域のほか、nが大きい領域の重みも小さくなるように窓関数を定義してもよい。nが大きい領域を除外するのは、報知音のピーク周波数にはある程度の幅があるものとし、それよりも細かなスペクトルの微細構造も除去することによって、安定した報知音の感知ができるものと考えられるためである。フレーム長を32ミリ秒（512サンプル）、フーリエ変換の窓長を1024サンプルとしたとき、Nhの値は、例えば、100〜400程度に設定することができる。 As another example of the window function, Nc <Nh,
When n ≦ Nc, w (n) = 0
When n ≧ Nh, w (n) = 0
Otherwise, w (n) = 1
As described above, the window function may be defined so that the weight of the region where n is large in addition to the region where n is small is also small. The reason for excluding the region where n is large is that the peak frequency of the notification sound has a certain range, and the fine structure of the spectrum that is finer than that is also removed, so that stable notification sound can be detected. This is because it is considered. When the frame length is 32 milliseconds (512 samples) and the Fourier transform window length is 1024 samples, the value of Nh can be set to about 100 to 400, for example.

上記の例では、w(n)=0又はw(n)=1の方形窓を用いたが、例えば、nの値によってゆるやかにw(n)の値が変化する窓関数を定義してもよい。例えば、nが小さい領域とnが大きい領域でw(n)が小さくなるように、三角関数を用いて窓を定義することができる。このような窓関数の代表的なものには、ハミング窓やハニング窓がある。 In the above example, a rectangular window with w (n) = 0 or w (n) = 1 was used, but for example, a window function in which the value of w (n) changes gradually depending on the value of n may be defined. Good. For example, a window can be defined using a trigonometric function so that w (n) becomes small in a region where n is small and a region where n is large. Typical examples of such window functions include a Hamming window and a Hanning window.

［第三実施形態］
この発明の第三実施形態は、第二実施形態の判定部１６の機能構成を具体化した報知音感知装置及び方法である。 [Third embodiment]
The third embodiment of the present invention is a notification sound sensing device and method embodying the functional configuration of the determination unit 16 of the second embodiment.

第三実施形態の判定部１６は、図８に示すように、ピーク検出部１６１、総合判定部１６２及びメモリ部１６３を例えば含み、強調スペクトルCw(k)を入力とし、感知結果aを出力する。 As shown in FIG. 8, the determination unit 16 of the third embodiment includes, for example, a peak detection unit 161, an overall determination unit 162, and a memory unit 163, and receives the enhancement spectrum Cw (k) as an input, and outputs a detection result a. .

以下、第三実施形態の判定部１６の行う処理について、第二実施形態と異なる部分を中心に説明する。 Hereinafter, the process performed by the determination unit 16 according to the third embodiment will be described focusing on differences from the second embodiment.

ピーク検出部１６１は、KL≦k≦KHの範囲の周波数帯域内の各周波数kに対応する強調スペクトルCw(k)の値を閾値Caと比較して、閾値Caを超える値があるときは、ピーク検出結果a0として１以上の値を出力し、閾値Caを超える値がないときは、ピーク検出結果a0として０の値を出力する。ピーク検出結果a0の値は、単にピークの有無を表す値であってもよいし、ピークの個数としてもよい。ピークの有無を表す値とする場合には、強調スペクトルCw(k)の値を調べ、閾値Caを超える値があるときは、ピーク検出結果a0=1を出力し、閾値Caを超える値がないときは、ピーク検出結果a0=0を出力する。ピークの個数を求める方法は第四実施形態において詳述する。ピーク検出結果a0は、総合判定部１６２とメモリ部１６３へ入力される。 The peak detector 161 compares the value of the enhancement spectrum Cw (k) corresponding to each frequency k in the frequency band in the range of KL ≦ k ≦ KH with the threshold value Ca, and when there is a value exceeding the threshold value Ca, A value of 1 or more is output as the peak detection result a0, and when there is no value exceeding the threshold value Ca, a value of 0 is output as the peak detection result a0. The value of the peak detection result a0 may simply be a value indicating the presence or absence of a peak, or may be the number of peaks. When setting the value to indicate the presence or absence of a peak, the value of the emphasized spectrum Cw (k) is examined, and if there is a value exceeding the threshold value Ca, the peak detection result a0 = 1 is output and there is no value exceeding the threshold value Ca The peak detection result a0 = 0 is output. A method for obtaining the number of peaks will be described in detail in the fourth embodiment. The peak detection result a0 is input to the comprehensive determination unit 162 and the memory unit 163.

メモリ部１６３は、あらかじめ定めた時間（フレーム数）にわたってピーク検出結果a0の値を蓄積し、その時系列を総合判定部１６２に送る。フレーム数は存在する可能性のある報知音の長さとフレーム長に基づいた数に設定すればよい。 The memory unit 163 accumulates the value of the peak detection result a0 over a predetermined time (the number of frames), and sends the time series to the general determination unit 162. The number of frames may be set to a number based on the length of the notification sound that may exist and the frame length.

総合判定部１６２は、現在のフレームのピーク検出結果a0の値と、過去のピーク検出結果a0の値の時系列を利用し、あらかじめ定めた規則に基づいて感知すべき報知音が発生したかどうかを示す感知結果aを出力する。あらかじめ定めた規則の例としては、
１．連続でXフレーム以上（例えば、X=10、フレーム長が32ミリ秒のときは320ミリ秒以上連続して）、ピーク検出結果a0が１以上のときに報知音が発生したものと判定する。 The overall judgment unit 162 uses the time series of the peak detection result a0 of the current frame and the value of the past peak detection result a0 to determine whether a notification sound that should be detected based on a predetermined rule has occurred. A sensing result a indicating is output. Examples of predefined rules include
1. It is determined that the notification sound is generated when the X detection is continuously performed for X frames or more (for example, when X = 10 and the frame length is 32 milliseconds for 320 milliseconds or more) and the peak detection result a0 is 1 or more.

または、
２．あらかじめ定めた過去一定時間T内（例えば、過去６秒以内）に、Yフレーム以上（例えば、Y=40）、ピーク検出結果a0が１以上のときに報知音が発生したものと判定する。 Or
2. It is determined that a notification sound has been generated when the peak detection result a0 is 1 or more within Y frames or more (for example, Y = 40) within a predetermined past predetermined time T (for example, within the past 6 seconds).

すなわち、総合判定部１６２は、１つのフレームの周波数分析結果だけで報知音が発生したかどうかを決定しないで、あらかじめ定めた過去一定時間内の分析結果（連続性、頻度）に基づいて報知音が発生したかどうかの総合判定を行う。 That is, the comprehensive determination unit 162 does not determine whether the notification sound is generated based on the frequency analysis result of one frame alone, but based on the analysis result (continuity, frequency) within a predetermined past predetermined time. Comprehensive judgment of whether or not has occurred.

［第四実施形態］
この発明の第四実施形態は、第三実施形態のピーク検出部１６１と総合判定部１６２の機能構成を具体化した報知音感知装置及び方法である。 [Fourth embodiment]
The fourth embodiment of the present invention is a notification sound sensing device and method embodying the functional configurations of the peak detection unit 161 and the comprehensive determination unit 162 of the third embodiment.

第四実施形態のピーク検出部１６１は、図９に示すように、ピーク数検出部１６１１及びピーク周波数検出部１６２２を例えば含み、強調スペクトルCw(k)を入力とし、ピーク数an及びピーク中心周波数akを出力する。 As shown in FIG. 9, the peak detection unit 161 of the fourth embodiment includes a peak number detection unit 1611 and a peak frequency detection unit 1622, for example, and receives the enhanced spectrum Cw (k) as an input, and the peak number an and the peak center frequency. Output ak.

第四実施形態の総合判定部１６２は、図９に示すように、連続音判定部１６２１及び間欠音判定部１６２２を例えば含み、ピーク数anの時系列及びピーク中心周波数akの時系列を入力とし、感知結果aを出力する。 As shown in FIG. 9, the overall determination unit 162 of the fourth embodiment includes, for example, a continuous sound determination unit 1621 and an intermittent sound determination unit 1622, and receives a time series of the peak number an and a time series of the peak center frequency ak. The detection result a is output.

以下、第四実施形態の判定部１６の行う処理について、第三実施形態と異なる部分を中心に説明する。 Hereinafter, the process performed by the determination unit 16 according to the fourth embodiment will be described focusing on differences from the third embodiment.

ピーク数検出部１６１１は、KL≦k≦KHの範囲の周波数帯域内の各周波数kに対応する強調スペクトルCw(k)の値を閾値Caと比較して、閾値Caを超える値があるときは、ピーク数anを求める。ここで、ピーク数とは、閾値を超えたkの個数ではなく、kの小さい（または大きい）方から順に各周波数kに対応する強調スペクトルCw(k)の値を調べ、閾値Caを超えない状態から強調スペクトルCw(k)の値が徐々に増加し、閾値Caを超えた周波数ksから、次に閾値Caを超えない状態になる直前の周波数keまでを１つのピークとしたピークの数である。 The peak number detector 1611 compares the value of the enhanced spectrum Cw (k) corresponding to each frequency k in the frequency band in the range of KL ≦ k ≦ KH with the threshold value Ca, and when there is a value exceeding the threshold value Ca. Find the peak number an. Here, the number of peaks is not the number of k exceeding the threshold, but the value of the enhanced spectrum Cw (k) corresponding to each frequency k is examined in order from the smaller (or larger) k, and the threshold Ca is not exceeded. From the state, the value of the emphasized spectrum Cw (k) gradually increases, and the number of peaks with one peak from the frequency ks exceeding the threshold Ca to the frequency ke immediately before the threshold Ca is not exceeded next. is there.

ピーク周波数検出部１６２２は、KL≦k≦KHの範囲の周波数帯域内の各周波数kに対応する強調スペクトルCw(k)の値を閾値Caと比較して、閾値Caを超える値があるときは、ピーク中心周波数akを求める。ここで、ピーク中心周波数とは、ピーク数検出部１６１１が求めた各ピークにおいて、閾値Caを超えた周波数ksと閾値Caを超えない状態になる直前の周波数keの中間点の周波数、またはks≦k≦keで強調スペクトルCw(k)の値が最大となる周波数である。なお、ピーク数anが２以上の場合は、ピーク中心周波数akは、それぞれのピークに対応して複数個求めるものとする。 The peak frequency detector 1622 compares the value of the enhanced spectrum Cw (k) corresponding to each frequency k in the frequency band in the range of KL ≦ k ≦ KH with the threshold value Ca, and when there is a value exceeding the threshold value Ca. The peak center frequency ak is obtained. Here, the peak center frequency is a frequency at an intermediate point between the frequency ks exceeding the threshold value Ca and the frequency ke immediately before the threshold value Ca is not exceeded at each peak obtained by the peak number detection unit 1611, or ks ≦ This is the frequency at which the value of the enhanced spectrum Cw (k) is maximized when k ≦ ke. When the peak number an is 2 or more, a plurality of peak center frequencies ak are obtained corresponding to each peak.

KL≦k≦KHの範囲に閾値Caを超える強調スペクトルCw(k)の値がないときは、ピーク数anを０とし、ピーク中心周波数akは求めなくてよい。例えば、ソフトウェアの実装上は、ピーク中心周波数akも０としておけばよい。 When there is no value of the enhanced spectrum Cw (k) exceeding the threshold value Ca in the range of KL ≦ k ≦ KH, the peak number an is set to 0 and the peak center frequency ak need not be obtained. For example, the peak center frequency ak may be set to 0 in terms of software implementation.

メモリ部１６３は、あらかじめ定めた時間（フレーム数）にわたってピーク数anとピーク中心周波数akの値を蓄積し、その時系列を総合判定部１６２に送る。 The memory unit 163 accumulates the values of the peak number an and the peak center frequency ak over a predetermined time (number of frames), and sends the time series to the comprehensive determination unit 162.

連続音判定部１６２１は、現在のフレームにおけるピーク数anの値と、過去のフレームにおけるピーク数anの値の時系列を利用し、あらかじめ定めた連続音を検出する規則に基づいて感知すべき報知音が発生したかどうかの感知結果aを出力する。連続音とは、特定の周波数にパワーが集中する音が連続する報知音である。 The continuous sound determination unit 1621 uses the time series of the value of the peak number an in the current frame and the value of the peak number an in the past frame to notify that the continuous sound determination unit 1621 should sense based on a predetermined continuous sound detection rule. Outputs a detection result a indicating whether or not sound is generated. A continuous sound is a notification sound in which a sound whose power is concentrated at a specific frequency continues.

間欠音判定部１６２２は、現在のフレームにおけるピーク数anの値及びピーク中心周波数akの値と、過去のフレームにおけるピーク数anの時系列及びピーク中心周波数akの値の時系列を利用し、あらかじめ定めた間欠音を検出する規則に基づいて感知すべき報知音が発生したかどうかの感知結果aを出力する。間欠音とは、特定の周波数にパワーが集中する音が間欠的に継続する報知音である。 The intermittent sound determination unit 1622 uses the value of the peak number an and the peak center frequency ak in the current frame and the time series of the peak number an and the value of the peak center frequency ak in the past frame in advance. A detection result a indicating whether a notification sound to be detected is generated based on a rule for detecting the intermittent sound is output. An intermittent sound is a notification sound in which a power-concentrated sound is intermittently continued at a specific frequency.

総合判定部１６２は、連続音判定部１６２１と間欠音判定部１６２２を並列して両方動作させてもよいし、まず連続音判定部１６２１を動作させ、報知音が発生したと判定されなかったときに間欠音判定部１６２２を動作させる構成でもよい。 The overall determination unit 162 may operate both the continuous sound determination unit 1621 and the intermittent sound determination unit 1622 in parallel, or when the continuous sound determination unit 1621 is first operated and it is not determined that the notification sound has occurred. Alternatively, the intermittent sound determination unit 1622 may be operated.

総合判定部１６２は、連続音判定部１６２１と間欠音判定部１６２２の少なくとも一方で報知音が発生したと判定された場合には、報知音の発生を示す感知結果を出力する。また、連続音判定部１６２１と間欠音判定部１６２２のいずれでも報知音が発生したと判定されなかった場合には、報知音の不感知を示す感知結果を出力する。 If it is determined that at least one of the continuous sound determination unit 1621 and the intermittent sound determination unit 1622 has generated the notification sound, the overall determination unit 162 outputs a sensing result indicating the generation of the notification sound. Further, when it is determined that neither the continuous sound determination unit 1621 nor the intermittent sound determination unit 1622 determines that the notification sound has occurred, a detection result indicating that the notification sound is not detected is output.

連続音と間欠音で検出する規則を異なるものに設定するのは、報知音をより誤りなく検出するためである。 The reason why the detection rules for the continuous sound and the intermittent sound are set differently is to detect the notification sound without any error.

連続音の例としては、「ピー」という同じ音が一定時間継続する報知音（例えば、洗濯機、電子レンジなどの動作終了音など）のほか、「ピーンポーン」という音程が変わるインターフォンの呼び出し音、「ピューピュー」と音程が連続的に変わる火災警報音、「トゥルルルル」や「チリリリン」といった複雑な音質の電話の着信音などを想定する。これらの音は、必ずしも同じ音が継続せず、音程が変化する音であっても、特定の周波数にパワーが集中する音が継続することによって、人は報知音であると認識する。なお、音程が変化するとは、パワーが集中する周波数やそのピークの数が時間変動することをいう。 Examples of continuous sounds include a notification sound (for example, an operation end sound of a washing machine, a microwave oven, etc.) that has the same sound "Pee" for a certain period of time, an interphone ring tone that changes the pitch of "Pean Pawn", A fire alarm sound whose pitch continuously changes from “Pew Pew”, a ring tone of a telephone with a complicated sound quality such as “Tururu Lulu” or “Chiri Lillin” is assumed. Even if these sounds do not necessarily continue the same sound, and the sound changes in pitch, a person recognizes that the sound is a notification sound by continuing the sound in which power is concentrated at a specific frequency. Note that the change in pitch means that the frequency at which power is concentrated and the number of peaks change over time.

間欠音の例としては、「ピピッ、ピピッ、」という目覚まし時計、キッチンタイマー、携帯電話の着信音などを想定する。これらは、音の鳴り始めから鳴り終わりまでの時間は一定以上継続しているが、一回一回の音が出ている時間はごく短時間である。しかし、同じ音が規則正しく間欠的に鳴ることによって、人は連続音と同様にそれを報知音であると認識する。 As an example of the intermittent sound, an alarm clock, a kitchen timer, a ringtone of a mobile phone, etc. are assumed. In these cases, the time from the beginning of the sound to the end of the sound continues for a certain time or more, but the time when the sound is emitted once is very short. However, when the same sound sounds regularly and intermittently, a person recognizes it as a notification sound as well as a continuous sound.

第三実施形態の方法において、連続音と間欠音を同じ規則で検出しようとすると、「あらかじめ定めた過去一定時間T内に、Yフレーム以上にわたってピーク数a0が１以上のときに報知音が発生したものと判定する」という規則を定める際のTとYの値を、間欠音が検出できる値に設定しなければならない、つまりTの値（すなわち、Tに対応するフレーム数）に対してYの値を十分に小さく設定しなければならない。しかしながら、このように設定すると、報知音が鳴っていないのに、日常生活音を報知音と誤って判断し、誤った感知結果を出力してしまうことがある。特に、食器やテレビ番組の音を報知音と誤る場合が多くなる。これを防ぐために、同じ音が規則正しく間欠的に鳴っているかどうかを判断する必要がある。 In the method of the third embodiment, when trying to detect a continuous sound and an intermittent sound with the same rule, “a notification sound is generated when the number of peaks a0 is 1 or more over a Y frame or more within a predetermined past predetermined time T. The value of T and Y when setting the rule “determined to have been determined” must be set to a value at which intermittent sounds can be detected, that is, Y with respect to the value of T (that is, the number of frames corresponding to T) The value of must be set small enough. However, if the setting is made in this manner, the daily sound may be erroneously determined as the notification sound and the erroneous detection result may be output even though the notification sound is not sounding. In particular, tableware and television program sounds are often mistaken as notification sounds. In order to prevent this, it is necessary to determine whether or not the same sound is regularly and intermittently sounding.

逆に、連続音の検出では、同じ音が連続して鳴っているという制限をつけると、インターフォンや火災警報音のように音程が変化する音を検出できなくなることから、連続音は音程が変化しても、とにかく特定の周波数にパワーが集中した音が連続して鳴っていれば報知音と判断してよいこととする。 On the other hand, in continuous sound detection, if the restriction that the same sound is continuously played is applied, it will not be possible to detect a sound whose pitch changes like an intercom or fire alarm sound. However, anyway, if a sound whose power is concentrated at a specific frequency is continuously sounding, it may be determined as a notification sound.

連続音を検出する規則の例としては、
１．連続でXフレーム以上（例えば、X=10、フレーム長が32ミリ秒のときは320ミリ秒以上連続して）、ピークの数anが１以上のときに報知音が発生したものと判定する。 An example of a rule for detecting continuous sounds is:
1. It is determined that the notification sound is generated when the number of peaks an is equal to or greater than X frames continuously (for example, when X = 10 and the frame length is equal to or greater than 320 milliseconds when the frame length is 32 milliseconds).

または、
２．あらかじめ定めた過去一定時間Tc内（例えば、過去１秒以内）に、Ycフレーム以上（例えば、Yc=20）、ピークの数anが１以上のときに報知音が発生したものと判定する。 Or
2. It is determined that a notification sound is generated when the number of peaks an is 1 or more within Yc frames (for example, Yc = 20) within a predetermined past fixed time Tc (for example, within the past 1 second).

すなわち、連続音判定部１６２１は、１つのフレームの周波数分析結果だけで連続音が発生したかどうかを決定しないで、あらかじめ定めた過去一定時間内の分析結果（連続性、頻度）に基づいて連続音が発生したかどうかの判定を行う。 That is, the continuous sound determination unit 1621 does not determine whether a continuous sound is generated based on the frequency analysis result of one frame alone, but continuously based on analysis results (continuity, frequency) within a predetermined past predetermined time. Judgment is made as to whether or not sound has occurred.

間欠音を検出する規則の例としては、
１．あらかじめ定めた過去一定時間Tb内（例えば、過去６秒以内）に、Ybフレーム以上（例えば、Yb=10）、ピークの数an及びピークの中心周波数akが０でない同一の値であるフレームがあるときに報知音が発生したものと判定する。 Examples of rules for detecting intermittent sounds include
1. Within a predetermined past predetermined time Tb (for example, within the past 6 seconds), there are frames having Yb frames or more (for example, Yb = 10), the number of peaks an and the peak center frequency ak being the same value other than 0. Sometimes it is determined that a notification sound has occurred.

または、
２．あらかじめ定めた過去一定時間Tb内（例えば、過去６秒以内）で、ピークの数an及びピークの中心周波数akが０でない同一の値、または値の差があらかじめ定めた許容差以内（以下、総称して同一と呼ぶ）のフレームの組を抽出し、当該フレーム間の時間差（フレーム番号の差）があらかじめ定めた所定の値以下の場合には、当該フレーム間は連続して同一の音が鳴っているとみなした上で、上述した連続音を検出する規則を適用して報知音の発生を判定する。前記「当該フレーム間は連続して同一の音が鳴っているとみなした上」とは、例えば、当該フレーム間のすべてのフレームのピークの数an及びピークの中心周波数akを、当該フレームのピークの数an及びピークの中心周波数akで置き換える処理としてもよい。または、実際にピークの数anが１以上のフレームは当該フレームの２つだけであるが、前記連続音を検出する規則のXおよびYcを数えるときに、当該フレームの数に前記フレーム番号の差を加える、厳密には当該フレームの数に前記フレーム番号の差を加えて１減ずる（１フレーム重複するため）処理としてもよい。なお、連続音を検出する規則を適用する際の閾値X、Tc、Ycは、連続音判定部１６２１のそれらとは異なる値を設定することができる。 Or
2. Within a predetermined past predetermined time Tb (for example, within the past 6 seconds), the number of peaks an and the peak center frequency ak are not the same value, or the difference between the values is within a predetermined tolerance (hereinafter, generic name) If the time difference between the frames (frame number difference) is less than or equal to a predetermined value, the same sound is played continuously between the frames. The generation of the notification sound is determined by applying the above-described rule for detecting continuous sounds. The above “once it is assumed that the same sound is continuously played between the frames” means, for example, the number of peaks an and the center frequency ak of the peaks of all frames between the frames. The processing may be replaced with the number an and the center frequency ak of the peak. Or, there are actually only two of the frames in which the number of peaks an is 1 or more, but when counting X and Yc of the rule for detecting the continuous sound, the difference of the frame number to the number of the frames Strictly speaking, it may be a process of adding 1 to the number of the frames and subtracting 1 by 1 (to overlap one frame). Note that the thresholds X, Tc, and Yc when applying the rule for detecting continuous sounds can be set to values different from those of the continuous sound determination unit 1621.

すなわち、間欠音判定部１６２２は、１つのフレームの周波数分析結果だけで間欠音が発生したかどうかを決定しないで、あらかじめ定めた過去一定時間内の分析結果（連続性、頻度）に基づいて間欠音が発生したかどうかの判定を行う。 That is, the intermittent sound determination unit 1622 does not determine whether or not an intermittent sound has occurred based on the frequency analysis result of one frame, but intermittently based on analysis results (continuity and frequency) within a predetermined past predetermined time. Judgment is made as to whether or not sound has occurred.

第三実施形態と第四実施形態の違いについて、より詳しく説明する。図１０に（Ａ）音程の変化する連続音、（Ｂ）間欠音、（Ｃ）報知音以外の音（生活雑音等）のピークの中心周波数を例示する。これらはピークの数anが１の例であり、横軸が時間、縦軸がピークの中心周波数を示す。ピークの数anが２以上のときは、同じ時刻にピークの中心周波数を示す線（図中の横線）が２本以上あることになるが、ここでは説明を簡単にするためにanが１の例のみを説明する。ピークの中心周波数を示す線が無い時間帯は、ピークを持たないことを示す。第三実施形態においてピーク検出結果a0をピークの有無を表す値（a0=1）とした場合には、図１０の例では、（Ａ）（Ｂ）で報知音が発生したと判定しようとすると、（Ｃ）でも報知音が発生したと判定される。これは、一定時間T内において、a0=1のフレーム数が（Ｂ）よりも（Ｃ）のほうが多いためである。図１１は、第四実施形態における間欠音判定部１６２２の判定規則を概念的に示したもので、（Ｂ）は（Ｂ’）のような連続音とみなす。第四実施形態では、連続音判定部１６２１により（Ａ）のみが報知音と判定され、間欠音判定部１６２２により（Ｂ）が報知音と判定される。したがって、（Ａ）（Ｂ）の場合に報知音が発生したと判定され、（Ｃ）は報知音と判定されない。 The difference between the third embodiment and the fourth embodiment will be described in more detail. FIG. 10 illustrates the center frequency of peaks of (A) continuous sounds whose pitches change, (B) intermittent sounds, and (C) sounds other than notification sounds (life noise, etc.). These are examples in which the number of peaks an is 1, the horizontal axis represents time, and the vertical axis represents the peak center frequency. When the number of peaks an is 2 or more, there are two or more lines (horizontal lines in the figure) indicating the peak center frequency at the same time. Here, for simplicity, an is 1 Only an example will be described. A time zone without a line indicating the peak center frequency indicates that there is no peak. In the third embodiment, when the peak detection result a0 is a value indicating the presence or absence of a peak (a0 = 1), in the example of FIG. 10, when it is determined that a notification sound has occurred in (A) and (B). , (C), it is determined that a notification sound has occurred. This is because within a certain time T, the number of frames with a0 = 1 is larger in (C) than in (B). FIG. 11 conceptually shows the determination rule of the intermittent sound determination unit 1622 in the fourth embodiment, and (B) is regarded as a continuous sound such as (B ′). In the fourth embodiment, only the continuous sound determination unit 1621 determines (A) as the notification sound, and the intermittent sound determination unit 1622 determines (B) as the notification sound. Therefore, in the case of (A) and (B), it is determined that the notification sound is generated, and (C) is not determined as the notification sound.

［第五実施形態］
この発明の第五実施形態は、第四実施形態のピーク検出部１６１および総合判定部１６２の変形例である。 [Fifth embodiment]
The fifth embodiment of the present invention is a modification of the peak detection unit 161 and the comprehensive determination unit 162 of the fourth embodiment.

一般に、報知音は1.0kHz以上の高い周波数の音である。そこで、第四実施形態のピーク検出部１６１において、k<KLの範囲に閾値Caを超える強調スペクトルCw(k)の値があるかどうかを更に調べ、当該範囲に閾値を超える値があったフレームは、判定対象から除外する。判定対象から除外するとは、例えば当該フレームのピークの数anを０とする。 In general, the notification sound is a high frequency sound of 1.0 kHz or more. Therefore, in the peak detection unit 161 of the fourth embodiment, it is further checked whether or not there is a value of the enhanced spectrum Cw (k) that exceeds the threshold value Ca in the range of k <KL, and a frame that has a value that exceeds the threshold value in the range. Is excluded from the determination target. To exclude from the determination target, for example, the number of peaks an in the frame is set to 0.

これによって、報知音でない音を報知音と誤感知する確率がより低下する。 As a result, the probability that a sound that is not a notification sound is mistakenly detected as a notification sound is further reduced.

この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and it goes without saying that modifications can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment may be executed not only in time series according to the order of description, but also in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１報知音感知装置
１０ケプストラム計算部
１１第一ＦＦＴ部
１２対数パワースペクトル計算部
１３逆ＦＦＴ部
１４窓かけ部
１５第二ＦＦＴ部
１６判定部
１６１ピーク検出部
１６１１ピーク数検出部
１６１２ピーク周波数検出部
１６２総合判定部
１６２１連続音判定部
１６２２間欠音判定部
１６３メモリ部 DESCRIPTION OF SYMBOLS 1 Notification sound detection apparatus 10 Cepstrum calculation part 11 1st FFT part 12 Logarithmic power spectrum calculation part 13 Inverse FFT part 14 Windowing part 15 Second FFT part 16 Determination part 161 Peak detection part 1611 Peak number detection part 1612 Peak frequency detection part 162 General determination unit 1621 Continuous sound determination unit 1622 Intermittent sound determination unit 163 Memory unit

Claims

A first FFT unit that converts an input acoustic signal into a frequency spectrum;
A logarithmic power spectrum calculator for calculating a logarithmic power spectrum from the frequency spectrum;
An inverse FFT unit for converting the logarithmic power spectrum into a cepstrum;
A windowing unit that generates a weighted cepstrum weighted to the cepstrum using a predetermined window function;
A second FFT unit for generating an enhanced spectrum obtained by converting the weighted cepstrum into a frequency domain;
A determination unit that outputs a sensing result indicating sensing of the notification sound by comparing the value of the enhanced spectrum corresponding to each frequency within a predetermined frequency band with a predetermined threshold;
Including
The determination unit is
The value of the enhanced spectrum corresponding to each frequency in the frequency band is compared with the threshold value in ascending or descending order of frequency, and the frequency from the frequency exceeding the threshold to the frequency immediately before the threshold value is not exceeded is 1 A peak number detector for obtaining the number of peaks as one peak;
A peak frequency detection unit for obtaining a peak center frequency from a frequency between a frequency that exceeds the threshold for the peak and a frequency immediately before the threshold is not exceeded, and
A memory unit for accumulating the number of peaks and the peak center frequency and outputting a time series of the number of peaks and the peak center frequency;
A continuous sound determination unit that verifies the time series of the peak number and the peak center frequency, and determines whether or not a continuous sound in which the power concentrates at a specific frequency is generated;
An intermittent sound determination unit that verifies the time series of the peak number and the peak center frequency, and determines whether or not an intermittent sound in which power concentrates at a specific frequency is intermittently generated;
Including
The intermittent sound determination unit is
When the number of peaks an and the peak center frequency ak are the same value except for 0 in a predetermined number of frames within a predetermined time,
Or
If the same number of frames are extracted with the peak number an and the peak center frequency ak being other than 0 within a predetermined time, and the time difference between the frames is equal to or less than a predetermined value, the frames are continuously transmitted over a predetermined number of frames. When the peak number an is 1 or more,
Or
When the same number of frames are extracted with the peak number an and the peak center frequency ak being other than 0 within a predetermined time, and the time difference between the frames is equal to or less than a predetermined value, the predetermined number or more is exceeded within the predetermined time. When the number of peaks an is 1 or more in a frame,
A notification sound sensing device for determining that the intermittent sound has occurred.

The notification sound sensing device according to claim 1,
The continuous sound determination unit
When the peak number an is 1 or more continuously in a predetermined number of frames or more,
Or
When the peak number an is 1 or more in a predetermined number of frames within a predetermined time,
It is determined that the above continuous sound has occurred.
Notification sound sensing device.

The notification sound sensing device according to claim 1 or 2,
The determination unit determines whether the intermittent sound has occurred in the intermittent sound determination unit and outputs the sensing result when the continuous sound determination unit does not determine that the continuous sound has occurred. A comprehensive judgment unit
Notification sound sensing device.

The notification sound sensing device according to claim 3 ,
The said comprehensive determination part is excluded from the object of verification when the value of the said enhancement spectrum exceeds the said threshold value in the frequency outside the range of the said frequency band.

A first FFT step of converting the input acoustic signal into a frequency spectrum;
A logarithmic power spectrum calculating step for calculating a logarithmic power spectrum from the frequency spectrum;
An inverse FFT step of converting the log power spectrum into a cepstrum;
A windowing step for generating a weighted cepstrum weighted to the cepstrum using a predetermined window function;
A second FFT step for generating an enhanced spectrum obtained by converting the weighted cepstrum into a frequency domain;
A determination step of comparing the value of the enhanced spectrum corresponding to each frequency within a predetermined frequency band with a predetermined threshold value and outputting a sensing result indicating sensing of the notification sound;
Including
The determination step includes
The value of the enhanced spectrum corresponding to each frequency in the frequency band is compared with the threshold value in ascending or descending order of frequency, and the frequency from the frequency exceeding the threshold to the frequency immediately before the threshold value is not exceeded is 1 A peak number detection step for obtaining the number of peaks as one peak,
A peak frequency detection step for obtaining a peak center frequency from a frequency between a frequency that exceeds the threshold for the peak and a frequency immediately before the threshold is not exceeded, and
A memory step for accumulating the number of peaks and the peak center frequency and outputting a time series of the number of peaks and the peak center frequency;
A continuous sound determination step that verifies the time series of the peak number and the peak center frequency, and determines whether or not a continuous sound in which power concentrates on a specific frequency is generated;
An intermittent sound determination step for verifying the time series of the peak number and the peak center frequency and determining whether or not an intermittent sound in which power concentrates on a specific frequency continues intermittently occurs;
Including
The intermittent sound determination step includes
When the number of peaks an and the peak center frequency ak are the same value except for 0 in a predetermined number of frames within a predetermined time,
Or
If the same number of frames are extracted with the peak number an and the peak center frequency ak being other than 0 within a predetermined time, and the time difference between the frames is equal to or less than a predetermined value, the frames are continuously transmitted over a predetermined number of frames. When the peak number an is 1 or more,
Or
When the same number of frames are extracted with the peak number an and the peak center frequency ak being other than 0 within a predetermined time, and the time difference between the frames is equal to or less than a predetermined value, the predetermined number or more is exceeded within the predetermined time. When the number of peaks an is 1 or more in a frame,
Determine that the intermittent sound has occurred
Notification sound detection method.

Program for causing a computer to function as the notification sound sensing device according to any one of claims 1 to 4.