JP5619529B2

JP5619529B2 - Scream detection device

Info

Publication number: JP5619529B2
Application number: JP2010192964A
Authority: JP
Inventors: 和義福士
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2010-08-30
Filing date: 2010-08-30
Publication date: 2014-11-05
Anticipated expiration: 2030-08-30
Also published as: JP2012048173A

Description

本発明は、事務所や店舗に強盗が押し入った際に、事務所等で発せられる音声を分析し、異常状態を判定する装置に関し、特に、音声の中から悲鳴を検出する悲鳴検知装置に関する。 The present invention relates to an apparatus for analyzing a sound uttered at an office or the like when a burglar enters an office or store to determine an abnormal state, and more particularly to a scream detection apparatus for detecting a scream from the sound.

従来、強盗が押し入ったことを、入力される音声を分析して悲鳴を検知する悲鳴検知装置が提案されている（特許文献１）。 Conventionally, there has been proposed a scream detection device that detects a scream by analyzing an input voice when a burglar is pushed in (Patent Document 1).

特許文献１には、通常の音声に基く周波数幅と、悲鳴に基く高音域の周波数幅を分析し、通常音声に基く周波数幅の音量が低下するとともに、悲鳴音声に基く周波数幅の音量が増加すると異常事態とするものである。 Patent Document 1 analyzes the frequency range based on normal speech and the frequency range of the high frequency range based on scream, and the volume of the frequency range based on normal scream decreases and the volume of the frequency range based on scream increases. Then, it becomes an abnormal situation.

特開平９−２５１５８３号公報Japanese Patent Laid-Open No. 9-251583

しかし、悲鳴においては、必ずしも高音域の周波数と、低音域の周波数の音量がこのような振る舞いをするわけではない。例えば、男声の悲鳴は低音域においても大きな音量が継続し得るために悲鳴として検知されないおそれがある。また例えば、低音域から高音域に推移するサイレン等の人工音は悲鳴として誤検知されるおそれがある。 However, in screams, the volume of the high frequency range and the low frequency range do not necessarily behave in this way. For example, a male scream may not be detected as a scream because a loud sound can continue even in a low frequency range. Further, for example, an artificial sound such as a siren that transitions from a low sound range to a high sound range may be erroneously detected as a scream.

ところで、悲鳴は、人間がとっさに発する音声なので、人間が最も自然に発する声となる。即ち、とっさの際の口の形状から発せられる声であり、その内容を特定するのは困難である。他方、悲鳴を発する状況は、強盗に襲われたときや恐ろしいものを見たときなどで、「わあー」「きやあー」「あわわあー」のように語尾の母音部分が長音化することが多い。また、母音の中でも悲鳴の語尾に発せられるのは特に「あ」「え」「お」が多いことを、発明者が知覚した。 By the way, since screams are voices that humans utterly, they are the voices that humans utter most naturally. That is, it is a voice uttered from the shape of the mouth at the moment, and it is difficult to specify the content. On the other hand, a screaming situation occurs when a robber is attacked or when a terrible thing is seen, and the vowel part of the ending part becomes longer as `` Wow '', `` Kiyaa '', `` Awawaa '' Many. Moreover, the inventor perceived that “a”, “e”, and “o” are particularly pronounced at the end of a scream among vowels.

そこで、本発明は、悲鳴の語尾が母音を所定時間継続した音声であるという特質を利用して、様々な環境音や会話音声を含む監視空間において、悲鳴の検知精度を向上させた悲鳴検知装置を実現することを目的とする。 Therefore, the present invention uses a characteristic that the scream ending is a voice that continues a vowel for a predetermined time, and in a monitoring space including various environmental sounds and conversational voices, a scream detection device with improved scream detection accuracy It aims at realizing.

かかる課題を解決するために本発明は、監視空間にて音響を集音するマイク部と、マイク部が集音した音響から音声の母音部分を抽出する母音抽出部と、抽出した母音部分が、所定の悲鳴音量以上であり、且つ所定時間以上継続していると悲鳴であると判定する悲鳴判定部と、悲鳴判定部が悲鳴と判定すると異常信号を出力する異常出力部と、を有することを特徴とした悲鳴検知装置を提供する。 In order to solve this problem, the present invention provides a microphone unit that collects sound in a monitoring space, a vowel extraction unit that extracts a vowel part of speech from the sound collected by the microphone unit, and the extracted vowel part includes: A scream determination unit that determines that the scream is greater than or equal to a predetermined scream volume and continues for a predetermined time or more, and an abnormal output unit that outputs an abnormal signal when the scream determination unit determines that the scream A characteristic scream detection device is provided.

また、かかる悲鳴検知装置において、母音抽出部は、「あ」に相当する音響を抽出することが好ましい。 In the scream detection device, it is preferable that the vowel extraction unit extracts the sound corresponding to “A”.

また、かかる悲鳴検知装置において、母音抽出部は、「え」に相当する音響を抽出することが好ましい。 In the scream detection device, it is preferable that the vowel extraction unit extracts the sound corresponding to “e”.

また、かかる悲鳴検知装置において、母音抽出部は、「お」に相当する音響を抽出することが好ましい。 In the scream detection device, it is preferable that the vowel extraction unit extracts sound corresponding to “o”.

また、かかる悲鳴検知装置において、悲鳴判定部は、母音部分の基本周波数又は音量を解析して揺らぎの有無を判定し、揺らぎが無いときは悲鳴と判定しないことが好ましい。 In the scream detection device, the scream determination unit preferably analyzes the fundamental frequency or volume of the vowel part to determine the presence or absence of fluctuation, and preferably does not determine scream when there is no fluctuation.

また、かかる悲鳴検知装置において、母音抽出部は、さらに母音部分の母音種別を判別し、悲鳴判定部は、母音種別が同一の母音部分が所定時間以上継続していると悲鳴であると判定することが好ましい。 In the scream detection device, the vowel extraction unit further determines the vowel type of the vowel part, and the scream determination unit determines that the vowel part is the scream if the vowel part having the same vowel type continues for a predetermined time or more. It is preferable.

また、かかる悲鳴検知装置において、悲鳴判定部は、さらに集音した音響から当該音響がクリッピングし得る音量を有する音量過大部分を検出し、母音部分と連続する音量過大部分を当該母音部分の一部とみなして判定を行なうことが好ましい。 Further, in the scream detection device, the scream determination unit further detects an excessive volume portion having a volume that can be clipped from the collected sound, and an excessive volume portion continuous with the vowel portion is a part of the vowel portion. It is preferable to make a determination by regarding

かかる構成によれば、様々な環境音や会話音声を含む監視空間の音響から、母音部分が所定以上継続し、母音部分の音量が大きいことを検出すると悲鳴を検知できる。つまり、悲鳴を定義する言葉情報や悲鳴が呈する複数種類の周波数の相関的なレベル変動の観察をすることなく、精度よく悲鳴を検知することができる。 According to such a configuration, screaming can be detected when it is detected from the sound of the monitoring space including various environmental sounds and conversational sounds that the vowel part continues for a predetermined period or more and the volume of the vowel part is high. That is, the scream can be detected with high accuracy without observing the word information defining the scream and the relative level fluctuations of multiple types of frequencies presented by the scream.

本発明に係る悲鳴検知装置を含んだ監視装置を説明する図である。It is a figure explaining the monitoring apparatus containing the scream detection apparatus which concerns on this invention. 悲鳴検知装置の構成を説明する図である。It is a figure explaining the structure of a scream detection apparatus. 悲鳴検知装置の悲鳴検知処理を説明するフローチャートである。It is a flowchart explaining the scream detection process of a scream detection apparatus.

以下、本発明に係る悲鳴検知装置を含んだ監視装置の一例について図を参照しつつ説明する。
この監視装置は、金品等の重要物が保管された金庫（保管庫）が設置された事務所内或いは店舗内の一室を監視空間とし、監視空間にて集音した音響信号を処理して、押込み強盗の発生時に賊に脅された従業員が発する悲鳴を検知して、警備センターへ異常信号を送信する。 Hereinafter, an example of a monitoring device including a scream detection device according to the present invention will be described with reference to the drawings.
This monitoring device uses a room in an office or a store where a safe (storage) where important items such as gold items are stored as a monitoring space, and processes an acoustic signal collected in the monitoring space, It detects a scream from an employee who was threatened by a bandit when an intrusion robbery occurred, and sends an abnormal signal to the security center.

［監視装置１の構成］
図１に監視装置１の全体構成を示す。監視装置１は、悲鳴検知装置２、警備センター装置５、警報装置６等がコントローラ３に接続されて構成される。 [Configuration of Monitoring Device 1]
FIG. 1 shows the overall configuration of the monitoring device 1. The monitoring device 1 is configured by connecting a scream detection device 2, a security center device 5, an alarm device 6 and the like to a controller 3.

悲鳴検知装置２は、金庫１０が設置された部屋内に設置され、金庫１０の周辺にて賊に脅された従業員等が発する悲鳴を検知する。悲鳴検知装置２は、同室或いは別室に設置されたコントローラ３と接続され、悲鳴を検知するとコントローラ３へ異常信号を出力する。 The scream detection device 2 is installed in a room where the safe 10 is installed, and detects screams generated by employees who are threatened by bandits around the safe 10. The scream detection device 2 is connected to a controller 3 installed in the same room or a separate room, and outputs an abnormal signal to the controller 3 when it detects a scream.

コントローラ３は、電話回線又はインターネット回線等の広域網４を介して警備センター装置５と接続され、悲鳴検知装置２から異常信号が入力されると、当該異常信号を遠隔地の警備センター内に設置された警備センター装置５へ伝送する。また、コントローラ３には同室又は／及び別室に設置されたパトランプ又はブザー等の警報装置６も接続され、悲鳴検知装置２から異常信号が入力されると、当該警報装置６を動作させて従業員等への報知及び賊への威嚇を行なう。 The controller 3 is connected to the security center device 5 through a wide area network 4 such as a telephone line or an internet line. When an abnormal signal is input from the scream detection device 2, the abnormal signal is installed in a remote security center. Is transmitted to the security center apparatus 5 that has been made. The controller 3 is also connected with an alarm device 6 such as a patrol lamp or a buzzer installed in the same room or / and in a separate room. When an abnormal signal is input from the scream detection device 2, the alarm device 6 is operated to operate the employee. Informs etc. and threatens the bandits.

尚、コントローラ３には、さらに監視カメラ７、録画装置８等が接続されてもよく、これらの機器により悲鳴を検知した時間帯における金庫１０の周辺の映像を撮像し、記録することもできる。 The controller 3 may be further connected with a monitoring camera 7, a recording device 8, and the like, and images of the surroundings of the safe 10 in a time zone when screaming is detected by these devices can be captured and recorded.

以下、図２を参照して悲鳴検知装置２の構成を説明する。 Hereinafter, the configuration of the scream detection device 2 will be described with reference to FIG.

悲鳴検知装置２は、マイク部２０、記憶部２１及び通信部２３が信号処理部２２に接続されてなる。 The scream detection device 2 includes a microphone unit 20, a storage unit 21, and a communication unit 23 connected to a signal processing unit 22.

マイク部２０は、監視空間にて発生した音をデジタル信号（音響信号）に変換する電気回路であり、信号処理部２２に接続される。マイク部２０は、マイクロフォン、増幅器及びＡ／Ｄ変換器などにより構成される。すなわち、マイク部２０は、金庫１０の周辺にて発生した音をマイクロフォンにより集音してアナログ電気信号に変換し、該信号を増幅器により増幅し、増幅された信号をＡ／Ｄ変換器によりデジタルの音響信号に変換して信号処理部２２に出力する。
尚、増幅器は、金庫１０が設置された部屋内で悲鳴が発せられたときに出力される音響信号の音量が７０〜１００ｄＢの範囲内に納まる程度の増幅率に予め設定される。 The microphone unit 20 is an electric circuit that converts sound generated in the monitoring space into a digital signal (acoustic signal), and is connected to the signal processing unit 22. The microphone unit 20 includes a microphone, an amplifier, an A / D converter, and the like. That is, the microphone unit 20 collects sound generated in the vicinity of the safe 10 by a microphone and converts it into an analog electric signal, amplifies the signal by an amplifier, and digitalizes the amplified signal by an A / D converter. And output to the signal processing unit 22.
Note that the amplifier is set in advance so that the volume of the sound signal output when a scream is generated in the room where the safe 10 is installed falls within the range of 70 to 100 dB.

記憶部２１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等のメモリ装置であり、各種プログラムや各種データを記憶し、信号処理部２２との間でこれらの情報を入出力する。各種データには音響信号、母音特徴量２１１が含まれる。 The storage unit 21 is a memory device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), stores various programs and various data, and inputs and outputs these information to and from the signal processing unit 22. Various data includes an acoustic signal and a vowel feature quantity 211.

記憶部２１には音響信号を循環記憶する記憶領域として音響信号格納部２１０が設けられる。悲鳴検知装置２の信号処理部２２は、音響信号格納部２１０にマイク部２０からの音響信号のうち最新の音響信号を予め設定された長さ（例えば１０秒間）だけ時系列に循環記憶する。 The storage unit 21 is provided with an acoustic signal storage unit 210 as a storage area for circulating and storing acoustic signals. The signal processing unit 22 of the scream detection device 2 circulates and stores the latest acoustic signal among the acoustic signals from the microphone unit 20 in the acoustic signal storage unit 210 in a time series for a preset length (for example, 10 seconds).

母音特徴量２１１は予め作成された母音の特徴量のデータであり、後述する母音抽出部２２０において音響信号から母音を抽出するために参照される。母音特徴量２１１は母音の種類ごとに各母音の種類を表す識別子と対応付けて記憶されている。 The vowel feature quantity 211 is vowel feature quantity data created in advance, and is referred to in order to extract a vowel from the acoustic signal in the vowel extraction unit 220 described later. The vowel feature quantity 211 is stored in association with an identifier representing each vowel type for each vowel type.

母音特徴量２１１の元データは、多数の話者から採取した悲鳴の語尾のサンプル音響信号である。事前に、これらのサンプル音響信号のそれぞれから悲鳴の語尾の周波数特徴量を表すスペクトル包絡のパラメータを抽出し、母音の種類ごとに当該パラメータの分布を学習しておく。本例では、スペクトル包絡を表すパラメータとして８次のＬＰＣケプストラム（ＬＰＣ：ＬｉｎｅａｒＰｒｅｄｉｃｔｉｖｅＣｏｄｉｎｇ）、分布としてＧＭＭ（ＧａｕｓｓｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ；混合正規分布）を用いる。また、悲鳴の語尾の母音として「あ」「え」「お」の３種類を用いる。 The original data of the vowel feature quantity 211 is a sample acoustic signal at the end of a scream collected from many speakers. In advance, a spectral envelope parameter representing the frequency feature amount of the scream ending is extracted from each of these sample acoustic signals, and the distribution of the parameter is learned for each type of vowel. In this example, an 8th-order LPC cepstrum (LPC: Linear Predictive Coding) is used as a parameter representing a spectral envelope, and a GMM (Gaussian Mixture Model) is used as a distribution. In addition, three types of “a”, “e”, and “o” are used as vowels at the end of the scream.

すなわち、母音特徴量２１１として、「あ」のＬＰＣケプストラムのＧＭＭ、「え」のＬＰＣケプストラムのＧＭＭ、「お」のＬＰＣケプストラムのＧＭＭのそれぞれが記憶部２１に記憶されている。悲鳴の語尾の周波数特徴量は母音種別が同一であってもそのバリエーションが比較的多いため、このようなバリエーションを的確に表すことのできるＧＭＭは悲鳴の検知に適している。尚、各母音種別の母音特徴量２１１において、さらに男声のＧＭＭと女声のＧＭＭを別々に学習しておくこともできる。 That is, as the vowel feature quantity 211, the GMM of the “PC” LPC cepstrum, the GMM of the “PC” LPC cepstrum, and the GMM of the “PC” LPC cepstrum are stored in the storage unit 21, respectively. Even if the vowel type has the same vowel type, there are relatively many variations of the frequency feature quantity at the end of the scream. Therefore, a GMM that can accurately represent such variations is suitable for detecting a scream. In the vowel feature quantity 211 of each vowel type, a male voice GMM and a female voice GMM can be separately learned.

信号処理部２２は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＭＣＵ（ＭｉｃｒｏＣｏｎｔｒｏｌＵｎｉｔ）等の演算装置と当該演算装置上で動作するプログラムにより構成される。信号処理部２２は、母音抽出部２２０、悲鳴判定部２２１及び異常出力部２２２等を含んでなる。演算装置がプログラムに従い動作することで信号処理部２２、母音抽出部２２０、悲鳴判定部２２１及び異常出力部２２２等の各部として動作する。 The signal processing unit 22 includes an arithmetic device such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), and an MCU (Micro Control Unit), and a program operating on the arithmetic device. The signal processing unit 22 includes a vowel extraction unit 220, a scream determination unit 221, an abnormal output unit 222, and the like. When the arithmetic device operates according to the program, it operates as each unit such as the signal processing unit 22, the vowel extraction unit 220, the scream determination unit 221, and the abnormal output unit 222.

母音抽出部２２０はマイク部２０が集音した音響信号（入力音響信号）から音声の母音部分を抽出して、抽出結果を悲鳴判定部２２１に出力する。すなわち母音抽出部２２０は、入力音響信号の周波数特徴量をフレームごとに算出して各周波数特徴量を母音特徴量２１１と比較し、各フレームが母音の周波数特徴量を有するか否かを判定して判定結果を悲鳴判定部２２１に出力する。 The vowel extraction unit 220 extracts the vowel part of the voice from the acoustic signal (input acoustic signal) collected by the microphone unit 20 and outputs the extraction result to the scream determination unit 221. That is, the vowel extraction unit 220 calculates the frequency feature amount of the input acoustic signal for each frame, compares each frequency feature amount with the vowel feature amount 211, and determines whether each frame has the frequency feature amount of the vowel. The determination result is output to the scream determination unit 221.

好ましくは、母音抽出部２２０は、母音の周波数特徴量を有するフレームが判定されたときに当該フレームの母音種別（「あ」、「え」又は「お」）の情報をも悲鳴判定部２２１に出力する。 Preferably, the vowel extraction unit 220 also sends information on the vowel type (“A”, “E”, or “O”) of the frame to the scream determination unit 221 when a frame having the frequency characteristic amount of the vowel is determined. Output.

母音抽出部２２０が抽出する母音「あ」、「え」又は「お」は、悲鳴の語尾として典型的に発声される母音である。「あ」を語尾とする典型例な悲鳴の例としては「きゃー」「ぎゃー」「わー」「うわー」「あわわー」を挙げることができ、「え」を語尾とする典型例な悲鳴の例としては「助けてー」「止めてー」「助けてくれー」「止めてくれー」を挙げることができ、「お」を語尾とする典型例な悲鳴の例としては「止めろー」を挙げることができる。 The vowels “A”, “E”, and “O” extracted by the vowel extraction unit 220 are vowels that are typically uttered as endings of screams. Examples of typical screams that end with "A" include "Kya", "Gya", "Wow", "Wow", and "Awow". Typical examples of screams that end with "E" For example, you can list "help me", "stop me", "help me", "stop me" as an example of a typical scream that ends with "o" Can be mentioned.

母音抽出部２２０が算出する周波数特徴量は母音特徴量２１１と同種であり、本例では８次のＬＰＣケプストラムである。フレーム長及びフレーム周期には音声分析に適した値が予め設定される。本例では、フレーム長を２０ｍｓ、フレーム周期を１０ｍｓとする。 The frequency feature amount calculated by the vowel extraction unit 220 is the same type as the vowel feature amount 211, and is an eighth-order LPC cepstrum in this example. Values suitable for speech analysis are set in advance for the frame length and frame period. In this example, the frame length is 20 ms and the frame period is 10 ms.

具体的には母音抽出部２２０は、入力音響信号の各フレームの周波数特徴量と各母音の母音特徴量２１１との距離を算出して予め設定された母音判定しきい値と比較し、フレームの周波数特徴量といずれかの母音特徴量２１１の距離が母音判定しきい値以下であれば、当該フレームは母音であり当該母音特徴量２１１と対応する母音種別であると判定する。一方、母音抽出部２２０は、距離が母音判定しきい値以下の母音特徴量２１１がひとつもないフレームは母音ではないと判定する。 Specifically, the vowel extraction unit 220 calculates the distance between the frequency feature amount of each frame of the input acoustic signal and the vowel feature amount 211 of each vowel and compares it with a preset vowel determination threshold value. If the distance between the frequency feature quantity and one of the vowel feature quantities 211 is equal to or smaller than the vowel determination threshold value, the frame is determined to be a vowel and a vowel type corresponding to the vowel feature quantity 211. On the other hand, the vowel extraction unit 220 determines that a frame having no vowel feature quantity 211 whose distance is equal to or smaller than the vowel determination threshold is not a vowel.

母音特徴量２１１がＧＭＭで記憶されている本例において、各母音特徴量２１１とフレームの周波数特徴量の距離Ｄは、数式１で算出される。
In this example in which the vowel feature quantity 211 is stored in the GMM, the distance D between each vowel feature quantity 211 and the frequency feature quantity of the frame is calculated by Equation 1.

但し、Ｋは距離算出対象の母音特徴量２１１を構成している正規分布の数、ｋはそのインデックス番号（ｋ＝１，…，Ｋ）を表しており、ｎは周波数特徴量の次数、ｉはそのインデックス番号（ｉ＝１，…，ｎ）を表している。また、ｘ_ｉは入力音響信号の周波数特徴量を表すベクトルのｉ番目要素、ｍ_ｋ，ｉはｋ番目の正規分布の平均を表す平均ベクトルのｉ番目要素、σ^２ _ｋ，ｉはｋ番目の正規分布におけるベクトルｉ番目要素の分散、ｗ_ｋはｋ番目の正規分布の重み係数、をそれぞれ表している。 Here, K is the number of normal distributions constituting the vowel feature quantity 211 for distance calculation, k is the index number (k = 1,..., K), n is the order of the frequency feature quantity, i Represents the index number (i = 1,..., N). X _i is the i-th element of the vector representing the frequency feature quantity of the input acoustic signal, m _{k, i} is the i-th element of the average vector representing the average of the k-th normal distribution, and σ ² _{k, i} is the k-th element. The variance of the vector i-th element in the normal distribution, w _k represents the weight coefficient of the k-th normal distribution, respectively.

悲鳴判定部２２１は、母音抽出部２２０により抽出された母音部分が、予め設定された悲鳴音量しきい値以上の音量を有し、且つ予め設定された悲鳴判定時間以上継続していると母音部分が悲鳴の語尾であると判定し、悲鳴検知の旨を異常出力部２２２に出力する。 When the vowel part extracted by the vowel extraction unit 220 has a volume equal to or higher than a preset scream volume threshold and continues for a preset scream determination time, the scream determination unit 221 Is the end of the scream, and the effect of scream detection is output to the abnormality output unit 222.

そのために悲鳴判定部２２１は、各フレームに対する母音抽出部２２０の抽出結果を参照するとともに各フレームの音量（パワー）を算出して悲鳴音量しきい値と比較し、母音が抽出され且つ音量が悲鳴音量しきい値以上であるフレームの連続出現数をカウントして連続出現数が悲鳴判定時間以上に達したときに悲鳴を検知する。 For this purpose, the scream determination unit 221 refers to the extraction result of the vowel extraction unit 220 for each frame, calculates the volume (power) of each frame, compares it with the scream volume threshold, extracts the vowel, and screams the volume. The number of consecutive appearances of frames that are equal to or greater than the sound volume threshold is counted, and a scream is detected when the number of consecutive appearances exceeds the scream determination time.

悲鳴音量しきい値及び悲鳴判定時間には悲鳴の語尾の検知に適した値が予め設定される。本例では、悲鳴音量しきい値を７０ｄＢ、悲鳴判定時間を２００ｍｓとする。因みにフレーム周期が１０ｍｓと設定される本例においてフレーム数に換算された悲鳴判定時間は２０フレームとなる。 The scream volume threshold value and the scream determination time are preset with values suitable for detecting the scream ending. In this example, the scream volume threshold is 70 dB, and the scream determination time is 200 ms. Incidentally, in this example in which the frame period is set to 10 ms, the scream determination time converted into the number of frames is 20 frames.

上述したように悲鳴音声全体の発声内容は様々だが、悲鳴の語尾に注目することで抽出すべき発声内容を高々３種類に減少させることができる。これにより発声内容が想定外の悲鳴を検知し損ねる不具合が著しく減少する。 As described above, the utterance contents of the entire screaming voice are various, but the utterance contents to be extracted can be reduced to at most three types by paying attention to the ending of the screaming. As a result, the problem that the utterance content fails to detect an unexpected scream is significantly reduced.

また母音部分を有することにより咳、くしゃみ、クラクション、扉を閉める音、きしみ音、缶等の落下音など、音量が大きく継続時間長の長い悲鳴以外の音を悲鳴と誤検知する不具合を減少させることができる。 Also, by having a vowel part, clogging, sneezing, horning, door closing sound, squeaking noise, falling sound of cans, etc., reduce the number of problems that falsely detect sounds other than screams with high volume and long duration as screams be able to.

また母音部分の音量の条件により通常音量の会話音声を悲鳴と誤検知する不具合を減少させることができる。 In addition, it is possible to reduce the problem of erroneously detecting a normal volume conversation voice as a scream according to the volume condition of the vowel part.

また母音部分の継続時間長の条件により笑い声のような大声の母音を悲鳴と誤検知する不具合を減少させることができる。 Further, it is possible to reduce a problem that a loud vowel such as a laughing voice is erroneously detected as a scream according to the condition of the duration time of the vowel part.

ここで大声での会話音声の語中において母音が連続する区間が含まれると、悲鳴と誤検知する可能性がある。そこで悲鳴判定部２２１は、母音抽出部２２０が抽出した母音種別を参照して同一母音が継続している区間のそれぞれを上記母音部分（判定区間）として悲鳴か否かを判定する。つまり母音種別が異なる区間は互いに異なる母音部分として悲鳴か否かが判定される。 Here, if a section in which a vowel is continuous is included in a word of loud conversational speech, there is a possibility of erroneous detection as a scream. Therefore, the scream determination unit 221 refers to the vowel type extracted by the vowel extraction unit 220 and determines whether each section in which the same vowel continues is used as the vowel part (determination section). That is, it is determined whether sections having different vowel types are screamed as different vowel parts.

これにより、例えば大声会話音声の語中に「あ」と「お」が連続する「あお」という区間が含まれ、「あお」の区間の長さが悲鳴判定時間に達していたとしても、「あ」の区間と「お」の区間が別々の判定区間となるので、この大声会話音声を悲鳴と誤検知する不具合を減少させることができる。 Thus, for example, even if a section of “Ao” in which “A” and “O” are consecutive is included in a word of a loud conversation voice, and the length of the section of “AO” reaches the scream determination time, Since the “a” section and the “o” section are separate determination sections, it is possible to reduce the problem of erroneously detecting this loud conversation voice as a scream.

また想定外の大きな悲鳴が入力された場合、悲鳴部分の一部の音響信号が飽和（いわゆるクリッピング）して周波数特徴量が正しく分析できなくなり、母音部分であるにも拘わらず母音が抽出されない不具合が生じ得る。 In addition, when an unexpected large scream is input, some acoustic signals of the scream part are saturated (so-called clipping) and the frequency feature cannot be analyzed correctly, and vowels are not extracted even though it is a vowel part. Can occur.

そこで悲鳴判定部２２１は、音響信号の音量を予め設定された過大音量しきい値と比較して音量が過大音量しきい値を超えている音量過大部分を検出し、上記母音部分と連続している音量過大部分を母音抽出部２２０の抽出結果に依らず母音部分の一部として悲鳴か否かを判定する。過大音量しきい値は、悲鳴音量しきい値よりも高くクリッピング発生のおそれがある音量値に設定され、例えば１００ｄＢと設定する。 Therefore, the scream determination unit 221 detects the excessive volume portion where the volume exceeds the excessive sound volume threshold value by comparing the volume of the acoustic signal with a preset excessive sound volume threshold value, and continues to the vowel portion. It is determined whether or not the excessive volume part is screamed as a part of the vowel part regardless of the extraction result of the vowel extraction unit 220. The excessive volume threshold value is set to a volume value that is higher than the scream volume threshold value and may cause clipping, and is set to 100 dB, for example.

これにより悲鳴区間内に断続的なクリッピングが生じていても、クリッピングが原因で悲鳴を検知し損ねる不具合を防ぐことができる。 As a result, even if intermittent clipping occurs in the scream section, it is possible to prevent a problem that the scream is missed due to clipping.

尚、クリッピングの原因がマイクへの悪戯など悲鳴以外である場合もあるため、悲鳴判定部２２１は、音量過大部分の継続時間長が予め設定された音割れ異常判定時間を超える場合は音割れ異常の旨を異常出力部２２２に出力する。 Note that since the cause of clipping may be other than screaming such as a mischief to the microphone, the screaming determination unit 221 causes the sound cracking abnormality if the duration of the excessive volume exceeds the preset sound cracking abnormality determination time. To the abnormality output unit 222.

音割れ異常判定時間は、悲鳴判定時間より短い時間長が設定される。例えば、悲鳴判定時間が２００ｍｓに設定される本例において、音割れ異常判定時間を１００ｍｓに設定することができる。 The sound crack abnormality determination time is set to be shorter than the scream determination time. For example, in this example in which the scream determination time is set to 200 ms, the sound cracking abnormality determination time can be set to 100 ms.

また多様な雑音の中には偶然的に母音と類似した周波数特徴を有する人工音が存在する可能性がある。そこで悲鳴判定部２２１は、人工音を肉声である悲鳴と区別するために、さらに母音部分の基本周波数（ピッチ）又は音量を解析して揺らぎの有無を判定し、揺らぎが有るときに悲鳴と判定し、揺らぎが無いときは悲鳴と判定しない。 In addition, among various noises, there may be an artificial sound having a frequency characteristic similar to a vowel by chance. Therefore, the scream determining unit 221 further analyzes the basic frequency (pitch) or volume of the vowel part to determine whether the artificial sound is screamed as a real voice, and determines whether or not there is a fluctuation. However, when there is no fluctuation, it is not judged as a scream.

具体的には悲鳴判定部２２１は、音響信号の基本周波数をフレームごとに算出し、母音部分の基本周波数の分散値を算出して予め設定された揺らぎ判定しきい値と比較し、分散値が揺らぎしきい値以上であれば揺らぎあり、そうでなければ揺らぎなしを判定する。 Specifically, the scream determination unit 221 calculates the fundamental frequency of the acoustic signal for each frame, calculates the dispersion value of the fundamental frequency of the vowel part, and compares it with a preset fluctuation determination threshold value. If it is equal to or greater than the fluctuation threshold, it is judged that there is fluctuation, and if not, it is judged that there is no fluctuation.

異常出力部２２２は、悲鳴判定部２２１が悲鳴と判定すると、悲鳴検知の旨を表す異常信号を生成して通信部２３に出力する。また異常出力部２２２は、悲鳴判定部２２１が音割れ異常を判定すると、音割れ異常発生の旨を表す異常信号を生成して通信部２３に出力する。 When the scream determining unit 221 determines that the scream is detected, the abnormal output unit 222 generates an abnormal signal indicating the detection of the scream and outputs the abnormal signal to the communication unit 23. In addition, when the scream determination unit 221 determines a sound cracking abnormality, the abnormality output unit 222 generates an abnormality signal indicating that the sound cracking abnormality has occurred and outputs the abnormality signal to the communication unit 23.

通信部２３は、コントローラ３との通信を行なう電気回路であり、コントローラ３と所定の通信線にて接続され、異常出力部２２２から異常信号が入力されると当該信号をコントローラ３に伝送する。 The communication unit 23 is an electric circuit that communicates with the controller 3. The communication unit 23 is connected to the controller 3 through a predetermined communication line. When an abnormal signal is input from the abnormal output unit 222, the communication unit 23 transmits the signal to the controller 3.

［監視装置１の動作］
以下、悲鳴検知装置２の動作を中心に監視装置１の動作を説明する。 [Operation of monitoring device 1]
Hereinafter, the operation of the monitoring device 1 will be described focusing on the operation of the scream detection device 2.

電源が投入されると、各部が初期化されて動作を始める。以降、悲鳴検知装置２においては、マイク部２０が金庫１０の周辺の音を音響信号に変換して信号処理部２２に出力し、信号処理部２２がマイク部２０からの音響信号を音響信号格納部２１０に循環記憶させる、という動作が、悲鳴検知処理とは独立して随時繰り返される。
また初期化において信号処理部２２は後述する悲鳴カウンタの値及び音割れカウンタの値をそれぞれ０に初期化する。 When the power is turned on, each unit is initialized and starts operating. Thereafter, in the scream detection device 2, the microphone unit 20 converts the sound around the safe 10 into an acoustic signal and outputs it to the signal processing unit 22, and the signal processing unit 22 stores the acoustic signal from the microphone unit 20 as an acoustic signal. The operation of circulating and storing in the unit 210 is repeated as needed independently of the scream detection process.
Further, in initialization, the signal processing unit 22 initializes a value of a scream counter and a value of a sound breaking counter described later to 0, respectively.

以下、図３のフローチャートを参照して、悲鳴検知装置２にて行なわれる悲鳴検知処理を説明する。 Hereinafter, the scream detection process performed by the scream detection device 2 will be described with reference to the flowchart of FIG.

まず、信号処理部２２はフレーム周期が到来したか否かを確認する（Ｓ１）。すなわち信号処理部２２は音響信号格納部２１０にフレーム周期の長さの音響信号が新たに追加記憶されたか否かを確認する。フレーム周期が到来していなければ、信号処理部２２は到来まで待機する（Ｓ１にてＮＯ→Ｓ１）。 First, the signal processing unit 22 confirms whether or not the frame period has arrived (S1). That is, the signal processing unit 22 confirms whether or not an acoustic signal having a frame period length is newly stored in the acoustic signal storage unit 210. If the frame period has not arrived, the signal processing unit 22 waits until arrival (NO in S1 → S1).

フレーム周期が到来すると（Ｓ１にてＹＥＳ）、信号処理部２２は、音響信号格納部２１０から最新の音響信号を予め設定されたフレーム長だけのフレームデータを読み出して、ハミング窓関数による窓掛け処理を行い、窓掛けしたフレームデータを悲鳴判定部２２１に入力する（Ｓ２）。 When the frame period arrives (YES in S1), the signal processing unit 22 reads out the latest acoustic signal from the acoustic signal storage unit 210 as frame data having a preset frame length, and performs windowing processing using a Hamming window function. The windowed frame data is input to the scream determination unit 221 (S2).

フレームデータを入力された悲鳴判定部２２１は、当該フレームデータの音量（パワー）を算出し（Ｓ３）、算出された音量を悲鳴音量しきい値と比較する（Ｓ４）。音量が悲鳴音量しきい値を超えていなければ（Ｓ４にてＮＯ）、悲鳴判定部２２１は悲鳴なしと判定し、信号処理部２２は悲鳴カウンタの値及び音割れカウンタをリセットして（Ｓ２０）、処理をステップＳ１に戻す。 The scream determination unit 221 to which the frame data is input calculates the volume (power) of the frame data (S3), and compares the calculated volume with a scream volume threshold (S4). If the volume does not exceed the scream volume threshold (NO in S4), the scream determining unit 221 determines that there is no scream, and the signal processing unit 22 resets the value of the scream counter and the sound breaking counter (S20). The process returns to step S1.

一方、音量が悲鳴音量しきい値を超えていれば（Ｓ４にてＹＥＳ）、悲鳴判定部２２１は、悲鳴が発生している可能性があるとして処理をステップＳ５へ進める。 On the other hand, if the volume exceeds the scream volume threshold (YES in S4), scream determination unit 221 advances the process to step S5 because there is a possibility that a scream has occurred.

まず、悲鳴判定部２２１は、揺らぎ判定に備えてフレームデータから基本周波数を算出し（Ｓ５）、算出した基本周波数を記憶部２１に記憶させる。 First, the scream determination unit 221 calculates a fundamental frequency from the frame data in preparation for fluctuation determination (S5), and stores the calculated fundamental frequency in the storage unit 21.

次に、悲鳴判定部２２１は、ステップＳ３にて算出した音量を過大音量しきい値と比較する（Ｓ６）。 Next, the scream determination unit 221 compares the volume calculated in step S3 with an excessive volume threshold (S6).

音量が過大音量しきい値以上であれば（Ｓ６にてＹＥＳ）、クリッピングによりフレームデータに対する母音判定を誤る可能性があるため母音判定は省略される。悲鳴判定部２２１は、音割れカウンタを１だけ増加させて（Ｓ１４）、増加させた音割れカウンタを音割れ異常判定時間と比較する（Ｓ１５）。 If the volume is equal to or higher than the excessive volume threshold (YES in S6), vowel determination is omitted because there is a possibility of erroneous vowel determination for frame data due to clipping. The scream determining unit 221 increases the sound breaking counter by 1 (S14), and compares the increased sound breaking counter with the sound breaking abnormality determining time (S15).

音割れカウンタが音割れ異常判定時間を超えていた場合（Ｓ１５にてＹＥＳ）、現在起きている事象に対して正しい判定を継続するのは不可能であるとして、悲鳴判定部２２１は音割れ異常発生の旨を異常出力部２２２に出力する（Ｓ１９）。この出力を受けた異常出力部２２２は音響信号格納部２１０に記憶されている１０秒間の音響信号を含めた音割れ異常信号を生成して通信部２３に出力する。音割れ異常信号はコントローラ３及び広域網４を経由して通信部２３から警備センター装置５に伝送される。 If the sound cracking counter has exceeded the sound cracking abnormality determination time (YES in S15), the scream determining unit 221 determines that it is impossible to continue the correct determination for the currently occurring event, and the scream determining unit 221 The occurrence is output to the abnormality output unit 222 (S19). Upon receipt of this output, the abnormal output unit 222 generates a sound cracking abnormal signal including the acoustic signal for 10 seconds stored in the acoustic signal storage unit 210 and outputs the generated signal to the communication unit 23. The sound breaking abnormality signal is transmitted from the communication unit 23 to the security center device 5 via the controller 3 and the wide area network 4.

音量が過大音量しきい値未満であれば（Ｓ６にてＮＯ）、フレームデータに対する正常な母音判定が可能であるので、悲鳴判定部２２１はフレームデータを母音抽出部２２０に入力して母音判定を行なわせる。 If the volume is less than the excessive volume threshold (NO in S6), normal vowel determination for the frame data is possible, so scream determination unit 221 inputs the frame data to vowel extraction unit 220 to perform vowel determination. Let it be done.

フレームデータを入力された母音抽出部２２０は、当該フレームデータの周波数特徴量を算出する（Ｓ７）。 The vowel extraction unit 220 that has received the frame data calculates the frequency feature amount of the frame data (S7).

続いて母音抽出部２２０は、記憶部２１から「あ」の母音特徴量２１１、「え」の母音特徴量２１１及び「お」の母音特徴量２１１を順次読み出し、読み出した各母音特徴量２１１とフレームデータの周波数特徴量を比較してフレームデータが母音であるか否かを判定する（Ｓ８）。また母音抽出部２２０は、フレームデータが母音と判定された場合、その母音種別を特定する。 Subsequently, the vowel extraction unit 220 sequentially reads out the vowel feature quantity 211 of “A”, the vowel feature quantity 211 of “e”, and the vowel feature quantity 211 of “O” from the storage unit 21, and each read vowel feature quantity 211 and It is determined whether the frame data is a vowel by comparing the frequency feature amounts of the frame data (S8). In addition, when the frame data is determined to be a vowel, the vowel extraction unit 220 specifies the vowel type.

フレームデータが母音でないと判定された場合（Ｓ９にてＮＯ）、母音抽出部２２０は母音が抽出されなかった旨を悲鳴判定部２２１に出力し、この出力を受けた悲鳴判定部２２１は悲鳴なしと判定し、信号処理部２２は悲鳴カウンタの値及び音割れカウンタをリセットして（Ｓ２０）、処理をステップＳ１に戻す。 When it is determined that the frame data is not a vowel (NO in S9), the vowel extraction unit 220 outputs to the scream determination unit 221 that no vowel has been extracted, and the scream determination unit 221 that receives this output does not scream. The signal processing unit 22 resets the value of the scream counter and the sound breaking counter (S20), and returns the process to step S1.

一方、フレームデータが母音であると判定されると（Ｓ９にてＹＥＳ）、母音抽出部２２０は、特定された母音種別が前回特定された母音種別と同一か否かを確認するとともに今回特定された母音種別を記憶部２１に記憶させる（Ｓ１０）。次回の確認では記憶部２１に記憶される今回の母音種別が前回の母音種別として参照される。 On the other hand, when it is determined that the frame data is a vowel (YES in S9), vowel extraction unit 220 confirms whether or not the specified vowel type is the same as the previously specified vowel type and is specified this time. The vowel type is stored in the storage unit 21 (S10). In the next confirmation, the current vowel type stored in the storage unit 21 is referred to as the previous vowel type.

今回特定された母音種別が前回特定された母音種別と同一ならば（Ｓ１０にてＹＥＳ）、母音抽出部２２０は同母音が継続抽出された旨を悲鳴判定部２２１に出力する。この出力を受けた悲鳴判定部２２１は悲鳴カウンタを１だけ増加させる（Ｓ１１）。 If the vowel type specified this time is the same as the vowel type specified last time (YES in S10), vowel extraction unit 220 outputs to scream determination unit 221 that the same vowel is continuously extracted. Upon receiving this output, the scream determination unit 221 increments the scream counter by 1 (S11).

他方、今回特定された母音種別が前回特定された母音種別と同一でなければ（Ｓ１０にてＮＯ）、母音抽出部２２０は新たな母音が抽出された旨を悲鳴判定部２２１に出力する。この出力を受けた悲鳴判定部２２１は悲鳴カウンタに１を設定するとともに（Ｓ１２）、音割れカウンタを０に設定する（Ｓ１３）。 On the other hand, if the vowel type specified this time is not the same as the vowel type specified last time (NO in S10), vowel extraction unit 220 outputs to scream determination unit 221 that a new vowel has been extracted. Upon receiving this output, the scream determining unit 221 sets 1 to the scream counter (S12) and sets the sound breaking counter to 0 (S13).

こうしてステップＳ１１又はステップＳ１２にて悲鳴カウンタの値が更新されると、悲鳴判定部２２１は、悲鳴カウンタを悲鳴判定時間と比較する（Ｓ１６）。悲鳴カウンタが悲鳴判定時間を超えていない場合（Ｓ１６にてＮＯ）、悲鳴判定部２２１は判定継続中であるとし、信号処理部２２は以降の音響信号を処理するために、処理をステップＳ１に戻す。 Thus, when the value of the scream counter is updated in step S11 or step S12, the scream determination unit 221 compares the scream counter with the scream determination time (S16). If the scream counter does not exceed the scream determination time (NO in S16), the scream determination unit 221 determines that the determination is continuing, and the signal processing unit 22 proceeds to step S1 in order to process subsequent acoustic signals. return.

一方、悲鳴カウンタが悲鳴判定時間を超えていれば（Ｓ１６にてＹＥＳ）、悲鳴判定部２２１は揺らぎ判定を行なう（Ｓ１７）。すなわち、悲鳴判定部２２１は、ステップＳ５にて時系列に蓄積してきた基本周波数の値を記憶部２１から読み出し、読み出した値の分散値を算出して揺らぎしきい値と比較し、分散値が揺らぎしきい値以上であれば揺らぎあり、そうでなければ揺らぎなしを判定する。 On the other hand, if the scream counter exceeds the scream determination time (YES in S16), scream determination unit 221 performs fluctuation determination (S17). That is, the scream determination unit 221 reads the fundamental frequency value accumulated in time series in step S5 from the storage unit 21, calculates the variance value of the read value, compares it with the fluctuation threshold value, and the variance value is If it is equal to or greater than the fluctuation threshold, it is judged that there is fluctuation, and if not, it is judged that there is no fluctuation.

揺らぎありが判定されると（Ｓ１８にてＹＥＳ）、悲鳴判定部２２１は、悲鳴が発生していると判定し、悲鳴検知の旨を異常出力部２２２に出力する（Ｓ１９）。この出力を受けた異常出力部２２２は音響信号格納部２１０に記憶されている１０秒間の音響信号を含めた異常信号を生成して通信部２３に出力する。異常信号はコントローラ３及び広域網４を経由して通信部２３から警備センター装置５に伝送される。またコントローラ３は警報装置６を動作させる。 If it is determined that there is a fluctuation (YES in S18), scream determination unit 221 determines that a scream has occurred, and outputs a notification of scream detection to abnormality output unit 222 (S19). Upon receiving this output, the abnormal output unit 222 generates an abnormal signal including the acoustic signal for 10 seconds stored in the acoustic signal storage unit 210 and outputs the abnormal signal to the communication unit 23. The abnormal signal is transmitted from the communication unit 23 to the security center device 5 via the controller 3 and the wide area network 4. The controller 3 operates the alarm device 6.

こうして異常信号が出力されると、信号処理部２２は、引き続き悲鳴検知を行うために悲鳴カウンタ及び音割れカウンタを０にリセットして（Ｓ２０）、処理をステップＳ１に戻す。尚、信号処理部２２は、カウンタのリセットに伴い、蓄積していた基本周波数の値や前回特定された母音種別のデータもクリアする。 When the abnormal signal is output in this way, the signal processing unit 22 resets the scream counter and the sound breaking counter to 0 in order to continue to detect the scream (S20), and returns the process to step S1. The signal processing unit 22 also clears the accumulated fundamental frequency value and the data of the vowel type specified last time when the counter is reset.

一方、揺らぎなしが判定されると（Ｓ１８にてＮＯ）、信号処理部２２は、引き続き悲鳴検知を行うために悲鳴カウンタ及び音割れカウンタを０にリセットして（Ｓ２０）、処理をステップＳ１に戻す。 On the other hand, if it is determined that there is no fluctuation (NO in S18), signal processing unit 22 resets the scream counter and the sound breaking counter to 0 in order to continue to detect scream (S20), and the process goes to step S1. return.

＜変形例＞
上記実施形態において母音特徴量２１１は各母音の分布を表すＧＭＭの形態で記憶された。別の実施形態において母音特徴量２１１は各母音の平均的な特徴ベクトル又は代表的な特徴ベクトルの形態で記憶してもよい。この場合、母音抽出部２２０はこれらの特徴ベクトルと入力音響信号の周波数特徴量との距離を算出して母音を抽出する。 <Modification>
In the above embodiment, the vowel feature quantity 211 is stored in the form of GMM representing the distribution of each vowel. In another embodiment, the vowel feature quantity 211 may be stored in the form of an average feature vector or a representative feature vector of each vowel. In this case, the vowel extraction unit 220 calculates the distance between these feature vectors and the frequency feature amount of the input acoustic signal, and extracts vowels.

また上記実施形態において周波数特徴量としてＬＰＣケプストラムを例示した。周波数特徴量はこれに限らず、ＭＦＣＣ（Ｍｅｌ−ＦｒｅｑｕｅｎｃｙＣｅｐｓｔｒａｌＣｏｅｆｆｉｃｉｅｎｔｓ）など音声分析で知られている様々なスペクトルパラメータを利用することができる。 In the above embodiment, the LPC cepstrum is exemplified as the frequency feature amount. The frequency feature amount is not limited to this, and various spectral parameters known in speech analysis such as MFCC (Mel-Frequency Cepstial Coefficients) can be used.

上記実施形態において悲鳴判定部２２１は基本周波数の分散値によって揺らぎ判定を行なった。別の実施形態において悲鳴判定部２２１は基本周波数の変動幅によって揺らぎ判定を行うこともできる。すなわち悲鳴判定部２２１は、ステップＳ５にて時系列に蓄積してきた基本周波数から最大値と最小値を抽出してその差を変動幅として算出し、算出された変動幅が変動幅に対して予め設定された揺らぎ判定しきい値を超えていれば揺らぎあり、そうでなければ揺らぎなしと判定する。 In the above embodiment, the scream determination unit 221 performs fluctuation determination based on the dispersion value of the fundamental frequency. In another embodiment, the scream determination unit 221 can also perform fluctuation determination based on the fluctuation range of the fundamental frequency. That is, the scream determination unit 221 extracts the maximum value and the minimum value from the fundamental frequency accumulated in time series in step S5 and calculates the difference as a fluctuation range. If it exceeds the set fluctuation determination threshold, it is determined that there is fluctuation, and otherwise, it is determined that there is no fluctuation.

さらに別の実施形態において悲鳴判定部２２１は基本周波数の代わりに音量を用いて揺らぎ判定を行なうこともできる。すなわち悲鳴判定部２２１は、ステップＳ４にて悲鳴音量を超えていると判定されたときにステップＳ３にて算出された音量を時系列に蓄積し、これらの分散又は変動幅を算出して、算出された分散又は変動幅がこれらに対して予め設定された揺らぎ判定しきい値を超えていれば揺らぎあり、そうでなければ揺らぎなしと判定する。このように音量を揺らぎ判定に用いる場合はステップＳ５を省略することができ、処理量を軽減できる利点がある。 In yet another embodiment, the scream determination unit 221 can also perform fluctuation determination using volume instead of the fundamental frequency. That is, the scream determination unit 221 accumulates the volume calculated in step S3 in time series when it is determined that the volume exceeds the scream volume in step S4, and calculates the variance or fluctuation range of these to calculate If the distributed or fluctuation range exceeds the fluctuation determination threshold set in advance for these, it is determined that there is fluctuation, and otherwise, it is determined that there is no fluctuation. Thus, when the volume is used for fluctuation determination, step S5 can be omitted, and there is an advantage that the processing amount can be reduced.

また上記実施形態において母音抽出部２２０は予め記憶された母音特徴量２１１との比較により母音部分を抽出した。別の実施形態において母音抽出部２２０は、予め記憶された母音特徴量２１１を用いずに、入力音響信号における周波数特徴量の継続性と揺らぎの有無から母音部分を抽出する。すなわち母音抽出部２２０は入力音響信号において前後するフレーム間で周波数特徴量を比較して互いに類似しているか否かを判定するとともに入力音響信号の各フレームの基本周波数を抽出し、予め設定された母音継続時間以上（例えば１００ｍｓ以上）連続して類似が判定された連続区間における基本周波数の揺らぎを判定し、揺らぎがあると判定された連続区間を母音部分として抽出する。このとき基本周波数が抽出されないフレームが含まれる区間は母音部分として抽出しない或いはそもそも連続区間として抽出しないのが好ましい。他方、母音抽出部２２０は、同一周波数特徴量が連続していない区間、基本周波数が抽出されない区間、又は揺らぎが判定されない区間を母音部分として抽出しない。つまり同一周波数特徴量が継続していることと基本周波数が有ることから子音を排除し、基本周波数の揺らぎが有ることから雑音を排除して母音部分を抽出するのである。尚、この場合、悲鳴判定部２２１による揺らぎ判定は省略しても良い。悲鳴の語尾の中には稀に２ｋＨｚ近い基本周波数を有するものもあるが、周波数特徴量の継続性と揺らぎの有無から母音部分を抽出することでこのように稀な特性を有する悲鳴であっても検知できる。
In the above embodiment, the vowel extraction unit 220 extracts a vowel part by comparison with a vowel feature quantity 211 stored in advance. In another embodiment, the vowel extraction unit 220 extracts the vowel part from the continuity of the frequency feature amount in the input acoustic signal and the presence / absence of fluctuation without using the previously stored vowel feature amount 211. That is, the vowel extraction unit 220 compares frequency feature amounts between frames before and after the input acoustic signal to determine whether they are similar to each other, extracts the fundamental frequency of each frame of the input acoustic signal, and is set in advance. Fluctuations in the fundamental frequency in continuous sections for which similarities are determined continuously for a vowel duration (for example, 100 ms or longer) are determined, and continuous sections determined to have fluctuations are extracted as vowel parts. At this time, it is preferable that a section including a frame from which the fundamental frequency is not extracted is not extracted as a vowel part or is not extracted as a continuous section in the first place. On the other hand, the vowel extraction unit 220 does not extract a section where the same frequency feature quantity is not continuous, a section where the fundamental frequency is not extracted, or a section where fluctuation is not determined as a vowel part. That is, the consonant is excluded because the same frequency feature amount continues and the fundamental frequency exists, and the vowel part is extracted by eliminating the noise because the fundamental frequency fluctuates. In this case, the fluctuation determination by the scream determination unit 221 may be omitted. Some scream endings rarely have a fundamental frequency close to 2 kHz, but this is a scream that has such a rare characteristic by extracting the vowel part from the continuity of the frequency feature and the presence or absence of fluctuations. Can also be detected.

１・・・監視装置
２・・・悲鳴検知装置
２０・・・マイク部
２１・・・記憶部
２１０・・・音響信号格納部
２１１・・・母音特徴量
２２・・・信号処理部
２２０・・・母音抽出部
２２１・・・悲鳴判定部
２２２・・・異常出力部
２３・・・通信部
３・・・コントローラ
４・・・広域網
５・・・警備センター装置
６・・・警報装置
７・・・監視カメラ
８・・・録画装置
１０・・・金庫
DESCRIPTION OF SYMBOLS 1 ... Monitoring apparatus 2 ... Scream detection apparatus 20 ... Microphone part 21 ... Memory | storage part 210 ... Acoustic signal storage part 211 ... Vowel feature-value 22 ... Signal processing part 220 ...・ Vowel extraction unit 221 ... Scream determination unit 222 ... Abnormal output unit 23 ... Communication unit 3 ... Controller 4 ... Wide area network 5 ... Security center device 6 ... Alarm device 7 ..Monitoring camera 8 ... Recording device 10 ... Safe

Claims

A microphone part for collecting sound in a monitoring space;
A vowel extraction unit that extracts a vowel part of speech from the sound collected by the microphone unit;
A scream determination unit that determines that the vowel part is a scream if the vowel part is equal to or higher than a predetermined scream volume and has continued for a predetermined time;
An abnormal output unit that outputs an abnormal signal when the scream determining unit determines that the scream,
Have
The scream detection device is characterized in that the scream determination unit analyzes the fundamental frequency or volume of the vowel part to determine the presence or absence of fluctuation, and does not determine scream when there is no fluctuation .

The scream detection device according to claim 1, wherein the vowel extraction unit extracts a sound corresponding to “A”.

The scream detection device according to claim 1, wherein the vowel extraction unit extracts a sound corresponding to “e”.

The scream detection device according to any one of claims 1 to 3, wherein the vowel extraction unit extracts sound corresponding to "o".

The vowel extraction unit further determines a vowel type of the vowel part,
The scream detection device according to any one of claims 1 to 4, wherein the scream determination unit determines that a scream occurs when the vowel part having the same vowel type continues for the predetermined time or more.

A microphone part for collecting sound in a monitoring space;
A vowel extraction unit that extracts a vowel part of speech from the sound collected by the microphone unit;
A scream determination unit that determines that the vowel part is a scream if the vowel part is equal to or higher than a predetermined scream volume and has continued for a predetermined time;
An abnormal output unit that outputs an abnormal signal when the scream determining unit determines that the scream,
Have
The scream determination unit further detects an excessive volume portion having a volume that can be clipped from the collected sound, and regards the excessive volume portion continuous with the vowel portion as a part of the vowel portion. A scream detection device characterized by making a determination.