JP2018072878A

JP2018072878A - Determination method of sound related to crime prevention and information processing device

Info

Publication number: JP2018072878A
Application number: JP2016207756A
Authority: JP
Inventors: 翔太藤丸; Shota Fujimaru; 淳宏桜井; Atsuhiro Sakurai; 晋太木村; Akihiro Kimura; 鈴木　晃; Akira Suzuki; 晃鈴木
Original assignee: Animo Ltd
Current assignee: Animo Ltd
Priority date: 2016-10-24
Filing date: 2016-10-24
Publication date: 2018-05-10
Anticipated expiration: 2036-10-24
Also published as: JP6726082B2

Abstract

PROBLEM TO BE SOLVED: To accurately detect generation of sound that has to be identified in crime prevention.SOLUTION: A method of the present invention includes: a step (A) of calculating, with respect to input sound data, a first parameter value expressing a fluctuation degree of a spectrum of sound related to sound data, a second parameter value expressing a white color degree of sound related to sound data, and a third parameter value expressing a degree of a harmonic structure in sound related to sound data; and a step (B) of determining whether predetermined sound that has to be identified in crime prevention is included in sound data on the basis of a first parameter value, a second parameter value, and a third parameter value.SELECTED DRAWING: Figure 1

Description

本発明は、防犯に関わる音を検出する技術に関する。 The present invention relates to a technique for detecting sound related to crime prevention.

従来から、悲鳴を検出するような技術は存在している。具体的には、母音の継続期間を測定することで検出する例や、音声のパワー情報と調波情報と基本周波数とに基づき検出処理を行う例や、２つの帯域における音声の音量により検出を行う例などが知られている。また、パターンマッチングにより、言葉以外の音声や破壊音等を検出するような技術もある。 Conventionally, techniques for detecting screams exist. Specifically, examples include detection by measuring the duration of vowels, examples of detection processing based on voice power information, harmonic information, and fundamental frequency, and detection based on voice volume in two bands. Examples of doing this are known. There is also a technique for detecting speech other than words, destructive sounds, and the like by pattern matching.

しかしながら、これらの従来技術では、様々な環境音や会話音声が存在する空間において、悲鳴やその他防犯上識別すべき音の発生を精度良く検出できない。 However, these conventional technologies cannot accurately detect the generation of screams and other sounds that should be identified for crime prevention in a space where various environmental sounds and conversational sounds exist.

特開平９−２５１５８３号公報Japanese Patent Laid-Open No. 9-251583 特開２００１−３１２２９２号公報JP 2001-312292 A 特開２０１１−５３５５７号公報JP 2011-53557 A 特開２０１２−４８１７３号公報JP 2012-48173 A

従って、本発明の目的は、一側面によれば、精度良く、防犯上識別すべき音の発生を検出するための技術を提供することである。 Therefore, the objective of this invention is providing the technique for detecting generation | occurrence | production of the sound which should be identified on crime prevention accurately according to one side surface.

本発明に係る判定方法は、（Ａ）入力された音データに対して、音データに係る音のスペクトルの変動度合いを表す第１のパラメータ値と、音データに係る音の白色度合いを表す第２のパラメータ値と、音データに係る音における調波構造の度合いを表す第３のパラメータ値とを算出するステップと、（Ｂ）第１のパラメータ値と第２のパラメータ値と第３のパラメータ値とに基づき、音データに、防犯上識別すべき所定の音が含まれるか否かを判定するステップとを含む。 In the determination method according to the present invention, (A) a first parameter value representing the degree of variation in the spectrum of the sound related to the sound data, and a whiteness degree of the sound related to the sound data are input to the input sound data. Calculating a second parameter value and a third parameter value representing a degree of the harmonic structure in the sound related to the sound data; (B) a first parameter value, a second parameter value, and a third parameter; Determining whether the sound data includes a predetermined sound to be identified for crime prevention based on the value.

一側面によれば、精度良く、防犯上識別すべき音の発生を検出できるようになる。 According to one aspect, it is possible to accurately detect the generation of a sound to be identified for crime prevention.

図１は、実施の形態に係るシステムの概要を示す図である。FIG. 1 is a diagram illustrating an overview of a system according to an embodiment. 図２は、防犯上識別すべき音とパラメータ値との関係を表す図である。FIG. 2 is a diagram illustrating a relationship between sound to be identified for crime prevention and parameter values. 図３は、実施の形態に係る処理フローを示す図である。FIG. 3 is a diagram illustrating a processing flow according to the embodiment. 図４は、実施の形態に係る処理フローを示す図である。FIG. 4 is a diagram illustrating a processing flow according to the embodiment.

本発明の一実施の形態に係るシステム構成例を図１に示す。 A system configuration example according to an embodiment of the present invention is shown in FIG.

実施の形態に係る主要な処理を実行する情報処理装置１００には、集音用のマイク１ａが接続されている。ここでは、マイク１ａから、周辺の音のアナログ信号をディジタル化することで得られた音データが情報処理装置１００に入力されるものとする。但し、音のアナログ信号が情報処理装置１００に入力されて、情報処理装置１００においてディジタル化された音データが生成される場合もある。 A sound collecting microphone 1a is connected to the information processing apparatus 100 that executes main processing according to the embodiment. Here, it is assumed that sound data obtained by digitizing analog signals of peripheral sounds is input to the information processing apparatus 100 from the microphone 1a. However, there may be a case where a sound analog signal is input to the information processing apparatus 100 and digitized sound data is generated in the information processing apparatus 100.

また、場合によっては、マイク１ｂが、ＩｏＴ（Internet of Things）ゲートウェイのような端末装置３００に接続されており、当該端末装置３００が、インターネット等のコンピュータネットワーク２００を介して情報処理装置１００に接続される場合もある。この場合、マイク１ｂ又は端末装置３００において、音のアナログ信号をディジタル化することで音データが得られて、当該音データは、コンピュータネットワーク２００を介して、情報処理装置１００に入力される。 In some cases, the microphone 1b is connected to a terminal device 300 such as an IoT (Internet of Things) gateway, and the terminal device 300 is connected to the information processing apparatus 100 via a computer network 200 such as the Internet. Sometimes it is done. In this case, sound data is obtained by digitizing an analog sound signal in the microphone 1 b or the terminal device 300, and the sound data is input to the information processing apparatus 100 via the computer network 200.

マイク１ａ又は１ｂは、警備を行うべきエリアに配置される。情報処理装置１００は、警備を行うべきエリアの近隣に設置されることもあれば、遠隔地に設置される場合もある。情報処理装置１００は、クラウドなどに設けられる物理的なサーバである場合もあれば、仮想マシンとして実現される場合もある。 The microphone 1a or 1b is arranged in an area to be guarded. The information processing apparatus 100 may be installed in the vicinity of an area to be guarded or may be installed in a remote place. The information processing apparatus 100 may be a physical server provided in a cloud or the like, or may be realized as a virtual machine.

情報処理装置１００は、音データ格納部１０２と、算出部１０３と、第２データ格納部１０４と、判定部１０５と、出力部１０６とを有する。 The information processing apparatus 100 includes a sound data storage unit 102, a calculation unit 103, a second data storage unit 104, a determination unit 105, and an output unit 106.

音データ格納部１０２は、マイク１ａ又は１ｂからの音データを格納する。算出部１０３は、本実施の形態において特徴的なパラメータ値を算出する。より具体的には、算出部１０３は、前処理部１０３１と、第１データ格納部１０３２と、第１パラメータ値算出部１０３３と、第２パラメータ値算出部１０３４と、第３パラメータ値算出部１０３５と、第４パラメータ値算出部１０３６とを有する。 The sound data storage unit 102 stores sound data from the microphone 1a or 1b. The calculation unit 103 calculates characteristic parameter values in the present embodiment. More specifically, the calculation unit 103 includes a preprocessing unit 1031, a first data storage unit 1032, a first parameter value calculation unit 1033, a second parameter value calculation unit 1034, and a third parameter value calculation unit 1035. And a fourth parameter value calculation unit 1036.

前処理部１０３１は、第１乃至第４パラメータ値算出部１０３３乃至１０３６共通で用いるデータを生成するための処理を実行し、処理結果を第１データ格納部１０３２に格納する。なお、第４パラメータ値算出部１０３６を用いるか否かは任意である。 The pre-processing unit 1031 executes processing for generating data used in common by the first to fourth parameter value calculation units 1033 to 1036 and stores the processing result in the first data storage unit 1032. Whether or not the fourth parameter value calculation unit 1036 is used is arbitrary.

第１パラメータ値算出部１０３３は、音のスペクトルの変動度合いを表す第１パラメータ値を算出する。第２パラメータ値算出部１０３４は、音の白色度合いを表す第２パラメータ値を算出する。第３パラメータ値算出部１０３５は、音における調波構造の度合いを表す第３パラメータ値を算出する。第４パラメータ値算出部１０３６は、音の主要な周波数を表す第４パラメータ値を算出する。第１乃至第４パラメータ値については後に詳しく述べる。 The first parameter value calculation unit 1033 calculates a first parameter value representing the degree of fluctuation of the sound spectrum. The second parameter value calculation unit 1034 calculates a second parameter value that represents the degree of whiteness of the sound. The third parameter value calculation unit 1035 calculates a third parameter value representing the degree of the harmonic structure in the sound. The fourth parameter value calculation unit 1036 calculates a fourth parameter value representing the main frequency of the sound. The first to fourth parameter values will be described in detail later.

第２データ格納部１０４は、算出部１０３によって算出されたパラメータ値を格納する。判定部１０５は、第２データ格納部１０４に格納されているパラメータ値に基づき、防犯上識別すべき所定の音が発生したか否かを判定する。出力部１０６は、判定部１０５によって防犯上識別すべき所定の音が発生したと判定された場合に、その旨又は検出した音の種類を表す通知を出力する。例えば、警告音又は警告音声メッセージを出力する。又は、情報処理装置１００に接続される端末装置又はコンピュータネットワーク２００などを介して接続される端末装置のモニターに、警告メッセージを表示する。このような端末装置から、警告音又は警告音声メッセージを出力するようにしても良い。警告音声メッセージ及び警告メッセージは、検出した音の種類についての情報を含む。 The second data storage unit 104 stores the parameter value calculated by the calculation unit 103. Based on the parameter value stored in the second data storage unit 104, the determination unit 105 determines whether or not a predetermined sound to be identified for crime prevention has occurred. When the determination unit 105 determines that a predetermined sound to be identified for crime prevention has occurred, the output unit 106 outputs a notification indicating that fact or the type of sound detected. For example, a warning sound or a warning voice message is output. Alternatively, a warning message is displayed on the monitor of the terminal device connected to the information processing device 100 or the terminal device connected via the computer network 200 or the like. A warning sound or a warning voice message may be output from such a terminal device. The warning voice message and the warning message include information about the detected sound type.

ここで、本実施の形態で用いられる第１乃至第４パラメータ値について説明しておく。第１パラメータ値は、音のスペクトルの変動度合いを表す指標値であり、例えば、スペクトル包絡の変動を表す値、又はスペクトルの変動を表す値である。具体的計算方法については、処理フローの説明において述べる。 Here, the first to fourth parameter values used in the present embodiment will be described. The first parameter value is an index value that represents the degree of fluctuation of the sound spectrum, and is, for example, a value that represents the fluctuation of the spectrum envelope or a value that represents the fluctuation of the spectrum. A specific calculation method will be described in the description of the processing flow.

第２パラメータ値は、音の白色度合いを表す指標値であり、例えば、音データに係る音のスペクトルを確率分布とみなして算出される情報エントロピー（本実施の形態では、スペクトルエントロピーとも呼ぶ）である。具体的計算方法については、処理フローの説明において述べる。 The second parameter value is an index value representing the degree of whiteness of sound, and is, for example, information entropy (also referred to as spectrum entropy in the present embodiment) calculated by regarding the sound spectrum related to sound data as a probability distribution. is there. A specific calculation method will be described in the description of the processing flow.

第３パラメータ値は、音データに係る音における調波構造の度合いを表す指標値であり、例えば、音データに係る音のケプストラムにおける所定範囲内の最大値（本実施の形態では、調波構造強度とも呼ぶ）である。具体的計算方法については、処理フローの説明において述べる。 The third parameter value is an index value representing the degree of the harmonic structure in the sound related to the sound data. For example, the third parameter value is a maximum value within a predetermined range in the sound cepstrum related to the sound data (in this embodiment, the harmonic structure Also called strength). A specific calculation method will be described in the description of the processing flow.

第４パラメータ値は、音データに係る音の主要な周波数であり、例えば、音データに係る音のスペクトルの重心周波数（本実施の形態では、スペクトル重心とも呼ぶ）である。具体的計算方法については、処理フローの説明において述べる。 The fourth parameter value is a main frequency of the sound related to the sound data, and is, for example, a centroid frequency of the spectrum of the sound related to the sound data (also referred to as a spectrum centroid in the present embodiment). A specific calculation method will be described in the description of the processing flow.

一方、本実施の形態において防犯上識別すべき音の種類は、悲鳴、踏みしめると特殊な音が発生する防犯砂利を踏みしめた時の音、ガラス破壊音又は爆発音である。 On the other hand, in the present embodiment, the types of sounds to be identified for crime prevention are screams, sounds when a crime prevention gravel is generated, which produces a special sound when stepped on, glass breaking sounds, or explosion sounds.

これらの音と、第１乃至第３パラメータ値との関係は、図２に示すような関係となるということが、今回分かった。 It has been found this time that the relationship between these sounds and the first to third parameter values is as shown in FIG.

具体的には、悲鳴であれば、第１パラメータ値が「低」、第２パラメータ値が「高」、第３パラメータ値が「高」となる。すなわち、音色の変化が小さく、自然音らしく、調波構造の度合いが高い音の発生を、第１乃至第３パラメータ値で特定できる。 Specifically, if it is a scream, the first parameter value is “low”, the second parameter value is “high”, and the third parameter value is “high”. That is, it is possible to specify the generation of a sound having a small change in timbre, a natural sound, and a high degree of harmonic structure by the first to third parameter values.

防犯砂利を踏みしめた時の音は、第１パラメータ値が「低」、第２パラメータ値が「高」、第３パラメータ値が「低」となる。すなわち、音色の変化が小さく、自然音らしく、調波構造の度合いが低い音の発生を、第１乃至第３パラメータ値で特定できる。 When the crime prevention gravel is stepped on, the first parameter value is “low”, the second parameter value is “high”, and the third parameter value is “low”. That is, it is possible to specify the occurrence of a sound with a small timbre change, a natural sound, and a low degree of harmonic structure by the first to third parameter values.

ガラス破壊音及び爆発音は、第１パラメータ値が「高」、第２パラメータ値が「高」、第３パラメータ値が「低」となる。すなわち、音色の変化が大きく、自然音らしく、調波構造の度合いが低い音の発生を、第１乃至第３パラメータ値で特定できる。 In the glass breaking sound and explosion sound, the first parameter value is “high”, the second parameter value is “high”, and the third parameter value is “low”. That is, the generation of a sound having a large timbre change, a natural sound, and a low degree of harmonic structure can be specified by the first to third parameter values.

このように、悲鳴、防犯砂利を踏みしめた時の音、ガラス破壊音及び爆発音は、一例であり、上記のような性質の音であれば、検出可能となる。 Thus, the scream, the sound when stepping on crime prevention gravel, the glass breaking sound and the explosion sound are examples, and any sound having the above properties can be detected.

なお、ガラス破壊音であれば、音データに係る音の主要な周波数を表す第４パラメータ値が「高」であり、爆発音であれば、第４パラメータ値が「低」である。従って、ガラス破壊音と爆発音とを区別するためには、第４パラメータ値を用いればよい。 If the sound is a glass breaking sound, the fourth parameter value indicating the main frequency of the sound related to the sound data is “high”, and if the sound is explosive, the fourth parameter value is “low”. Therefore, the fourth parameter value may be used to distinguish between the glass breaking sound and the explosion sound.

従って、第１乃至第４パラメータ値の閾値（一般的には範囲を定める値）を、各種音サンプルによる実験などにより定めておけば、判定部１０５によって、防犯上識別すべき所定の音の発生を検出できるようになる。 Therefore, if the threshold values (generally, the values that determine the range) of the first to fourth parameter values are determined by experiments with various sound samples, the determination unit 105 generates a predetermined sound that should be identified for crime prevention. Can be detected.

次に、情報処理装置１００において実行される具体的な処理について、図３及び図４を用いて説明する。 Next, specific processing executed in the information processing apparatus 100 will be described with reference to FIGS. 3 and 4.

前処理部１０３１は、音データ格納部１０２に格納されている音データのうち、所定期間分の未処理の音データを読み出す（図３：ステップＳ１）。そして、前処理部１０３１は、所定の前処理を実行し、処理結果を第１データ格納部１０３２に格納する（ステップＳ２）。 The pre-processing unit 1031 reads unprocessed sound data for a predetermined period from the sound data stored in the sound data storage unit 102 (FIG. 3: step S1). Then, the preprocessing unit 1031 executes predetermined preprocessing, and stores the processing result in the first data storage unit 1032 (step S2).

ステップＳ２の前処理は、所定期間分の音データに対する窓処理を含む。この窓処理は、例えば、所定期間を複数のサブ期間に分けて、それぞれに対して窓関数を乗ずる処理である。例えば、窓関数にはハニング窓を用いる。窓処理及び窓関数については、例えば、＜http://www.ni.com/white-paper/4844/ja/＞を参照のこと。 The preprocessing in step S2 includes window processing for sound data for a predetermined period. This window process is, for example, a process of dividing a predetermined period into a plurality of sub-periods and multiplying each by a window function. For example, a Hanning window is used as the window function. For example, see <http://www.ni.com/white-paper/4844/en/> for window processing and window functions.

さらに、前処理では、サブ期間毎に、窓処理後の音データに対してＦＦＴ（Fast Fourier Transform）を実行して、ＦＦＴ処理結果の複素数の絶対値を算出する。そうすると、各周波数について、値ａ［ｉ］（ｉは周波数に対応するインデックス値）が得られる。 Further, in the preprocessing, for each sub period, FFT (Fast Fourier Transform) is performed on the sound data after the window processing, and the absolute value of the complex number of the FFT processing result is calculated. Then, a value a [i] (i is an index value corresponding to the frequency) is obtained for each frequency.

そして、第１パラメータ値算出部１０３３は、第１データ格納部１０３２に格納されているデータを用いて第１パラメータ値を算出し、第２データ格納部１０４に格納する（ステップＳ３）。 Then, the first parameter value calculation unit 1033 calculates the first parameter value using the data stored in the first data storage unit 1032 and stores it in the second data storage unit 104 (step S3).

第１パラメータ値がスペクトルの変動を表す値であれば、第１パラメータ値算出部１０３３は、所定期間に含まれる全サブ期間について、ｉ（すなわち周波数）毎にａ［ｉ］の分散を算出する。そして、周波数毎に算出された分散を合計することで、スペクトルの変動を表す値が得られる。なお、分散ではなく、標準偏差などのばらつきを表す他の統計量を用いても良い。 If the first parameter value is a value representing the fluctuation of the spectrum, the first parameter value calculation unit 1033 calculates the variance of a [i] for every i (ie, frequency) for all the sub-periods included in the predetermined period. . And the value showing the fluctuation | variation of a spectrum is obtained by totaling the dispersion | distribution calculated for every frequency. Note that other statistics representing variations such as standard deviation may be used instead of variance.

一方、第１パラメータ値がスペクトル包絡の変動を表す値であれば、第１パラメータ値算出部１０３３は、各サブ期間について、ａ［ｉ］の二乗の対数（＝log(a[i]²)）を算出し、算出された値を信号とみなして逆ＦＦＴ（Inverse ＦＦＴ）を実行することでケプストラムを算出する。ケプストラムでは、低次にスペクトル包絡が現れることが知られている。なお、ケプストラムにおいて、周波数に相当するものをケフレンシと呼び、ｊをそのインデックスとすると、ケプストラムはｂ［ｊ］と表される。そこで、第１パラメータ値算出部１０３３は、所定期間に含まれる全サブ期間について、低次の部分（例えば、サンプリング周波数１６０００Ｈｚのとき８次まで（０次を除く）。）におけるｊ毎にｂ［ｊ］の分散を算出する。そして、ケフレンシ毎に算出された分散を合計することで、スペクトル包絡の変動を表す値が得られる。なお、分散ではなく、標準偏差などのばらつきを表す他の統計量を用いても良い。 On the other hand, if the first parameter value is a value representing the variation of the spectrum envelope, the first parameter value calculation unit 1033 calculates the logarithm of the square of a [i] (= log (a [i] ² ) for each sub period. ), And the cepstrum is calculated by performing the inverse FFT with the calculated value as a signal. In the cepstrum, it is known that a spectral envelope appears in the lower order. In the cepstrum, a frequency corresponding to a frequency is called quefrency, and j is an index thereof, and the cepstrum is expressed as b [j]. Therefore, the first parameter value calculation unit 1033 performs b [ j] is calculated. And the value showing the fluctuation | variation of a spectrum envelope is obtained by totaling the dispersion | variation calculated for every quefrency. Note that other statistics representing variations such as standard deviation may be used instead of variance.

また、第２パラメータ値算出部１０３４は、第１データ格納部１０３２に格納されているデータを用いて第２パラメータ値を算出し、第２データ格納部１０４に格納する（ステップＳ５）。 Further, the second parameter value calculation unit 1034 calculates the second parameter value using the data stored in the first data storage unit 1032 and stores the second parameter value in the second data storage unit 104 (step S5).

例えば、第２パラメータ値算出部１０３４は、各サブ期間について、ａ［ｉ］の総和ａｓｕｍ（＝ａ［０］＋ａ［１］＋・・・・＋ａ［max］）を算出し、ａ［０］／ａｓｕｍ、ａ［１］／ａｓｕｍ、ａ［２］／ａｓｕｍ、．．．、ａ［max］／ａｓｕｍをさらに算出する。そして、これらを確率密度とみなした時の情報エントロピーＨを算出する。具体的には、以下のように表される。
Ｈ＝Σ^max _i=0ａ［ｉ］／ａｓｕｍ＊ｌｏｇ（ａ［ｉ］／ａｓｕｍ） For example, the second parameter value calculation unit 1034 calculates the sum asum (= a [0] + a [1] +... + A [max]) of a [i] for each sub-period, and a [0 ] / Asum, a [1] / asum, a [2] / asum,. . . , A [max] / asum is further calculated. Then, information entropy H when these are regarded as probability density is calculated. Specifically, it is expressed as follows.
^{_{H = Σ max i = 0 a}} [i] / asum * log (a [i] / asum)

このようにすれば、サブ期間ごとのスペクトルエントロピーが得られる。そして、サブ期間のスペクトルエントロピーの平均値を算出することで、所定期間のスペクトルエントロピーを算出する。なお、平均値ではなく、中央値その他の統計量を用いるようにしても良い。 In this way, spectral entropy for each sub-period can be obtained. And the spectrum entropy of a predetermined period is calculated by calculating the average value of the spectrum entropy of a sub period. Note that the median or other statistics may be used instead of the average value.

さらに、第３パラメータ値算出部１０３５は、第１データ格納部１０３２に格納されているデータを用いて第３パラメータ値を算出し、第２データ格納部１０４に格納する（ステップＳ７）。 Further, the third parameter value calculation unit 1035 calculates the third parameter value using the data stored in the first data storage unit 1032 and stores it in the second data storage unit 104 (step S7).

例えば、第３パラメータ値算出部１０３５は、上で述べたように、サブ期間毎にケプストラムを算出する。ケプストラムでは、高次にスペクトル微細構造が現れることが知られている。従って、例えば悲鳴の基本周波数の範囲に対応するケフレンシの範囲におけるケプストラムの最大値を、サブ期間毎に特定する。なお、ケフレンシの範囲は、例えば周波数であれば７０−６００Ｈｚに相当する次数の範囲であり、サンプリング周波数１６０００Ｈｚのとき２７次から２２９次である。このケプストラムの最大値が、各サブ期間の調波構造強度である。そして、サブ期間の調波構造強度の平均値を算出することで、所定期間の調波構造強度を算出する。なお、平均値ではなく、中央値その他の統計量を用いるようにしても良い。 For example, the third parameter value calculation unit 1035 calculates a cepstrum for each sub period as described above. In cepstrum, it is known that higher order spectral fine structure appears. Therefore, for example, the maximum value of the cepstrum in the quefrency range corresponding to the range of the fundamental frequency of scream is specified for each sub period. Note that the range of quefrency is, for example, a range of orders corresponding to 70-600 Hz in the case of a frequency, and is 27th to 229th when the sampling frequency is 16000 Hz. The maximum value of this cepstrum is the harmonic structure intensity in each sub-period. And the harmonic structure intensity | strength of a predetermined period is calculated by calculating the average value of the harmonic structure intensity | strength of a sub period. Note that the median or other statistics may be used instead of the average value.

また、第４パラメータ値算出部１０３６は、第１データ格納部１０３２に格納されているデータを用いて第４パラメータ値を算出し、第２データ格納部１０４に格納する（ステップＳ９）。 Further, the fourth parameter value calculation unit 1036 calculates the fourth parameter value using the data stored in the first data storage unit 1032 and stores it in the second data storage unit 104 (step S9).

例えば、第４パラメータ値算出部１０３６は、以下の算式に従って、インデックスｃｏｇを算出する。
cog＝（a[0]*0 + a[1]*1 + a[2]*2 + a[3]*3 + ・・・・+ a[max]*max）／asum For example, the fourth parameter value calculation unit 1036 calculates the index cog according to the following formula.
cog = (a [0] * 0 + a [1] * 1 + a [2] * 2 + a [3] * 3 +... + a [max] * max) / asum

このインデックスcogが、サブ期間のスペクトル重心となる。よって、サブ期間のスペクトル重心の平均値を算出することで、所定期間のスペクトル重心を算出する。なお、平均値ではなく、中央値その他の統計量であっても良い。 This index cog is the spectral centroid of the sub-period. Therefore, by calculating the average value of the spectral centroids of the sub-periods, the spectral centroid of the predetermined period is calculated. Note that the median and other statistics may be used instead of the average value.

以上第１乃至第４パラメータ値の算出を説明したが、これらの処理は並列に実行するようにしても良いし、その実行順番は問わない。なお、ケプストラムの計算についても、前処理部１０３１に実行させるようにしても良い。また、ケプストラムの計算を先に行ったパラメータ算出部が他のパラメータ算出部に処理結果を出力するようにしても良い。 Although the calculation of the first to fourth parameter values has been described above, these processes may be executed in parallel, and the execution order is not limited. The cepstrum calculation may also be executed by the preprocessing unit 1031. In addition, the parameter calculation unit that has previously calculated the cepstrum may output the processing result to another parameter calculation unit.

そうすると、判定部１０５は、第２データ格納部１０４に格納されている第１乃至第３パラメータ値について予め定められたいずれかの条件に合致するか否かを判定する（ステップＳ１１）。図２に示すような傾向があるので、第１乃至第３パラメータ値について設定された閾値に基づき、悲鳴、防犯砂利を踏みしめた時の音、爆発音又はガラス破壊音のいずれかの条件に合致するか否かを判定する。処理は端子Ａを介して図４に移行する。 Then, the determination unit 105 determines whether or not any of the predetermined conditions for the first to third parameter values stored in the second data storage unit 104 is met (step S11). Since there is a tendency as shown in FIG. 2, it matches one of the conditions of scream, sound when stepping on crime prevention gravel, explosion sound or glass breaking sound based on the threshold values set for the first to third parameter values It is determined whether or not to do. The processing shifts to FIG.

悲鳴の条件を満たしている場合、すなわち第１パラメータ値が「低」範囲に入り、第２パラメータ値が「高」範囲に入り、第３パラメータ値が「高」範囲に入っていれば（ステップＳ１３：Ｙｅｓルート）、判定部１０５は、出力部１０６に、悲鳴を表す通知を出力させる（ステップＳ１５）。悲鳴を表す通知は、警告音でも音声メッセージでも表示メッセージでも他の装置への命令であってもよい。そして処理はステップＳ３１に移行する。 If the screaming condition is satisfied, that is, if the first parameter value is in the “low” range, the second parameter value is in the “high” range, and the third parameter value is in the “high” range (step S13: Yes route), the determination unit 105 causes the output unit 106 to output a notification indicating scream (step S15). The notification indicating the scream may be a warning sound, a voice message, a display message, or a command to another device. Then, the process proceeds to step S31.

一方、悲鳴の条件を満たしていない場合（ステップＳ１３：Ｎｏルート）であって、爆発音又はガラス破壊音の条件を満たしている場合、すなわち、第１パラメータ値が「高」範囲に入り、第２パラメータ値が「高」範囲に入り、第３パラメータ値が「低」範囲に入っていれば（ステップＳ１７：Ｙｅｓルート）、判定部１０５は、第２データ格納部１０４に格納されている第４パラメータ値による判定を実行する（ステップＳ１９）。上でも述べたように、爆発音とガラス破壊音を区別するための閾値（一般的には範囲を表す値）に基づき、いずれであるかを判定する。ガラス破壊音であれば（ステップＳ２１：Ｙｅｓルート）、判定部１０５は、出力部１０６に、ガラス破壊音を表す通知を出力させる（ステップＳ２３）。通知はステップＳ１５と同様な態様で行われる。そして処理はステップＳ３１に移行する。 On the other hand, when the conditions for screaming are not satisfied (step S13: No route) and the conditions for explosion sound or glass breaking sound are satisfied, that is, the first parameter value falls within the “high” range, If the second parameter value falls within the “high” range and the third parameter value falls within the “low” range (step S17: Yes route), the determination unit 105 stores the second parameter value stored in the second data storage unit 104. The determination based on the four parameter values is executed (step S19). As described above, it is determined based on a threshold value (generally a value representing a range) for distinguishing between explosion sound and glass breaking sound. If it is a glass breaking sound (step S21: Yes route), the determination unit 105 causes the output unit 106 to output a notification representing the glass breaking sound (step S23). Notification is performed in the same manner as in step S15. Then, the process proceeds to step S31.

一方、ガラス破壊音でなければ（ステップＳ２１：Ｎｏルート）、判定部１０５は、出力部１０６に、爆発音を表す通知を出力させる（ステップＳ２５）。通知はステップＳ１５と同様な態様で行われる。そして処理はステップＳ３１に移行する。 On the other hand, if it is not a glass breaking sound (step S21: No route), the determination unit 105 causes the output unit 106 to output a notification indicating an explosion sound (step S25). Notification is performed in the same manner as in step S15. Then, the process proceeds to step S31.

また、爆発音又はガラス破壊音の条件を満たしていない場合（ステップＳ１７：Ｎｏルート）であって、防犯砂利を踏みしめた時の音の条件を満たしている場合、すなわち、第１パラメータ値が「低」範囲に入り、第２パラメータ値が「高」範囲に入り、第３パラメータ値が「低」範囲に入る場合には（ステップＳ２７：Ｙｅｓルート）、判定部１０５は、出力部１０６に、防犯砂利を踏みしめた時の音を表す通知を出力させる（ステップＳ２９）。通知はステップＳ１５と同様な態様で行われる。そして処理はステップＳ３１に移行する。 Further, when the conditions of explosion sound or glass breaking sound are not satisfied (step S17: No route) and the sound conditions when the crime prevention gravel is stepped on, that is, the first parameter value is “ When the second parameter value enters the “high” range and the third parameter value enters the “low” range (step S27: Yes route), the determination unit 105 A notification representing a sound when the crime prevention gravel is stepped on is output (step S29). Notification is performed in the same manner as in step S15. Then, the process proceeds to step S31.

一方、防犯砂利を踏みしめた時の音の条件を満たさない場合には（ステップＳ２７：Ｎｏルート）、防犯上識別すべき所定の音が検出されなかったことになるので、処理はステップＳ３１に移行する。 On the other hand, when the sound conditions when the crime prevention gravel is stepped on are not satisfied (step S27: No route), the predetermined sound that should be identified for crime prevention is not detected, so the process proceeds to step S31. To do.

ステップＳ３１では、例えば管理者などによって処理終了を指示されていないと例えば前処理部１０３１が判断しなければ（ステップＳ３１：Ｎｏルート）、処理は端子Ｂを介してステップＳ１に戻る。一方、処理終了が指示されたと判断されれば、処理は終了する。 In step S31, for example, if the pre-processing unit 1031 does not determine that the process is not instructed by an administrator or the like (step S31: No route), the process returns to step S1 via the terminal B. On the other hand, if it is determined that the end of the process has been instructed, the process ends.

以上のように処理を行えば、防犯上識別すべき所定の音の発生を精度良く検出することができる。 By performing the processing as described above, it is possible to accurately detect the occurrence of a predetermined sound to be identified for crime prevention.

なお、図４の処理フローでは、悲鳴、爆発音又はガラス破壊音、防犯砂利を踏みしめた時の音の順番で判定を行ったが、この判定順番でなくても良い。また、これらの音を区別することを求められないのであれば、いずれかの条件を満たした時点で、防犯上識別すべき所定の音の検出を表す通知を出力するようにしても良い。 In the processing flow of FIG. 4, the determination is made in the order of scream, explosion sound or glass breaking sound, and sound when stepping on crime prevention gravel, but this determination order may not be used. If it is not required to distinguish these sounds, a notification indicating detection of a predetermined sound to be identified for crime prevention may be output when any of the conditions is satisfied.

さらに、爆発音とガラス破壊音とを区別することを求められない場合には、第４パラメータ値の算出及びそれに基づく判定を行わなくても良い。 Furthermore, when it is not required to distinguish between explosion sound and glass breaking sound, it is not necessary to calculate the fourth parameter value and make a determination based thereon.

以上本発明の実施の形態を説明したが、本発明はこれに限定されるものではない。例えば、図１に示した機能ブロック構成は一例であって、プログラムモジュール構成とは一致しない場合もある。さらに、図３及び図４の処理フローも一例であり、処理結果が変わらない限り処理順番を入れ替えたり、並列実行するようにしてもよい。 Although the embodiment of the present invention has been described above, the present invention is not limited to this. For example, the functional block configuration shown in FIG. 1 is an example, and may not match the program module configuration. Furthermore, the processing flows in FIGS. 3 and 4 are also examples, and the processing order may be changed or may be executed in parallel as long as the processing result does not change.

また、各パラメータ値について閾値を決定する例を示したが、防犯上識別すべき音の種類毎に値域が決定される場合もある。但し、図２に示すような傾向は保持される。 Moreover, although the example which determines a threshold value about each parameter value was shown, the value range may be determined for every kind of sound which should be identified for crime prevention. However, the tendency as shown in FIG. 2 is maintained.

また、上では閾値等を実験などにより定めて判定部１０５で判定することを述べたが、例えば、音の種類と上記の３種類又は４種類のパラメータ値との組み合わせを機械学習その他の手法によって学習させて判定部１０５を構成するようにしても良い。 Further, in the above description, the threshold value and the like are determined by experiments and the determination unit 105 determines, but for example, a combination of a sound type and the above three types or four types of parameter values is determined by machine learning or other methods. The determination unit 105 may be configured by learning.

なお、上で述べた情報処理装置１００は、コンピュータ装置であって、メモリとＣＰＵ（Central Processing Unit）とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）と表示装置に接続される表示制御部とリムーバブル・ディスク用のドライブ装置と入力装置とネットワークに接続するための通信制御部とがバスで接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤに格納されており、ＣＰＵにより実行される際にはＨＤＤからメモリに読み出される。ＣＰＵは、アプリケーション・プログラムの処理内容に応じて表示制御部、通信制御部、ドライブ装置を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリに格納されるが、ＨＤＤに格納されるようにしてもよい。本発明の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスクに格納されて頒布され、ドライブ装置からＨＤＤにインストールされる。インターネットなどのネットワーク及び通信制御部を経由して、ＨＤＤにインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ、メモリなどのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatus 100 described above is a computer apparatus, and includes a memory, a CPU (Central Processing Unit), a hard disk drive (HDD), a display control unit connected to the display device, and a removable device. A disk drive device, an input device, and a communication control unit for connecting to a network are connected by a bus. An operating system (OS) and an application program for performing the processing in this embodiment are stored in the HDD, and are read from the HDD to the memory when executed by the CPU. The CPU controls the display control unit, the communication control unit, and the drive device in accordance with the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory, but may be stored in the HDD. In the embodiment of the present invention, an application program for performing the above-described processing is stored and distributed on a computer-readable removable disk, and installed from the drive device to the HDD. In some cases, the HDD is installed via a network such as the Internet and a communication control unit. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU and memory described above with programs such as the OS and application programs.

以上述べた本実施の形態をまとめると以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態に係る判定方法は、（Ａ）入力された音データに対して、音データに係る音のスペクトルの変動度合いを表す第１のパラメータ値と、音データに係る音の白色度合いを表す第２のパラメータ値と、音データに係る音における調波構造の度合いを表す第３のパラメータ値とを算出する算出ステップと、（Ｂ）第１のパラメータ値と第２のパラメータ値と第３のパラメータ値とに基づき、音データに、防犯上識別すべき所定の音が含まれるか否かを判定する判定ステップとを含む。 In the determination method according to the present embodiment, (A) the first parameter value representing the degree of fluctuation of the spectrum of the sound related to the sound data and the whiteness level of the sound related to the sound data are input to the input sound data. A calculation step of calculating a second parameter value to be expressed and a third parameter value representing a degree of the harmonic structure in the sound related to the sound data; (B) the first parameter value, the second parameter value, and the second parameter value; And a determination step of determining whether or not the sound data includes a predetermined sound to be identified for crime prevention based on the parameter value of 3.

このような３種類のパラメータ値を判定指標として用いることによって、防犯上識別すべき所定の音の検出精度が高くなる。 By using these three types of parameter values as the determination index, the detection accuracy of a predetermined sound to be identified for crime prevention is increased.

なお、上で述べた判定ステップにおいて、第１のパラメータ値と第２のパラメータ値と第３のパラメータ値とに基づき、上記音データが、少なくとも悲鳴、防犯砂利を踏みしめた時の音、及びガラスの破壊又は爆発音のいずれを含むか判定するようにしても良い。防犯上識別すべき音の種類を特定しても良いし、種類を特定しないようにしても良い。 In the determination step described above, based on the first parameter value, the second parameter value, and the third parameter value, the sound data includes at least a scream, a sound when stepping on crime prevention gravel, and glass It may be determined whether to include destruction or explosion sound. The type of sound that should be identified for crime prevention may be specified, or the type may not be specified.

また、上で述べた算出ステップが、音データに係る音の主要な周波数を表す第４のパラメータ値を算出するステップを含むようにしても良い。この場合、上で述べた判定ステップが、第４のパラメータ値に基づき、ガラスの破壊音と爆発音とのいずれであるかを判定するステップをさらに含むようにしても良い。 Further, the calculation step described above may include a step of calculating a fourth parameter value representing a main frequency of the sound related to the sound data. In this case, the determination step described above may further include a step of determining whether the sound is a glass breaking sound or an explosion sound based on the fourth parameter value.

なお、上で述べた第１のパラメータ値が、例えば、音データに係る音のスペクトル包絡の変動を表す値と、音データに係る音のスペクトルの変動を表す値とのいずれかである場合もある。 Note that the first parameter value described above may be, for example, one of a value representing the fluctuation of the sound spectrum envelope related to the sound data and a value representing the fluctuation of the sound spectrum related to the sound data. is there.

さらに、上で述べた第２のパラメータ値が、例えば、音データに係る音のスペクトルを確率分布とみなして算出される情報エントロピーである場合もある。これは、スペクトルエントロピーとも呼ばれる。 Furthermore, the second parameter value described above may be, for example, information entropy calculated by regarding a sound spectrum related to sound data as a probability distribution. This is also called spectral entropy.

さらに、上で述べた第３のパラメータ値が、例えば、音データに係る音のケプストラムにおける所定範囲内の最大値である場合もある。これは、調波構造強度とも呼ばれる。 Furthermore, the third parameter value described above may be, for example, a maximum value within a predetermined range in a sound cepstrum related to sound data. This is also called harmonic structure strength.

さらに、上で述べた第４のパラメータ値が、例えば、音データに係る音のスペクトルの重心周波数である場合もある。これは、スペクトル重心とも呼ばれる。 Further, the fourth parameter value described above may be, for example, the barycentric frequency of the sound spectrum related to the sound data. This is also called the spectral centroid.

なお、上記処理を実行するためのプログラムを作成することができ、当該プログラムは、例えばフレキシブルディスク、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭなど）、光磁気ディスク、半導体メモリ、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。尚、中間的な処理結果はメインメモリ等の記憶装置に一時保管される。 A program for executing the above processing can be created, and the program can be read by a computer such as a flexible disk, optical disk (CD-ROM, DVD-ROM, etc.), magneto-optical disk, semiconductor memory, hard disk, etc. Stored in a storage medium or storage device. The intermediate processing result is temporarily stored in a storage device such as a main memory.

１００情報処理装置
１０２音データ格納部
１０３算出部
１０４第２データ格納部
１０５判定部
１０６出力部
１０３１前処理部
１０３２第１データ格納部
１０３３第１パラメータ値算出部
１０３４第２パラメータ値算出部
１０３５第３パラメータ値算出部
１０３６第４パラメータ値算出部 100 information processing apparatus 102 sound data storage unit 103 calculation unit 104 second data storage unit 105 determination unit 106 output unit 1031 preprocessing unit 1032 first data storage unit 1033 first parameter value calculation unit 1034 second parameter value calculation unit 1035 3 parameter value calculator 1036 Fourth parameter value calculator

Claims

For the input sound data, a first parameter value representing a degree of fluctuation of a sound spectrum related to the sound data, a second parameter value representing a sound whiteness degree related to the sound data, and the sound data A calculation step of calculating a third parameter value representing a degree of the harmonic structure in the sound according to
A determination step of determining whether or not the sound data includes a predetermined sound to be identified for crime prevention based on the first parameter value, the second parameter value, and the third parameter value;
A program that causes a computer to execute.

In the determination step,
Based on the first parameter value, the second parameter value, and the third parameter value, the sound data includes at least a scream, a sound when the crime prevention gravel is stepped, and a glass breakage or explosion sound The program according to claim 1, wherein it is determined whether or not it is included.

The calculating step comprises:
Calculating a fourth parameter value representing a main frequency of the sound related to the sound data,
The determination step includes
The program according to claim 2, further comprising: determining whether the glass breaking sound or the explosion sound is generated based on the fourth parameter value.

The first parameter value is any one of a value representing a variation in a sound spectrum envelope associated with the sound data and a value representing a variation in a sound spectrum associated with the sound data. A program according to any one of the above.

The program according to any one of claims 1 to 4, wherein the second parameter value is information entropy calculated by regarding a sound spectrum related to the sound data as a probability distribution.

The program according to any one of claims 1 to 5, wherein the third parameter value is a maximum value within a predetermined range in a sound cepstrum related to the sound data.

The program according to claim 3, wherein the fourth parameter value is a centroid frequency of a spectrum of a sound related to the sound data.

For the input sound data, a first parameter value representing a degree of fluctuation of a sound spectrum related to the sound data, a second parameter value representing a sound whiteness degree related to the sound data, and the sound data A calculation step of calculating a third parameter value representing a degree of the harmonic structure in the sound according to
A determination step of determining whether or not the sound data includes a predetermined sound to be identified for crime prevention based on the first parameter value, the second parameter value, and the third parameter value;
And a determination method executed by a computer.

For the input sound data, a first parameter value representing a degree of fluctuation of a sound spectrum related to the sound data, a second parameter value representing a sound whiteness degree related to the sound data, and the sound data A calculation unit for calculating a third parameter value representing a degree of the harmonic structure in the sound according to
A determination unit for determining whether or not the sound data includes a predetermined sound to be identified for crime prevention based on the first parameter value, the second parameter value, and the third parameter value;
An information processing apparatus.