JP2005172548A

JP2005172548A - Supervisory device and database construction method

Info

Publication number: JP2005172548A
Application number: JP2003411375A
Authority: JP
Inventors: Yasuhisa Kuroda; 靖尚黒田
Original assignee: Nippon Sheet Glass Co Ltd
Current assignee: Nippon Sheet Glass Co Ltd
Priority date: 2003-12-10
Filing date: 2003-12-10
Publication date: 2005-06-30

Abstract

<P>PROBLEM TO BE SOLVED: To offer a supervisory device capable of dispensing with signals from many microphones, as well as complicated data processings and accumulation of a huge amount of data. <P>SOLUTION: A signal from a microphone 4 is inputted into a sound board 100 installed in a personal computer 2. The inputted sound data are sampled and are stored in a memory 102 as digital data. Distribution of the integrated intensity of the frequency spectrum for each range divided into a predetermined frequency band is defined as a phoneme by a CPU(central processing unit) 104 for the frequency spectrum of the sound collected. The phonemes are accumulated in the database; and if the coefficient of correlation between the stored phonemes in the database of sound and the phonemes of the newly collected sound is below a predetermined value, it is decided that the new sound is an abnormal sound, and a predetermined action is performed. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、監視すべき領域の状況や、例えば連続運転や無人運転されているような機器の状態を把握するための監視装置に関する。さらに、このような監視装置に有用な音のデータベースを構築するためのデータベース構築方法に関する。 The present invention relates to a monitoring device for grasping the status of a region to be monitored and the state of a device that is operated continuously or unattended, for example. Furthermore, the present invention relates to a database construction method for constructing a sound database useful for such a monitoring device.

従来より、建物などへの不法侵入や火災の発生などの異常事態を監視するための監視装置が利用されている。これら監視装置の多くは、異常事態の把握のための検出に、監視カメラや赤外線センサなどを用いてきた。 Conventionally, a monitoring device for monitoring an abnormal situation such as illegal entry into a building or the occurrence of a fire has been used. Many of these monitoring apparatuses have used a monitoring camera, an infrared sensor, or the like for detection for grasping an abnormal situation.

例えば監視カメラでは、その画像を監視するための要員が必要であったり、無人で監視するためには、複雑な画像認識処理およびそのための装置が必要となっていた。また赤外線センサでは、小動物などの発熱源にも反応するなどの問題があった。 For example, in a surveillance camera, personnel for monitoring the image are required, or in order to monitor without an operator, a complicated image recognition process and a device for the same are required. In addition, the infrared sensor has a problem of reacting to a heat source such as a small animal.

さらに建物への不法侵入を検出するためには、窓や扉などの建物の開口部に数多くのセンサを設置する必要があった。このようなセンサは、侵入者に見つけられる虞があり、意識的にこれらを破壊される可能性もある。さらに侵入者によって、建物への電気の供給が断たれてしまうと、これらの監視装置が機能しなくなる虞もある。 Furthermore, in order to detect illegal intrusion into a building, it was necessary to install a number of sensors at the opening of the building such as windows and doors. Such sensors can be found by intruders and can be consciously destroyed. Furthermore, if the intruder cuts off the supply of electricity to the building, these monitoring devices may not function.

特開２０００−２６８２６５公報には、マイクを使用した監視装置の提案がなされている。すなわち、『周囲の音を収集するためのマイクと、予め、様々な物から発する音を示す情報と、その音の発信源やその音の属性などのような「その音に関する情報」と関連付けて記録しておく「音のデータベース」と、前記マイクからの音を示す情報を、前記「音のデータベース」に記録された音を示す情報と比較する比較手段と、前記比較手段からの情報に基づいて、監視エリア内での異常発生又はその可能性の有無を判定する判定手段と、前記判定手段からの情報に基づいて、ユーザーに所定の警告をするための警告手段と、を備えたことを特徴とするマイクを使用した監視装置』である。 Japanese Patent Laid-Open No. 2000-268265 proposes a monitoring device using a microphone. In other words, it is related to “information about the sound” such as a microphone for collecting surrounding sounds, information indicating the sound emitted from various objects, and the sound source and the attribute of the sound. A "sound database" to be recorded, comparison means for comparing information indicating sound from the microphone with information indicating sound recorded in the "sound database", and information based on the information from the comparison means Determination means for determining whether or not there is an abnormality in the monitoring area or the possibility thereof, and warning means for giving a predetermined warning to the user based on information from the determination means. A monitoring device using a characteristic microphone ”.

さらに特開平６−１６０１７２号公報には、「異常検出装置」として以下の提案がなされている。すなわち、『監視対象が発生する音波を分析して、前記監視対象の異常を検出する異常検出装置において、前記監視対象の診断部分で発生する音波を集音する集音マイクと、前記集音マイクにより集音された音波のパワースペクトラム分析を行なうと共に、当該分析値の時系列的な変化率分析を行なう波形分析手段と、前記監視対象が正常な時に前記波形分析手段により得られる音の特徴パターンを、前記監視対象の各運転条件毎にそれぞれ基準値として記憶しているデータベース手段と、前記波形分析手段による分析値と、その時の前記監視対象の運転条件に対応した前記データベース手段内の基準値とを比較し、異常の有無を判定する比較演算手段と、前記比較演算手段による比較演算に基づいて異常報知すると共に、前記分析値の変化率の傾向を管理して前記監視対象の寿命を予測する監視手段と、を備えて成ることを特徴とする異常検出装置』である。 Further, Japanese Patent Laid-Open No. 6-160172 proposes the following as an “abnormality detection device”. That is, “a sound collection microphone that collects sound waves generated in a diagnostic part of the monitoring target in the abnormality detection device that analyzes sound waves generated by the monitoring target and detects an abnormality of the monitoring target; and the sound collection microphone A waveform analysis means for performing a power spectrum analysis of the sound waves collected by the apparatus and performing a time-series change rate analysis of the analysis value, and a feature pattern of the sound obtained by the waveform analysis means when the monitoring target is normal Is stored as a reference value for each operating condition of the monitoring target, an analysis value by the waveform analysis means, and a reference value in the database means corresponding to the operating condition of the monitoring target at that time And a comparison calculation means for determining presence / absence of abnormality, an abnormality notification based on a comparison calculation by the comparison calculation means, and a change in the analysis value A monitoring means for predicting the monitored life to manage the trend of the abnormal detector ", characterized in that it comprises an.

またさらに特開平９−１６６４８３号公報（特許文献３）には、「機器監視方法及びその装置」として以下の提案がなされている。すなわち、『監視対象からの音響を音響センサで検出して監視対象機器の異常の有無を監視する方法において、監視時の音響検出信号の周波数スペクトルから、同一の監視対象の正常音について予め記憶した周波数スペクトルを減算し、減算して得られた周波数スペクトルパターンと予め既知の異常音及び外乱音に対して記憶された周波数スペクトルパターンとのパターンの類似度を演算し、前記類似度が予め設定した値以上の場合に既知の異常音又は外乱音と判定処理して外乱音の除去あるいは異常音有無の判定を行うことを特徴とする機器監視方法』である。 Furthermore, Japanese Patent Laid-Open No. 9-166483 (Patent Document 3) proposes the following as “apparatus monitoring method and apparatus”. That is, “in the method of monitoring the presence or absence of abnormality of the monitoring target device by detecting the sound from the monitoring target with the acoustic sensor, the normal sound of the same monitoring target is stored in advance from the frequency spectrum of the sound detection signal at the time of monitoring. The frequency spectrum is subtracted, and the similarity between the frequency spectrum pattern obtained by the subtraction and the frequency spectrum pattern stored in advance for the known abnormal sound and disturbance sound is calculated, and the similarity is set in advance. A device monitoring method characterized in that if it is equal to or greater than the value, it is determined as a known abnormal sound or disturbance sound, and the disturbance sound is removed or the presence or absence of the abnormal sound is determined.

さらに特開２０００−２５９２２２公報には、機器監視・予防保全システムとして以下の提案がなされている。すなわち、『機器にその状態変化を二次的に捕らえることのできる多次元のセンサを配設し、前記センサ群からの情報を基に、前記機器の状態変化や保守管理を実現する管理機器を配設し、前記センサからの情報により、前記管理機器内に具備された機器の劣化診断と速度予測する計算手段にて、前記機器の状態の監視やその状態変化を予測し、さらにその結果を前記管理機器の表示手段にて開示し、前記管理機器にはこれらのデータを保存・蓄積するデータサーバ部を具備し、前記管理機器に具備した入力手段にて、この情報の編集・追加・登録をできるように構成する』機器監視・予防保全システムである。 Further, JP 2000-259222 A proposes the following as a device monitoring / preventive maintenance system. That is, “a management device that arranges a multi-dimensional sensor capable of secondarily capturing the state change in the device and realizes the state change and maintenance management of the device based on information from the sensor group. In accordance with the information from the sensor, a calculation means for diagnosing deterioration and speed prediction of the equipment provided in the management equipment is used to monitor the state of the equipment and predict the state change, Disclosed by the display means of the management device, the management device has a data server unit for storing and storing these data, and editing / addition / registration of this information by the input means provided in the management device It is an equipment monitoring and preventive maintenance system.

加えてその請求項２の発明は、『・・・前記機器の動作状況の状態変化の固有モードを決定する基準空間を作成する計算機能（計算手段又はアルゴリズム）を具備してあることを特徴とする機器監視・予防保全システム』である。 In addition, the invention of claim 2 is characterized by comprising a calculation function (calculation means or algorithm) for creating a reference space for determining an eigenmode of a state change of the operation status of the device. Equipment monitoring / preventive maintenance system ”.

さらに請求項４の発明は、『・・・前記機器の基準空間は前記センサ群のデータより、
（１）：前記機器の固有モードの空間（基準空間）を作成するデータを収集し、そのデータ数は、少なくとも前記機器に取り付けたセンサの数以上であるが、その２から３倍数である。
（２）：前記機器の（１）項のデータを前記各要素毎（センサ毎）に、その平均値と標準偏差で正規化する。
（３）：（２）項のデータから、前記各要素の数のみで決定される２次元の相関行列を作成し、さらにこの行列を加工して、前記行列のマハラノビスの距離を求め、これらの距離で決定した分布状態(その平均値は０で、その標準偏差は約１)を前記機器の基準空間としてあることを特徴とする機器監視・予防保全システム』である。 Furthermore, the invention of claim 4 is as follows: "... the reference space of the device is based on data of the sensor group
(1): Data for creating a space (reference space) of the eigenmode of the device is collected, and the number of data is at least equal to or more than the number of sensors attached to the device, but two to three times that number.
(2): The data of the item (1) of the device is normalized by the average value and standard deviation for each element (for each sensor).
(3): Create a two-dimensional correlation matrix determined only by the number of each element from the data in (2), and further process this matrix to obtain the Mahalanobis distance of the matrix, A device monitoring / preventive maintenance system characterized in that a distribution state determined by distance (the average value is 0 and the standard deviation is about 1) is used as the reference space of the device.

特開２０００−２６８２６５公報JP 2000-268265 A 特開平６−１６０１７２号公報JP-A-6-160172 特開平９−１６６４８３号公報Japanese Patent Laid-Open No. 9-166483 特開２０００−２５９２２２公報JP 2000-259222 A

特開２０００−２６８２６５公報には、上述したようにマイクを使用した監視装置が開示されているが、発明の実施の形態の説明によれば、『「音のデータベース」５には、様々な膨大な数及び種類の音が波形データ（デジタルデータ）として記録され』とある。さらに、『これらの各音を示す波形データは、その音の発信源の名称及びその音の属性情報（その音がどういう状況の時に発生したかなどの情報）などの「その音に関する関連情報」と関連づけられて、記録されている』とある。 Japanese Patent Laid-Open No. 2000-268265 discloses a monitoring device using a microphone as described above. However, according to the description of the embodiment of the invention, the “sound database” 5 has various enormous volumes. A great number and types of sounds are recorded as waveform data (digital data) ”. In addition, “the waveform data indicating each sound is“ related information about the sound ”such as the name of the sound source and the attribute information of the sound (information such as when the sound occurred) It is related to and recorded.

つまり実際の監視作業に際しては、このように膨大なデータをその関連情報と共に、「音のデータベース」として構築する必要がある。これにかかる作業負担は、決して小さいものではない。また音の取り込みに関しても、マイクを複数本用いた例が示されている。 That is, in actual monitoring work, it is necessary to construct such a huge amount of data together with related information as a “sound database”. The work burden on this is not small. An example of using a plurality of microphones is also shown regarding sound capturing.

さらに特開平６−１６０１７２号公報に記載の「異常検出装置」では、『監視対象の運転条件が変わっても異常判定を確実に行なう』ために、『集音された音波のパワースペクトラム分析を行なうと共に、当該分析値の時系列的な変化率分析を行なう』ことを必要としている。 Further, in the “abnormality detection apparatus” described in Japanese Patent Laid-Open No. 6-160172, “a power spectrum analysis of collected sound waves is performed in order to“ determine abnormality determination reliably even if the monitored operating condition changes ”. In addition, it is necessary to perform a time-series change rate analysis of the analysis value.

またさらに特開平９−１６６４８３号公報に記載の「機器監視方法及びその装置」では、『監視音から異常音或いは外乱音を分離して異常を監視する』ために、『監視時の音響検出信号の周波数スペクトルから、同一の監視対象の正常音について予め記憶した周波数スペクトルを減算し、減算して得られた周波数スペクトルパターンと予め既知の異常音及び外乱音に対して記憶された周波数スペクトルパターンとのパターンの類似度を演算』することを必要としている。 Furthermore, in the “apparatus monitoring method and apparatus thereof” described in Japanese Patent Application Laid-Open No. 9-166483, an “acoustic detection signal at the time of monitoring” is used in order to “monitor abnormalities by separating abnormal sounds or disturbance sounds from monitoring sounds”. Subtracting the frequency spectrum stored in advance for the normal sound of the same monitoring target from the frequency spectrum of the frequency spectrum, the frequency spectrum pattern obtained by subtraction, and the frequency spectrum pattern stored in advance for known abnormal sounds and disturbance sounds It is necessary to calculate the similarity of the pattern.

さらに特開２０００−２５９２２２公報に記載の「機器監視・予防保全システム」では、『機器が外的要因或いは内的要因により、異常或いは劣化となる兆候を捉える方法・診断の確度を向上する』ために、まず「基準空間」を作成し、続いて『２次元の相関行列を作成し、さらにこの行列を加工して、前記行列のマハラノビスの距離を求め』ている。 Furthermore, in the “equipment monitoring / preventive maintenance system” described in Japanese Patent Laid-Open No. 2000-259222, “to improve the accuracy of a method / diagnosis in which a device detects an abnormality or deterioration due to an external factor or an internal factor”. First, a “reference space” is created, and then “a two-dimensional correlation matrix is created and further processed to obtain the Mahalanobis distance of the matrix”.

しかしながら、この「基準空間」を作成するデータは、『少なくとも前記機器に取り付けたセンサの数以上であるが、その２から３倍数』であり、その『データを前記各要素毎（センサ毎）に、その平均値と標準偏差で正規化』し、『前記各要素の数のみで決定される２次元の相関行列を作成』する必要がある。つまり、このようなやり方では、多くのセンサを必要とし、さらに数多くのデータ加工を必要としている。 However, the data for creating this “reference space” is “at least more than the number of sensors attached to the device, but two to three times that number”, and the “data for each element (each sensor)”. It is necessary to “normalize by the average value and standard deviation” and “create a two-dimensional correlation matrix determined only by the number of each element”. In other words, such a method requires a lot of sensors and a lot of data processing.

そこで本発明は、上述した従来技術のように、例えば数多くのマイクロフォンからの信号を必要とせず、また複雑なデータ加工処理や膨大なデータの蓄積も必要としない監視装置であり、しかもそのデータベースも自動的に構築できるような監視装置の提供を目的とする。 Therefore, the present invention is a monitoring device that does not require signals from a large number of microphones, for example, does not require complicated data processing, and does not require the accumulation of enormous amounts of data as in the prior art described above, and also has a database thereof. The purpose is to provide a monitoring device that can be automatically constructed.

さらには、このような音のデータベースを簡便に構築できるデータベース構築方法の提供を目的とする。 Furthermore, it aims at providing the database construction method which can construct | assemble such a sound database simply.

本発明の本発明の第１形態である監視装置は、請求項１に記載の発明として、
収集された音の周波数スペクトルに対して、所定の周波数帯域に分割された範囲毎の、前記周波数スペクトルの積分強度の分布が音素として定義され、該音素を蓄積するデータベースと、
前記データベースに蓄積されている各音素と、新たに収集された音の音素との相関係数が異常判断用の所定値以下であれば、前記新たに収集された音が異常な音であると判断する音素判断部と、
前記音素判断部において、前記新たに収集された音が異常な音であると判断された場合に、予め決められたアクションを実行するアクション実行部とを含むことを特徴とする監視装置である。 The monitoring device according to the first aspect of the present invention is the invention according to claim 1,
For the collected frequency spectrum of the sound, a distribution of the integrated intensity of the frequency spectrum for each range divided into a predetermined frequency band is defined as a phoneme, and a database for storing the phoneme;
If the correlation coefficient between each phoneme stored in the database and the phoneme of the newly collected sound is equal to or less than a predetermined value for abnormality determination, the newly collected sound is an abnormal sound. A phoneme determination unit to determine;
The monitoring device includes an action execution unit that executes a predetermined action when the phoneme determination unit determines that the newly collected sound is an abnormal sound.

請求項２に記載の発明は、
請求項１に記載の監視装置において、
前記データベースは、蓄積された音素の発生頻度が求められており、
さらに前記蓄積された音素同士の相関係数が求められており、
前記発生頻度の大きな音素を基準として、該音素と相関係数の大きな順に、前記蓄積された各音素が並べ替えられており、
前記音素の種類を第１軸に、前記音素の継続時間の逆数を第２軸に、前記音素の発生頻度の逆数を第３軸とする、音素の仮想３次元空間が構築されていることを特徴とする監視装置である。 The invention described in claim 2
The monitoring device according to claim 1,
In the database, the occurrence frequency of accumulated phonemes is calculated,
Further, a correlation coefficient between the accumulated phonemes is obtained,
The stored phonemes are rearranged in the descending order of the correlation coefficient with the phoneme based on the phoneme having the high occurrence frequency,
A virtual three-dimensional space of phonemes is constructed with the phoneme type as the first axis, the reciprocal of the phoneme duration as the second axis, and the reciprocal of the phoneme occurrence frequency as the third axis. This is a characteristic monitoring device.

請求項３に記載の発明は、
請求項２に記載の監視装置において、
前記データベースにおける、前記３次元空間のうち第１軸と第２軸で定義される平面内の隣接する音素に対して画像処理が施され、前記第１軸に関する結合関係の特徴量が付与されていることを特徴とする監視装置である。 The invention according to claim 3
The monitoring device according to claim 2,
Image processing is performed on adjacent phonemes in the plane defined by the first axis and the second axis in the three-dimensional space in the database, and a feature value of the coupling relation with respect to the first axis is given. It is the monitoring apparatus characterized by having.

請求項４に記載の発明は、
請求項２に記載の監視装置において、
前記データベースに蓄積された正常な音素のみで構築される音素の３次元空間にてフィルタを構成し、
新たに収集された音の音素データを前記フィルタに通して、その前後の強度比により、前記新たに収集された音が異常な音であると判断する判断部を含むことを特徴とする監視装置である。 The invention according to claim 4
The monitoring device according to claim 2,
Constructing a filter in a three-dimensional space of phonemes constructed only from normal phonemes stored in the database,
A monitoring apparatus comprising: a determination unit that passes phoneme data of newly collected sound through the filter and determines that the newly collected sound is an abnormal sound based on an intensity ratio before and after the filter. It is.

さらに、第２形態であるデータベース構築方法は、請求項５に記載の発明として、
所定時間毎に収集した音の周波数スペクトルを求めるステップと、
所定の周波数帯域に分割された範囲毎に、前記周波数スペクトルの積分強度を求め、該積分強度の分布を音素として定義して、データベースに蓄積するステップと、
新たに収集した音について、前記音素を求めるステップと、
前記新たに収集した音の音素と、前記データベースに蓄積されている各音素との相関係数を求めるステップと、
前記相関係数がデータ蓄積用の所定値以下であれば、前記新たに収集した音の音素は、蓄積されている音素とは異なると判断するステップとを含むことを特徴とするデータベース構築方法である。 Furthermore, the second aspect of the database construction method is the invention according to claim 5,
Obtaining a frequency spectrum of the sound collected every predetermined time;
Obtaining an integrated intensity of the frequency spectrum for each range divided into a predetermined frequency band, defining a distribution of the integrated intensity as a phoneme, and storing it in a database;
Obtaining the phonemes for newly collected sounds; and
Obtaining a correlation coefficient between the phoneme of the newly collected sound and each phoneme stored in the database;
If the correlation coefficient is equal to or less than a predetermined value for data storage, the step of determining that the phoneme of the newly collected sound is different from the stored phoneme is included. is there.

請求項６に記載の発明は、
請求項５に記載のデータベース構築方法において、
前記新たに収集した音の音素を、蓄積されている音素とは異なると判断した場合にのみ、前記新たに収集した音の音素を前記データベースに蓄積するステップを含むデータベース構築方法である。 The invention described in claim 6
In the database construction method of Claim 5,
The database construction method includes a step of storing the newly collected phonemes in the database only when it is determined that the newly collected phonemes are different from the stored phonemes.

請求項７に記載の発明は、
請求項５に記載のデータベース構築方法において、
前記蓄積された音素の発生頻度を求めるステップと、
前記蓄積された音素同士の相関係数を求めるステップと、
前記発生頻度の大きな音素を基準として、該音素と相関係数の大きな順に前記蓄積された各音素を並べ替えるステップと、
前記音素の種類を第１軸、前記音素の継続時間の逆数を第２軸、前記音素の発生頻度の逆数を第３軸として、音素の３次元空間を構築するステップとを含むデータベース構築方法である。 The invention described in claim 7
In the database construction method of Claim 5,
Determining the frequency of occurrence of the accumulated phonemes;
Obtaining a correlation coefficient between the accumulated phonemes;
Rearranging each of the accumulated phonemes in descending order of the correlation coefficient with the phoneme having a high frequency of occurrence as a reference;
Constructing a three-dimensional space of phonemes, wherein the phoneme type is the first axis, the reciprocal of the phoneme duration is the second axis, and the reciprocal of the phoneme occurrence frequency is the third axis. is there.

請求項８に記載の発明は、
請求項７に記載のデータベース構築方法において、
前記音素の３次元空間をフィルタとして用い、新たに収集した音の音素データを前記フィルタに通し、その前後の強度比により、前記判断を行うステップを含むデータベース構築方法である。 The invention according to claim 8 provides:
In the database construction method of Claim 7,
The database construction method includes a step of using the three-dimensional space of the phonemes as a filter, passing newly collected phoneme data of the sound through the filter, and performing the determination based on an intensity ratio before and after the filter.

請求項９に記載の発明は、
請求項７に記載のデータベース構築方法において、
前記データベースにおける、前記３次元空間のうち第１軸と第２軸で定義される平面内の隣接する音素に対して画像処理を施すステップと、
前記第１軸に関する結合関係の特徴量を付与するステップとを含むデータベース構築方法である。 The invention according to claim 9 is:
In the database construction method of Claim 7,
Performing image processing on adjacent phonemes in a plane defined by a first axis and a second axis in the three-dimensional space in the database;
A database construction method including a step of assigning a feature amount of a connection relationship related to the first axis.

請求項１０に記載の発明は、
請求項５に記載のデータベース構築方法において、
所定時間内に得られた音素データを用いて、前記音素データの種類を縦行列、時間軸を横行列とするマハラビノスの参照空間を規定するステップと、
新たに収集した音について、前記マハラビノスの参照空間に対するマハラビノスの距離を算出するステップと、
該マハラビノスの距離が所定の値を超えたことを認識した場合に、音の発生源の状況が変化したと認識するステップとを含むデータベース構築方法である。 The invention according to claim 10 is:
In the database construction method of Claim 5,
Using phoneme data obtained within a predetermined time, defining a Mahalanobis reference space with the type of the phoneme data as a vertical matrix and the time axis as a horizontal matrix;
Calculating the distance of the maharabinos relative to the reference space of the maharabinos for the newly collected sound;
And recognizing that the state of the sound source has changed when recognizing that the Maharabinos distance exceeds a predetermined value.

請求項１１に記載の発明は、
請求項５に記載のデータベース構築方法において、
前記データベースに蓄積されている音素同士の相関係数を求めるステップと、
前記データ蓄積用の所定値よりも小さなデータ整理用の所定値を設定し、該データ整理用の所定値を越える相関係数を有する音素同士を同じ音素であると判断するステップとを含むデータベース構築方法である。 The invention according to claim 11
In the database construction method of Claim 5,
Obtaining a correlation coefficient between phonemes stored in the database;
Setting a predetermined value for data reduction that is smaller than the predetermined value for data storage, and determining that phonemes having correlation coefficients exceeding the predetermined value for data reduction are the same phoneme. Is the method.

請求項１２に記載の発明は、
請求項１１に記載のデータベース構築方法において、
前記判断ステップで、同じ音素であると判断された音素同士を１つの音素として、蓄積し直すステップを含むデータベース構築方法である。 The invention according to claim 12
In the database construction method of Claim 11,
A database construction method including a step of re-accumulating phonemes determined to be the same phoneme in the determination step as one phoneme.

請求項１３に記載の発明は、
請求項５に記載のデータベース構築方法において、
前記新たに収集した音の音素が所定時間内で変化した場合に、変化する音素の群を音素列と定義して、音素列データベースとして蓄積するステップを含むデータベース構築方法である。 The invention according to claim 13
In the database construction method of Claim 5,
A database construction method including a step of defining a phoneme string as a phoneme string and storing it as a phoneme string database when a phoneme of the newly collected sound changes within a predetermined time.

請求項１４に記載の発明は、
請求項１３に記載の音のデータベース構築方法において、
前記音素列データのうち、所定の時間内に複数発生したと認識された音素列データのみを、前記音素列データベースに残すようにするステップを含むデータベース構築方法である。 The invention according to claim 14
The sound database construction method according to claim 13,
The database construction method includes a step of leaving only the phoneme string data recognized to be generated in a predetermined time among the phoneme string data in the phoneme string database.

本発明によると、監視カメラや赤外線センサなどを用いることなく、例えばパーソナルコンピュータ上にマイクロフォンを接続した簡単な構成で、監視装置を成立させることができる。 According to the present invention, a monitoring device can be established with a simple configuration in which, for example, a microphone is connected to a personal computer without using a monitoring camera or an infrared sensor.

また本発明による監視装置は、膨大なデータの蓄積を必要としないので、多くのハードウエア資源を要することもない。 The monitoring device according to the present invention does not require a large amount of data to be stored, and therefore does not require many hardware resources.

さらにデータベースの自動的な構築も可能である。このことから、監視作業を行う際に、特に事前に基礎となるデータベースを構築する必要もない。 In addition, automatic database construction is possible. For this reason, it is not necessary to construct a basic database in advance when performing the monitoring work.

なお本発明による監視装置を、ノート型パーソナルコンピュータのように、内部電源を有するコンピュータを用いて構成すると、外部の電源がない場所や停電時などにも対応できる。 Note that if the monitoring apparatus according to the present invention is configured using a computer having an internal power supply such as a notebook personal computer, it can cope with a place where there is no external power supply or a power failure.

以下に、本発明を監視装置の例を用いて詳しく説明する。
まず、本発明による監視装置の概略構成図を図１に示す。図１はパーソナルコンピュータ２にマイクロフォン４を接続し、監視機能を実現するのに必要なソフトウエアがインストールされたものである。図２および図３は、パーソナルコンピュータ２の監視機能を説明するための機能ブロック図である。 Hereinafter, the present invention will be described in detail using an example of a monitoring device.
First, FIG. 1 shows a schematic configuration diagram of a monitoring apparatus according to the present invention. In FIG. 1, a microphone 4 is connected to a personal computer 2 and software necessary for realizing a monitoring function is installed. 2 and 3 are functional block diagrams for explaining the monitoring function of the personal computer 2.

図１，図２および図３を参照しながら、本発明による監視装置を構成するのに必要なデータベースの構築方法について説明する。 A database construction method necessary for configuring the monitoring apparatus according to the present invention will be described with reference to FIGS.

（波形の取得）
まず対象となる波形を取り込む。以下対象の波形が音の場合について説明する。サンプリング方法は、マイクロフォン４にて集音すればよく、既に電気信号になっているのであれば、それを直接監視装置に入力すればよい。 (Acquisition of waveform)
First, the target waveform is captured. The case where the target waveform is sound will be described below. The sampling method may be that the sound is collected by the microphone 4, and if it is already an electric signal, it may be input directly to the monitoring device.

図２を参照して、まず、マイクロフォン４で集められた音は、電気信号に変換され、さらに必要に応じてアンプ６にて増幅されるとよい。このとき増幅率は、アンプのゲイン調整部８にてコントロールされるが、この増幅率は、ノイズレベル取得部１０で、周囲のノイズレベルを測定して求めた基準値から適宜決められるとよい。なお、周囲のノイズレベルの測定は随時行われ、ノイズレベルとしては移動平均化されていることが好ましい。 With reference to FIG. 2, first, the sound collected by the microphone 4 is preferably converted into an electric signal and further amplified by the amplifier 6 as necessary. At this time, the amplification factor is controlled by the gain adjustment unit 8 of the amplifier. The amplification factor may be appropriately determined from a reference value obtained by measuring the surrounding noise level by the noise level acquisition unit 10. In addition, it is preferable that the ambient noise level is measured at any time, and that the noise level is a moving average.

増幅率の決定は以下のようにして行うことができる。まず、マイクロフォン４からの電気信号は、Ａ／Ｄコンバータ１２によってディジタル信号に変換され、標準偏差計算部１４でこのディジタル信号の標準偏差が算出される。この標準偏差は、ゲイン調整部８において、ノイズレベル取得部１０からの基準値と比較される。標準偏差が小さい場合は増幅率を大きくし、大きな場合は増幅率を小さくする。このような制御を行うと、入力信号が周囲のノイズに比べて、極端に大きかったり小さかったりすることによるエラーを防ぐことができる。 The amplification factor can be determined as follows. First, the electric signal from the microphone 4 is converted into a digital signal by the A / D converter 12 and the standard deviation calculator 14 calculates the standard deviation of the digital signal. This standard deviation is compared with a reference value from the noise level acquisition unit 10 in the gain adjustment unit 8. When the standard deviation is small, the amplification factor is increased. When the standard deviation is large, the amplification factor is decreased. By performing such control, it is possible to prevent an error caused by an input signal being extremely large or small compared to ambient noise.

次に、マイクロフォン４からの電気信号が、測定すべき入力か否かをしきい値判別部１６で判別する。この判別は、入力された電気信号が、移動平均化されたノイズレベルを示す、ノイズレベル取得部１０からの基準値に比べて、所定値以上であるか否かのしきい値判別によって決められるとよい。 Next, the threshold determination unit 16 determines whether or not the electrical signal from the microphone 4 is an input to be measured. This determination is made by threshold determination as to whether or not the input electrical signal is equal to or greater than a predetermined value compared to the reference value from the noise level acquisition unit 10 indicating the moving average noise level. Good.

以上説明した波形を取得するには、アンプ６、ゲイン調整器８、Ａ／Ｄコンバータ１２から構成されるモジュールがあればよい。このモジュールは、ハードウエアで構成することができ、具体的には、通常パーソナルコンピュータに装着されているサウンドボードを用いることができる。 In order to obtain the waveform described above, a module including the amplifier 6, the gain adjuster 8, and the A / D converter 12 may be provided. This module can be configured by hardware. Specifically, a sound board usually mounted on a personal computer can be used.

さらに、上述したしきい値判別部は、ノイズレベルの取得、およびしきい値の判断を実行できるモジュールで構成されればよい。このモジュールは、例えばサウンドボードからの信号を処理できるソフトウエアで構成されているとよい。 Furthermore, the threshold value determination unit described above may be configured by a module that can execute acquisition of a noise level and determination of a threshold value. This module may be composed of software that can process a signal from a sound board, for example.

（波形の解析：音素の定義）
しきい値判別部１６で測定すべき入力であると判別された場合には、音データ取得部１８で、まず所定の時間間隔毎にディジタルデータが取得される。続いて、スペクトル解析部２０によって、周波数成分が分析される。得られた周波数成分は、予め分割、領域化された周波数帯域に振り分けられる。さらにそれぞれの周波数帯域毎に、強度分布計算部２２によってエネルギー強度が算出され、周波数強度分布が取得される（図４参照）。ここで、この周波数強度分布を「音素」として定義する。 (Waveform analysis: definition of phonemes)
When the threshold value determination unit 16 determines that the input is to be measured, the sound data acquisition unit 18 first acquires digital data at predetermined time intervals. Subsequently, the spectrum analysis unit 20 analyzes the frequency component. The obtained frequency components are distributed to frequency bands that have been divided and aread in advance. Further, for each frequency band, the intensity distribution is calculated by the intensity distribution calculation unit 22, and the frequency intensity distribution is acquired (see FIG. 4). Here, this frequency intensity distribution is defined as “phoneme”.

以上説明した波形の解析は、スペクトル解析、周波数強度分布の取得、および音素の定義を実行できるモジュールで行うことができる。このモジュールは、例えばサウンドボードからの信号を処理できるソフトウエアで構成されているとよい。 The waveform analysis described above can be performed by a module that can execute spectrum analysis, acquisition of frequency intensity distribution, and definition of phonemes. This module may be composed of software that can process a signal from a sound board, for example.

（データベース構築）
続いて、本発明の監視装置を構成するために、まず音のデータベースを構築する。上述した方法にて、音源からの音を取得し解析を行って、音素データベースに順次蓄積していく。 (Database construction)
Subsequently, in order to construct the monitoring apparatus of the present invention, a sound database is first constructed. Using the method described above, the sound from the sound source is acquired and analyzed, and sequentially stored in the phoneme database.

図３に示すように、新たな音素の入力があれば、音素取得部２４でまず音素を求めて、相関係数計算部２５で、音素データベース２６に蓄積されている音素データと類似性の比較を行う。 As shown in FIG. 3, if a new phoneme is input, the phoneme acquisition unit 24 first obtains the phoneme, and the correlation coefficient calculation unit 25 compares the similarity with the phoneme data stored in the phoneme database 26. I do.

音素判断部２８で、新たな音素とデータベース２６内のいずれかの音素との相関係数がある所定値を超えていれば、新たな音素とデータベース２６内のいずれかの音素とは、同じ音素と判断できる。この新たな音素が同じ音素と判断されれば、この新たな音素は音素データベースには蓄積されない。このときの所定値をデータ蓄積用の所定値とする。 If the correlation coefficient between the new phoneme and any phoneme in the database 26 exceeds a predetermined value in the phoneme determination unit 28, the new phoneme and any phoneme in the database 26 are the same phoneme. It can be judged. If it is determined that the new phoneme is the same phoneme, the new phoneme is not stored in the phoneme database. The predetermined value at this time is set as a predetermined value for data storage.

一方、この相関係数がデータ蓄積用の所定値以下であれば、新たな音素とデータベース内のいずれかの音素とは、異なる音素と判断できる。新たな音素が、データベース内のいずれかの音素と異なると判断されれば、新たな音素を音素データベース２６に新規なデータとして蓄積される。なおデータ蓄積用の所定値としては、強い相関関係があるとされる０．７以上の数値を適用するとよい。このようにして、音素データベースが構築される。 On the other hand, if this correlation coefficient is equal to or less than a predetermined value for data storage, it can be determined that the new phoneme and any phoneme in the database are different phonemes. If it is determined that the new phoneme is different from any phoneme in the database, the new phoneme is stored as new data in the phoneme database 26. It should be noted that a numerical value of 0.7 or more, which is considered to have a strong correlation, may be applied as the predetermined value for data storage. In this way, a phoneme database is constructed.

以上説明した音素データベースの構築は、音素の入力、データベース検索、相関係数の計算、音素の類似性判断、および新規登録を実行できるモジュールで実行されればよく、このようなモジュールは、例えば波形解析モジュールからの信号を処理できるソフトウエアで構成されているとよい。データベースそのものはＨＤＤなどの外部記憶装置上に構築されればよい。 The construction of the phoneme database described above may be executed by a module that can execute phoneme input, database search, correlation coefficient calculation, phoneme similarity determination, and new registration. It is good to be comprised with the software which can process the signal from an analysis module. The database itself may be constructed on an external storage device such as an HDD.

このようにして構築された音素データベースを用いて、新たに収集された音の音素が異常か否かを判断することができる。上述した音素判断部２８で、新たな音の音素とデータベース２６内のいずれかの音素との相関係数が、異常判断用の所定値以下であれば、新たに収集された音の音素が異常であると判断することができる。 Using the phoneme database constructed in this way, it can be determined whether or not the phonemes of newly collected sounds are abnormal. If the correlation coefficient between the phoneme of the new sound and any phoneme in the database 26 is equal to or less than the predetermined value for abnormality determination, the phoneme of the newly collected sound is abnormal. Can be determined.

なおこのとき、異常判断用の所定値としては、構築された音素データベースと、どの程度異なったときを異常とするかによって、適宜選択するとよい。この所定値としては、例えば、弱い相関関係があるとされる相関係数の下限値である０．４未満の数値が選択されるとよい。具体的には、０．３，０．２あるいは０．１などとすればよい。なお、少しの違いでも以上と判断すべき場合には、この異常判断用の所定値は、０．４以上の値であってもよい。 At this time, the predetermined value for determining the abnormality may be appropriately selected depending on how much the abnormality is different from the constructed phoneme database. As this predetermined value, for example, a numerical value less than 0.4, which is a lower limit value of a correlation coefficient that has a weak correlation, may be selected. Specifically, it may be 0.3, 0.2 or 0.1. In the case where it is determined that even a slight difference is as described above, the predetermined value for determining abnormality may be 0.4 or more.

さらに、新たに収集された音の音素が異常であると判断された場合に、本発明による監視装置は、予め決められたアクションを実行するアクション実行部を備えている（図５参照のこと）。 Furthermore, when it is determined that the phoneme of the newly collected sound is abnormal, the monitoring device according to the present invention includes an action execution unit that executes a predetermined action (see FIG. 5). .

具体的なアクションとしては、
・警報メッセージをディスプレイに表示する
・スピーカから警報音を発する
・決まられたメールアドレスに異常発生を知らせるメールを発信する
などが挙げられる。 Specific actions include
-Display an alarm message on the display-Sound an alarm sound from a speaker-Send an e-mail notifying the occurrence of an abnormality to a predetermined e-mail address.

（継続時間）
一般的に、音は様々な時間的長さを持っているが、音の継続時間も音素の要素の１つである。この継続時間も特徴量の１つとして、継続時間計測部３０で計測され、音素データベース２６に蓄積されるとよい。 (Duration)
In general, sounds have various lengths of time, but the duration of a sound is one of the elements of phonemes. The duration is also measured by the duration measuring unit 30 as one of the feature quantities and stored in the phoneme database 26.

このような継続時間に関する処理は、継続時間の測定、および特徴量の蓄積を実行できるモジュールで行われるとよい。このモジュールは、例えば波形解析モジュールからの信号を処理できるソフトウエアで構成されているとよい。 Such processing related to the duration time may be performed by a module capable of executing the measurement of the duration time and the accumulation of the feature amount. This module may be configured by software capable of processing a signal from the waveform analysis module, for example.

（発生頻度）
一般的に個々の音は、それぞれある発生頻度で発生している。この発生頻度も音素の要素の１つである。この発生頻度も特徴量の１つとして、発生頻度計測部３２で計測され、音素データベース２６に蓄積されるとよい。 (Frequency of occurrence)
In general, each sound is generated at a certain frequency. This frequency of occurrence is also one of the elements of phonemes. This occurrence frequency is also preferably measured by the occurrence frequency measurement unit 32 as one of the feature quantities and stored in the phoneme database 26.

このような発生頻度に関する処理は、発生頻度の測定、および特徴量の蓄積を実行できるモジュールで行われるとよい。このモジュールは、例えば波形解析モジュールからの信号を処理できるソフトウエアで構成されているとよい。 Such processing relating to the occurrence frequency may be performed by a module capable of executing occurrence frequency measurement and feature amount accumulation. This module may be configured by software capable of processing a signal from the waveform analysis module, for example.

（データベースの整理）
さて所定の時間が経過したところで、蓄積された音素同士の類似性を再度チェックして、データベースの整理を行うとよい。データベース整理部３４で、蓄積された音素相互間の相関係数を求めて、データ整理用の所定値を超えた場合は、同じ音素として再認識する。 (Organization of database)
Now, when a predetermined time has passed, it is preferable to check the similarity between the accumulated phonemes again and organize the database. The database organizing unit 34 obtains the correlation coefficient between the accumulated phonemes, and if it exceeds a predetermined value for data organizing, it is re-recognized as the same phoneme.

同じ音素として認識された場合には、いずれか一方を代表する音素とし、他方は削除するとよい。こうして、同じ音素であると判断された音素同士を１つの音素として、蓄積し直すことができる。 If they are recognized as the same phoneme, one of them may be used as a representative phoneme and the other may be deleted. In this way, phonemes determined to be the same phoneme can be stored again as one phoneme.

ここで用いるデータ整理用の所定値は、先のデータ蓄積用の所定値より小さな値を用いる必要がある。一般に、相関係数が０．４から０．７の範囲では弱い相関があるとされる。このデータベースの整理では、データ整理用の所定値として、０．４以上０．７未満の数値を用いるとよい。 It is necessary to use a smaller value than the predetermined value for data storage as the predetermined value for data arrangement used here. Generally, it is assumed that there is a weak correlation when the correlation coefficient is in the range of 0.4 to 0.7. In this database organization, it is preferable to use a numerical value of 0.4 or more and less than 0.7 as a predetermined value for data organization.

上述したデータベース構築方法では、新たな音が入力された場合、既に蓄積されているデータとの相関を求め、異なる音と判定された音の音素のみを蓄積するようにしている。このため、データベースに膨大なデータ量を蓄積する必要がない。つまりデータ蓄積のために、多くのハードウエア資源を必要とすることがない。さらにデータベースの整理も行うと、よりいっそう多くのハードウエア資源を必要としなくなる。このような手法は、一種のデータ圧縮とも考えられる。 In the database construction method described above, when a new sound is input, a correlation with data already accumulated is obtained, and only phonemes of sounds determined as different sounds are accumulated. For this reason, it is not necessary to store a huge amount of data in the database. That is, many hardware resources are not required for data storage. Furthermore, if the database is organized, more hardware resources are not required. Such a method is also considered as a kind of data compression.

以上説明したデータベースの整理は、相互相関係数の計算、音素の再認識、および同じ音素とされたデータの統合を実行できるモジュールで行われるとよい。このモジュールは、ソフトウエアで実現することができる。 The database organization described above may be performed by a module that can execute calculation of cross-correlation coefficients, re-recognition of phonemes, and integration of data made into the same phonemes. This module can be realized by software.

（音素列）
継続的に入力される音は、同じではない場合があり、途中で変化する場合が多い。この場合当然、音素も変化することになる。ここで変化していく音素を、新たに音素列として定義する。 (Phoneme sequence)
Sounds that are continuously input may not be the same and often change in the middle. In this case, naturally, the phoneme also changes. The phoneme changing here is newly defined as a phoneme string.

この音素列も、音素と同様の方法で音素列データベース３６を構築するとよい。音素列取得部３８で音素列を取得し、相関係数計算部３９で、音素データベース３６に蓄積されている音素列データと類似性の比較を行う。 For this phoneme string, the phoneme string database 36 may be constructed in the same manner as the phoneme. The phoneme string acquisition unit 38 acquires the phoneme string, and the correlation coefficient calculation unit 39 compares the similarity with the phoneme string data stored in the phoneme database 36.

音素列判断部４０で、新たな音素列とデータベース３６内のいずれかの音素列との相関係数が、ある所定値以上であれば、同じ音素列と判断できる。この新たな音素列がデータベース３６内のいずれかの音素列と同じ音素列と判断されれば、この新たな音素列は音素列データベース３６には蓄積されない。 If the correlation coefficient between the new phoneme sequence and any phoneme sequence in the database 36 is equal to or greater than a predetermined value, the phoneme sequence determination unit 40 can determine that the phoneme sequence is the same. If this new phoneme string is determined to be the same phoneme string as any one of the phoneme strings in the database 36, the new phoneme string is not stored in the phoneme string database 36.

相関係数がある所定値未満であれば、新たな音素列とデータベース３６内のいずれかの音素列は、異なる音素列と判断できる。新たな音素列が、データベース３６内のいずれかの音素列と異なると判断されれば、この新たな音素列を音素列データベース３６に新規なデータとして蓄積される。なおこの所定値としては、強い相関関係があるとされる０．７以上の数値を適用するとよい。 If the correlation coefficient is less than a predetermined value, it can be determined that the new phoneme string and any phoneme string in the database 36 are different phoneme strings. If it is determined that the new phoneme string is different from any of the phoneme strings in the database 36, the new phoneme string is stored as new data in the phoneme string database 36. In addition, as this predetermined value, it is good to apply the numerical value of 0.7 or more considered that there exists a strong correlation.

さらに上述したデータベースの整理と同様の方法で、音素列データベースの整理を行い、所定の発生頻度の音素列のみをデータベース化するとよい。 Furthermore, it is preferable to organize the phoneme string database by a method similar to the above-described database arrangement, and to create a database of only phoneme strings having a predetermined frequency of occurrence.

音素列データベースは、音素列の取得、データベース検索、相関係数の計算、音素列の類似性判断、および新規登録を実行できるモジュールで実行されればよく、このようなモジュールは、例えば波形解析モジュールからの信号を処理できるソフトウエアで構成されているとよい。データベースそのものはＨＤＤなどの外部記憶装置上に構築されればよい。 The phoneme string database only needs to be executed by a module that can execute acquisition of a phoneme string, database search, correlation coefficient calculation, similarity determination of phoneme strings, and new registration, such as a waveform analysis module. It may be configured by software that can process signals from The database itself may be constructed on an external storage device such as an HDD.

このような装置は、マイクロフォンが接続され、上述したデータベース構築方法を実行できるソフトウエアがインストールされたパーソナルコンピュータにて、容易に具現化できる。 Such an apparatus can be easily realized by a personal computer to which a microphone is connected and software that can execute the above-described database construction method is installed.

以上のような処理を継続的に行うことによって、例えば監視対象となる領域内で発生する音を認識するために必要な、音のデータベースを構築することができる。 By continuously performing the processing as described above, it is possible to construct a sound database necessary for recognizing sound generated in a region to be monitored, for example.

上述したように、本発明によるデータベース構築方法によると、音のデータベースの構築は、このような機能を有する装置を、例えば監視すべき環境に置いておくだけで、自動的に行うことができる。 As described above, according to the database construction method of the present invention, the construction of the sound database can be automatically performed only by placing a device having such a function in an environment to be monitored, for example.

このように本発明によると、音のデータベースの構築に際して、操作者に大きなの負担を強いる必要がない、という大きな特徴がある。 As described above, according to the present invention, there is a great feature that it is not necessary to place a heavy burden on the operator when constructing a sound database.

（音素空間）
新たに入力された音が正常か異常かの判断は、音素空間を用いて行うことができる。ここで、この音素空間とは、図６に示すように、音素の種類、逆継続時間、逆発生頻度の３つの次元で形成される空間として定義する。 (Phoneme space)
Whether the newly input sound is normal or abnormal can be determined using the phoneme space. Here, the phoneme space is defined as a space formed by three dimensions of phoneme type, reverse duration, and reverse occurrence frequency, as shown in FIG.

ここで、継続時間という特徴量に関しては、継続時間の逆数をとることで、音のマスキング効果を考慮することができる。また発生頻度という特徴量に関しても、その逆数をとることも、音のマスキング効果を考慮することになる。一般に、音を連続的に聞かされた場合や、聞かされる頻度が多い場合には、その音に対する人間の感度は、だんだんと低下していくことが知られている。これが、マスキング効果と呼ばれている現象である。 Here, with respect to the feature quantity of duration, the masking effect of sound can be taken into consideration by taking the reciprocal of duration. Also, regarding the feature quantity of occurrence frequency, taking the reciprocal thereof also takes into account the sound masking effect. In general, it is known that when a sound is heard continuously or when it is heard frequently, human sensitivity to the sound gradually decreases. This is a phenomenon called a masking effect.

なお、逆継続時間については、連続量として扱うのではなく、所定のセグメントを設けておき、そのセグメントに割り当てるとよい。 Note that the reverse duration time is not handled as a continuous amount, but a predetermined segment may be provided and assigned to the segment.

図７は、音素空間の作成を説明する機能ブロック図である。このような音素空間の形成は、音素軸の構築、逆継続時間のセグメント化、逆発生頻度の算出、を実行できるモジュールで行うことができる。このモジュールは、例えば音素データベースに蓄積されたデータを用い、ソフトウエアで処理することで実現できる。以下、図７を参照しながら説明する。 FIG. 7 is a functional block diagram for explaining the creation of the phoneme space. Such a phoneme space can be formed by a module capable of executing phoneme axis construction, segmentation of reverse duration time, and calculation of reverse occurrence frequency. This module can be realized, for example, by processing data using data stored in a phoneme database. Hereinafter, a description will be given with reference to FIG.

まず、音素データベース２６を構成している音素の相関係数を相関係数計算部５０で計算し、次に発生頻度の逆数を発生頻度逆数化部５２で計算し、続いて音素の継続時間の逆数を継続時間逆数化部５４で計算する。 First, the correlation coefficient of the phoneme constituting the phoneme database 26 is calculated by the correlation coefficient calculation unit 50, then the reciprocal of the occurrence frequency is calculated by the occurrence frequency reciprocalization unit 52, and then the duration of the phoneme is calculated. The reciprocal is calculated by the duration reciprocalization unit 54.

さらに、ソート部５６で最も発生頻度の高い音素を基準とし、相関係数の大きな順番に音素を並べ替える。 Further, the phoneme is rearranged in the order of the correlation coefficient with the sorting unit 56 using the phoneme having the highest occurrence frequency as a reference.

一方、セグメント化部５８で上述したように、計算された逆継続時間を所定のセグメントに割り当てることで領域化する。 On the other hand, as described above in the segmentation unit 58, the calculated reverse duration time is assigned to a predetermined segment to make a region.

以上のようなソートされた音素、逆継続時間、逆発生頻度を、３次元化部６０で３次元化して、音素空間を作成する。 The sorted phonemes, reverse duration, and reverse occurrence frequency are three-dimensionalized by the three-dimensionalization unit 60 to create a phoneme space.

（音の認識：音素空間の画像的処理）
例えば音声認識に関する技術では、母音や子音をそれぞれの音素として分離して処理するので、非常に複雑なパターンマッチング処理を行う必要があった。しかし本発明の特徴の１つは、上述の音素空間において相関係数によるソートを行うことである。すなわち、音素データベースを構築している音素データにおいて、相互の相関係数を求め、最も発生頻度の高い音素を基準として、相関係数の大きな順に音素を並べ替えるのである。 (Sound recognition: Image processing of phoneme space)
For example, in the technology related to speech recognition, since vowels and consonants are processed separately as phonemes, it is necessary to perform very complicated pattern matching processing. However, one of the features of the present invention is that the sorting is performed by the correlation coefficient in the above phoneme space. That is, in the phoneme data constructing the phoneme database, the correlation coefficient is obtained, and the phonemes are rearranged in descending order of the correlation coefficient with the most frequently occurring phoneme as a reference.

つまりこのような処理をすることで、音素データベースを１つの画像情報として取り扱うことができるようになる。この場合、逆発生頻度を画像の強度とするとよい。 That is, the phoneme database can be handled as one piece of image information by performing such processing. In this case, the reverse occurrence frequency may be set as the image intensity.

また、３次元化部６０で、ソートされた音素の種類軸と逆継続時間軸のなす面において、スムージング処理を施すことによって、各音素の間に結合情報を与えることができる。 In addition, the three-dimensionalization unit 60 can apply coupling information between phonemes by performing a smoothing process on the surface formed by the sorted phoneme type axis and reverse duration axis.

図８は、スムージング処理の施された２次元画像を示す。これは、ソートされた音素の種類軸および逆継続時間軸で定義される画像面に、逆発生頻度を画像の強度として表されている状態を示す。ここで、スムージング処理としては、画像全体の濃度の平均化を行っている。このことは、脳の記憶における、記憶の固定化を助ける事象間の連携に、例えることができる。 FIG. 8 shows a two-dimensional image that has been subjected to a smoothing process. This shows a state in which the reverse occurrence frequency is represented as the image intensity on the image plane defined by the sorted phoneme type axis and the reverse duration axis. Here, as the smoothing process, the density of the entire image is averaged. This can be compared to the coordination between events in the memory of the brain that help fix the memory.

さらにスムージング処理のほかに、微分や画像強調の処理などによっても、各音素の間に結合情報を与えることができる。 Furthermore, in addition to the smoothing process, it is also possible to give coupling information between the phonemes by a process of differentiation or image enhancement.

このような音素空間の画像的処理は、音素空間における相関係数によるソート、さらには音素の種類軸と逆継続時間軸のなす面におけるスムージング処理を実行できるモジュールで構成されればよい。このモジュールは、例えば音素データベースに蓄積されたデータを処理できるソフトウエアで構成されているとよい。 Such image processing of the phoneme space may be configured by a module that can perform sorting based on the correlation coefficient in the phoneme space and smoothing processing on the plane formed by the phoneme type axis and the inverse duration time axis. This module may be composed of software that can process data stored in a phoneme database, for example.

このような処理を継続的に行っていくことによって、音素のデータベースに音の知識を蓄積することができる。ここで、例えば正常な音のみを監視装置に入力していき、音素空間を得る。この音素空間は、正常な音素空間として定義することができる。この正常な音素空間は、判断フィルタ作成部６２において、新たに入力された音が、正常な音か異常な音かを判断する際に用いるフィルタとすることができる。 By continuously performing such processing, it is possible to accumulate sound knowledge in a phoneme database. Here, for example, only normal sounds are input to the monitoring device to obtain a phoneme space. This phoneme space can be defined as a normal phoneme space. This normal phoneme space can be a filter used when the judgment filter creation unit 62 judges whether the newly input sound is a normal sound or an abnormal sound.

（フィルタリング処理）
このフィルタを用いたフィルタリング処理は、フィルタリング処理部６４において、以下のようにして行うことができる。まず、音素空間の画像の強度を光の透過率と読み替える。次に新たな音について、それから求められる音素の種別と逆継続時間の値にて、音素空間においてマッピングする。続いて、マッピングされた位置における透過率を掛け合わせて、正常な音素空間によるフィルタリングの前後における強度比を算出することができる。この強度比が所定の値より大きなとき、つまり新たな音がよく透過される場合、新たな音は正常な音素空間に蓄積された音素とは異なる、と判断することができる。 (Filtering process)
The filtering process using this filter can be performed in the filtering processing unit 64 as follows. First, the intensity of the image in the phoneme space is read as light transmittance. Next, the new sound is mapped in the phoneme space with the phoneme type and reverse duration value obtained from the new sound. Subsequently, the intensity ratio before and after filtering by the normal phoneme space can be calculated by multiplying the transmittance at the mapped position. When this intensity ratio is larger than a predetermined value, that is, when a new sound is transmitted well, it can be determined that the new sound is different from the phoneme accumulated in the normal phoneme space.

このように本発明は、複雑なパターンマッチング処理を必要せず、簡単な処理によって、入力された音が正常か異常かを判断すること可能となる。 As described above, the present invention does not require complicated pattern matching processing, and can determine whether the input sound is normal or abnormal by simple processing.

さらにこのフィルタリング処理は、ソフトウエアのみならずハードウエアでも実現することができる。具体的には、図９に示すように、新たな入力された音を表示するための２次元発光ダイオードアレイ７０と、フィルタとしての液晶パネル７２と、透過光を検出するための２次元フォトディテクタアレイ７４と、２次元マイクロレンズアレイ７６，７８とを用いて構成することができる。このように音素空間を定義することによって、ハードウエアでもフィルタリング処理を行うことができる。 Further, this filtering process can be realized not only by software but also by hardware. Specifically, as shown in FIG. 9, a two-dimensional light emitting diode array 70 for displaying a newly input sound, a liquid crystal panel 72 as a filter, and a two-dimensional photodetector array for detecting transmitted light. 74 and the two-dimensional microlens arrays 76 and 78 can be used. By defining the phoneme space in this way, filtering processing can also be performed by hardware.

新たに入力された音における音素は、上述した画像的処理がなされて電気信号に変換され、２次元に配列された発光ダイオードアレイ７０によって画像情報として表示される。表示された音素は、One-to-Manyの光学系を構成するマイクロレンズアレイ７６によって、複数のイメージに複製され、液晶パネル７２に照射される。 The phonemes in the newly input sound are subjected to the above-described image processing and converted into electrical signals, and are displayed as image information by the light-emitting diode array 70 arranged two-dimensionally. The displayed phonemes are duplicated into a plurality of images by the microlens array 76 constituting the one-to-many optical system, and irradiated onto the liquid crystal panel 72.

このとき、液晶パネル７２がフィルタとして機能する。つまり、液晶パネルには、複数の監視領域の正常な音素空間からなるフィルタ情報を、透過率パターンを変化させることによって表示されている。このフィルタを通過した光は、マイクロレンズアレイ７８を経て、個々のフィルタに対応したフォトディテクタアレイ７４に照射される。フォトディテクタアレイに照射された光は、電気信号に変換される。 At this time, the liquid crystal panel 72 functions as a filter. That is, on the liquid crystal panel, filter information including normal phoneme spaces in a plurality of monitoring areas is displayed by changing the transmittance pattern. The light that has passed through the filter passes through the microlens array 78 and irradiates the photodetector array 74 corresponding to each filter. The light irradiated to the photodetector array is converted into an electrical signal.

このようにして得られたフィルタリング前後の信号強度比から、新たに入力された音が正常か異常かを瞬時に判断することができる。例えば、この信号強度比が−３ｄＢよりも大きい場合には、新たに入力された音が異常な音であると判断することができる。例えば、この「−３ｄＢよりも大きい場合」とは、新たに入力された音による画像の光が、正常な音素空間によって構成されるフィルタを、よく透過することを意味する。 From the signal intensity ratio before and after filtering thus obtained, it is possible to instantaneously determine whether the newly input sound is normal or abnormal. For example, when the signal intensity ratio is larger than −3 dB, it can be determined that the newly input sound is an abnormal sound. For example, “when larger than −3 dB” means that the light of the image by the newly input sound is well transmitted through a filter constituted by a normal phoneme space.

さらに、ハードウエアで上述のようなフィルタ処理ユニットを構成すると、複数の処理ユニットを同一平面上に構築することができる。このような構成によると、複数の判断を、同時にしかも瞬間的に処理することが可能となる。 Furthermore, if the above-described filter processing unit is configured by hardware, a plurality of processing units can be constructed on the same plane. According to such a configuration, a plurality of determinations can be processed simultaneously and instantaneously.

（状況の変化）
ところで監視する領域において、その状況は時間の経過とともに変化する。つまり、例えば人が出入りするような領域では、日中と夜中ではその状況は変化しており、正常か異常かの判断基準も変化することになる。ここで、一定の標準値を元に判断していたのでは、正しい判断を行うことができない。また平日と休日との違いや、季節による状況変化も当然存在することになる。そこで、クロック情報やカレンダー情報を元に、いくつかの正常な音素空間を構築しておくとよい。 (Change in situation)
In the area to be monitored, the situation changes with time. In other words, for example, in an area where people go in and out, the situation changes during the day and at night, and the criteria for judging whether it is normal or abnormal will also change. Here, if a determination is made based on a certain standard value, a correct determination cannot be made. Naturally, there will be differences between weekdays and holidays, and seasonal changes in the situation. Therefore, it is advisable to construct some normal phoneme spaces based on clock information and calendar information.

（マハラビノス空間および距離）
さらに本発明では、時間の経過による状況変化に応じて的確な判断を行うために、マハラビノス空間およびマハラビノス距離という概念を導入することを特徴とする。マハラビノス空間とは、正常と認識される情報を行列化し、その非分散の相関情報を抽出した空間である。この行列においては、例えば縦行列成分として特徴項目を、横行列成分として時間軸や空間軸を用いるとよい。 (Maharabinos space and distance)
Furthermore, the present invention is characterized by introducing the concept of a maharabinos space and a maharabinos distance in order to make an accurate determination according to a change in the situation over time. The maharabinos space is a space in which information recognized as normal is formed into a matrix and the non-dispersed correlation information is extracted. In this matrix, for example, a feature item may be used as a vertical matrix component, and a time axis or a space axis may be used as a horizontal matrix component.

このようにマハラビノス空間は、特徴項目と時間軸や空間軸の情報から行列化を行うことによって構築することができる。 As described above, the maharabinos space can be constructed by forming a matrix from the feature items and information on the time axis and the space axis.

またマハラビノス距離とは、マハラビノス空間において正常とされる中心位置から、被検査情報をマッピングした座標までの距離をいう。この距離が所定値より大きくなると、正常な状態から離れていると判断することができる。通常この所定値には、４以上の数値が適用される。 The Mahalanobis distance refers to the distance from the normal center position in the Mahalanobis space to the coordinates where the information to be inspected is mapped. When this distance becomes larger than the predetermined value, it can be determined that the distance is away from the normal state. Usually, a numerical value of 4 or more is applied to this predetermined value.

このようなマハラビノス空間およびマハラビノス距離の概念を導入した正常空間の構築を、図１０に示した機能ブロック図を参照しながら説明する。 The construction of a normal space in which the concept of such maharabinos space and maharabinos distance is introduced will be described with reference to the functional block diagram shown in FIG.

（状況変化の判断）
本発明においては、縦行列成分にソートされた音素を、横行列成分に時間軸を当てはめる。まず、マハラビノス参照配列生成部８０において、現在からある所定時間さかのぼって収集した音素データを用いて、マハラビノス参照配列を生成する。次に、マハラビノス距離計算部８２において、この参照配列と新たな音素とのマハラビノス距離を算出する。状況変換判断部８４において、得られたマハラビノス距離が所定の値以上であり、それが所定時間以上連続するようであれば、状況が変化したと判断する。 (Judgment of situation change)
In the present invention, phonemes sorted into vertical matrix components are assigned time axes to horizontal matrix components. First, the maharabinos reference sequence generation unit 80 generates a maharabinos reference sequence by using phoneme data collected from a current predetermined time. Next, the maharabinos distance calculation unit 82 calculates the maharabinos distance between this reference sequence and a new phoneme. The situation conversion determination unit 84 determines that the situation has changed if the obtained Mahalanobis distance is greater than or equal to a predetermined value and continues for a predetermined time or longer.

このように状況変化の判断は、マハラビノス参照配列の作成、マハラビノス距離の算出・判定を実行できるモジュールで行われるとよい。このモジュールは、例えば音素データベース２６に蓄積されたデータを用い、ソフトウエアで処理することで実現できる。 Thus, the determination of the situation change may be performed by a module that can execute the creation of the Maharabinos reference sequence and the calculation / determination of the Maharabinos distance. This module can be realized, for example, by using data stored in the phoneme database 26 and processing with software.

このようにして状況の変化を自動的に認識ながら、日中や夜中、さらには平日や休日といった区切りを設けて、音素データベースを構築するとよい。つまり、それぞれの状況における正常音素空間を構築していくのである。 In this way, it is preferable to construct a phoneme database while automatically recognizing a change in the situation, and providing divisions such as daytime, midnight, weekdays, and holidays. That is, the normal phoneme space in each situation is constructed.

（特別な音素の学習）
なおこのとき、例えば電話の着信音や来訪を告げるチャイム音など、正常な音素空間とはマハラビノス距離が離れているが、通常起こりうる音に対しては、異常と判断しないように、特別な音素として登録しておくとよい。このように、特別な音素として登録しておくと、次に電話の着信音やチャイム音が入力された場合には、そのことを認識できるようになる。 (Special phoneme learning)
At this time, although the Mahalanobis distance is far from normal phoneme space, such as a phone ringtone or a chime sound that tells you to visit, a special phoneme is used to prevent normal sounds from being considered abnormal. It is good to register as. In this way, if the phoneme is registered as a special phoneme, it will be possible to recognize the next incoming call tone or chime tone.

さらにこの監視装置に、操作者の声により、例えば何かを命令するような言葉を意味づけして登録しておくと、この監視装置を用いて音声コントロールを実現することも可能である。 Furthermore, if it is registered in this monitoring device by meaning, for example, a word that instructs something by an operator's voice, voice control can be realized using this monitoring device.

（基本構成例）
以下に、本発明をパーソナルコンピュータ上で実現される基本構成例を説明する。 (Basic configuration example)
Hereinafter, a basic configuration example in which the present invention is realized on a personal computer will be described.

図１１は、パーソナルコンピュータ２内の基本ハードウエア構成を示す。マイクロフォン４からの信号が、パーソナルコンピュータ２にインストールされているサウンドボード１００に入力される。またウインドウズ（登録商標）付属のマルチメディアＡＰＩにより、入力された音データはサンプリングされ、ディジタルデータとしてメモリ１０２に格納される。また、ミキサーコントロールを行うことで、ゲイン調整もソフトウエアで実現することができる。なおマイクロフォンは１つとした。 FIG. 11 shows a basic hardware configuration in the personal computer 2. A signal from the microphone 4 is input to the sound board 100 installed in the personal computer 2. The input sound data is sampled by a multimedia API attached to Windows (registered trademark) and stored in the memory 102 as digital data. Also, gain control can be realized by software by controlling the mixer. One microphone was used.

マルチメディアＡＰＩの命令をＣＰＵ（中央演算装置）１０４で実行することで、マイクロフォンから音データを所定時間取得することができる。取得されたデータに対して高速フーリエ変換（ＦＦＴ）を行うと、スペクトル解析を行うことができる。また高速フーリエ変換の代わりに、Maximum Entropy Method（ＭＥＭ）を適用してもよい。 By executing a multimedia API command by a CPU (Central Processing Unit) 104, sound data can be acquired from a microphone for a predetermined time. When fast Fourier transform (FFT) is performed on the acquired data, spectrum analysis can be performed. Further, instead of the fast Fourier transform, Maximum Entropy Method (MEM) may be applied.

サンプリング条件としては、一例として、データ数は１０２４とし、サンプリングレートは２２．０５ｋＨｚ、８ｂｉｔで、モノラルとした。このとき、１サンプルのデータを取得するのに、４６．４ｍＳかかることになる。 As an example of sampling conditions, the number of data is 1024, the sampling rate is 22.05 kHz, 8 bits, and monaural. At this time, it takes 46.4 mS to acquire one sample of data.

音素データベースや音素列データベースは、例えばＣ言語の構造体で構成されるとよい。データベース構築に関するモジュールは、Ｃ＋＋によってコーディングを行った。またタイマー機能は、割り込みタイマーにより実現し、そこで呼び出される関数において実行手続きを作成しておくとよい。 The phoneme database and the phoneme string database may be composed of, for example, a C language structure. Modules related to database construction were coded in C ++. The timer function is realized by an interrupt timer, and an execution procedure is preferably created in a function called there.

音素空間の作成については、まずグローバルメモリを確保すれば、リニアなメモリ空間内に実現することができる。ここで、状況判断に必要な量の空間を確保しておくとよい。 The phoneme space can be created in a linear memory space by first securing a global memory. Here, it is preferable to secure an amount of space necessary for situation determination.

また一定時間にわたる測定などは、割り込みタイマーを用いて制御するか、別途スレッドを作成して行うとよい。 Measurements over a certain period of time may be controlled by using an interrupt timer or by creating a separate thread.

マハラビノス空間の計算には、掃き出し法による逆行列演算によっている。もし逆行列演算ができない場合には、シュミット展開を利用するとよい。 The maharabinos space is calculated by the inverse matrix operation using the sweep-out method. If inverse matrix operation is not possible, use Schmitt expansion.

そこでまず上述した方法にて、監視すべき環境の音を取得し解析を行って、音素データベースに蓄積し、音のデータベースを構築していく。さらに、継続時間の測定、データベースの整理を行って、音素データベースが構築される。また音素列データベースも上述のようにして構築しておく。 Therefore, first, the sound of the environment to be monitored is acquired and analyzed by the above-described method, stored in the phoneme database, and the sound database is constructed. Furthermore, the phoneme database is constructed by measuring the duration and organizing the database. The phoneme string database is also constructed as described above.

新たに入力された音が正常か異常かの判断するために、まず音素空間を定義し、続いてフィルタリング前後の信号強度比から、新たに入力された音が正常か異常かを判断する。 In order to determine whether the newly input sound is normal or abnormal, first, a phoneme space is defined, and then it is determined from the signal intensity ratio before and after filtering whether the newly input sound is normal or abnormal.

新たに入力された音が異常であると判断した場合は、予め決められたアクションを実行することになる。このアクションは、例えば、スピーカら１０６から警報を発したり、電子メールを発信したりすることなどである。このようなアクションを行う場合には、その制御に必要なモジュールを追加することになる。例えば、電子メールを発信するような場合には、ＰＯＰサーバへ接続するための制御モジュールを追加するとよい。 When it is determined that the newly input sound is abnormal, a predetermined action is executed. This action is, for example, issuing an alarm or sending an e-mail from the speaker 106 or the like. When such an action is performed, a module necessary for the control is added. For example, when sending an e-mail, a control module for connecting to the POP server may be added.

なお、図１１において、アクションを実行するハードウエアは、スピーカ１０６のみを示している。図中、１０８はディスプレイである。 In FIG. 11, only the speaker 106 is shown as hardware for executing the action. In the figure, reference numeral 108 denotes a display.

このようにして、本発明では１つのマイクロフォン４からの信号だけでも、その環境の状況を把握することのできる監視装置をパーソナルコンピュータ２上に構成することができる。本発明による監視装置において、簡単な構成とするために、マイクロフォンを１つにすることが好ましい。 In this way, according to the present invention, a monitoring device that can grasp the state of the environment with only a signal from one microphone 4 can be configured on the personal computer 2. In the monitoring apparatus according to the present invention, it is preferable to use one microphone for a simple configuration.

（具体例１）
基本構成例で説明したように、本発明の監視装置はパーソナルコンピュータ上に構成することができる。そこでこの具体例１では、ノート型パーソナルコンピュータにて本発明の監視装置を構成した。図１２は、ノート型パーソナルコンピュータ１１２に、マイクロフォン４を接続して構成された監視装置を示している。 (Specific example 1)
As described in the basic configuration example, the monitoring apparatus of the present invention can be configured on a personal computer. Therefore, in this specific example 1, the monitoring device of the present invention is configured by a notebook personal computer. FIG. 12 shows a monitoring device configured by connecting a microphone 4 to a notebook personal computer 112.

通常の環境下で、監視装置を制御するプログラムを実行させて、まずノイズレベルの測定を行い、音素データ、音素列データを蓄積を行う。ある程度データが蓄積されると、マハラビノスの参照行列を作成し、随時入力される音素データに対して、マハラビノス距離を計算して、時間の経過と共に状況の変化を把握しながら、データの分類を行う。そして音素空間を作成する。なおこの場合、蓄積されたデータは、すべて正常な状況における音として記録される。 Under a normal environment, a program for controlling the monitoring device is executed to measure a noise level first and store phoneme data and phoneme string data. When data is accumulated to some extent, a Maharabinos reference matrix is created, and the maharabinos distance is calculated for phoneme data that is input as needed, and the data is classified while grasping changes in the situation over time. . Then create a phoneme space. In this case, all accumulated data is recorded as sound in a normal situation.

このようにして、音のデータベースが構築されると、新たに入力される音が正常か異常かを判断することが可能となる。また操作者に対して、その判断が正しいのか、間違っているのか回答を、ユーザーインターフェイスを通じて要求して、より正確に学習するようにしてもよい。もし異常と判断した場合は、予め決められた何らかのアクションを実行する。 When a sound database is constructed in this way, it is possible to determine whether the newly input sound is normal or abnormal. Further, the operator may be requested to answer whether the judgment is correct or incorrect through the user interface, so that the operator can learn more accurately. If it is determined as abnormal, some predetermined action is executed.

このように、ノート型パーソナルコンピュータ１１２上で監視装置を構成し室内に設置したところ、通常の監視装置のように目立つことがなく、室内の雰囲気を壊すことがない。 As described above, when the monitoring device is configured on the laptop personal computer 112 and installed in the room, it does not stand out like a normal monitoring device, and the indoor atmosphere is not destroyed.

またこの具体例１では、ノート型パーソナルコンピュータ上で監視装置を構成している。このため、外部電源のない場所でも使用可能である。さらに、外部電源のある場所に設置した場合において、例え停電などによって電気の供給が断たれたとしても、内蔵のバッテリにより少なくとも数時間は動作可能であり、異常を検出し通報などのアクションを行うことができる。 In this specific example 1, the monitoring device is configured on a notebook personal computer. For this reason, it can be used in a place without an external power supply. In addition, when installed in a place with an external power supply, even if the power supply is cut off due to a power failure, etc., it can operate for at least several hours with the built-in battery, detect an abnormality, and take action such as notification be able to.

（具体例２）
上述した監視装置における制御は、ソフトウエアによって構成したが、必ずしもその必要はない。例えば、ＦＰＧＡ(Field Programmable Gate Array)、ＡＳＩＣ(Application Specific IC)やＤＳＰ(Digital Signal Processor)を用いたシステムでも実現することができる。すなわち、ペットロボット、家電製品や携帯電話への組み込みが可能である。 (Specific example 2)
Although the control in the monitoring apparatus described above is configured by software, it is not always necessary. For example, it can be realized by a system using an FPGA (Field Programmable Gate Array), an ASIC (Application Specific IC), or a DSP (Digital Signal Processor). That is, it can be incorporated into pet robots, home appliances and mobile phones.

このように、ペットロボットや家電製品に監視装置を組み込み構成すると、室内の雰囲気を壊すことがなく、好ましく用いることができる。また携帯電話に組み込んでおくと、例え本人に不測の事態が起こったとしても、通報などのアクションを行うことができる。 As described above, when the monitoring device is built in and configured in the pet robot or the home appliance, the indoor atmosphere is not broken and can be preferably used. If it is built into a mobile phone, actions such as reporting can be performed even if an unexpected situation occurs.

（適用例１）
次に、本発明による監視装置を、工場へ適用した場合について説明する。例えば、連続運転している装置について、本発明による監視装置によりその動作音を監視しておき、特異音を検出すると警報を発し、トラブルの拡大を防ぐためのアクションを行わせることもできる。 (Application example 1)
Next, the case where the monitoring apparatus according to the present invention is applied to a factory will be described. For example, the operating sound of a continuously operating apparatus can be monitored by the monitoring apparatus according to the present invention, an alarm can be issued when a singular sound is detected, and an action for preventing the expansion of trouble can be performed.

また本発明は音だけに限られず、ディジタルサンプリングが可能な波形データであれば、それを監視することが可能である。さらに故障解析を行うために、過渡応答波形を学習する監視装置とすることも可能である。 The present invention is not limited to sound, and any waveform data that can be digitally sampled can be monitored. Further, in order to perform failure analysis, a monitoring device that learns a transient response waveform can be used.

（適用例２）
さらに、本発明による監視装置を、家の監視に適用した場合について説明する。具体例１で説明したように、ノート型パーソナルコンピュータ上に本発明の監視装置を構成した。 (Application example 2)
Furthermore, the case where the monitoring apparatus according to the present invention is applied to home monitoring will be described. As described in the first specific example, the monitoring apparatus of the present invention is configured on a notebook personal computer.

上述のように、本発明による監視装置を動作させると、部屋に人がいるのか、留守中なのかなどの状況を把握しつつ、新たに入力された音が正常なのか、異常なのかを判断することができる。また、電話の着信音や来訪を告げるチャイム音を、認識できるようにしておくと、電話の着信や来客の有無などの履歴を残すことができる。 As described above, when the monitoring device according to the present invention is operated, it is determined whether the newly input sound is normal or abnormal while grasping the situation such as whether there is a person in the room or being out of the office. can do. In addition, if it is possible to recognize a ringtone of a telephone call or a chime sound that tells a visit, it is possible to leave a history of incoming telephone calls or presence / absence of visitors.

さらに、外部から電子メールによる問い合わせに応答できる機能を付加しておくと、出先からでも留守宅の状況を把握することができる。 Furthermore, by adding a function that can respond to inquiries by e-mail from the outside, it is possible to grasp the situation of the home away from home.

また本発明による監視装置は、監視対象領域がどのような状況なのかを把握することができるので、例えば病人、赤ちゃんや介護が必要な人などに、異常が起こったときなどにも、通報を行うなどのアクションを起こすこともできる。 In addition, since the monitoring device according to the present invention can grasp the situation of the monitoring target region, for example, when an abnormality occurs to a sick person, a baby or a person who needs care, etc. You can also take actions such as performing.

（音素データ例）
適用例２において、室内の状況を半日間測定し、得られたデータからソートした音素をパーソナルコンピュータ画面上に表示して一例を、図１３に示す。 (Example of phoneme data)
In Application Example 2, an indoor situation is measured for half a day, and phonemes sorted from the obtained data are displayed on a personal computer screen, and an example is shown in FIG.

この音素パターンは、図８に対応した２次元画像であり、ソートされた音素の種類軸および逆継続時間軸で表される画像面に、逆発生頻度を画像強度として表されている。 This phoneme pattern is a two-dimensional image corresponding to FIG. 8, and the reverse occurrence frequency is represented as image intensity on the image plane represented by the sorted phoneme type axis and reverse duration axis.

さらに正常な音素空間の例を図１４に示す。図１４は、図１３において、正常な音素空間のみを表示している。 An example of a normal phoneme space is shown in FIG. FIG. 14 shows only a normal phoneme space in FIG.

本発明による監視装置をパーソナルコンピュータ上で実現した例を説明する図である。It is a figure explaining the example which implement | achieved the monitoring apparatus by this invention on the personal computer. 図１に示した監視装置の機能を説明する機能ブロック図である。It is a functional block diagram explaining the function of the monitoring apparatus shown in FIG. 図１に示した監視装置の機能を説明する機能ブロック図である。It is a functional block diagram explaining the function of the monitoring apparatus shown in FIG. 入力された波形から周波数強度分布を求めた結果を説明する図である。It is a figure explaining the result of having calculated | required frequency intensity distribution from the input waveform. 図１に示した監視装置の機能を説明する機能ブロック図である。It is a functional block diagram explaining the function of the monitoring apparatus shown in FIG. 音素空間を概念的に説明する図である。It is a figure which illustrates phoneme space notionally. 音素空間において、音素をソートした例を説明する図である。It is a figure explaining the example which sorted the phoneme in phoneme space. スムージング処理が施された音素空間を説明する図である。It is a figure explaining the phoneme space to which the smoothing process was performed. フィルタリング処理の様子を模式的に説明する図である。It is a figure which illustrates the mode of filtering processing typically. マハラビノス空間およびマハラビノス距離の概念を導入した正常音素空間の構成を説明するブロック図である。It is a block diagram explaining the structure of the normal phoneme space which introduce | transduced the concept of the maharabinos space and maharabinos distance. パーソナルコンピュータ内のハードウエア構成を示す図である。It is a figure which shows the hardware constitutions in a personal computer. 本発明の監視装置をノート型パーソナルコンピュータで実現した例を示す図である。It is a figure which shows the example which implement | achieved the monitoring apparatus of this invention with the notebook type personal computer. ソートされた音素をパーソナルコンピュータ画面上に表示した例を示す図である。It is a figure which shows the example which displayed the sorted phoneme on the personal computer screen. 正常な音素空間をパーソナルコンピュータ画面上に表示した図である。It is the figure which displayed normal phoneme space on the personal computer screen.

Explanation of symbols

２パーソナルコンピュータ
４マイクロフォン
６アンプ
８ゲイン調整部
１０ノイズレベル取得部
１２Ａ／Ｄコンバータ
１４標準偏差計算部
１６しきい値計算部
１８音データ取得部
２０スペクトル解析部
２２強度分布計算部
２４音素取得部
２６音素データベース
２８音素判断部
２９アクション実行部
３０継続時間計測部
３２発生頻度計測部
３４データベース整理部
３６音素列データベース
３８音素列取得部
３９相関関係計算部
４０音素列判断部
４２データベース整理部
５０計算部
５２発生頻度逆数化部
５４継続時間逆数化部
５８セグメント化部
６０３次元化部
６２判断フィルタ作成部
６４フィルタリング処理部
７０２次元発光ダイオードアレイ
７２２次元液晶パネル
７４２次元フォトディテクタアレイ
７６，７８２次元マイクロレンズアレイ
８０マハラビノス参照配列生成部
８２マハラビノス距離計算部
８４状況判断部
１００サウンドボード
１０２メモリ
１０４ＣＰＵ
１０６スピーカ
１０８ディスプレイ
１１２ノート型パソーナルコンピュータ DESCRIPTION OF SYMBOLS 2 Personal computer 4 Microphone 6 Amplifier 8 Gain adjustment part 10 Noise level acquisition part 12 A / D converter 14 Standard deviation calculation part 16 Threshold value calculation part 18 Sound data acquisition part 20 Spectrum analysis part 22 Intensity distribution calculation part 24 Phoneme acquisition part 26 phoneme database 28 phoneme determination unit 29 action execution unit 30 duration measurement unit 32 occurrence frequency measurement unit 34 database organization unit 36 phoneme sequence database 38 phoneme sequence acquisition unit 39 correlation calculation unit 40 phoneme sequence determination unit 42 database organization unit 50 calculation Unit 52 occurrence frequency reciprocal unit 54 duration reciprocal unit 58 segmentation unit 60 three-dimensional unit 62 decision filter creation unit 64 filtering processing unit 70 two-dimensional light-emitting diode array 72 two-dimensional liquid crystal panel 74 two-dimensional photo detector array Lee, 78 two-dimensional microlens array 80 Maharabinosu reference sequence generator 82 Maharabinosu distance calculator 84 situation determination unit 100 soundboard 102 Memory 104 CPU
106 speaker 108 display 112 notebook personal computer

Claims

For the collected frequency spectrum of the sound, a distribution of the integrated intensity of the frequency spectrum for each range divided into a predetermined frequency band is defined as a phoneme, and a database for storing the phoneme;
If the correlation coefficient between each phoneme stored in the database and the phoneme of the newly collected sound is equal to or less than a predetermined value for abnormality determination, the newly collected sound is an abnormal sound. A phoneme determination unit to determine;
A monitoring apparatus comprising: an action execution unit that executes a predetermined action when the phoneme determination unit determines that the newly collected sound is an abnormal sound.

The monitoring device according to claim 1,
In the database, the occurrence frequency of accumulated phonemes is calculated,
Furthermore, a correlation coefficient between the accumulated phonemes is obtained,
The accumulated phonemes are rearranged in the descending order of the correlation coefficient with the phoneme based on the phoneme having the high occurrence frequency,
A phoneme virtual three-dimensional space is constructed with the phoneme type as the first axis, the reciprocal of the phoneme duration as the second axis, and the reciprocal of the phoneme occurrence frequency as the third axis. A monitoring device characterized.

The monitoring device according to claim 2,
Image processing is performed on adjacent phonemes in the plane defined by the first axis and the second axis in the three-dimensional space in the database, and a feature value of the coupling relation with respect to the first axis is given. A monitoring device.

The monitoring device according to claim 2,
Constructing a filter in a three-dimensional space of phonemes constructed only from normal phonemes stored in the database,
A monitoring apparatus comprising: a determination unit that passes phoneme data of newly collected sound through the filter and determines that the newly collected sound is an abnormal sound based on an intensity ratio before and after the filter. .

Obtaining a frequency spectrum of the sound collected every predetermined time;
Obtaining an integrated intensity of the frequency spectrum for each range divided into a predetermined frequency band, defining a distribution of the integrated intensity as a phoneme, and storing it in a database;
Obtaining the phonemes for newly collected sounds; and
Obtaining a correlation coefficient between the phoneme of the newly collected sound and each phoneme stored in the database;
And a step of determining that the newly collected phoneme is different from the stored phoneme if the correlation coefficient is equal to or less than a predetermined value for data storage.

In the database construction method of Claim 5,
A database construction method including a step of storing the newly collected phonemes in the database only when it is determined that the newly collected phonemes are different from the stored phonemes.

In the database construction method of Claim 5,
Determining the frequency of occurrence of the accumulated phonemes;
Obtaining a correlation coefficient between the accumulated phonemes;
Rearranging each of the accumulated phonemes in descending order of the correlation coefficient with the phoneme having a high frequency of occurrence as a reference;
Constructing a three-dimensional phoneme space with the phoneme type as the first axis, the reciprocal of the phoneme duration as the second axis, and the reciprocal of the phoneme occurrence frequency as the third axis.

In the database construction method of Claim 7,
A database construction method including the step of using the three-dimensional space of the phonemes as a filter, passing the phoneme data of newly collected sound through the filter, and making the determination based on an intensity ratio before and after the filter.

In the database construction method of Claim 7,
Performing image processing on adjacent phonemes in a plane defined by a first axis and a second axis in the three-dimensional space in the database;
Providing a feature amount of a connection relationship with respect to the first axis.

In the database construction method of Claim 5,
Using phoneme data obtained within a predetermined time, defining a Mahalanobis reference space with the type of the phoneme data as a vertical matrix and the time axis as a horizontal matrix;
Calculating the distance of the maharabinos relative to the reference space of the maharabinos for the newly collected sound;
And recognizing that the state of the sound source has changed when recognizing that the maharabinos distance exceeds a predetermined value.

In the database construction method of Claim 5,
Obtaining a correlation coefficient between phonemes stored in the database;
Setting a predetermined value for data reduction that is smaller than the predetermined value for data storage, and determining that phonemes having correlation coefficients exceeding the predetermined value for data reduction are the same phoneme. Method.

In the database construction method of Claim 11,
A database construction method including a step of re-accumulating phonemes determined to be the same phoneme in the determination step as one phoneme.

In the database construction method of Claim 5,
A database construction method including a step of defining a group of phonemes to be changed as a phoneme string and storing it as a phoneme string database when a phoneme of the newly collected sound changes within a predetermined time.

The sound database construction method according to claim 13,
A database construction method including a step of leaving only the phoneme string data recognized to be generated in a predetermined time among the phoneme string data in the phoneme string database.