JP6881207B2

JP6881207B2 - Learning device, program

Info

Publication number: JP6881207B2
Application number: JP2017196976A
Authority: JP
Inventors: 祐太河内; 悠馬小泉; 登原田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 2017-10-10
Filing date: 2017-10-10
Publication date: 2021-06-02
Anticipated expiration: 2037-10-10
Also published as: JP2019070965A

Description

本発明は、例えば機械などの動作音から正常音と異常音を学習する学習装置、機械などの正常データと異常データを学習する学習装置、学習方法、プログラムに関する。 The present invention relates to, for example, a learning device for learning normal sounds and abnormal sounds from operating sounds of a machine or the like, a learning device for learning normal data and abnormal data of a machine or the like, a learning method, and a program.

機械などの故障を故障前に発見、または故障後に素早く発見することは、業務の継続性の観点で重要である。これを省力化するために、センサなどを用いて電気回路やプログラムにより、正常状態からの乖離である「異常」を発見する異常検知という分野が存在する。特に、マイクロフォンなどの音を電気に変換するセンサを用いるものを異常音検知と呼ぶ。 It is important from the viewpoint of business continuity to detect a failure of a machine or the like before the failure or quickly after the failure. In order to save labor, there is a field called anomaly detection that detects an "abnormality" that is a deviation from the normal state by using an electric circuit or a program using a sensor or the like. In particular, a sensor that uses a sensor that converts sound into electricity, such as a microphone, is called abnormal sound detection.

従来、故障などが発生していない、正常な状態であると考えられる音波形などのデータ（正常データ）を収集し、回帰モデルなどを用いて、正常データが従う分布を導出し、正常・異常が未知のサンプルに対する生成確率の小ささや回帰誤差の大きさを異常の度合いとし、正常・異常の判定を行う技術が代表的である。例えば、変分オートエンコーダ（非特許文献１）の再構成確率や再構成誤差を用いた従来技術（非特許文献２）がある。 Conventionally, data (normal data) such as sonic shapes that have not occurred and are considered to be in a normal state are collected, and a regression model or the like is used to derive a distribution that the normal data follows, and normal / abnormal. A typical technique is to determine normality / abnormality by using the small generation probability and the magnitude of regression error for an unknown sample as the degree of abnormality. For example, there is a conventional technique (Non-Patent Document 2) that uses a reconstruction probability and a reconstruction error of a variational autoencoder (Non-Patent Document 1).

異常音検知システムを実運用すると、正常データ以外の、故障時などに対応するごく少量の異常データが収集できることがある。しかしながら、正常データと異常データのデータ量が著しく偏るため、異常データを単純に判別問題として学習に用いることは難しい。これを解決する技術として両方のデータを用いて多様体学習を行う技術（非特許文献３）などがある。 When the abnormal sound detection system is actually operated, it may be possible to collect a very small amount of abnormal data other than normal data in case of failure. However, since the amount of normal data and abnormal data is significantly biased, it is difficult to simply use the abnormal data as a discrimination problem for learning. As a technique for solving this, there is a technique for performing manifold learning using both data (Non-Patent Document 3).

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." ArXiv preprint arXiv: 1312.6114 (2013). An, Jinwon, and Sungzoon Cho. Variational Autoencoder based Anomaly Detection using Reconstruction Probability. Technical Report, 2015.An, Jinwon, and Sungzoon Cho. Variational Autoencoder based Anomaly Detection using Reconstruction Probability. Technical Report, 2015. Du, Bo, and Liangpei Zhang. "A discriminative metric learning based anomaly detection method." IEEE Transactions on Geoscience and Remote Sensing 52.11 (2014): 6844-6857.Du, Bo, and Liangpei Zhang. "A discriminative metric learning based anomaly detection method." IEEE Transactions on Geoscience and Remote Sensing 52.11 (2014): 6844-6857.

変分オートエンコーダは確率分布を出力するため、推定結果のあいまいさの情報も反映して分布間距離尺度を用いた異常検知ができる。変分オートエンコーダにはこのようなメリットがあるが、これに対して異常データに関する確率分布を導入した手法は提案されていなかった。 Since the variational auto-encoder outputs the probability distribution, it is possible to detect anomalies using the inter-distribution distance scale by reflecting the ambiguity information of the estimation result. Variational autoencoders have such merits, but no method has been proposed to introduce a probability distribution for anomalous data.

そこで本発明では、異常データに関する確率分布を導入して学習に用いることができる学習装置を提供することを目的とする。 Therefore, an object of the present invention is to provide a learning device that can be used for learning by introducing a probability distribution related to abnormal data.

本発明の学習装置は学習部を含む。学習部は、正常音に基づくデータと異常音に基づくデータを学習データとし、変分オートエンコーダを用いて、正常音に基づくデータを第１の潜在変数の確率分布にエンコードし、異常音に基づくデータを第１の潜在変数の確率分布とは異なる第２の潜在変数の確率分布にエンコードするエンコーダと、エンコード結果をデコードするデコーダを学習する。 The learning device of the present invention includes a learning unit. The learning unit uses data based on normal sound and data based on abnormal sound as learning data, encodes data based on normal sound into the probability distribution of the first latent variable using a variational auto-encoder, and is based on abnormal sound. Learn an encoder that encodes data into a probability distribution of a second latent variable that is different from the probability distribution of the first latent variable, and a decoder that decodes the encoded result.

本発明の学習装置によれば、異常データに関する確率分布を導入して学習に用いることができる。 According to the learning device of the present invention, a probability distribution related to abnormal data can be introduced and used for learning.

第１の潜在変数の確率分布と、第２の潜在変数の確率分布を例示する図。The figure which illustrates the probability distribution of the 1st latent variable and the probability distribution of a 2nd latent variable. 実施例１の学習装置の構成を示すブロック図。The block diagram which shows the structure of the learning apparatus of Example 1. FIG. 実施例１の学習装置の学習動作を示すフローチャート。The flowchart which shows the learning operation of the learning apparatus of Example 1. 実施例１の学習装置の異常検知動作を示すフローチャート。The flowchart which shows the abnormality detection operation of the learning apparatus of Example 1. 実施例１の学習装置が学習するエンコーダとデコーダを説明する図。The figure explaining the encoder and the decoder which the learning apparatus of Example 1 learns. 正常音に基づくデータ、異常音に基づくデータのエンコード結果を示す図。The figure which shows the encoding result of the data based on a normal sound and the data based on an abnormal sound.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. The components having the same function are given the same number, and duplicate explanations will be omitted.

以下の実施例１に係る学習装置では、変分オートエンコーダ学習において、潜在変数事前分布に、正常、異常で異なる確率分布を設定した。例えば正常データに対しては潜在空間内で所定の原点に集まる（所定の原点およびその周辺において密となる）制約、異常データに対しては所定の原点から遠ざかる（所定の原点およびその周辺において疎となる）制約を導入してエンコーダ・デコーダの学習を行う。未知のテストデータ（検知対象データ）にエンコーダを適用し出力された潜在変数の所定の原点への近さに関する尺度を算出することで異常を判定できる。また、変分オートエンコーダの潜在変数空間を異常度算出に用いることにより、正常、異常それぞれの確率分布に対するＫＬダイバージェンスの差や比などを用いることが可能になり、潜在変数の推定結果のばらつきを考慮することができるようになるため、異常判定精度の向上が期待できる。 In the learning apparatus according to the first embodiment below, different probability distributions for normal and abnormal are set for the latent variable prior distribution in the variational auto-encoder learning. For example, for normal data, there is a constraint that gathers at a predetermined origin in the latent space (becomes dense at the predetermined origin and its surroundings), and for abnormal data, it moves away from the predetermined origin (sparse at the predetermined origin and its surroundings). Introduce constraints to learn encoders and decoders. An abnormality can be determined by applying an encoder to unknown test data (detection target data) and calculating a scale related to the proximity of the output latent variable to a predetermined origin. In addition, by using the latent variable space of the variational autoencoder to calculate the degree of anomaly, it is possible to use the difference and ratio of KL divergence to the probability distributions of normal and abnormal, and the variation of the estimation result of the latent variable can be used. Since it can be taken into consideration, improvement in abnormality determination accuracy can be expected.

＜異常分布の定義＞
機器動作音やその潜在表現ベクトルなどの、正常、異常両者のデータが点として存在する空間を考える。例えば、音響特徴を用いるのであれば、その音響特徴の次元数と同じ次元のユークリッド空間において、一つのサンプルは一つの点として表現できる。 <Definition of anomalous distribution>
Consider a space in which both normal and abnormal data such as device operating sounds and their latent expression vectors exist as points. For example, if an acoustic feature is used, one sample can be expressed as one point in the Euclidean space having the same dimension as the number of dimensions of the acoustic feature.

正常パターン以外は全て異常であるという観点から、この空間において、正常なサンプルは全て有限な正常領域Dの内部に存在するとし、異常なサンプルはその正常領域D以外の場所にのみ存在すると仮定する。 From the viewpoint that everything other than the normal pattern is abnormal, it is assumed that all normal samples exist inside the finite normal region D in this space, and that the abnormal samples exist only in places other than the normal region D. ..

データzの正常度合を表す分布として正常領域D上での一様分布U_D(z)を考える。これは単にデータが領域D内に含まれていれば、正常であるとするものである。つまりU_D(z)=c_D>0:正常とする。c_Dは、一様分布の定義域内での確率密度値である。異常については、領域D以外の場所D^￣に正の密度を持った一様分布U_D￣(z)を用いて、U_D￣(z)=c_D￣=α(c_D-U_D(z))>0：異常(αは適当な正の定数)と定義するのが自然である。
しかし、 _{Consider a uniform distribution U D} (z) on the normal domain D as a distribution representing the degree of normality of the data z. This is just normal if the data is contained within region D. That is, U _D (z) = c _D > 0: Normal. c _D is the probability density value within the definition of uniform distribution. The abnormal, uniform distribution U _{D -} having a positive density locations D ^¯ other than the region D by using a _{(z), U D¯ (z} ) = c D¯ = α (c D -U D ( z))> 0: It is natural to define it as anomalous (α is an appropriate positive constant).
But,

は矛盾であるため、そのような確率密度関数U_D￣(z)は存在しない。そこで、この条件を満たすために、別の正規分布R(z)を用い、 There is no such probability density function U _D￣ (z) because is a contradiction. Therefore, in order to satisfy this condition, another normal distribution R (z) is used.

を代わりに異常サンプルを表現する確率密度関数とする。Yはzに関する積分を1にするための正規化定数である。ここでは、一様分布で解析解を導出することが困難であるため、一様分布U_D(z)を平均0分散1の正規分布に置き換え、正規分布R(z)も平均0分散s²の正規分布とした Is instead a probability density function that represents an anomalous sample. Y is a normalization constant for making the integral with respect to z 1. Here, since it is difficult to derive an analytical solution with a uniform distribution, the uniform distribution U _D (z) is replaced with a normal distribution with a mean 0 variance 1, and the normal distribution R (z) is also a mean 0 variance s ² With a normal distribution of

を異常分布とした例を示す（図１の破線参照）。なお、sの値は通常は実験的に決定されるハイパーパラメータである。図１の例において、正常分布p(z)は、平均0分散1の正規分布であって、原点Ｏおよびその周辺において密となる特徴を有し、異常分布p^￣(z)は、原点Ｏおよびその周辺において疎となる特徴を有する。 Is shown as an example of an abnormal distribution (see the broken line in FIG. 1). The value of s is a hyperparameter that is usually determined experimentally. In the example of FIG. 1, the normal distribution p (z) is a normal distribution with a mean 0 variance 1 and has a characteristic of being dense in and around the origin O, and the anomalous distribution p ^￣ (z) is the origin O. It has the characteristic of being sparse in and around it.

図２を参照して本実施例の学習装置１の構成を説明する。同図に示すように、本実施例の学習装置１は、ＡＤ変換部１１と、前処理部１２と、正常データ記憶部１２ａと、異常データ記憶部１２ｂと、検知対象データ記憶部１２ｃと、学習部１３と、エンコーダ・デコーダ記憶部１３ａと、異常度算出部１４と、異常判定部１５を含む。以下、図３、図４を参照して各構成要件の動作について説明する。 The configuration of the learning device 1 of this embodiment will be described with reference to FIG. As shown in the figure, the learning device 1 of this embodiment includes an AD conversion unit 11, a preprocessing unit 12, a normal data storage unit 12a, an abnormal data storage unit 12b, a detection target data storage unit 12c, and the like. It includes a learning unit 13, an encoder / decoder storage unit 13a, an abnormality degree calculation unit 14, and an abnormality determination unit 15. Hereinafter, the operation of each configuration requirement will be described with reference to FIGS. 3 and 4.

＜ＡＤ変換（Ｓ１１−１）＞
まず、ＡＤ変換部１１は、正常音（例えば機械の正常動作音など）の音波形を適切なサンプリング周波数でＡＤ（アナログデジタル)変換し、量子化した波形データを用意する（Ｓ１１−１）。ＡＤ変換部１１は、異常音（例えば機械の異常動作音など）の音波形を、正常音と同様に、ＡＤ変換する（Ｓ１１−１）。 <AD conversion (S11-1)>
First, the AD conversion unit 11 AD (analog-digital) converts the sound wave shape of a normal sound (for example, a normal operation sound of a machine) at an appropriate sampling frequency, and prepares quantized waveform data (S11-1). The AD conversion unit 11 AD-converts the sound wave form of an abnormal sound (for example, an abnormal operation sound of a machine) in the same manner as the normal sound (S11-1).

＜データ前処理（Ｓ１２−１）＞
前処理部１２は、ＡＤ変換した正常音のデータ、異常音のデータに対し、複数サンプルの連結、離散フーリエ変換、フィルタバンク処理などを用いて多次元に拡張する特徴抽出処理や、データの平均、分散を計算して値の取り幅を正規化するなどの前処理を実行する（Ｓ１２−１）。なお、前処理部１２は必須の構成要件ではない。正常音のデータ、異常音のデータを前処理せずに、そのまま１次元の値が時系列に並んだデータとして扱ってもよい。 <Data preprocessing (S12-1)>
The pre-processing unit 12 performs feature extraction processing that expands AD-converted normal sound data and abnormal sound data in multiple dimensions by using concatenation of a plurality of samples, discrete Fourier transform, filter bank processing, etc., and data averaging. , Perform preprocessing such as calculating the variance and normalizing the range of values (S12-1). The preprocessing unit 12 is not an indispensable constituent requirement. Normal sound data and abnormal sound data may be treated as data in which one-dimensional values are arranged in chronological order without preprocessing.

ステップＳ１２−１実行後の正常音に基づくデータを正常データと呼び、ステップＳ１２−１実行後の異常音に基づくデータを異常データと呼ぶ。ステップＳ１２−１を省略する場合には、ステップＳ１１−１実行後の正常音に基づくデータを正常データと呼び、ステップＳ１１−１実行後の異常音に基づくデータを異常データと呼ぶ。正常データは、正常データ記憶部１２ａに記憶される。異常データは、異常データ記憶部１２ｂに記憶される。 The data based on the normal sound after the execution of step S12-1 is called normal data, and the data based on the abnormal sound after the execution of step S12-1 is called abnormal data. When step S12-1 is omitted, the data based on the normal sound after the execution of step S11-1 is referred to as normal data, and the data based on the abnormal sound after the execution of step S11-1 is referred to as abnormal data. The normal data is stored in the normal data storage unit 12a. The abnormal data is stored in the abnormal data storage unit 12b.

＜変分オートエンコーダ学習（Ｓ１３）＞
学習部１３は、正常データと異常データを学習データとし、変分オートエンコーダを用いて、正常データを第１の潜在変数の確率分布（正常分布）にエンコードし、異常データを第１の潜在変数の確率分布とは異なる第２の潜在変数の確率分布（異常分布）にエンコードするエンコーダと、エンコード結果をデコードするデコーダを学習する（Ｓ１３）。 <Variational autoencoder learning (S13)>
The learning unit 13 uses normal data and abnormal data as training data, encodes the normal data into the probability distribution (normal distribution) of the first latent variable using the variable auto encoder, and encodes the abnormal data into the first latent variable. Learn an encoder that encodes into a probability distribution (abnormal distribution) of a second latent variable different from the probability distribution of (S13) and a decoder that decodes the encoding result (S13).

以下、変分オートエンコーダを用いたエンコーダ・デコーダの学習例を示す。以下では主に、潜在変数の次元が１次元の場合について説明するが、多次元においても、最適化基準の次元ごとの和を取ることで同様のことが言える。正常データには、通常の変分オートエンコーダを用いる。これにより、学習部１３は、データx(ベクトルでもスカラーでもよい)を、xに対応した潜在変数の分布N(z;μ_x,σ² _x)に変換するパラメータφを持ったエンコーダ１３１（q(z|x;φ)）と、潜在変数zをxに再構成するパラメータθを持ったデコーダ１３２（p(x|z;θ)）を学習する（図５参照）。簡単のため、μ_x,σ² _xの添え字xは以降省略する。 The following is an example of learning an encoder / decoder using a variational autoencoder. In the following, the case where the dimension of the latent variable is one-dimensional will be mainly described, but the same can be said for multiple dimensions by taking the sum of each dimension of the optimization criterion. A normal variational autoencoder is used for normal data. As a result, the learning unit 13 has an encoder 131 (q) having a parameter φ that converts the data x (which may be a vector or a scalar) into the distribution N (z; μ _x , σ ² _{x) of the latent variables corresponding to x.} (z | x; φ)) and the decoder 132 (p (x | z; θ)) having the parameter θ that reconstructs the latent variable z into x are learned (see FIG. 5). For the sake of simplicity, _{the subscript x} _{of μ x} and σ ² x will be omitted hereafter.

最適化基準は、パラメータθを持ったモデルでのデータxの周辺尤度最大化基準 The optimization criterion is the marginal likelihood maximization criterion of the data x in the model with the parameter θ.

である。これに対し、学習部１３は、変分下限 Is. On the other hand, the learning unit 13 has a variational lower limit.

を求め、分布q(z|x;φ)をこの分布からのサンプル数Lの経験分布によって近似した、 Was obtained, and the distribution q (z | x; φ) was approximated by the empirical distribution of the number of samples L from this distribution.

を代わりに最適化する。具体的な最適化基準は、再構成誤差項とエンコーダq(z|x;φ)の、第１の潜在変数の確率分布（正常分布p(z)）に対するカルバック・ライブラー情報量を表す式の和 Optimize instead. The specific optimization criterion is an equation that expresses the amount of Kullback-Leibler information for the probability distribution (normal distribution p (z)) of the first latent variable of the reconstruction error term and encoder q (z | x; φ). Sum

である。前半は再構成誤差と見做せ、後半は潜在変数制約項と見做せる。係数C_nは２項のバランスを取る為の項である。学習部１３は、これを正常データに対する学習基準としてエンコーダおよびデコーダの学習を実行する。これに対して、変分オートエンコーダの式における事前分布に前述の異常分布p^￣(z)を代入する。すなわち、エンコーダq(z|x;φ)の、第２の潜在変数の確率分布（異常分布p^￣(z)）に対するカルバック・ライブラー情報量を表す式は、 Is. The first half can be regarded as a reconstruction error, and the second half can be regarded as a latent variable constraint term. The coefficient C _n is a term for balancing the two terms. The learning unit 13 executes learning of the encoder and the decoder using this as a learning reference for normal data. On the other hand, the above-mentioned anomalous distribution p ^￣ (z) is substituted for the prior distribution in the variational autoencoder equation. That is, the equation expressing the Kullback-Leibler information amount for ^{the probability distribution (abnormal distribution p ￣} (z)) of the second latent variable of the encoder q (z | x; φ) is

となる。上式のlogの中の足し算を分解するため、テイラー展開した結果の１次までの項のみを用いる近似を導入し、これを解くと Will be. In order to decompose the addition in the log of the above equation, we introduce an approximation that uses only the terms up to the first order of the Taylor expansion result, and solve this.

となる。最適化における基準としては、 Will be. As a standard in optimization,

となる。その際、係数C_aを同様にバランスを取る項として導入した。なお、上述のlogの中の足し算を分解するため、テイラー展開した結果のｎ次（ｎは２以上の自然数）までの項を用いる近似式を用いてもよい。学習部１３は、これを異常データに対する学習基準としてエンコーダおよびデコーダの学習を実行する。潜在変数zが多次元の場合、p^￣(z)が各次元独立であるすると、各次元での上記の最適化基準の和とできるから、潜在変数次元数をJとして Will be. At that time, the coefficient C _a was introduced as a similarly balancing term. In addition, in order to decompose the addition in the above log, an approximate expression using terms up to the nth order (n is a natural number of 2 or more) of the Taylor expansion result may be used. The learning unit 13 executes learning of the encoder and the decoder using this as a learning reference for abnormal data. When the latent variable z is multidimensional, if p ^￣ (z) is independent of each dimension, it can be the sum of the above optimization criteria in each dimension. Therefore, let J be the number of dimensions of the latent variable.

を多次元の場合の基準としてもよい。この最適化基準の加算で連結されている項の一部のみを用いて学習を行ってもよいし、各項を任意に定数倍して各項の間の重みのバランスを変えてもよい。さらに定数となる係数を除去してもよい。 May be used as a reference in the case of multiple dimensions. Learning may be performed using only a part of the terms connected by the addition of the optimization criteria, or each term may be arbitrarily multiplied by a constant to change the balance of weights between the terms. Further, the constant coefficient may be removed.

潜在変数の次元数が大きい場合、球面集中現象により中心に近い部分は出にくくなることが想定される。そのようなケースでは、単に正常分布より分散の大きいガウス事前分布を異常分布としてもよいと考えられる。つまり、異常分布を When the number of dimensions of a latent variable is large, it is assumed that the part near the center is difficult to appear due to the spherical concentration phenomenon. In such a case, the Gauss prior distribution, which has a larger variance than the normal distribution, may be used as the anomalous distribution. That is, the abnormal distribution

とし、 age,

として異常データに対する学習を行ってもよい。正常データと異常データの学習は、任意の回数ずつ交互に行ってもよいし、確率的勾配法を用いる場合は、正常、異常毎にバッチを作成して学習してもよいし、勾配法の誤差計算ごとに、正常、異常データに対する誤差を計算し、その平均を取って全体の誤差としてもよい。また、正常データと異常データのデータ量の偏りに配慮して、各勾配を定数倍するなどの工夫をしてもよい。 You may also learn about abnormal data. The normal data and the abnormal data may be learned alternately at any number of times, and when the stochastic gradient method is used, a batch may be created and learned for each normal or abnormal, or the gradient method may be used. For each error calculation, the error for normal and abnormal data may be calculated, and the average may be taken as the total error. Further, in consideration of the bias in the amount of normal data and abnormal data, each gradient may be multiplied by a constant.

＜異常度算出（Ｓ１４）＞
異常度算出部１４は、正常であるか異常であるかが未知のデータであって、異常を検知する対象のデータである検知対象データ（Ｓ１１−２のＡＤ変換、Ｓ１２−２の前処理を実行し、検知対象データ記憶部１２ｃに記憶済み）に対して、変分オートエンコーダのエンコーダによりエンコードして得られた潜在変数の事後確率分布q(z│x;φ)=N(z;μ,σ²)を用いて異常度を算出する（Ｓ１４）。 <Abnormality calculation (S14)>
The abnormality degree calculation unit 14 performs detection target data (AD conversion in S11-2, preprocessing in S12-2), which is data whose normality or abnormality is unknown and is target data for detecting an abnormality. The posterior probability distribution q (z│x; φ) = N (z; μ) of the latent variable obtained by executing and encoding the data stored in the data storage unit 12c to be detected by the encoder of the variational autoencoder. , Σ ² ) is used to calculate the degree of anomaly (S14).

異常の程度に比例する値(異常度または正常度)の算出は複数の方法が考えられる。図６に示すように、正常データのエンコード結果１３１ａは、正常分布（同図の破線）の中に現れ、異常データのエンコード結果１３１ｂは、正常分布の外（異常分布の中）に現れるため、ステップＳ１４では、この特徴を利用して異常度算出を実行する。 Multiple methods can be considered for calculating the value (abnormality or normality) proportional to the degree of abnormality. As shown in FIG. 6, the normal data encoding result 131a appears in the normal distribution (broken line in the figure), and the abnormal data encoding result 131b appears outside the normal distribution (inside the abnormal distribution). In step S14, the degree of abnormality is calculated by utilizing this feature.

すなわち、異常度算出部１４は、潜在変数の推定平均ベクトルμのみ用い、例えば
（１）検知対象データをエンコードして得られた潜在変数の推定平均ベクトルの、第１の潜在変数の確率分布（正常分布）に対する尤度である第１の尤度
（２）検知対象データをエンコードして得られた潜在変数の推定平均ベクトルの、第２の潜在変数の確率分布（異常分布）に対する尤度である第２の尤度
（３）第１の尤度と、第２の尤度の尤度比
を用いて異常度を算出してもよい。また、異常度算出部１４は、潜在変数推定平均ベクトルμと推定分散ベクトルσ²を両方用い、分布間距離尺度として、
（４）検知対象データをエンコードして得られた潜在変数の、第１の潜在変数の確率分布（正常分布）に対するカルバック・ライブラー情報量である第１のＫＬダイバージェンス That is, the abnormality calculation unit 14 uses only the estimated average vector μ of the latent variable, and for example, (1) the probability distribution of the first latent variable of the estimated average vector of the latent variable obtained by encoding the detection target data (1) First likelihood which is the likelihood for (normal distribution) (2) Likelihood of the estimated average vector of the latent variable obtained by encoding the detection target data with respect to the probability distribution (abnormal distribution) of the second latent variable. A second likelihood (3) The degree of anomaly may be calculated using the likelihood ratio of the first likelihood and the second likelihood. Further, the anomaly degree calculation unit 14 uses both the latent variable estimated average vector μ and the estimated variance vector σ ² as an inter-distribution distance scale.
(4) The first KL divergence, which is the amount of Kullback-Leibler information for the probability distribution (normal distribution) of the first latent variable of the latent variable obtained by encoding the detection target data.

（５）検知対象データをエンコードして得られた潜在変数の、第２の潜在変数の確率分布（異常分布）に対するカルバック・ライブラー情報量である第２のＫＬダイバージェンス (5) The second KL divergence, which is the amount of Kullback-Leibler information for the probability distribution (abnormal distribution) of the second latent variable of the latent variable obtained by encoding the detection target data.

（６）第１のＫＬダイバージェンスと、第２のＫＬダイバージェンスの比 (6) Ratio of the first KL divergence to the second KL divergence

などの事後確率分布q(z│x;φ)を用いたいずれの尺度を用いて異常度を算出してもよい。負の対数を取るなど、順序が入れ替わらない操作や、順序の完全な反転をしてもよい。また、変分下限の値等、上記の条件とエンコードおよびデコードに伴う再構成誤差を加算などで組み合わせてもよいし、モデル学習のみに異常データに関する確率分布を導入した方法を適用し、異常度の算出には再構成誤差のみを用いてもよい。 The degree of anomaly may be calculated using any scale using the posterior probability distribution q (z│x; φ) such as. Operations that do not change the order, such as taking a negative logarithm, or complete reversal of the order may be used. In addition, the above conditions such as the value of the lower limit of variation and the reconstruction error due to encoding and decoding may be combined by addition, etc., or the method of introducing the probability distribution for abnormal data is applied only to model learning, and the degree of abnormality is applied. You may use only the reconstruction error in the calculation of.

＜異常判定（Ｓ１５）＞
異常判定部１５は、異常検知手法で用いられる方法を用いて、ステップＳ１４で算出した異常度から異常を判定する（Ｓ１５）。例えば異常判定部１５は、異常度が予め定めた閾値を超過するか否かを判断基準として、異常判定を行えばよい。 <Abnormality judgment (S15)>
The abnormality determination unit 15 determines an abnormality from the degree of abnormality calculated in step S14 by using the method used in the abnormality detection method (S15). For example, the abnormality determination unit 15 may perform an abnormality determination based on whether or not the degree of abnormality exceeds a predetermined threshold value as a determination criterion.

＜変形例＞
上述の実施例１では、正常「音」、異常「音」に注目して異常「音」の検出に用いるエンコーダ・デコーダを学習する学習装置を開示したが、本発明における正常、異常の判断対象は音に限定されない。本発明の判断対象は、正常、異常の何れかに判断可能なデータ全般に及ぶ。従って、実施例１を変形し、学習部が、正常なデータと異常なデータ（音のデータに限定されない）を学習データとし、変分オートエンコーダを用いて、正常なデータを第１の潜在変数の確率分布（正常分布）にエンコードし、異常なデータを第１の潜在変数の確率分布とは異なる第２の潜在変数の確率分布（異常分布）にエンコードするエンコーダと、エンコード結果をデコードするデコーダを学習する構成としてもよい。変形例にかかる学習装置は、音以外の、温度、圧力、変位、などの任意のセンサデータ、ネットワーク通信量などのトラフィックデータなどを対象とした異常検知に適用することができる。 <Modification example>
In Example 1 described above, a learning device for learning an encoder / decoder used for detecting an abnormal "sound" by paying attention to a normal "sound" and an abnormal "sound" has been disclosed. Is not limited to sound. The subject of determination of the present invention covers all data that can be determined to be normal or abnormal. Therefore, the first embodiment is modified, and the learning unit uses normal data and abnormal data (not limited to sound data) as learning data, and uses a variable auto encoder to use normal data as a first latent variable. An encoder that encodes into the probability distribution (normal distribution) of, and encodes abnormal data into a probability distribution (abnormal distribution) of a second latent variable that is different from the probability distribution of the first latent variable, and a decoder that decodes the encoding result. It may be configured to learn. The learning device according to the modified example can be applied to anomaly detection targeting arbitrary sensor data such as temperature, pressure, displacement, and traffic data such as network communication volume other than sound.

＜効果＞
変分オートエンコーダを用いた異常音検知において、正常、異常両方のデータを用いた学習により高精度に異常を検知できる。その際、モデル学習の計算量は、制約項を変更するだけであるからほとんど増加しない。 <Effect>
In the abnormal sound detection using the variational autoencoder, the abnormality can be detected with high accuracy by learning using both normal and abnormal data. At that time, the computational complexity of model learning hardly increases because only the constraint term is changed.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplement>
The device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Communication unit to which can be connected, CPU (Central Processing Unit, cache memory, registers, etc.), RAM and ROM which are memories, external storage device which is a hard disk, and input units, output units, and communication units of these. , CPU, RAM, ROM, has a connecting bus so that data can be exchanged between external storage devices. Further, if necessary, a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity. A physical entity equipped with such hardware resources includes a general-purpose computer and the like.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores the program required to realize the above-mentioned functions and the data required for processing this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in the external storage device (or ROM, etc.) and the data necessary for processing each program are read into the memory as needed, and are appropriately interpreted, executed, and processed by the CPU. .. As a result, the CPU realizes a predetermined function (each configuration requirement represented by the above, ... Department, ... means, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. ..

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by a computer, the processing content of the function that the hardware entity should have is described by a program. Then, by executing this program on the computer, the processing function in the above hardware entity is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなど、どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープなどを、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）などを、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）などを、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）などを用いることができる。 The program describing the processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape, or the like as a magnetic recording device is used as an optical disk, and a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), or a CD-ROM (Compact Disc Read Only) is used as an optical disk. Memory), CD-R (Recordable) / RW (ReWritable), etc., MO (Magneto-Optical disc) as a magneto-optical recording medium, EP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. as a semiconductor memory Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体を販売、譲渡、貸与などすることによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 Further, the distribution of this program is performed, for example, by selling, transferring, or renting a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータなど）を含むものとする。 A computer that executes such a program first, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own recording medium and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially. In addition, the above processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition without transferring the program from the server computer to this computer. May be. The program in this embodiment includes information to be used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, in this form, the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.

Claims

The data based on the normal sound and the data based on the abnormal sound are used as training data, and the data based on the normal sound is encoded into the probability distribution of the first latent variable by using the variable auto encoder, and the data based on the abnormal sound is used. Kullback for the probability distribution of the encoder for encoding the probability distribution of the different second latent variables and the probability distribution of the first latent variable, a decoder for decoding the encoded result, the pre-Symbol encoder, the second latent variables - an expression representing a Leibler information amount resulting from Taylor expansion only an approximation equation used terms up to first order, as the learning reference for data based on the abnormal sound, the learning device for learning.

The data based on the normal sound and the data based on the abnormal sound are used as training data, and the data based on the normal sound is encoded into the probability distribution of the first latent variable by using the variable auto encoder, and the data based on the abnormal sound is used. Kullback for the probability distribution of the encoder for encoding the probability distribution of the different second latent variables and the probability distribution of the first latent variable, a decoder for decoding the encoded result, the pre-Symbol encoder, the second latent variables -Leibler information amount representing the Taylor expansion as a result of n-th order equation (n is a natural number of 2 or more) an approximate expression using only terms up, as learning reference for data based on the abnormal sound, the learning device learns ..

A program that causes a computer to function as the learning device according to claim 1 or 2.