WO2022269690A1 - 異常検知装置、異常検知方法および異常検知プログラム - Google Patents
異常検知装置、異常検知方法および異常検知プログラム Download PDFInfo
- Publication number
- WO2022269690A1 WO2022269690A1 PCT/JP2021/023416 JP2021023416W WO2022269690A1 WO 2022269690 A1 WO2022269690 A1 WO 2022269690A1 JP 2021023416 W JP2021023416 W JP 2021023416W WO 2022269690 A1 WO2022269690 A1 WO 2022269690A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- time
- contribution
- feature
- anomaly
- abnormality
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 140
- 230000005856 abnormality Effects 0.000 title claims abstract description 78
- 238000000605 extraction Methods 0.000 claims abstract description 75
- 238000004364 calculation method Methods 0.000 claims abstract description 35
- 239000000284 extract Substances 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 27
- 238000000034 method Methods 0.000 description 80
- 238000012545 processing Methods 0.000 description 78
- 238000011156 evaluation Methods 0.000 description 68
- 238000010586 diagram Methods 0.000 description 41
- 230000002159 abnormal effect Effects 0.000 description 31
- 238000005516 engineering process Methods 0.000 description 19
- 238000012854 evaluation process Methods 0.000 description 12
- 238000013527 convolutional neural network Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 10
- 101150064138 MAP1 gene Proteins 0.000 description 9
- 101100400452 Caenorhabditis elegans map-2 gene Proteins 0.000 description 6
- 230000004913 activation Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 4
- 235000008733 Citrus aurantifolia Nutrition 0.000 description 2
- 235000011941 Tilia x europaea Nutrition 0.000 description 2
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 239000004571 lime Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2123/00—Data types
- G06F2123/02—Data types in the time domain, e.g. time-series data
Definitions
- the present invention relates to an anomaly detection device, an anomaly detection method, and an anomaly detection program.
- Anomaly detection using machine learning technology creates a model by unsupervised learning using normal data when anomalies occur very infrequently. Then, an abnormality score representing deviation from the normal state is calculated. A threshold is set for the calculated abnormality score to determine whether it is abnormal or normal.
- machine learning anomaly detection a method that can be applied regardless of whether it is time series data by treating each sample independently, and a time series that considers the order of samples in that range by setting a time window There is a method for data (hereinafter referred to as "time series anomaly detection").
- the time window set for time-series anomaly detection refers to a window that divides the time-series data into certain intervals.
- the behavior is learned using the data within the time window while shifting in the time direction.
- Time-series anomaly detection learns the behavior of time-series data in normal times, and calculates an anomaly score using the prediction error, which is the difference between the predicted value and the measured value. Samples with behavior similar to normal time-series data learned during model training have small prediction errors, while untrained samples have large prediction errors. is possible.
- anomaly detection using machine learning technology judges that an anomaly has occurred based on whether the anomaly score of a predicted sample exceeds a preset anomaly judgment threshold, and identifies the time when an anomaly occurred. do.
- additional analysis such as checking the behavior before and after the sample exceeding the threshold Is required.
- the degree of contribution indicates the degree of influence on the result output by the machine learning model, and it can be determined that the greater the degree of contribution, the greater the cause of the abnormality.
- There is also a technique that outputs a degree of contribution in consideration of the temporal relationship of input data and indicates which time period of the input data contributed to the classification result see, for example, Non-Patent Document 1).
- each sample is handled independently in the time direction, so the degree of contribution is output without considering the context of the data in the time direction.
- the above technology that outputs the degree of contribution considering the temporal relationship of the input data is a method for supervised learning and classification problems, so it is not suitable for anomaly detection technology without supervised data. Not applicable.
- an anomaly detection device includes an acquisition unit that acquires time-series data of a detection target in which an anomaly is detected at a predetermined time, and from the time-series data: , a first extraction unit for extracting features in the feature amount direction in a time interval before the predetermined point in time; and a second extraction unit for extracting features in the time direction in the time interval from the features in the feature amount direction. , calculating an anomaly score at a predetermined point in time based on the features in the feature direction and the features in the time direction, and contributing to the anomaly score in the feature direction and the contribution in the time direction prior to the predetermined time point; and a calculating unit for calculating.
- an anomaly detection method is an anomaly detection method executed by an anomaly detection device, comprising: an acquisition step of acquiring time-series data of a detection target in which an anomaly is detected at a predetermined time; A first extraction step of extracting features in the feature amount direction in a time interval before the predetermined point in time from the data, and a second extraction step of extracting features in the time direction in the time interval from the features in the feature amount direction. and calculating an anomaly score at a predetermined time based on the features in the feature direction and the features in the time direction, and determining the contribution of the feature direction and the time direction to the anomaly score before the predetermined time. and a calculating step of calculating the degree of contribution.
- an anomaly detection program includes an acquisition step of acquiring time-series data of a detection target in which an anomaly is detected at a predetermined point in time; a first extraction step of extracting features in the direction of quantity; a second extraction step of extracting features in the direction of time in the time interval from the features in the direction of feature quantity; calculating an anomaly score at a predetermined point in time based on the features, and calculating a contribution in the feature amount direction and a contribution in the time direction to the anomaly score before the predetermined point in time; It is characterized by
- FIG. 1 is a diagram showing an example of an anomaly detection system according to the first embodiment.
- FIG. 2 is a block diagram showing a configuration example of the abnormality detection device according to the first embodiment.
- FIG. 3 is a diagram illustrating an example of architecture of a learning model according to the first embodiment.
- FIG. 4 is a diagram illustrating an example of feature extraction processing according to the first embodiment.
- FIG. 5 is a diagram showing an example of learning data according to the first embodiment.
- FIG. 6 is a diagram showing an example of evaluation data according to the first embodiment.
- FIG. 7 is a diagram illustrating an example of data processing according to the first embodiment.
- FIG. 8 is a diagram illustrating an example of data processing according to the first embodiment.
- FIG. 9 is a diagram illustrating an example of data processing according to the first embodiment.
- FIG. 1 is a diagram showing an example of an anomaly detection system according to the first embodiment.
- FIG. 2 is a block diagram showing a configuration example of the abnormality detection device according to the
- FIG. 10 is a diagram illustrating an example of the flow of evaluation processing for abnormality detection accuracy according to the first embodiment.
- FIG. 11 is a diagram illustrating an example of an abnormality detection accuracy evaluation process according to the first embodiment.
- FIG. 12 is a diagram illustrating an example of an abnormality detection accuracy evaluation process according to the first embodiment.
- FIG. 13 is a diagram illustrating an example of the flow of contribution degree evaluation processing according to the first embodiment.
- 14A and 14B are diagrams illustrating an example of calculation processing of the degree of contribution in the feature amount direction according to the first embodiment.
- FIG. 15 is a diagram illustrating an example of calculation processing of contribution in the time direction according to the first embodiment.
- FIG. 16 is a diagram illustrating an example of contribution evaluation processing according to the first embodiment.
- FIG. 17 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 18 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 19 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 20 is a diagram explaining an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 21 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 22 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 23 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment.
- FIG. 24 is a diagram illustrating an example of an evaluation result of effectiveness of the learning model architecture according to the first embodiment
- FIG. 25 is a diagram illustrating an example of an evaluation result of the effectiveness of the learning model architecture according to the first embodiment
- FIG. 26 is a flowchart illustrating an example of the flow of overall processing according to the first embodiment.
- FIG. 27 is a diagram showing a computer executing a program.
- Embodiments of an anomaly detection device, an anomaly detection method, and an anomaly detection program according to the present invention will be described in detail below with reference to the drawings. In addition, this invention is not limited by embodiment described below.
- FIG. 1 is a diagram showing an example of an anomaly detection system according to the first embodiment. This system has an anomaly detection device 10 . Note that the abnormality detection system shown in FIG. 1 may include a plurality of abnormality detection devices 10 .
- time-series data 20 is involved as data acquired by the anomaly detection device 10 .
- the time-series data 20 is data that considers the order of each sample and includes time-series information.
- a single time-series anomaly detection model can be used to calculate anomaly scores, and regardless of whether the anomaly scores are high or low, the contribution to identifying the time and feature value that is thought to be the cause of anomalies can be calculated.
- An example of anomaly detection processing based on a convolutional neural network (hereinafter referred to as "CNN") that can be calculated will be described.
- the anomaly detection device 10 acquires time-series data 20 . At this time, it is desirable that the processing of the anomaly detection device 10 not only detects an anomaly from the anomaly score, but also considers the influence from the time before the anomaly score rises, and finds the feature amount and time that contributed to the anomaly score ( See FIG. 1 (1)).
- the anomaly detection device 10 identifies the cause of the anomaly by tracing back a specific time period from the time of occurrence of the anomaly (see (2) in FIG. 1).
- the cause of the abnormality at the abnormality occurrence time t is specified in the section from the time t ⁇ w to t ⁇ 1, going back w hours from the abnormality occurrence time t.
- the anomaly detection device 10 calculates the degree of contribution in the feature amount direction based on the time-series data 20 (see (3) in FIG. 1).
- the feature amount that affected the abnormality score at time t is the feature amount related to sensor A and sensor E.
- the feature amount that affected the abnormality score at time t is the feature amount related to sensor A and sensor E.
- the anomaly detection device 10 also calculates the degree of contribution in the time direction based on the time-series data 20 (see (4) in FIG. 1). In the example of FIG. 1, it can be seen from the calculated contribution that the time that affected the abnormality score at time t is the latter half of the interval from time t ⁇ w to t ⁇ 1.
- the process of the anomaly detection device 10 makes it possible to grasp not only the feature quantity that causes an anomaly, but also the temporal relevance (see (5) in FIG. 1).
- the contributions in the feature amount direction and the time direction are calculated for a certain amount of time (w hours in FIG. 1) from the time when the anomaly is to be specified. This makes it easier to identify the cause of time-series anomaly detection. That is, this system can perform cause identification in consideration of time-series property in addition to time-series anomaly detection.
- RNN Recurrent Neural Network
- LSTM Long Short Time Memory
- the reconstruction error is a value calculated for each feature amount by the difference between the input layer and the output layer of a model having an input layer, an intermediate layer, and an output layer.
- the reconstruction error can be calculated by any technique such as autoencoder or principal component analysis that obtains a compressed representation of data in the intermediate layer.
- the reconstruction error of each feature value is reduced by correct reconstruction in the output layer, and samples that behave differently from normal data are reduced.
- the reconstruction in the output layer fails and the reconstruction error increases. Therefore, the reconstruction error is visualized, the statistic is calculated, etc., and the feature value with a large value is estimated as the cause of the abnormality.
- both the anomaly cause identification technology using the reconstruction error and the technology that outputs the degree of contribution from the trained model described above handle each sample independently in the time direction, so the context of the data in the time direction is not considered. Contribution is output. Therefore, it is insufficient as a cause estimation technique for time-series anomaly detection.
- MTEX-CNN (see, for example, Non-Patent Document 1) can be cited as a technique for outputting the degree of contribution in consideration of the temporal relationship of data.
- MTEX-CNN uses supervised learning to create a sequence classification model, and outputs contributions using Grad-CAM, which can use the values output by the last convolutional layer of the CNN to present the decision basis. do.
- MTEX-CNN can perform time series classification and contribution output with the same model. , it is possible to output the degree of contribution in the time direction, which indicates which time of the input data contributed to the classification result.
- MTEX-CNN is a method for supervised learning and classification problems, some ingenuity is required to apply it to unsupervised anomaly detection.
- FIG. 2 is a block diagram showing a configuration example of the abnormality detection device according to the first embodiment.
- the anomaly detection device 10 has an input unit 11 , an output unit 12 , a communication unit 13 , a storage unit 14 and a control unit 15 .
- the input unit 11 controls input of various information to the abnormality detection device 10 .
- the input unit 11 is implemented by a mouse, a keyboard, or the like, and receives input such as setting information to the abnormality detection device 10 .
- the output unit 12 controls output of various information from the abnormality detection device 10 .
- the output unit 12 is realized by a display or the like, and outputs setting information or the like stored in the abnormality detection device 10 .
- the communication unit 13 manages data communication with other devices. For example, the communication unit 13 performs data communication with each communication device. Further, the communication unit 13 can perform data communication with an operator's terminal (not shown).
- the storage unit 14 stores various information referred to when the control unit 15 operates and various information acquired when the control unit 15 operates.
- the storage unit 14 can be realized by, for example, a RAM (Random Access Memory), a semiconductor memory device such as a flash memory, or a storage device such as a hard disk or an optical disk.
- the storage unit 14 is installed inside the anomaly detection device 10, but it may be installed outside the anomaly detection device 10, and a plurality of storage units may be installed. good.
- the control unit 15 controls the entire abnormality detection device 10 .
- the control unit 15 has an acquisition unit 15a, a first extraction unit 15b, a second extraction unit 15c, a calculation unit 15d, and an identification unit 15e.
- the control unit 15 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
- the acquisition unit 15a acquires time-series data of a detection target in which an abnormality is detected at a predetermined point in time. For example, the acquisition unit 15a acquires data including sensor values transmitted from a plurality of sensors at each time. On the other hand, the acquisition unit 15a outputs the acquired time-series data to the first extraction unit 15b. Further, the acquiring unit 15a may store the acquired time-series data in the storage unit 14. FIG.
- the first extraction unit 15b extracts features in the feature amount direction in time intervals before a predetermined point in time from the time-series data. For example, the first extraction unit 15b performs two-dimensional convolution on each feature amount of the time-series data to extract features in the feature amount direction. The first extraction unit 15b also outputs a first feature map (feature map 1) as a feature in the feature direction.
- feature map 1 a feature map
- the first extraction unit 15b performs two-dimensional convolution twice on each feature amount of time-window w and d-dimensional time-series data, and creates a feature amount map of (w/4) ⁇ d Compress to dimension. Further, the first extraction unit 15b sets the number of filters for the first time to 64 and the number of filters for the second time to 128, and performs convolution, thereby performing feature extraction in the feature amount direction. Note that the feature extraction processing in the feature amount direction by the first extraction unit 15b will be described later in [Details of processing] (2. Feature extraction processing).
- the second extraction unit 15c extracts features in the time direction in a predetermined time interval from the features in the feature amount direction. For example, the second extraction unit 15c performs one-dimensional convolution on each feature amount of the features in the feature amount direction, and extracts the features in the time direction. The second extraction unit 15c also outputs a second feature map (feature map 2) as a feature in the time direction.
- a second feature map feature map 2
- the second extraction unit 15c performs one-dimensional convolution on the d-dimensional first feature amount map so as to use all the d-dimensional feature amounts, so that the entire input data is feature extraction in the time direction. Note that the feature extraction processing in the time direction by the second extraction unit 15c will be described later in [Details of processing] (2. Feature extraction processing).
- the calculation unit 15d calculates an anomaly score at a predetermined point in time based on the features in the feature amount direction and the features in the time direction, and calculates the degree of contribution of the feature amount direction and the degree of contribution in the time direction to the anomaly score before the predetermined point of time. Calculate For example, the calculation unit 15d calculates the anomaly score at a predetermined point in time, and calculates the contribution in the feature quantity direction and the contribution in the time direction, using an unsupervised learning model.
- the calculation unit 15d performs learning using a loss function composed of a penalty for at least one of the prediction error for the anomaly score, the contribution in the feature value direction, and the contribution in the time direction.
- a loss function composed of a penalty for at least one of the prediction error for the anomaly score, the contribution in the feature value direction, and the contribution in the time direction.
- the calculation unit 15d performs backpropagation using the predicted values on the final layer that has undergone convolution in the feature value direction, and calculates weights from the obtained gradient values. Then, the calculation unit 15d uses an activation function for a matrix obtained by multiplying the obtained weight and the first feature map to output the contribution in the feature quantity direction. The calculation unit 15d also performs backpropagation using the predicted values on the last layer that has undergone the convolution in the time direction, and calculates weights from the obtained gradient values. Then, the calculation unit 15d outputs the degree of contribution in the time direction by using an activation function for the matrix obtained by multiplying the obtained weight and the second feature map. Contribution degree calculation processing by the calculation unit 15d will be described later in [Details of processing] (3. Contribution degree calculation processing).
- Identification unit 15e When the identifying unit 15e detects an abnormality based on the abnormality score, the identifying unit 15e identifies the cause of the abnormality using the degree of contribution in the feature amount direction or the degree of contribution in the time direction. For example, the identifying unit 15e identifies the type of sensor as a feature that has influenced the anomaly score at the anomaly occurrence time using the degree of contribution in the direction of the feature amount. Further, the specifying unit 15e specifies the time that affected the anomaly score of the anomaly occurrence time using the degree of contribution in the time direction. Furthermore, the specifying unit 15 e may store the specified information in the storage unit 14 .
- FIG. 3 is a diagram illustrating an example of architecture of a learning model according to the first embodiment.
- the architecture of a learning model that outputs the anomaly score, the contribution in the feature value direction, and the contribution in the time direction from the same model will be described below.
- time-series anomaly detection is performed by using CNN to create a model that predicts actual measured values at a certain point in time using input data with d-dimensional feature values and w time windows.
- the certain point in time may be before or after k hours.
- two-stage feature extraction is performed by CNN on input data. That is, in this architecture, feature extraction in the feature amount direction (see (1) in FIG. 3) is performed in the first stage, and feature extraction in the time direction (see (2) in FIG. 3) is performed in the second stage. After that, in this architecture, we obtain a fully connected layer (see FIG. 3 (3)), output the predicted value y (see FIG. 3 (4)), and calculate the error (mean square error, etc.) of the measured value y By doing so, the abnormality score is calculated (see (5) in FIG. 3).
- FIG. 4 is a diagram illustrating an example of feature extraction processing according to the first embodiment. Below, the feature extraction process in the feature amount direction and the feature extraction process in the time direction will be described in this order.
- the abnormality detection device 10 performs two-dimensional convolution multiple times for each feature amount (see FIG. 4(1)).
- the anomaly detection device 10 obtains a c ⁇ d size feature quantity map 1 by transposing the matrix after performing the final two-dimensional convolution in feature extraction in the feature quantity direction (see FIG. 4(2) ).
- c must be a value smaller than the time window w.
- the filter size w' used for convolution must be w' ⁇ 1, and w' is restricted to 1 ⁇ w' ⁇ w.
- the number of filters used for convolution can be set to any value. For example, the anomaly detection device 10 sets the number of filters for the first time to 64 and the number of filters for the second time to 128, and performs convolution. Further, the anomaly detection device 10 may use half padding for convolution in the feature amount direction.
- the abnormality detection device 10 converts the feature amount map 1 obtained in the first stage into a one-dimensional map so as to use all d-dimensional feature amounts.
- time-direction feature extraction of the entire input data is performed (see FIG. 4(3)), and a feature quantity map 2 is obtained (see FIG. 4(4)).
- the filter size used in this convolution must be c'xd, and there is a limit of 1 ⁇ c' ⁇ c.
- the abnormality detection device 10 performs the first extraction process and the second extraction process described above, obtains a fully connected layer (see (5) in FIG. 4), and outputs the predicted value y ⁇ (see FIG. 4 (5)). (6)).
- the anomaly detection device 10 outputs a gradient value by back-propagating the value output from the learning model to the convolutional layer selected using the output value of the learning model, and performs global average pooling of the gradient value. Output the weights by calculating Then, the anomaly detection device 10 converts a matrix obtained by multiplying the feature map obtained from the selected convolutional layer and the obtained weight using an activation function (ReLU function, etc.) to calculate the degree of contribution. do.
- ReLU function activation function
- the anomaly detection apparatus 10 includes a feature map 1 (see FIG. 4(2)), which is the output of the layer subjected to feature extraction in the feature amount direction, and a feature map 1 (see FIG. 4B), which is the output of the layer subjected to feature extraction in the time direction.
- a contribution calculation process is performed using the predicted value y ⁇ k points ahead (k is an arbitrary variable), and the contribution in the feature quantity direction, the time Along with the contribution of the direction, the contribution to the predicted value k points ahead is output.
- the anomaly detection device 10 performs backpropagation using the prediction value ⁇ l on the final layer subjected to convolution in the direction of the feature amount, and calculates the weight by dividing the gradient value obtained here by c. Then, the anomaly detection device 10 outputs the degree of contribution by using an activation function for the matrix obtained by multiplying the obtained weight and the feature quantity map 1 .
- the degree of contribution to the feature quantity map 1 is c ⁇ d dimensions, and the output contribution degree cannot be interpreted because the dimensions do not match the input data.
- the anomaly detection device 10 changes the size from (w/4) ⁇ d dimensions to w ⁇ d dimensions and outputs the contribution in the feature amount direction.
- the anomaly detection device 10 performs backpropagation using the predicted value ⁇ l on the final layer that has undergone the convolution in the time direction, and calculates the weight by dividing the gradient value obtained here by n. Then, the anomaly detection device 10 outputs the degree of contribution by using an activation function for the matrix obtained by multiplying the obtained weight and the feature quantity map 2 .
- the contribution of the anomaly detection device 10 to the feature quantity map 2 is n ⁇ m dimensions, and the size does not match the time window w of the input data. output the contribution of
- the L feature that configures the loss function Loss is expressed as the following equation (3).
- the contribution penalty (L feature , L time ) adds regularization so that the contribution approaches 0 during learning. An effect is expected in which the degree of contribution to samples is small and the degree of contribution to abnormal samples is large.
- the above equation (1) of the loss function Los does not necessarily include a penalty for the degree of contribution, and only the penalty L ad for the prediction error or the penalty for either one of the degrees of contribution (L feature , L time ). may contain.
- the above expressions (3) and (4) of the penalty for contribution are not limited to these as long as they are regularizations that produce similar effects.
- FIG. 5 is a diagram showing an example of learning data according to the first embodiment.
- FIG. 6 is a diagram showing an example of evaluation data according to the first embodiment.
- both learning data and evaluation data are generated according to the same rules, and there is no difference. That is, as shown in FIG. 5, the learning data shows a waveform without large fluctuations in the entire interval. Also, as shown in FIG. 6B, even the evaluation data shows the same waveform as the learning data in the normal dimension.
- learning data is generated by combining trigonometric functions and uniform distribution.
- the evaluation data is generated according to the same rule as the learning data, and a significantly large value is periodically added to create a pseudo abnormal state. That is, as shown in FIG. 6(1), data is generated so that an abnormal waveform appears periodically.
- all rectangular portions indicated by hatching are treated as abnormal sections.
- FIG. 7 to 9 are diagrams showing an example of data processing according to the first embodiment.
- time-series data is extracted by time window, converted into a data format that can be input to the model, and labeled.
- time window w 20 in FIG. 7, it is not particularly limited.
- an abnormal label is given.
- FIG. 8(1) when no abnormal value is included even for one time, a normal label is given.
- FIG. 8(2) when an abnormal value is included for several hours, an abnormal label is given.
- FIG. 8(3) when all times are abnormal values, an abnormal label is given because it is naturally abnormal.
- FIG. 10 is a diagram illustrating an example of the flow of evaluation processing for abnormality detection accuracy according to the first embodiment.
- 11 and 12 are diagrams illustrating an example of an abnormality detection accuracy evaluation process according to the first embodiment.
- FIG. 10 the flow of calculating and evaluating anomaly detection accuracy by comparing the anomaly label/normal label of the evaluation data with the anomaly determination result will be described.
- the learning process first, normal learning data is input and the learning model is learned (see FIG. 10(1)). Next, the learning model calculates an anomaly score and outputs an anomaly score within the normal range (see FIG. 10(2)). A threshold is then determined using the output anomaly score. Note that determination of the threshold will be described later with reference to FIG. 11 .
- the evaluation process first, the evaluation data is input to the learning model and prediction is performed (see Fig. 10 (4)). Next, the learning model calculates an anomaly score and outputs the anomaly score (see FIG. 10(5)). Then, the output abnormality score is compared with the determined threshold value (see FIG. 10(6)), and the abnormality or normality is determined, and the determination result is output (see FIG. 10(7)). Finally, the accuracy of abnormality detection is evaluated by making a correct/wrong judgment based on the label of the evaluation data and the judgment result (see (8) in FIG. 10). In addition, the precision rate, recall rate, F1 score, and ROC-AUC (Receiver Operating Characteristic-Area Under the Curve) will be used as evaluation indicators, and the average value of 5 trials will be calculated in the numerical calculation. .
- the precision rate, recall rate, F1 score, and ROC-AUC Receiveiver Operating Characteristic-Area Under the Curve
- the determination of the threshold value and the determination of abnormality by the threshold value will be described with reference to FIGS. 11 and 12.
- FIG. 11 the threshold used for abnormality determination, the abnormality score is calculated for all learning data, and the 95% tile value is set as the threshold (see FIG. 11). On the other hand, if the determined threshold value is exceeded, the evaluation data is determined to be abnormal (see FIG. 12).
- FIG. 13 is a diagram illustrating an example of the flow of contribution degree evaluation processing according to the first embodiment.
- 14 to 16 are diagrams showing an example of contribution degree evaluation processing according to the first embodiment.
- FIG. 13 the flow of contribution degree evaluation will be explained.
- the evaluation data is input to the learning model and prediction is performed (see FIG. 13(1)).
- the learning model calculates the degree of contribution, and outputs the degree of contribution in the feature value direction and the time direction (see FIG. 13(2)).
- the maximum value of the output contribution is calculated and a histogram is drawn (see FIG. 13(3)).
- the contribution is evaluated from the label of the evaluation data and the drawn histogram (see FIG. 13(4)).
- FIG. 14 the process of calculating the contribution in the feature amount direction and then calculating the maximum value will be described.
- the numerical values calculated as the contribution in the direction of the feature amount are shown in tabular form.
- the maximum value "7.6" is output when drawing the histogram of the degree of contribution in the feature amount direction.
- FIG. 15 the process of calculating the contribution in the time direction and then calculating the maximum value will be described.
- numerical values calculated as contributions in the time direction are shown in tabular form.
- the maximum value "6.8" is output when plotting the contribution degree histogram in the time direction.
- a maximum histogram of abnormal labels is drawn from a plurality of contributions in the time direction to which abnormal labels are assigned (see (1) and (2) in FIG. 16).
- the maximum value histogram of the anomaly labels to be drawn has a contribution degree distribution with heavy tails. That is, in the case of an anomaly, a high contribution to the cause of the anomaly should be obtained.
- a maximum value histogram of normal labels is drawn from a plurality of contributions in the time direction to which normal labels are assigned (see FIGS. 16(3) and 16(4)).
- the maximum value histogram of normal labels to be drawn has a contribution of 0. That is, in the case of normality, the contribution should be low because it is not the cause of abnormality.
- the maximum value histogram of the abnormal labels and the maximum value histogram of the normal labels are compared to evaluate whether the degree of contribution is appropriately reflected (see (2) and (3) of FIG. 16).
- FIGS. 17 to 20 are diagrams showing examples of evaluation results of the effectiveness of the learning model architecture according to the first embodiment.
- FIG. 20 is a diagram explaining an evaluation result of effectiveness of the learning model architecture according to the first embodiment. Below, the evaluation results of the anomaly detection accuracy and the contribution evaluation results will be described in this order.
- the AUC is 0.885 (average of 5 trials), and the anomaly detection accuracy is effective, and it is judged that it can be fully used for anomaly detection. .
- 21 to 25 are diagrams showing examples of evaluation results of effectiveness of the learning model architecture according to the first embodiment. Below, the evaluation results of the anomaly detection accuracy and the contribution evaluation results will be described in this order.
- the architecture of the learning model by the method with regularization includes L feature (see Equation 3) and L time (see Equation 4) in addition to L ad (see Equation 2) using the mean squared error as the loss function Loss.
- the evaluation of the anomaly detection accuracy is performed based on (5-1-3. Evaluation of anomaly detection accuracy) described above. At this time, since regularization is an operation that makes optimization difficult, it is only necessary to confirm that the anomaly detection accuracy does not deteriorate.
- the learning model architecture with the above regularization method has an AUC of 0.948 (5-trial average), which exceeds the AUC of 0.885 based on the non-regularization method, and Adverse effects are assessed as non-existent.
- FIG. 22 is a maximum value histogram of normal labels drawn by the non-regularization method, and the maximum value of contribution is a value other than zero.
- FIG. 23 is a maximum value histogram of normal labels drawn by the method with regularization, and the maximum value of the degree of contribution is almost zero. Therefore, it can be seen that regularization increases the percentage of the maximum contribution value of 0 in the normal label maximum value histogram, avoiding the output of confusing anomaly causes.
- FIG. 24 is a maximum value histogram of anomalous labels drawn by the method without regularization, and the maximum value of the degree of contribution falls within the range of 0-10.
- FIG. 25 is the maximum value histogram of the abnormal labels drawn by the method with regularization. It is designed to take Therefore, due to regularization, the histogram of the maximum value of the anomaly label has a larger maximum value of the contribution compared to the method without regularization, which emphasizes the anomaly cause, i.e., makes it easier to identify the anomaly cause. I understand.
- FIG. 26 is a flowchart illustrating an example of the flow of overall processing according to the first embodiment. Below, while showing the flow of the whole abnormality detection process, the outline of each process is demonstrated.
- the acquisition unit 15a of the anomaly detection device 10 executes time-series data acquisition processing (step S101).
- the first extraction unit 15b of the abnormality detection device 10 executes feature extraction processing (first extraction processing) in the feature amount direction (step S102).
- the second extraction unit 15c of the anomaly detection device 10 also executes feature extraction processing (second extraction processing) in the time direction (step S103).
- the calculation unit 15d of the anomaly detection device 10 executes contribution degree calculation processing (step S104).
- the identification unit 15e of the abnormality detection device 10 executes abnormality cause identification processing (step S105), and ends the processing.
- the above steps S101 to S105 can also be performed in a different order. Also, some of the above steps S101 to S105 may be omitted.
- time-series data acquisition processing by the acquisition unit 15a First, time-series data acquisition processing by the acquisition unit 15a will be described.
- the acquisition unit 15a acquires time-series data of a detection target for detecting an abnormality.
- the first extraction unit 15b performs two-dimensional convolution for each feature amount a plurality of times, performs the last two-dimensional convolution in the feature extraction in the feature amount direction, and then transposes the matrix to obtain the feature amount Output map 1.
- the second extraction unit 15c performs one-dimensional convolution on the feature quantity map 1 output in the process of step S102 so as to use all the feature quantities, thereby extracting the time-direction feature of the entire input data. Extraction is performed, and a feature quantity map 2 is output.
- the calculation unit 15d outputs a gradient value by back-propagating the value output from the learning model to the convolution layer selected using the output value of the learning model, and then outputs the weight. . Then, the calculating unit 15d calculates a contribution by converting a matrix obtained by multiplying the feature quantity map output in the processing of steps S102 and S103 and the obtained weight using an activation function. At this time, the calculation unit 15d outputs the contribution in the feature quantity direction and the contribution in the time direction.
- the specifying unit 15e specifies the time and the feature amount considered to be the cause of the abnormality based on the contribution in the feature amount direction and the contribution in the time direction output in the process of step S104.
- time-series data of a detection target in which an anomaly is detected at a predetermined point in time is acquired, and from the time-series data, features in a time interval before the predetermined point in time are Extract features in the direction of quantity, extract features in the direction of time from the features in the direction of feature quantity, calculate an anomaly score at a given point in time based on the features in the direction of feature quantity and the features in the direction of time, and calculate the anomaly score
- the contribution in the feature quantity direction and the contribution in the time direction before a predetermined point in time are calculated. For this reason, in this process, in unsupervised anomaly detection, cause identification in consideration of time series is facilitated.
- an anomaly score at a predetermined point in time is calculated by an unsupervised learning model, and the contribution in the feature value direction and the contribution in the time direction are calculated.
- the cause of the abnormality is specified using the contribution in the feature value direction or the contribution in the time direction. Therefore, in the present process, in unsupervised anomaly detection, it is possible to easily identify the cause in consideration of the time series, and to identify the characteristic or the influence of time that is the cause.
- two-dimensional convolution is performed on each feature amount of time-series data to extract features in the feature amount direction.
- Dimensional convolution is performed to extract features in the time direction. Therefore, in the present process, in unsupervised anomaly detection, it is possible to easily and efficiently identify the cause in consideration of the chronological property.
- a loss function composed of a penalty for at least one of the prediction error for the anomaly score, the contribution in the feature value direction, and the contribution in the time direction is used.
- Anomaly scores are calculated by the unsupervised learning model that has been trained by the method, and the contribution in the feature value direction and the contribution in the time direction are calculated. Therefore, in the present process, in unsupervised anomaly detection, it is possible to easily and accurately identify the cause in consideration of the chronological property.
- each component of each device shown in the drawings according to the above embodiment is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawing.
- the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
- each processing function performed by each device may be implemented in whole or in part by a CPU and a program analyzed and executed by the CPU, or implemented as hardware based on wired logic.
- ⁇ program ⁇ It is also possible to create a program in which the processing executed by the anomaly detection device 10 described in the above embodiment is described in a computer-executable language. In this case, the same effects as those of the above embodiments can be obtained by having the computer execute the program. Further, such a program may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read by a computer and executed to realize processing similar to that of the above embodiments.
- FIG. 27 is a diagram showing a computer that executes a program.
- computer 1000 includes, for example, memory 1010, CPU 1020, hard disk drive interface 1030, disk drive interface 1040, serial port interface 1050, video adapter 1060, and network interface 1070. , and these units are connected by a bus 1080 .
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG.
- the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
- Hard disk drive interface 1030 is connected to hard disk drive 1090 as illustrated in FIG.
- Disk drive interface 1040 is connected to disk drive 1100 as illustrated in FIG.
- a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
- the serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG.
- Video adapter 1060 is connected to display 1130, for example, as illustrated in FIG.
- the hard disk drive 1090 stores an OS 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, the above program is stored in, for example, the hard disk drive 1090 as a program module in which instructions to be executed by the computer 1000 are described.
- the various data described in the above embodiments are stored as program data in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads the program modules 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes various processing procedures.
- program module 1093 and program data 1094 related to the program are not limited to being stored in the hard disk drive 1090. For example, they may be stored in a removable storage medium and read by the CPU 1020 via a disk drive or the like. . Alternatively, the program module 1093 and program data 1094 related to the program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.), and via the network interface 1070 It may be read by CPU 1020 .
- LAN Local Area Network
- WAN Wide Area Network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
Description
以下に、第1の実施形態(適宜、本実施形態)に係る異常検知システムの処理、従来技術と本実施形態との比較、異常検知装置10の構成、処理の詳細、処理の流れを順に説明し、最後に本実施形態の効果を説明する。
図1を用いて、本実施形態に係る異常検知システム(適宜、本システム)の処理を説明する。図1は、第1の実施形態に係る異常検知システムの一例を示す図である。本システムは、異常検知装置10を有する。なお、図1に示した異常検知システムには、複数台の異常検知装置10が含まれてもよい。
ここで、参考技術として一般的に行われる従来の異常検知処理に関連する技術について説明する。
次に、図2を用いて、本実施形態に係る異常検知装置10の構成を詳細に説明する。図2は、第1の実施形態に係る異常検知装置の構成例を示すブロック図である。異常検知装置10は、入力部11、出力部12、通信部13、記憶部14および制御部15を有する。
入力部11は、当該異常検知装置10への各種情報の入力を司る。例えば、入力部11は、マウスやキーボード等で実現され、当該異常検知装置10への設定情報等の入力を受け付ける。
出力部12は、当該異常検知装置10からの各種情報の出力を司る。例えば、出力部12は、ディスプレイ等で実現され、当該異常検知装置10に記憶された設定情報等を出力する。
通信部13は、他の装置との間でのデータ通信を司る。例えば、通信部13は、各通信装置との間でデータ通信を行う。また、通信部13は、図示しないオペレータの端末との間でデータ通信を行うことができる。
記憶部14は、制御部15が動作する際に参照する各種情報や、制御部15が動作した際に取得した各種情報を記憶する。ここで、記憶部14は、例えば、RAM(Random Access Memory)、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置等で実現され得る。なお、図2の例では、記憶部14は、異常検知装置10の内部に設置されているが、異常検知装置10の外部に設置されてもよいし、複数の記憶部が設置されていてもよい。
制御部15は、当該異常検知装置10全体の制御を司る。制御部15は、取得部15a、第1抽出部15b、第2抽出部15c、算出部15dおよび特定部15eを有する。ここで、制御部15は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)等の電子回路やASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路である。
取得部15aは、所定の時点における異常が検知される検知対象の時系列データを取得する。例えば、取得部15aは、時刻ごとに複数のセンサから送信されたセンサ値を含むデータを取得する。一方、取得部15aは、取得した時系列データを第1抽出部15bに出力する。また、取得部15aは、取得した時系列データを記憶部14に格納してもよい。
第1抽出部15bは、時系列データから、所定の時点以前の時間の区間における特徴量方向の特徴を抽出する。例えば、第1抽出部15bは、時系列データの各特徴量に2次元畳み込みを行い、特徴量方向の特徴を抽出する。また、第1抽出部15bは、特徴量方向の特徴として、第1の特徴量マップ(特徴量マップ1)を出力する。
第2抽出部15cは、特徴量方向の特徴から、所定の時間の区間における時間方向の特徴を抽出する。例えば、第2抽出部15cは、特徴量方向の特徴の各特徴量に1次元畳み込みを行い、時間方向の特徴を抽出する。また、第2抽出部15cは、時間方向の特徴として、第2の特徴量マップ(特徴量マップ2)を出力する。
算出部15dは、特徴量方向の特徴および時間方向の特徴に基づいて、所定の時点における異常スコアを算出するとともに、異常スコアに対する所定の時点以前における特徴量方向の寄与度および時間方向の寄与度を算出する。例えば、算出部15dは、教師なし学習モデルによって、所定の時点における異常スコアを算出するとともに、特徴量方向の寄与度および時間方向の寄与度を算出する。
特定部15eは、異常スコアに基づいて異常を検知した場合には、特徴量方向の寄与度または時間方向の寄与度を用いて異常の原因を特定する。例えば、特定部15eは、特徴量方向の寄与度を用いて、異常発生時刻の異常スコアに影響した特徴として、センサの種類を特定する。また、特定部15eは、時間方向の寄与度を用いて、異常発生時刻の異常スコアに影響した時刻を特定する。さらに、特定部15eは、特定した情報を記憶部14に格納してもよい。
図3~図25や数式等を用いて、本実施形態に係る処理の詳細を説明する。以下では、学習モデルのアーキテクチャの概要、特徴抽出処理、寄与度算出処理、損失関数、学習モデルの評価処理の順に説明する。
図3を用いて、本実施形態に係る学習モデルのアーキテクチャ(適宜、本アーキテクチャ)の概要について説明する。図3は、第1の実施形態に係る学習モデルのアーキテクチャの一例を示す図である。以下では、異常スコア、特徴量方向の寄与度、および時間方向の寄与度を同一のモデルから出力する学習モデルのアーキテクチャについて説明する。
図4を用いて、特徴抽出処理の詳細について説明する。図4は、第1の実施形態に係る特徴抽出処理の一例を示す図である。以下では、特徴量方向の特徴抽出処理、時間方向の特徴抽出処理の順に説明する。
1段階目の特徴量方向の特徴抽出処理(第1抽出処理)では、まず、異常検知装置10は、特徴量ごとに2次元畳み込みを複数回行う(図4(1)参照)。次に、異常検知装置10は、特徴量方向の特徴抽出における最後の2次元畳み込みを行った後に行列を転置することで、c×dサイズの特徴量マップ1を得る(図4(2)参照)。
2段階目の時間方向の特徴抽出処理(第2抽出処理)では、異常検知装置10は、1段階目で得た特徴量マップ1について、d次元のすべての特徴量を利用するように1次元畳み込みを行うことで、入力データ全体の時間方向の特徴抽出を行い(図4(3)参照)、特徴量マップ2を得る(図4(4)参照)。
特徴抽出処理に続く処理として、寄与度算出処理の詳細を説明する。以下では、寄与度算出処理の概要、特徴量方向の寄与度算出処理、時間方向の寄与度算出処理の順に説明する。
まず、異常検知装置10は、学習モデルの出力値を用いて選択した畳み込み層に対して、学習モデルから出力された値を逆伝播することで勾配値を出力し、その勾配値のGlobal Average Poolingを計算することで重みを出力する。そして、異常検知装置10は、選択した畳み込み層から得られた特徴量マップと得られた重みをかけ合わせた行列を、活性化関数(ReLU関数等)を用いて変換することによって寄与度を算出する。
異常検知装置10は、特徴量方向の畳み込みを行った最終層に対して予測値y^lを用いて逆伝播を行い、ここで得られた勾配値をcで割ることで重みを計算する。そして、異常検知装置10は、得られた重みと特徴量マップ1をかけ合わせた行列に対して、活性化関数を用いることで寄与度を出力する。
異常検知装置10は、時間方向の畳み込みを行った最終層に対して予測値y^lを用いて逆伝播を行い、ここで得られた勾配値をnで割ることで、重みを計算する。そして、異常検知装置10は、得られた重みと特徴量マップ2をかけ合わせた行列に対して、活性化関数を用いることで寄与度を出力する。
本実施形態に係る学習モデルの学習を行う損失関数の詳細を説明する。まず、損失関数Lossは、下記(1)式のように示される。
図5~図25を用いて、本実施形態に係る学習モデルの評価処理の詳細を説明する。以下では、学習モデルの評価処理の概要、正則化なし手法による評価処理、正則化あり手法による評価処理、学習モデルの有効性の順に説明する。なお、本実施形態に係る学習モデルの評価処理は、以下に説明する処理により限定されるものではない。
図5~図16を用いて、本実施形態に係る学習モデルの評価処理の概要を説明する。以下では、学習モデルに利用するデータの作成、データの加工、異常検知精度の評価、寄与度の評価の順に説明する。
図5および図6を用いて、学習モデルに利用するデータの作成について説明する。図5は、第1の実施形態に係る学習データの一例を示す図である。図6は、第1の実施形態に係る評価データの一例を示す図である。
図7~図9を用いて、学習モデルに利用するデータの加工について説明する。図7~図9は、第1の実施形態に係るデータの加工処理の一例を示す図である。
図10~図12を用いて、異常検知精度の評価について説明する。図10は、第1の実施形態に係る異常検知精度の評価処理の流れの一例を示す図である。図11および図12は、第1の実施形態に係る異常検知精度の評価処理の一例を示す図である。
図13~図16を用いて、寄与度の評価について説明する。図13は、第1の実施形態に係る寄与度の評価処理の流れの一例を示す図である。図14~図16は、第1の実施形態に係る寄与度の評価処理の一例を示す図である。
図17~図20を用いて、本実施形態に係る学習モデルの正則化なし手法による評価処理を説明する。図17~図19は、第1の実施形態に係る学習モデルのアーキテクチャの有効性の評価結果の一例を示す図である。図20は、第1の実施形態に係る学習モデルのアーキテクチャの有効性の評価結果を説明する図である。以下では、異常検知精度の評価結果、寄与度の評価結果の順に説明する。
まず、学習モデルのアーキテクチャの異常検知精度の有効性の評価結果について説明する。以下では、正則化なし手法による学習モデルのアーキテクチャの概要を説明した上で、有効性の評価結果について説明する。
次に、学習モデルのアーキテクチャの寄与度の有効性の評価結果について説明する。以下では、時間方向の寄与度に基づき、異常ラベルの最大値ヒストグラムの評価、正常ラベルの最大値ヒストグラムの評価の順に説明する。
図21~図25を用いて、本実施形態に係る学習モデルの正則化あり手法による評価処理を説明する。図21~図25は、第1の実施形態に係る学習モデルのアーキテクチャの有効性の評価結果の一例を示す図である。以下では、異常検知精度の評価結果、寄与度の評価結果の順に説明する。
まず、学習モデルのアーキテクチャの異常検知精度の有効性の評価結果について説明する。以下では、正則化あり手法による学習モデルのアーキテクチャの概要を説明した上で、有効性の評価結果について説明する。
次に、学習モデルのアーキテクチャの寄与度の有効性の評価結果について説明する。以下では、時間方向の寄与度に基づき、正常ラベルの最大値ヒストグラムの評価、異常ラベルの最大値ヒストグラムの評価の順に説明する。
以上より、本実施形態に係る学習モデルのアーキテクチャは、異常検知に利用することができる性能をもつことが判断できる。また、本実施形態に係る学習モデルの学習に用いる損失関数に正則化を行うことにより、異常原因特定が容易になる。
図26を用いて、本実施形態に係る処理の流れを詳細に説明する。図26は、第1の実施形態に係る処理全体の流れの一例を示すフローチャートである。以下では、異常検知処理全体の流れを示すとともに、各処理の概要を説明する。
まず、異常検知装置10の取得部15aは、時系列データ取得処理を実行する(ステップS101)。次に、異常検知装置10の第1抽出部15bは、特徴量方向の特徴抽出処理(第1抽出処理)を実行する(ステップS102)。また、異常検知装置10の第2抽出部15cは、時間方向の特徴抽出処理(第2抽出処理)を実行する(ステップS103)。続いて、異常検知装置10の算出部15dは、寄与度算出処理を実行する(ステップS104)。最後に、異常検知装置10の特定部15eは、異常原因特定処理を実行し(ステップS105)、処理を終了する。なお、上記のステップS101~S105は、異なる順序で実行することもできる。また、上記のステップS101~S105のうち、省略される処理があってもよい。
第1に、取得部15aによる時系列データ取得処理について説明する。この処理では、取得部15aは、異常を検知する検知対象の時系列データを取得する。
第1に、上述した本実施形態に係る異常検知処理では、所定の時点における異常が検知される検知対象の時系列データを取得し、時系列データから、所定の時点以前の時間の区間における特徴量方向の特徴を抽出し、特徴量方向の特徴から時間方向の特徴を抽出し、特徴量方向の特徴および時間方向の特徴に基づいて、所定の時点における異常スコアを算出するとともに、異常スコアに対する所定の時点以前における特徴量方向の寄与度および時間方向の寄与度を算出する。このため、本処理では、教師なし異常検知において、時系列性を考慮した原因特定を容易にする。
上記実施形態に係る図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のごとく構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、CPUおよび当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。
また、上記実施形態において説明した異常検知装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。この場合、コンピュータがプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。
11 入力部
12 出力部
13 通信部
14 記憶部
15 制御部
15a 取得部
15b 第1抽出部
15c 第2抽出部
15d 算出部
15e 特定部
20 時系列データ
Claims (6)
- 所定の時点における異常が検知される検知対象の時系列データを取得する取得部と、
前記時系列データから、前記所定の時点以前の時間の区間における特徴量方向の特徴を抽出する第1抽出部と、
前記特徴量方向の特徴から、前記時間の区間における時間方向の特徴を抽出する第2抽出部と、
前記特徴量方向の特徴および前記時間方向の特徴に基づいて、所定の時点における異常スコアを算出するとともに、前記異常スコアに対する前記所定の時点以前における特徴量方向の寄与度および時間方向の寄与度を算出する算出部と、
を備えることを特徴とする異常検知装置。 - 前記算出部は、教師なし学習モデルによって、所定の時点における異常スコアを算出するとともに、前記特徴量方向の寄与度および前記時間方向の寄与度を算出し、
前記異常スコアに基づいて前記異常を検知した場合には、前記特徴量方向の寄与度または前記時間方向の寄与度を用いて前記異常の原因を特定する特定部を、
さらに備えることを特徴とする請求項1に記載の異常検知装置。 - 前記第1抽出部は、前記時系列データの各特徴量に2次元畳み込みを行い、前記特徴量方向の特徴を抽出し、
前記第2抽出部は、前記特徴量方向の特徴の前記各特徴量に1次元畳み込みを行い、前記時間方向の特徴を抽出する、
ことを特徴とする請求項1または2に記載の異常検知装置。 - 前記算出部は、前記異常スコアに関する予測誤差、前記特徴量方向の寄与度、および前記時間方向の寄与度のうち少なくとも1つに対するペナルティから構成される損失関数を用いて学習を行った前記教師なし学習モデルによって、前記異常スコアを算出するとともに、前記特徴量方向の寄与度および前記時間方向の寄与度を算出する、
ことを特徴とする請求項1から3のいずれか1項に記載の異常検知装置。 - 異常検知装置によって実行される異常検知方法であって、
所定の時点における異常が検知される検知対象の時系列データを取得する取得工程と、
前記時系列データから、前記所定の時点以前の時間の区間における特徴量方向の特徴を抽出する第1抽出工程と、
前記特徴量方向の特徴から、前記時間の区間における時間方向の特徴を抽出する第2抽出工程と、
前記特徴量方向の特徴および前記時間方向の特徴に基づいて、所定の時点における異常スコアを算出するとともに、前記異常スコアに対する前記所定の時点以前における特徴量方向の寄与度および時間方向の寄与度を算出する算出工程と、
を含むことを特徴とする異常検知方法。 - コンピュータを請求項1から4のいずれか1項に記載の異常検知装置として機能させるための異常検知プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/023416 WO2022269690A1 (ja) | 2021-06-21 | 2021-06-21 | 異常検知装置、異常検知方法および異常検知プログラム |
JP2023529217A JPWO2022269690A1 (ja) | 2021-06-21 | 2021-06-21 | |
US18/570,076 US20240272976A1 (en) | 2021-06-21 | 2021-06-21 | Abnormality detection device, abnormality detection method, and abnormality detection program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/023416 WO2022269690A1 (ja) | 2021-06-21 | 2021-06-21 | 異常検知装置、異常検知方法および異常検知プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022269690A1 true WO2022269690A1 (ja) | 2022-12-29 |
Family
ID=84545562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/023416 WO2022269690A1 (ja) | 2021-06-21 | 2021-06-21 | 異常検知装置、異常検知方法および異常検知プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240272976A1 (ja) |
JP (1) | JPWO2022269690A1 (ja) |
WO (1) | WO2022269690A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020107248A (ja) * | 2018-12-28 | 2020-07-09 | トヨタ自動車株式会社 | 異常判定装置および異常判定方法 |
JP2020149601A (ja) * | 2019-03-15 | 2020-09-17 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | データ処理装置、データ処理方法及びデータ処理プログラム |
US20210056430A1 (en) * | 2019-08-23 | 2021-02-25 | Accenture Global Solutions Limited | Intelligent time-series analytic engine |
-
2021
- 2021-06-21 US US18/570,076 patent/US20240272976A1/en active Pending
- 2021-06-21 WO PCT/JP2021/023416 patent/WO2022269690A1/ja active Application Filing
- 2021-06-21 JP JP2023529217A patent/JPWO2022269690A1/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020107248A (ja) * | 2018-12-28 | 2020-07-09 | トヨタ自動車株式会社 | 異常判定装置および異常判定方法 |
JP2020149601A (ja) * | 2019-03-15 | 2020-09-17 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | データ処理装置、データ処理方法及びデータ処理プログラム |
US20210056430A1 (en) * | 2019-08-23 | 2021-02-25 | Accenture Global Solutions Limited | Intelligent time-series analytic engine |
Also Published As
Publication number | Publication date |
---|---|
US20240272976A1 (en) | 2024-08-15 |
JPWO2022269690A1 (ja) | 2022-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108549954B (zh) | 风险模型训练方法、风险识别方法、装置、设备及介质 | |
CN112000081B (zh) | 基于多块信息提取和马氏距离的故障监测方法及系统 | |
Chadha et al. | Time series based fault detection in industrial processes using convolutional neural networks | |
Shajihan et al. | CNN based data anomaly detection using multi-channel imagery for structural health monitoring | |
JP2019105871A (ja) | 異常候補抽出プログラム、異常候補抽出方法および異常候補抽出装置 | |
CN117155706B (zh) | 网络异常行为检测方法及其系统 | |
CN117578715A (zh) | 一种电力运维智能监测预警方法、系统及存储介质 | |
CN112416662A (zh) | 多时间序列数据异常检测方法与装置 | |
CN113360656A (zh) | 异常数据检测方法、装置、设备及存储介质 | |
CN112613617A (zh) | 基于回归模型的不确定性估计方法和装置 | |
CN117668684A (zh) | 基于大数据分析的电网电能数据异常检测方法 | |
KR102622895B1 (ko) | 지도 학습 모델 및 비지도 학습 모델의 앙상블 구조를 이용한 대기질 데이터의 이상 판정 방법 및 시스템 | |
CN116610938A (zh) | 曲线模式分段的半导体制造无监督异常检测方法及设备 | |
US9696717B2 (en) | Apparatus and method of segmenting sensor data output from a semiconductor manufacturing facility | |
CN117951646A (zh) | 一种基于边缘云的数据融合方法及系统 | |
Wu et al. | Remaining useful life prediction of bearings with different failure types based on multi-feature and deep convolution transfer learning | |
WO2022269690A1 (ja) | 異常検知装置、異常検知方法および異常検知プログラム | |
Chen et al. | Reliability analysis using deep learning | |
CN111930728A (zh) | 一种设备的特征参数和故障率的预测方法及系统 | |
CN113919237B (zh) | 一种风机设备在线工况分割及故障诊断的方法 | |
Ramachandra | Causal inference for climate change events from satellite image time series using computer vision and deep learning | |
Jiang et al. | Data anomaly detection with automatic feature selection and deep learning | |
CN117349770B (zh) | 一种结构健康监测多应变传感器数据异常检测与修复方法 | |
CN116625678B (zh) | 基于平均峭度反卷积网络的故障诊断方法及系统 | |
JP7118210B2 (ja) | 学習装置、抽出装置、学習方法、抽出方法、学習プログラムおよび抽出プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21946980 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023529217 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18570076 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21946980 Country of ref document: EP Kind code of ref document: A1 |