WO2011036809A1

WO2011036809A1 - Abnormality identification system and method thereof

Info

Publication number: WO2011036809A1
Application number: PCT/JP2009/066806
Authority: WO
Inventors: 研植野; クマルトポンポール
Original assignee: 株式会社東芝
Priority date: 2009-09-28
Filing date: 2009-09-28
Publication date: 2011-03-31

Abstract

Using data obtained by a plurality of sensors and accumulated in the past, an object being monitored is subjected to precise identification of abnormality. An abnormality identification system is provided with: a waveform segmentation unit which designates a plurality of segments to each of a plurality of variates and extracts a plurality of segment data being data relating to the plurality of segments; an evaluation unit which identifies, on the basis of each of the variates, each of the segments by the nearest method using the plurality of segment data extracted by the waveform segmentation unit, thereby selecting the best segment being one of the segments; and a calculation unit which calculates, on the basis of each of the variates, the conditional probabilities of normality and abnormality of the best segment according to the frequency of being identified to be normal and the frequency of being identified to be abnormal relating to each of the segments, and calculates the previous probabilities of normality and abnormality from the total of normal classes and the total of abnormal classes included in a plurality of pieces of exercise data.

Description

Abnormality judgment system and method

The present invention relates to an abnormality determination system and method.

With the recent spread of sensor networks, sensor data analysis technology is required in various scenes such as sensor fusion. However, it has been difficult to improve the determination performance in a determination apparatus using multivariate time series data.

Generally, the discrimination accuracy is good, and there are SVMs and neural networks, etc. that are used relatively frequently (see Patent No. 3624546). However, because they are difficult to construct and the grounds for judgment are difficult to understand, There was a problem that it might be unacceptable.

In the sensor fusion technology (Japanese Patent No. 3931879, Japanese Patent Laid-Open No. 2005-165421), it is possible to determine the abnormality of information of multiple sensor data using a probabilistic model. In this case, sensor data is discrete from continuous time series values. It is the mainstream to convert it into a value and handle it as categorical data.

Patent No. 3624546 Patent 3931879 JP 2005-165421 A JP 2007-64307 A

The present invention provides an abnormality determination system and method capable of performing abnormality determination on a monitoring target (or target object) with high accuracy using sensing data of a plurality of sensors (or sensor nodes) accumulated in the past. provide.

The abnormality determination system of the present invention represents a plurality of time-series data related to a plurality of variables obtained by observing a monitoring target with a plurality of sensors, and a state of the monitoring target when the plurality of time-series data are acquired. A data storage unit that stores a plurality of training data in which a normal class or an abnormal class is set as a set, a plurality of sections are specified for each of the plurality of variables, and each of the plurality of variables includes the plurality of A waveform dividing unit that extracts a plurality of segment data that is data of the plurality of sections from the plurality of time-series data included in the training data, and the waveform dividing unit extracts each of the plurality of variables. In addition, by performing determination by the nearest neighbor method for each of the plurality of sections using the plurality of segment data, the best which is one of the plurality of sections Based on the evaluation unit for selecting a section, the number of times each of the plurality of variables is determined to be normal for each of the plurality of sections, and the number of times determined to be abnormal, the normal and abnormal of the best section Calculating a conditional probability, and calculating a normal and abnormal prior probability from the total number of normal classes and the total number of abnormal classes included in the plurality of training data; and a normal probability and an abnormal prior probability Storing, for each of the plurality of variables, the identification information of the best interval, the segment data of the best interval, the class associated with the segment data, and the normal and abnormal conditional probabilities of the best interval; , A sensing unit for observing the monitoring target with a plurality of sensors and acquiring a plurality of time-series data regarding a plurality of variables, and the plurality of variables. A selection unit that selects segment data from the plurality of time-series data acquired by the sensing unit according to the best interval, and a selection unit that selects each of the plurality of variables. For the segment data, the uppermost predetermined number of segment data is detected by the nearest neighbor method using the segment data in the storage unit, and the normal class in the predetermined number of segment data is detected for each of the plurality of variables. And the respective ratios of the abnormal classes and the conditional probabilities of normality and abnormality in the storage unit, respectively, multiplying the multiplied values among the plurality of variables and multiplying the prior probabilities of normality and abnormality Calculate the likelihood of normal and abnormal, and the likelihood of normal and abnormal And a determination unit that determines a state of the monitored object on the greater.

The abnormality determination system of the present invention includes a plurality of first labels respectively indicating whether sensor data observed by a plurality of sensor nodes monitoring a target object is abnormal or normal, and whether the state of the target object is normal or normal. A first database that stores a plurality of training data including a second label to indicate, and (A-1) mapping using a coding method that specifies mapping of the presence or absence of each of the plurality of sensor nodes to a bit string A plurality of candidate solutions are generated by performing a plurality of times at random, and (A-2) a fitness evaluation of each of the plurality of candidate solutions with respect to the first database and a candidate solution selected based on the fitness The generation of new candidate solutions by crossover and mutation operations is repeated according to the genetic algorithm A decision fusion rule learning unit that determines an optimal candidate solution having optimal fitness and identifies a sensor node on which a bit is set in the optimal candidate solution, and (B-1) an observation by the specified sensor node. Whether or not the sensor data is abnormal or normal is determined using a classifier prepared in advance for the specified sensor node and determining whether the given sensor data is abnormal or normal. 2) When all the determination results for the specified sensor node indicate abnormality, the target object is determined to be abnormal, and when at least one of the determination results indicates normal, the target object is A general determination unit that determines normality, and the determination fusion rule learning unit includes the plurality of candidate solutions. For each of the plurality of training data, the first label of the sensor node having a bit set in the candidate solution is detected for each of the plurality of training data, and normality and abnormality indicated by the detected first label Selecting a larger state of the plurality of training data, and calculating a ratio at which the state selected for each of the plurality of training data matches the state indicated by the second label of each of the plurality of training data. Features.

According to the present invention, it is possible to perform an abnormality determination on a monitoring target (or target object) with high accuracy using data of a plurality of sensors (or sensor nodes) accumulated in the past.

1 shows a configuration of an abnormality determination system according to a first embodiment of the present invention. The flow of the training learning process by the server is shown. The detailed processing flow of a pseudo judgment evaluation process is shown. Details of step S205 of FIG. 3 are shown. The process added in various modifications is shown. The example of the training data set in a training data storage part is shown. An example of waveform amplitude is shown. The example of the feature vector after converting into a power spectrum is shown. An example of waveform division is shown. Another example of waveform division is shown. Another example of waveform division will be described. The example of a division | segmentation in the case of a power spectrum is shown. An example of division of training data is shown. The other division example of training data is shown. An example of finding nearest neighbor segment data is shown. The state of performing the nearest neighbor calculation is shown. An example of a score table is shown. The example of data (judgment model) in the best model storage part is shown. The process of a 1st modification is shown. The process of the 2nd modification is shown. The process of the 3rd modification is shown. The process of the 4th modification is shown. An example of the determination model according to the fifth modification will be shown. 14 shows a concept of a model formula according to a fifth modification. An example of conditional probability calculation is shown. The state of performing the nearest neighbor calculation is shown. The format example of a frequency distribution table is shown. A state in which the determination target data is scanned is shown. The state of cutting out data (waveform) is shown. The example of a display screen of the part of abnormality determination among the determination object data is shown. The operation | movement flow in a client is shown. 2 shows an example of a hardware configuration for realizing a server and a client. An example of applying a segment template is shown below. An example of operation | movement of the client which concerns on a modification is shown. 2 shows an overall configuration of a remote monitoring system according to a second embodiment. An example of a format of sensor data is shown. An example of the sensor data extracted in order to build a single channel abnormality judgment model is shown. We show how an optimal threshold is learned using a method based on C4.5. A typical genetic algorithm processing flow for solving the problem is shown. An example of optimal segmentation of feature regions in a training waveform is shown using a genetic algorithm. Fig. 5 shows an example of a format of extracted sensor data used to construct a decision fusion rule. The example of the database of a decision fusion rule in case one line is equivalent to one decision fusion rule is shown. The example of the database of a decision fusion rule in case one rule consists of many decision fusion rules is shown. An example of converting a classification rule into a number of decision fusion rules is shown. An example of coding in a genetic algorithm for constructing a classification rule is shown. An example of the processing flow of a genetic algorithm for constructing a classification rule from a sensor data set is shown. An example of generating offspring using crossover and mutation in a genetic algorithm is shown. An example of evaluating candidate classification rules in a genetic algorithm is shown. An example of coding based on S expression in genetic programming is shown. An example of tree-based coding in genetic programming is shown. An example of a process flow of genetic programming for constructing a classification rule from sensor data is shown. An example of classification rule evaluation in genetic programming is shown. An example of generating offspring using crossover and mutation in genetic programming is shown. The flow of processing inside the gateway is shown. An example of determination of abnormality in data from a sensor node (test waveform) is shown. Fig. 5 shows an example of the format of data sent to a server at a remote monitoring site when the sensor node status matches at least one of the decision fusion rules. An example of the format of data sent to the server at the remote monitoring site when the state of the sensor node does not match any of the decision fusion rules is shown. An example of the operation in the gateway is shown.

First embodiment

FIG. 1 shows a configuration of an abnormality determination system according to the first embodiment of the present invention.
This abnormality determination system includes a server (monitoring center device) and a client (remote monitoring terminal). The server performs training learning using past sensor data (time-series data) obtained by observation of the monitoring target and a class (abnormal or normal) that identifies the status of the monitoring target at the time of acquisition of the sensor data. As a result, a determination model for determining new sensor data is generated. The client observes the monitoring target, acquires sensor data, and determines whether the monitoring target is normal or abnormal using the acquired sensor data and the determination model.

(server)
FIG. 2 is a flowchart showing the flow of the training learning process by the server.

First, the server reads various parameters set by the user (S101). For example, parameters such as the maximum waveform division number Z_max (used in step S106) are read. Reading is performed from a recording medium such as a memory or a hard disk.

Next, by performing initial setting, the parameter z of the number of waveform divisions is set to 0 (S102).

Next, the training data input unit 12 reads the training data set from the training data storage unit 11 and inputs it to the waveform preprocessing unit 13 at the next stage (S103).

Fig. 6 shows an example of a training data set in the training data storage unit 11.

Each training data is composed of at least one type of time series data (sensor data) and a class. Here, the class is a determination result obtained when a maintenance person or the like determines the state of the target device (monitoring target) when the corresponding time-series data is acquired in the past. Classes are, for example, abnormal and normal. However, there may be multiple types of abnormal states such as abnormal type A and abnormal type B. Here, in order to make the explanation easy to understand, the case where there are two classes of normal and abnormal will be described. In the illustrated example, the training data includes time-series data of four variables (channels). The class of training data d ₁ to d _N is normal, and the class of training data d _{N + 1} to d _M is abnormal. The time series data of the four variables are obtained from the corresponding four sensors. Here, for simplification of explanation, it is assumed that the time series data has the same size (length in the time axis direction), but the size may be different for each variable (channel).

Next, the waveform preprocessing unit 13 preprocesses each time series data included in the training data set (S104). As preprocessing, a feature vector such as an amplitude spectrum may be acquired by performing signal processing such as power spectrum conversion by FFT, short-time Fourier transform, and wavelet transform. Alternatively, waveform amplitude values at a plurality of predetermined times may be acquired. FIG. 7 shows examples of waveform amplitudes acquired at a plurality of predetermined times. FIG. 8 shows an example of the feature vector after conversion to the power spectrum. As preprocessing, the waveform may be further processed using a low-frequency pass filter (smoothing filter). This is effective when noise is added to the waveform amplitude or when it is desired to grasp the general characteristics of the waveform. Alternatively, only a waveform in a limited band may be extracted by a filter. Alternatively, various waveform approximation calculations such as line segment approximation, Chebyshev approximation, and APCA approximation may be performed as described in Non-Patent Document 1, for example. It is also possible to proceed to the next process without performing any pre-processing. In the following description, for ease of understanding, description will be made using the time-series data of FIG. 6 that has not undergone preprocessing.

Next, in steps S105 to S110, the time-series data is divided into a plurality of sections by the waveform division number z while sequentially increasing the waveform division number z from 1 to the maximum waveform division number z_max, and the waveform division number z (1 ~ Z_max), an important interval for each variable is determined. Then, the important section of each variable at the time of the waveform division number at which the highest evaluation value is obtained is determined as the optimum section. In addition, the data portion (segment data) of each optimum section in each time series data of the training data set is stored in association with the corresponding class. Details of steps S105 to S110 will be described below.

In step S105, the server increments the waveform division number z by one.

In step S106, the server determines whether or not the waveform division number z exceeds the maximum division number z_max. If it exceeds, the process proceeds to step S111. If not, the process proceeds to step S107.

In step S107, the waveform division unit 14 divides each time series data of the training data set on the time axis by the number of waveform divisions z and cuts out segment data. Here, the division method is simple and is divided so that the division width is equal. However, another division method may be used. The extracted segment data is stored in the segment storage unit 15.

An example of waveform division is shown in FIG. 9 (when z = 1), FIG. 10 (when z = 2), and FIG. 11 (when z = 4). Note that no division is actually performed when z = 1. Each segment data (partial time series data) cut out in this way is stored in the segment storage unit 15 in association with the value of z, the training data ID, and the variable ID. When the pre-processed time-series data (feature vector) is a power spectrum, the data may be divided along the frequency axis direction as shown in FIG. Here, the case of z = 3 is shown, and the frequency band is divided into three. In the present invention, dividing time series data means dividing the time series data along the frequency axis direction when the time series data is converted into a power spectrum.

Next, in step S108, a pseudo determination evaluation process is performed by the pseudo determination evaluation unit 17, the probability likelihood calculation unit 16, and the best model selection unit 19.

FIG. 3 is a flowchart showing a detailed process flow of the pseudo judgment evaluation process (S108).

First, the pseudo judgment evaluation unit 17 divides the training data set (segmented) into a plurality of divided sets, and labels the plurality of divided sets with 1 to Vmax (S201). One divided set may consist of a single piece of training data or a plurality of pieces of training data. When one divided set consists of one training data, the training data set is divided into the total number of training data, and therefore Vmax matches the total number of training data. In the following, for simplicity of explanation, it is assumed that one divided set consists of one training data unless otherwise specified.

Next, by performing initialization, the divided set identifier v = 0 and the evaluation value q = 0.0 are set (S202).

Next, v is incremented by 1 (S203).

Next, the pseudo-judgment evaluation unit 17 selects, as a pseudo-judgment target data set Tv, the one indicated by the identifier v among the plurality of split sets divided in step S201. That is, a plurality of divided sets are divided into a pseudo determination target data set Tv and other divided sets. FIG. 13 shows an example when v = 1, and FIG. 14 shows an example when v = 2. As described above, since each of the plurality of divided sets includes one piece of training data, the pseudo determination target data set Tv includes one piece of training data. Therefore, hereinafter, the pseudo-determination target data Tv refers to a case where the pseudo-determination target data set Tv includes one piece of training data.

Next, the pseudo judgment evaluation unit 17 performs a modeling process using Leave Cross Validation by training learning, and obtains an evaluation value r (S205). In this process, the class of the pseudo determination target data Tv (determination result) is simulated and the remaining training data is used to estimate the class of the pseudo determination target data Tv. The evaluation value r is obtained by calculating whether the estimated result matches the actual class of the pseudo determination target data Tv. In particular, Leave-One-out Cross Validation (only one training data is included in one divided set) is effective when the training data is decimal. However, when there is only one training data to be included in the divided set, there is a problem that the pseudo judgment process takes too much time.If you want to avoid this problem, include multiple training data in one divided set and add one of the divided sets. It is only necessary to select the pseudo judgment target set and repeat the evaluation so that all the subsets become the pseudo judgment target set once. This is an evaluation method generally called Cross Validation. In this example, it is assumed that the training set includes one piece of training data as described above.

In step S207, the pseudo determination evaluation unit 17 adds the evaluation value r to the evaluation value q.

In step S208, the pseudo determination evaluation unit 17 determines whether or not the divided set identifier v exceeds Vmax. That is, it is determined whether each training data of the training data set is selected as the pseudo determination target data Tv (whether each of the plurality of divided sets is selected as the pseudo determination target data set Tv). When it does not exceed Vmax, the process returns to step S203, and when it exceeds, the process proceeds to step S209.

In step S209, the pseudo determination evaluation unit 17 calculates the pseudo correct answer rate Gz (average evaluation value) by dividing the evaluation value q by v_max which is the number of evaluations (number of divided sets). Accordingly, one pseudo correct answer rate Gz (average evaluation value) is obtained corresponding to one waveform division number z.

Step S210 (conditional probability calculation) and step S211 (important segment determination) will be described later.

Returning to FIG. 2, in step S109, the pseudo judgment evaluation unit 17 determines whether the pseudo correct answer rate Gz calculated in step S209 is smaller than the pseudo correct answer rate Gz-1 when the previous waveform division number z-1 or not. Determine whether or not. When Gz is equal to or greater than Gz−1, it is determined that there is a possibility that a higher pseudo correct answer rate may be obtained, and the process returns to step S105, the waveform division number z is incremented by 1, and the same procedure is repeated. On the other hand, when the pseudo correct answer rate Gz is smaller than Gz-1, it is determined that a pseudo correct answer rate larger than this cannot be obtained, and the process proceeds to step S110.

Here, prior to description of steps S110 and S111, details of step S205 (modeling processing by training learning) in FIG. 3 will be described.

FIG. 4 is a flowchart showing details of step S205 in FIG. Here, a case where the number of waveform divisions z = 4 will be described as an example.

First, in step S301, the probability / likelihood calculation unit 16 calculates the normal and abnormal occurrence probabilities based on the classes of training data in the training data set (segmented by z = 4) as prior probabilities p. Calculate as (Ci). For example, if the size of the training data set is 200, there are 140 normal classes and 60 abnormal classes, normal prior probability p (C ₁ = normal) = 0.7, abnormal prior probability p (C ₂ = Abnormal) = 0.3 (see the upper left of FIG. 25). This step S301 may be performed only once, and the processing of this step may be skipped from the next time.

Next, in step S302, the pseudo determination evaluation unit 17 performs initialization and sets i indicating variable ID (channel ID) to 0 and j indicating section ID to 0.

Next, in step S303a, the pseudo judgment evaluation unit 17 increments channel i by 1, and in step S303b increments variable (channel) j by 1.

Next, in step S304, the pseudo determination evaluation unit 17 uses the k-nearest neighbor method proven in the time series data classification problem for the pseudo determination target data Tv to estimate the class of the pseudo determination target data Tv ( (Pseudo-judgment). In the k-nearest neighbor method, k cases closest to the pseudo judgment target are extracted in the feature space, and the class occupying the largest number among the classes of the k cases is selected as the pseudo judgment target. It is the determination method determined as an estimation class. This will be described in detail below.

For each segment data of each variable (each channel) in the pseudo judgment target data Tv, the k nearest segment data is the same variable from the remaining training data (remaining divided sets) other than the pseudo judgment target data Tv Find in. Focusing on variable (channel) 1 and segment s1 of pseudo judgment target data d _{N + 1 in} FIG. 15, the top k segments having the highest similarity (closest distance) to segment data s1 of pseudo judgment target data d ₁ An example is shown in which data s1 is found from time series data of variable (channel) 1 of the remaining training data. However, k = 5. In the illustrated example, training data _{_{_{d 13, d 14, d 15}}} , d 17, segment data s1 of variable 1 in d ₁₆ have been identified. Here, for the calculation of the distance between the segment data, a scale such as Dynamic Time Warping (DTW) distance or Euclidean distance may be used. Here, the distances from the training data d ₁₃ , d ₁₄ , d ₁₅ , d ₁₇ , d _{16 to} the variable 1 segment data s 1 are calculated as 3.5, 9.3, 12.9, 13.2, and 14.1, respectively. In the figure, dist (x, y) indicates the distance between the segment data x and the segment data y.

When the top k (= 5) segment data are specified in this way, the class with the largest number among the classes related to these segment data is specified. This is formulated as shown in Equation 1-1.

In the example of FIG. 15, all classes of training data d ₁₃ , d ₁₄ , d ₁₅ , d ₁₇ , and d ₁₆ are normal. That is, the frequency of abnormality and the frequency of normality are freq (abnormality, normal) = (0, 5). Therefore, the estimation result is determined to be normal according to the above equation 1-1. Here, the actual class pseudo determination target data d ₁ is abnormal. Therefore, this estimation result is an incorrect answer (error).

Next, in step S305, the pseudo judgment evaluation unit 17 updates the normal and abnormal frequency distribution tables for each variable and for each section based on the normal frequency and the abnormal frequency obtained in step S304. A format example of the frequency distribution table is shown in FIG. For example, the frequency distribution table is prepared for each waveform division number z. Initially, all items in the frequency distribution table are set to zero. In the above calculation example, 5 is added to the normal item of section s1 in the distribution table of channel 1 (upper left of FIG. 27), and nothing is added to the abnormal item.

Next, in step S306, the pseudo evaluation determination unit 17 updates the score table according to whether the estimation in step S304 is correct or incorrect. The score table stores a score for each combination of all channels and segments (sections) selected in the process of proceeding with pseudo judgment evaluation (see the upper diagram of FIG. 17 described later). The initial value of each square in the score table is 0. If the answer is correct, a predetermined score (1 in this case) is added to the corresponding cell. For example, as shown in FIG. 16, regarding the segment s2 of the variable (channel) ₁ in the pseudo determination target data d1, if the estimation result in step S304 is correct, it corresponds to the variable (channel) 1 and the segment 2 Add 1 to the score score (ch1, s2) of the cell. That is, score (ch1, s2) = 0 + 1. A score table exists for each waveform division number z.

In step S309, it is determined whether or not the segment j has reached jmax. If not, the process returns to step S303b to increment j and select the next segment. If reached, the process proceeds to the next step S310. FIG. 16 shows how the nearest neighbor calculation is performed in step S304 for the section s2 of the variable 1 (channel 1). Here, the estimation result is abnormal, and the pseudo-determination target data d _{N + 1} is also abnormal, so it is a correct answer.

In step S310, it is determined whether or not the variable (channel) i has reached imax. If not, the process returns to step S303a to select the next variable (channel). If reached, the next step S311 is performed. Proceed to FIG. 26 shows how the nearest neighbor calculation is performed in step S304 for section s2 of variable 3 (channel 3). Here, the estimation result is normal, and the pseudo determination target data d _{N + 1} is abnormal, so it is incorrect.

In step S311, the evaluation value of the pseudo judgment target data Tv is calculated. When the judgment result is abnormal and correct for at least one of the total 16 judgments (S304) performed for the segments s1 to s4 for each variable 1 to 4, the evaluation value r is 1.0, otherwise When it is 0.0. Alternatively, 1.0 may be used when the number of correct answers is greater than the number of incorrect answers, and 0.0 may be set when the number of correct answers is less than or equal to the number of incorrect answers. Alternatively, the ratio of the number of correct answers to the number of determinations may be set as the evaluation value r. When step S311 is completed, this flow is ended, and the process returns to step S207 in FIG.

Note that when a plurality of training data are included in one divided set, the processing of steps S302 to S310 may be performed for each training data, and thereafter, an evaluation value may be calculated according to the same criteria.

In step S207 of FIG. 3, the evaluation value r is added to q, and q is updated. Then, the process proceeds to selection of the next pseudo determination target data Tv, and the flow of FIG. 4 is performed in the same manner.

When all the training data (all divided sets) are selected once as the pseudo judgment target data (pseudo judgment target data set) Tv, the process proceeds to step S209, and the pseudo correct answer rate (average evaluation value) Gz = q / Calculate v_max. That is, the average of evaluation values among all training data (all divided sets) is calculated.

In the next step S210, the probability / likelihood calculation unit 16 performs normal and abnormal conditional probabilities p (X | C for each variable and each segment combination based on the frequency distribution table updated in step S305 of FIG. ). For example, for a set of variable 2 and segment s2, p (X2 = s2 | C) = (abnormal = 0.8, normal = 0.067) is calculated.

Here, p (X | C) is the probability of normalization when the variable X takes each segment under the condition of C, that is, C = abnormal and C = normal. This shows the probability of an abnormality. That is, the probability of normalizing the normal / abnormal distribution in each segment with C = normal and C = abnormal is p (X | C). If only P (X) is normal, the probability of normality exceeds the probability of abnormality, but if the probability of normal occurrence in P (C) is five times higher than the probability of occurrence of abnormality, this difference in prior probabilities Must be corrected to 1/5 to consider the conditional probability. Therefore, here, the probability in each segment normalized by p (C) is calculated as p (X | C). For each C (ie normal and abnormal) value, the probability is calculated for each C so that the sum of the probabilities in all the segment types to be taken is 1.

As an example, calculation examples of conditional probabilities for variable 2 and variable 3 are shown in the upper right corner of the center of FIG. If the illustrated frequency distribution f (X2 | C) is obtained for the variable 2, the conditional probability p (X2 | C) is calculated as shown in the table based on this frequency distribution. That is, for each normal and abnormal condition, the conditional probability is calculated by dividing the frequency of each variable by the total frequency. The same method is used for variable 3. Although not shown, conditional probabilities are similarly calculated for

variables

1 and 4. If the abnormal conditional probability is 0.1, the normal conditional probability is 0.9, and so on, this is normal compared to the abnormal conditional probability of 0.55 and normal conditional probability of 0.45. The probability is very high, and such a probability is close to certainty.

Next, proceed to step S211 and determine an important segment (important section) for each variable. That is, all the training data is selected as the pseudo determination target data Tv, and the flow of FIG. 4 is performed for each, so that a score table is finally obtained as shown in the upper part of FIG. The pseudo judgment evaluation unit 17 selects each important segment (important section) of each variable (channel) based on this score table, and records the information in the segment storage unit 15. Specifically, in the score table, the segment (section) with the highest score is selected as an important segment for each variable. For example, in the variable (channel) 1, since the segment s1 has the highest score, the segment s1 is selected as the important segment. Similarly, for the variables 2 to 4, segments s2, s2, and s4 are selected as important segments. A selection method when there are a plurality of segments having the same score and other selection methods will be described later.

If the important segment is determined in step S211, this flow is ended, and the process proceeds to step S109 in FIG.

In step S109, as described a little earlier, it is checked whether the pseudo correct answer rate Gz calculated in step S209 is smaller than the pseudo correct answer rate Gz-1 when the waveform division number is z-1. If it is smaller, the best model selection unit 19 stores the important segment for each variable selected when Gz-1 is obtained in the best model storage unit 18 as the best segment (best section) in the next step S110. .

Also, when it is determined in step S106 that the waveform division number z is larger than the maximum waveform division number z_max, the best segment (best section) in Gz-1 is specified and stored in the same manner. However, in this case, since z is incremented in step S105, it should be noted that in this case, Gz-1 matches Gz_max.

In addition, once the best segment (best section) of each channel (variable) is obtained, if there is room to improve the pseudo judgment performance by merging other segments (other sections) in each channel, the merge approach It is good also considering the segment which introduce | transduced and merged as the best segment.

Next, in step S111, the best model selection unit 19 obtains normal and abnormal prior probability information and conditional probabilities for each variable obtained in step S210 (corresponding to z from which the best model was obtained) as the best model storage unit. Store in 18. The conditional probability stored may be only the probability of the segment specified as the best segment in each variable.

Also, the model formula (described later) used for determination by the client is stored in the best model storage unit 18. The model formula can be automatically generated once the best segment for each variable is determined.

Also, the best model selection unit 19 reads the segment data of the best section of each variable and the corresponding class from the segment storage unit 15 and stores them in the best model storage unit 18. The segment to be read may be the entire training data, a predetermined number of training data for each of normal and abnormal, or may be determined by other criteria.

The best model selection unit 19 also stores in the best model storage unit 18 detailed information (time length) of each section (segment) when divided by the selected waveform division number z. At least the detailed information of the section corresponding to the best segment is stored.

These pieces of information data stored in the best model storage unit 18 form a judgment model. An example of data stored in the best model storage unit 18 is shown in FIG.

In the example of FIG. 18, the segment s2 is specified as the best segment in all the variables. In FIG. 18, the detailed notation of the table of p (X1 | C), p (X3 | C), p (X4 | C) is omitted for simplification, and simply p (X1 | C), p Only (X3 | C) and p (X4 | C) are shown. How to read the model formula in (2) will be described later.

Next, the transmission unit 20 transmits the determination model stored in the best model storage unit 18 to the client. The model formula type data may be given to the client in advance, and the client may generate the model formula from the type data based on the best segment (best section) of each variable. In this case, the server may not include the model formula in the determination model sent to the client. When a plurality of clients are connected to one server, the determination model is transmitted to the plurality of clients. By receiving this determination model, the client is ready for abnormality determination.

(Supplement) Here is a supplementary explanation of conditional probabilities. Usually, in order to estimate a determination result by a Bayes classifier, it is necessary to obtain a conditional probability p (Xi | C) in each variable (attribute) Xi. Here, the type of attribute value of attribute Xi becomes a problem. Normally, when calculating the conditional probability, it is calculated based on how often the discrete attribute value ai of the attribute Xi occurs, but in this embodiment, segment data obtained by cutting out the time series waveform Since there is no attribute value in itself, the probability cannot be calculated. Usually, time-series waveforms are clustered and divided into several category categories, and each category type is calculated as an attribute value in the domain of time-series clustering and time-series classification. It's not easy to decide how many types to choose. Moreover, when the number of divisions increases, the number of clusters to be handled for each variable must be obtained when the number of types of variables handled at the same time increases, which is unrealistic. Although it is conceivable to handle the segments divided by each variable as all the attributes Xi, there is a problem that the probability calculation cost increases. Therefore, in this embodiment, in the attribute Xi of each variable, by setting the segment type with the highest pseudo judgment performance as the attribute value, the time series data is discretized within the framework of the Bayesian classification problem and the information is not dropped. Made it possible to handle. If it does in this way, it becomes possible to express the segment selected by each variable with certainty. For example, as described above, the conditional probability can be expressed as p (X2 = s2 | C) = (abnormal = 0.8, normal = 0.067).

(First modification of server)
As shown in the score table shown in FIG. 17, when there is only one segment (section) with the highest score in each variable, the important segment is uniquely determined. However, when there are a plurality of segments having the same point, this method cannot be uniquely determined. In such a case, an important segment (important section) is determined by the following method.

FIG. 19 is a diagram for explaining an important segment determination method according to the first modification.

In the example of FIG. 19, the scores of the segments (s1 and s4) of the variable (channel) 4 are the same (each is 9 points). For

variables

1, 2, and 3, there is only one segment with the highest score, and segments s1, s2, and s2 are selected, respectively.

In this case, multiple candidates are generated by taking all combinations of the highest score segments between the variables 1 to 4. Then, as shown in the flowchart of FIG. 5, the likelihood of abnormality is calculated for each candidate (S222). Likelihood is a measure representing the plausibility of a decision and is defined as the total product of probabilities. The calculated likelihoods are compared, and the candidate with the highest likelihood is selected (S223).

In the illustrated example, as candidate c1, (variable 1, variable 2, variable 3, variable 4) = (s1, s2, s2, s1), as candidate c2 (variable 1, variable 2, variable 3, variable 4) = (S1, s2, s2, s4) are generated, and the likelihood of abnormality is calculated for each candidate. Then, the candidate with the highest likelihood is selected (pseudo-judgment evaluation by likelihood calculation).

Here, the likelihood is calculated according to the following equation. Since a vector composed of normal likelihood and abnormal likelihood is obtained by the calculation of the following equation, the likelihood of the abnormal is selected, and the likelihood of abnormality is compared between the candidates. p (C) is a prior probability, and p (X _j = s _i | C) is a conditional probability. As the conditional probability, the value calculated in step S210 in FIG. 3 can be used.

In the example of FIG. 19, when the likelihood of abnormality of each candidate is actually calculated, it becomes 0.087 for candidate c1 and 0.084 for candidate c2, and candidate c1 has a larger value, so candidate c1 is selected. That is, the important segment of the variable (channel) 4 is determined as s1. As shown in the example in Fig. 19, if there is only one variable with a tie segment, compare the conditional probabilities of abnormalities between these tie segments, and select the segment with the largest value as the important segment. The same result can be obtained even if it is determined.

(Second modification of server)
(No. 1) In the first embodiment, the important segment is specified by the maximum value of the score, but as another method, it is possible to select a combination of segments between the variables with the highest likelihood calculation. This is because the best segment is selected for each variable, but it is not always the best determination accuracy when the pseudo-judgment is made for the entire variable. In the process of this modification, for example, according to the above Equation 1-2, the likelihood of abnormality is calculated for all the combinations of segments between variables (S222 in FIG. 5), and the combination with the highest likelihood may be selected. (S223 in FIG. 5).

(Part 2) The following method is also possible as the second modification.

That is, the lower limit threshold θ of the evaluation score is determined in advance, and all segments having a score larger than the lower limit threshold are selected as candidates, and the final important segment is determined from the candidates.

FIG. 20 is a diagram for explaining an important segment determination method according to the second modification.

The lower threshold θ is set to 5. Segments with scores greater than 5 in each of the variables (channels) 1 to 4 are selected as candidates. For variable 1, segments s1, s4, for variable 2, s2, for variable 3, s2, for variable 4, s1, s3, s4 are selected. When the selected segments are combined between the variables, the following six candidates c1 to c6 are obtained.

c1 = (s1, s2, s2, s1), c2 = (s4, s2, s2, s1), c3 = (s1, s2, s2, s3), c4 = (s4, s2, s2, s3), c5 = (S1, s2, s2, s4), c6 = (s4, s2, s2, s4)
The likelihood of abnormality is calculated for each candidate in the same manner as “No. 1” (S222 in FIG. 5). Then, the candidate with the highest likelihood is selected (S223 in FIG. 5). In this example, the likelihood L1 of the candidate c1 is 0.013, the likelihood L2 of the candidate c2 is 0.031, the likelihood L3 of the candidate c3 is 0.024, the likelihood L4 of the candidate c4 is 0.062, and the candidate c5 The likelihood L5 is calculated as 0.033, and the likelihood L6 of the candidate c6 is calculated as 0.093. Since the likelihood of the candidate c6 is the largest, the segment included in the candidate c6 is determined as the important segment. That is, segment s4 is determined as the important segment for variable 1, segment s2 is determined as variable 2, segment s2 is determined as variable 3, and segment s4 is determined as important segment in variable 4.

According to such an important segment determination method, an important segment is not selected in a variable (channel) in which there is no segment with a score higher than the threshold θ. It can be said that such a variable is less necessary for abnormality detection, and there is an advantage that the data of the variable (channel) is not used for abnormality determination. When there are a plurality of candidates with the largest likelihood, a plurality of candidates may be selected.

(Modification 3 of the server)
Depending on the sensing target (monitoring target), it may not be necessary to consider the time difference of each sensor. In that case, a segment at the same position in each variable (channel) may be selected, and estimation may be performed by the k-nearest neighbor method in units of the same segment sequence.

In this case, the conditional probability is calculated in the flow unit of FIG. 4 corresponding to step S205 of FIG. 3 (the conditional probability is calculated in the divided set identifier v unit). As shown in FIG. 21, normal and abnormal likelihoods are calculated by multiplying the normal and abnormal conditional probabilities of the same segment (for example, s2) between the variables, and further multiplying the prior probabilities, and the values are large. The state of the direction is adopted. This is shown by Equation 1-3 below.

・ If the adopted status matches the status of the pseudo-judgment target data, 1 is added as a correct answer, and if it does not match, the score is not added. The same calculation is performed for other same segments (s1, s3, s4) between each variable, and normal or abnormal is selected, and if correct, the score is added. In this way, the score table is updated (the size of the score table is 1 × 4 compared to 4 × 4 in the first embodiment).

In the example of FIG.

(Note that the values in the table of FIG. 25 are used here for ease of understanding). Therefore, the likelihood of abnormality is higher than normal. Therefore, in this case, normal is the estimation result. Because pseudo-class determination target data d ₁ is normal, the estimation result is a correct answer, 1 is added to the score score (S2) for the segment s2.

In some cases, it is possible that the bias of the class prior distribution has already been corrected by the k-nearest neighbor method. Therefore, Formula 1-4, which is obtained by removing the prior probability distribution p (C) from the estimation formula shown in Formula 1-3, may be used instead of Formula 1-3.

(Fourth modification of server)
This modification shows another method for determining the best segment of each variable (channel).

FIG. 22 is a diagram for explaining a method for determining the best segment according to the fifth modification.

First, the segment length and provisional position of the best segment are determined in advance for each variable. Each segment (section) with a predetermined segment length is placed at a provisional position, and each segment is shifted back and forth along the time axis from the provisional position by the minimum movement interval Δ unit, and the position where the pseudo judgment evaluation value Gz is the best is obtained. The combination is determined, thereby determining the best segment. In the example of FIG. 22, the maximum width of the shift is the absolute value of the segment length. A combination with the highest likelihood of abnormality is selected for all the combinations of segments between the variables when moving in units of Δ. In this way, the combination of the position of each possible segment is searched, and the best segment for each variable is determined by performing pseudo judgment evaluation comparison.

(Fifth modification of server)
In this modification, an example will be described in which a stochastic dependency is known from the first variable to the second variable. Here, the first variable is described as variable (channel) X3, and the second variable is described as variable (channel) X2. The variable dependency relationship, that is, the dependency relationship between sensors, is specified in advance by the user to the server.

When there is a stochastic dependency from variable X3 to variable X2, the likelihood is calculated according to Equation 1-5, and the decision estimation equation (the equation that estimates the larger of normal and abnormal likelihoods) is It becomes like -6.

When the prior probability related to the class distribution is not used as in Modification 4, the determination may be made using Formula 1-7 instead of Formula 1-6.

As a difference from the first embodiment, in this modification, the likelihood is calculated by Expression 1-5. In addition, as a model formula stored in the best model storage unit 18, a formula shown in Formula 1-8 is used. The meaning of P (Xnew = s2 | C) will be described later. In this modification, the normal and abnormal conditional probabilities for the variable X2 to the variable X3 are also stored in the best model storage unit 18 and included in the determination model.

An example of the determination model according to this modification is shown in FIG. FIG. 24 schematically shows the concept of the model expression of Expression 1-6.

Here, p (X2 = s2 | X3 = s2, C) indicates that C2 and S3 = s2 have a probability dependency relationship with X2 = s2 depending on the combination of possible values. In other words, this means that there is an influence that should be a synergistic effect (positive or negative) between C and the combination of values that X3 = s2.

Fig. 25 shows a calculation example of the conditional probability p (X2 = s2 | X3 = s2, C). p (X2 = s2 | X3 = s2, C) is calculated from the frequency f (X2 = s2 | X3 = s2, C). Here, to obtain the frequency f (X2 = s2 | X3 = s2, C), the sum of the frequencies f (X2 = s2 | C) and p (X3 = s2 | C) is used as an approximation. This is one way to prevent the frequency of each cell in the conditional probability table from becoming too small when k is small. In the example shown in the figure, f (X2 = s2 | C = abnormal) = 4 and f (X3 = s2 | C = abnormal) = 2, so that f (X2 = s2 | X3 = s2, C = abnormal) = 4 + 2 = 6. Also, f (X2 = s2 | C = normal) = 1 and f (X3 = s2 | C = normal) = 2, so f (X2 = s2 | X3 = s2, C = normal) = 1 + 3 = 4 It becomes.

As a calculation method other than this, if infrequent frequency is not an issue, 3 + 2 = 5 satisfying f (X3 = s2 | C = abnormal) and f (X3 = s2 | C = normal) Of these data, the frequency table of f (X2 = s2 | X3 = s2, C) may be created by counting the frequency of abnormalities and normality of f (X2 = s2).

It should be noted that the method for calculating the conditional probability from the frequency table is the same as p (X2 | C) and p (X3 | C) described above, so duplicate explanation is omitted.

(client)
The client in FIG. 1 will be described.

The client includes a sensing unit 30 that newly senses data using a plurality of sensors, and stores data sensed by the sensing unit 30 (this data will be referred to as determination target data) in the sensing data storage unit 31.

The determination target data input unit 32 monitors whether or not the determination target data is stored in the sensing data storage unit 31. If new determination target data is input, the determination target data is read and input to the waveform preprocessing unit 33.

The waveform preprocessing unit 33 performs waveform preprocessing in the same manner as described in the waveform preprocessing unit 13 of the server.

The model receiving unit 34 receives the determination model sent from the server.

The determination model storage unit 35 stores the determination model received by the model reception unit 34.

As shown in FIG. 28, the segment selection unit 36 scans the determination target data at regular time intervals based on the segment template including the best segment of each variable included in the determination model. The data is read and output to the abnormality determination unit 39.

The abnormality determination unit 39 calculates the likelihood of abnormality and normality based on the cut-out data of each variable input from the segment selection unit 36 and the determination model. The abnormality determination unit 39 determines that the abnormality is abnormal if the abnormality likelihood is higher than the normal likelihood, and otherwise determines that the abnormality is normal.

For example, if the data (waveform) is cut out as shown on the left in Fig. 29 for each variable, the distances to the corresponding segment data (Fig. 18 (4) or Fig. 23 (4)) are compared in the judgment model. Then, the determination result (normal or abnormal) of the top k ′ segments closest to each other is specified. However, 1 <k ′ <k, and k and k ′ are set in advance as system parameters. Then, likelihood calculation is performed using the model formula (2) in FIG. 18 or (2) in FIG. 23, and the likelihood that becomes normal and the likelihood that becomes abnormal are calculated. The way

Output as. When both likelihood values are the same, one of the predetermined values is taken as the determination result.

In more detail, when following the model formula in Fig. 18 (2), the ratio of normal and abnormal (conditional probability) p (Xnew = s2 | C) is first calculated for each variable. For example, for a variable, when k '= 5, normal count is 1, and abnormal count is 4, p (Xnew = s2 | C = normal) = 0.2, p (Xnew = s2 | C = abnormal) = 0.8. Then, p (Xnew = s2 | C) for each variable is multiplied by the normal and abnormal probabilities p (Xj = s2 | C) of the relevant section (section) of the conditional probability table included in the judgment model (this example) So in all variables the section is s2.) That is, p (Xnew = s2 | C), (p (X2 = s2 | C), p (X3 = s2 | C), p (X4 = s2 | C) for each variable Multiply Further, normal and abnormal prior probabilities p (C) are multiplied, thereby obtaining normal and abnormal likelihoods. In the case of following the model formula of FIG. 23 (2), normal and abnormal probabilities of p (X2 | X3, C) are further read and multiplied to obtain normal and abnormal likelihoods. Finally, the state of the higher likelihood of normal and abnormal is determined.

In this way, the segment selection unit 36 and the abnormality determination unit 39 cut out the segment data by sliding the segment template from the determination target data in the same manner as the likelihood calculation by assigning the sliding window, and the determination model and the segment The likelihood is calculated using the data.

By sliding the segment template in this way, the abnormality determination unit 39 outputs the determination results for the number of times scanned. For example, FIG. 29 shows a case where it is normal in a certain part of the first half but is determined to be abnormal in the second half.

The notification display unit 38 notifies or displays the determination result by the abnormality determination unit 39. As an example, FIG. 30 shows a screen display in the case of highlighting only the abnormality determination portion of the determination target data. The determination result is notified, for example, to a remote monitoring terminal or a maintenance staff or a staff of the server using a display or a speaker.

The device control unit 37 controls the operation to be monitored according to the determination result by the abnormality determination unit 39. For example, when it is determined that there is an abnormality, the monitoring target is urgently stopped.

The determination result storage unit 40 has a predetermined time length (for example, the same time length as already stored training data) including the determination result in the abnormality determination unit 39 and the data of each variable extracted for determination from the determination target data. Accumulate time series data.

The determination result transmission unit 41 transmits the time series data of each variable and the corresponding determination result to the server.

The server determination result receiving unit 22 receives the time-series data of each variable transmitted from the client and the corresponding determination result, and the notification unit 21 displays or notifies the monitoring member of these. After the monitoring person confirms that the determination result is correct, the notification unit 21 adds the time series data and the determination result to the training data storage unit 11 of the server in response to an instruction input from the monitoring person.

If there is an error in the determination, the determination result is corrected according to the instruction input from the observer, and the time series data and the corrected determination result are stored in the training data storage unit 11.

The training data input unit 12 of the server may detect that the data in the training data storage unit 11 has been updated and recalculate the determination model.

この By preparing a mechanism that can always give new training data to the device in this way, the judgment model can be refined, that is, accuracy can be improved continuously. This means that there is a possibility that the abnormality determination accuracy can be improved in the daily monitoring operation, and it is considered to be particularly effective in an area where high abnormality determination performance is required.

FIG. 31 is a flowchart showing an operation flow from the input of the determination target data in the client until the determination by the abnormality determination unit 39 is performed.

The determination target data input unit 32 reads the determination target data in the sensing data storage unit 31 and inputs it to the waveform preprocessing unit 33 (S401).

The waveform preprocessing unit 33 performs preprocessing on the determination target data (S402), and the segment selection unit 36 extracts data based on the segment template including the best segment of each variable included in the determination model (S403). Then, the cut out data is input to the abnormality determination unit 39 (S404).

For each variable (S406), the abnormality determination unit 39 calculates k'-nearest neighbor (S407), and calculates a conditional probability (the ratio of normal and abnormal based on k'-nearest neighbor) (S408). When the processing of S407 and S408 is completed for each variable (YES in S409), the normal likelihood and the abnormal likelihood are calculated according to the above-described model formula (S410). The abnormality determination unit 39 compares the likelihood that the likelihood of abnormality is normal and gives the state having the larger value as the determination result (S411).

FIG. 32 shows an example of a hardware configuration for realizing a server and a client.

The server includes a CPU 51, RAM 52, ROM 53, HDD 54, I / O 55, display 56, speaker 57, I / O controller 58, and network interface 59. The training data storage unit 11 and the segment storage unit 15 of the server are configured by the HDD 54, for example. The model transmitting unit 20 and the determination result receiving unit 22 can be configured by a network interface 59. The

other elements

12, 13, 14, 16, 17, 19, and 21 can be configured by logic circuits as program modules that are executed by the CPU 51, for example. The program modules are stored in the ROM 53 or the HDD 54, read out by the CPU 51, developed in the RAM 52, and executed, whereby the operations of the corresponding logic circuits are realized.

The client has a CPU 61, RAM 62, ROM 63, HDD 64, I / O controller 65, display 66, speaker 67, I / O 68, and network interface 69. The client sensing data storage unit 31, the determination model storage unit 35, and the determination result storage unit 40, for example, can be configured by the RAM 62 or the HDD 64. The model receiving unit 34 and the determination result transmitting unit 41 can be configured by a network interface 69.

Other elements

32, 33, 36, 37, and 38 can be configured by a logic circuit as a program module to be executed by the CPU 61, for example. The program modules are stored in the ROM 63 or the HDD 64, and the CPU 61 reads the program modules, develops them in the RAM 62, and executes them, thereby realizing the operations of the corresponding logic circuits.

In this embodiment, the server and the client are separated. However, some or all of the server functions may be performed by the client, and some or all of the client functions may be performed by the server. In the present invention, execution of a process by a computer includes a case where a single computer executes the process, and a case where the process is distributed and executed by a plurality of computers.

(Client variants)
In the explanation of the client so far, the likelihood is calculated by applying the segment template at regular intervals on the determination target data. However, this method sometimes exceeds the upper limit of the time required for the determination. In particular, information processing equipment such as a remote monitoring terminal installed in the field has severe restrictions on computing resources (memory amount, CPU, etc.) that can be used for judgment processing. If it is necessary to stop the monitoring target device (equipment) immediately in case of an abnormality, or if there is a communication performance limit on the communication path from the remote monitoring terminal in the field to the monitoring center server, scan at regular intervals. Sending all the results to the server is unrealistic.

In such a case, an upper limit threshold value is determined in advance for each variable, and the upper limit threshold value for each variable is stored in the determination model storage unit 35. Then, when the value of the variable for any one variable or all of the variables exceeds the upper threshold, the device control unit 37 performs a method such as an emergency stop of the device. In the example of FIG. 33, an example is shown in which a segment template is applied so as to select a portion that exceeds the upper threshold in each variable.

Thereafter, or in parallel with this, the segment template is applied so as to include a portion exceeding the upper threshold, and the data is cut out as shown in FIG. 33, and the determination by the abnormality determination unit 39 is performed based on the cut out data. When it is determined that the abnormality is normal, the emergency stop (control operation) is automatically canceled by the device control unit 37. The upper threshold may be given in advance by the user. Alternatively, some threshold candidates may be selected and compared using the above-described method of dividing and determining training data, and a threshold having the best pseudo-determination performance may be adopted from the compared candidates.

FIG. 34 is a flowchart showing an example of the operation of the client according to this modification.

The determination target data input unit 32 monitors whether sensor data is input to the sensing data storage unit 31 (S501). When sensor data is not input, it is confirmed whether a stop instruction is input from a supervisor or the like. If it is input, this flow is terminated, and if it is not input, the process returns to S501.

When sensor data is input in S502, the determination target data input unit 32 reads the sensor data from the sensing data storage unit 31 as determination target data, and outputs it to the abnormality determination unit 39 via the waveform preprocessing unit 33. The abnormality determination unit 39 determines whether all or any one or more of the variables exceed the respective upper thresholds while moving the segment template (S505). When it does not exceed, the process returns to step S501, and when it exceeds, emergency stop of the monitoring target device is performed via the device control unit 37 (S506).

On the other hand, the segment selection unit 36 cuts out the data of each variable at the position of the segment template when it is determined that the upper limit threshold has been exceeded and sends it to the abnormality determination unit 39. The abnormality determination unit 39 The determination is made based on the determination model (S507).

When it is determined to be normal, the abnormality determination unit 39 cancels the emergency stop or the like via the device control unit 37 (S509), and the determination result and data of a certain time length including the extracted data are determined result storage unit Store in 40 (S511). On the other hand, when it is determined that there is an abnormality, the abnormality determination unit 39 notifies or displays that fact via the notification display unit 38 (S510).

As described above, according to the first embodiment of the present invention, it is possible to specify the dependency relationship between sensors while utilizing the set of accumulated multi-channel sensor data and the abnormality determination result (class) corresponding to the data, and determining the accuracy of determination. Can be improved. The determination can be performed in a form (positional relationship between sections for each variable) that can indicate the basis of which part of the waveform data contributes to the determination.

Second embodiment

In many applications, a large number of sensor nodes are widely used to monitor various features of the object in order to improve the performance of detecting the state (normal or abnormal) of the target object. These applications include object tracking, image recognition, collision avoidance in vehicles, remote monitoring of areas, and remote monitoring of plant operations. The definition of the target object varies depending on the problem. For example, in remote monitoring of a train station area, the target object is a person, and in the collision avoidance in the vehicle, the target object is a vehicle. The target object can be a component in the device. Many target objects may also exist in a remote monitoring system. Hereinafter, the term “sensor node” is used to mean a sensor setup that monitors the state of a feature in a target object.

In most of these applications, the data from the sensor node is sent to a server at the remote monitoring center because computing resources in the remote monitoring device (eg, a remote listening device) are limited. The remote monitoring center server analyzes the data and takes appropriate action. However, this approach is not feasible if the communication bandwidth is constrained by the sensor node generating a large amount of data.

Therefore, when the sensor node is in an abnormal state, it is possible to reduce communication overhead by using a gateway (client) that transmits data from the sensor node. However, in this case as well, due to noise, a false alarm (false alarm (FA)) may occur many times when determining the abnormal state of the sensor node.

An abnormal event in the sensor node may be triggered by an abnormality in some other sensor nodes. In this case, the abnormality of the other sensor node is the cause of the abnormality of the target object. The determination of abnormal events and their causes is very important, especially when equipment or plant operations are linked with human safety and security.

Bayesian network is widely used to show the causal relationship in sensor nodes, and conditional probability table (CPT) is used to infer the cause of abnormal events in sensor nodes. If the number of sensor nodes is small and the causal relationships at the sensor nodes are known, the Bayesian network can be created manually. If the number of sensor nodes is very large and the relationships at the sensor nodes are hidden, it is virtually impossible to manually construct a Bayesian network. On the other hand, a Bayesian network may be automatically constructed from data, but creating a Bayesian network from data is a difficult problem. Therefore, for a remote monitoring system including a large number of sensor nodes, it is not feasible to construct an optimal Bayesian network and to infer the cause of abnormality of sensor nodes used in the Bayesian network.

The sensor network consists of many sensor nodes, and the target object may be monitored by many sensors. In such a large network, not all sensor nodes may be necessary to find anomalous events in the target object. The removal of unnecessary sensor nodes leads to a reduction in the cost of the monitoring system.

In some applications, more expensive sensors may be used to monitor the state of the object. The determination of a set of sensors from a sensor network that can perform expensive sensor replacement is very helpful in reducing the cost of the sensor network. In such an application, the target object is an expensive sensor node.

In this embodiment, an abnormal event of a target object is detected efficiently and reliably, the cause of the abnormality (causing sensor node) is identified from many sensor nodes, and from the gateway to the server in the remote monitoring center. The communication overhead of data transmission is reduced, and unnecessary sensor nodes can be identified for the target object.

Hereinafter, the present embodiment will be described with reference to the drawings.

FIG. 35 shows the configuration of the abnormality determination system according to this embodiment. This system detects the abnormal state (or abnormal event) of the target object in the monitoring site or plant, and identifies the cause of the abnormal state (hereinafter, the cause may also be referred to as a judgment basis).

This system comprises a gateway (client) 100 at the monitoring site and a server 200 at the remote monitoring center.

The gateway 100 includes a single-channel abnormality determination unit 102, a comprehensive determination unit 103 that performs comprehensive abnormality determination and ground identification using a determined fusion rule, a data filtering unit 104, an abnormality determination model database 105, a determined fusion rule database 106, and a reception unit 107. .

The remote monitoring center server 200 includes a receiving unit 205 that receives data from the gateway 100, a sensor data database 201 that holds sensor data, a determination result and determination basis database 204, a single channel abnormality determination model learning unit 203, a determination fusion A rule learning unit 202 and a transmission unit 206 are provided.

FIG. 36 shows an example of the sensor data database 201.

The database 201 includes sensor data observed from each sensor node, a status label indicating whether each sensor data is normal or abnormal (first label), and a determination label indicating whether the target object is normal or abnormal (second label). Are stored with various time stamps. The determination label may be referred to as a class label.

The judgment label of the target object is given by the maintenance staff or the staff at the monitoring site after confirming the actual state of the target object at the time indicated in the time stamp. The state label of each sensor node is determined according to the criteria (model, classifier) prepared for each sensor node. For example, when the reference is a threshold value, the status label is determined to be abnormal if the sensor data value exceeds the threshold value, and normal if it does not exceed the threshold value. The determination of the status label may be automatically determined and given by the apparatus, or may be given by a maintenance person or an attendant.

As the sensor node, various types of sensors such as a sound sensor, a vibration sensor, and a temperature sensor can be used. Outputs from the sound sensor and the vibration sensor are waveform data. The output from the temperature sensor is a value integrated along the time axis.

The single-channel abnormality determination model learning unit 203 in the server 200 learns a single-channel abnormality determination model (classifier) that classifies each sensing data for each sensor node in the sensor data database 201 as abnormal or normal. The single channel abnormality determination model learning unit 203 transmits the single channel abnormality determination model generated for each sensor node to the gateway 100 via the transmission unit 206. The receiving unit 107 of the gateway 100 receives a single channel abnormality determination model (classifier) for each sensor node and stores it in the abnormality determination model database 105.

In order to learn a single channel abnormality determination model (classifier), the single channel abnormality determination model learning unit 203 first collects data and state labels recorded at various time stamps from the sensor data database 201 for each sensor node. Extract. An example of data extracted for one channel (sensor node) is shown in FIG. The single channel abnormality determination model learning unit 203 learns different types of classifiers according to the type of data (for example, whether the data is waveform data).

That is, when the output from the sensor node is a single value such as a value collected along the time axis, a predetermined threshold is used as a classifier for classifying data as abnormal or normal Used. It is very difficult to determine an appropriate threshold. If the threshold is set to a very high value, many false negatives are predicted, and if it is set to a low value, many false positives are predicted. Here, a method for determining the optimum threshold value from the extracted data and the state label will be described.

This method is based on the method of selecting features for optimal segmentation from training learning data in C4.5. C4.5 is disclosed in "C4.5:" Programs for Machine Learning "[Morgan Kaufman Publishers, 1993] by Quinlan.

In this method, first, the values of training learning data (extracted data) are sorted. For example, when the left data in FIG. 38 is sorted, it becomes like the middle of FIG. Next, the median (breakpoint) of the value intervals with different state labels is calculated. For example, since the states of ID9 and ID8 are different in the figure, the median value of the data of ID9 and ID8 is “2.1”. The median value thus calculated is a candidate threshold value. Each candidate threshold value is evaluated by an index such as accuracy, F-score, geometric mean, or AUCB, and a median value (candidate threshold value) that returns the best score (fitness) is selected as the optimum threshold value. AUCB is based on Paul et al. “Genetic algorithm based methods for identification of health risk factors aimed at preventing metabolic syndrome” [SEAL '08: Proceedings of the 7th International Conference on Simulated Evolutionerand Berlin 2008 Springer-Verlag].

The definition of accuracy, F-score, geometric mean sensitivity (sensitivity), specificity (Precision) or recall (Recall) is shown below.

Here, _NTP is the number of true positives, N _TN is the number of true negatives, N _FP is the number of false positives, and N _FN is the number of false negatives.

In the example shown, four candidate thresholds are calculated, and candidate thresholds 3.0 and 3.6 have the highest score, so one of these is selected as the optimal threshold. The selection may be random or user specified.

On the other hand, when the output from the sensor node is waveform data, special consideration is necessary for the construction of the classifier (abnormality judgment model). Normally, waveform data is first processed by a signal processing technique such as moving average, discrete wavelet transform (DWT), or short-time Fourier transform (STFT), and the classifier (anomaly judgment model) learns in the next step. Is done.

The simplest method for learning the classifier (abnormality determination model) in the case of waveform data is the threshold method. In this threshold method, the highest peaks of waveform data measured at various time stamps are acquired, and the optimum threshold value is learned using the method described above. Another possible technique is to extract many feature values such as maximum and minimum amplitude, average and standard deviation, and area under the waveform from the waveform data, and based on the extracted feature values, classifier (anomaly determination model ).

One example of a classifier that can be used to classify waveform data is the k nearest neighbor (kNN) classifier. The k nearest neighbor (kNN) classifier uses dynamic time warping (DTW) that can handle variable length partial waveforms as a distance measurement. The k nearest neighbor (kNN) classifier is disclosed by Dasarathy in "NearestearNeighbor (NN) Norms: NN Pattern Classification Techniques" [IEEE Computer Society Press, 1991]. DTW is also disclosed in “A comparative study of several dynamic time-warping algorithms for connected word recognition" [The Bell System Technical Journal, 60 (7): 1389-1409, September 1981] by Myers and Rabiner.

However, DTW operations are very slow when the number of databases and / or observation points is very large. So instead of DTW, faster distance calculation methods like cross-correlation and Euclidean distance, t-statistic or signal-to-noise ratio (SNR) based functions are used to calculate the distance between two waveforms. May be.

Here, not all the regions in the training waveform are important in detecting an abnormality in the test waveform, and only the feature region in the training waveform is necessary for detecting the abnormality. In addition, if the waveform contains many data points, it takes a very long execution time to determine whether the test waveform is abnormal, but using the waveform feature waveform makes the abnormality more accurate and faster. Given a set of training waveforms that may be detected, feature regions can be extracted by using an optimization algorithm such as Genetic Algorithm (GA). Genetic algorithms include “Adaptation in Natural and Artificial Systems” by Holland [University of Michigan Press, AnnArbor, Michigan, 1975] and “Genetic Algorithms in Search, Optimization, and Machine Learning” by Goldberg [Addison Wesley (Reading, MA), 1989].

FIG. 39 is a flowchart showing a general processing flow of the genetic algorithm.

First, an encoding method for mapping between the solution space and the search space is determined (S1001).

Once the encoding method is determined, next, various control parameter values such as candidate solution size (population size), offspring size, maximum number of generations, crossover probability and mutation probability are initialized (S1002). .

Next, an initial candidate solution is randomly generated (S1003). The group of generated initial candidate solutions corresponds to the initial population.

Next, each candidate solution is evaluated and each fitness is calculated (S1004).

Next, it is checked whether the termination criteria such as the maximum number of generations or the best candidate solution in the population reaches the optimal fitness are satisfied (S1005).

If the termination criterion is not satisfied (NO in S1005), several candidate solutions are selected from the previous generation population to generate new candidate solutions (descendants) (S1006). The selection is made according to a predetermined criterion based on the score (fitness) of each candidate solution. For example, a predetermined number of candidate solutions or a candidate solution having a fitness equal to or greater than a predetermined value may be selected as a predetermined criterion.

A new candidate solution (descendant) is generated by applying the crossover and mutation operator to the selected candidate solution (S1007). Then, a new candidate solution (descendant) is evaluated in the same manner as in step S1004, and each fitness is calculated (S1008).

Next, a new set of candidate solutions (new population) is generated by combining the candidate solution selected from the previous generation population and the newly generated candidate solution (S1009). The candidate solution to be selected is selected according to a predetermined criterion. For example, a predetermined number of candidate solutions having high fitness or candidate solutions having fitness equal to or higher than a predetermined value are selected. The same candidate solution selected in step S1006 may be selected.

When the termination criterion is satisfied (YES in S1005), the best candidate solution (for example, the candidate solution having the highest fitness) from the population at that time is obtained as the best solution (S1010).

FIG. 40 specifically shows an example of optimal segmentation of a waveform using a genetic algorithm (GA). In the following, description will be given focusing only on the waveform data of one sensor node.

First, many candidate solutions (here, 50) are created by extracting partial waveforms arbitrarily (randomly) from a plurality of waveforms with different time stamps (S1011). For each candidate solution, only one partial waveform is taken from each waveform. Here, there are four waveforms 1 to 4 corresponding to four time stamps, and each candidate solution includes partial waveforms 1 to 4 cut out from the waveforms 1 to 4, respectively. The width of the waveform to be cut out may be constant or not constant.

Next, each candidate solution is evaluated by the k-nearest neighbor method (S1012). In the k-nearest neighbor method, for example, partial waveforms included in the candidate solution are classified, that is, the number of true positives (N _TP ), the number of true negatives (N _TN ), the number of false positives (N _FP ), and false negatives. Classification statistics such as the number of (N _FN ) are calculated, and the fitness (fitness) of the candidate solution is calculated based on the calculated classification statistics. For fitness, for example, various indexes such as the accuracy described above can be used. An example of the process performed in step S1012 is shown below. However, the example described here is merely an example, and the present invention is not limited to this.

For example, an example of calculating the fitness of the candidate solution with ID 1 in FIG. 40 is shown. First, one of the partial waveforms 1 to 4 included in candidate 1 (here, partial waveform 4) is removed. The top k partial waveforms closest to the partial waveform 4 are selected from the remaining partial waveforms. Here, k = 3, so all the rest are selected. Each state (normal or abnormal) of the selected partial waveform is specified, the total number of normals and the total number of abnormalities are calculated, and the larger state is selected. Then, the actual state (determination label) of the partial waveform 4 is compared with the selected state. If they match, the answer is correct, and if they do not match, the answer is incorrect. The partial waveforms 1 to 3 are also selected in order and compared to identify the correct or incorrect answer. The ratio of the number of correct answers to the number of comparisons is calculated, and this ratio is set as the fitness of candidate 1.

It is checked whether or not the termination criterion described in FIG. 39 is satisfied (S1013). If it is not satisfied (NO), a candidate solution satisfying the predetermined criterion is selected based on fitness (S1014), and the selected candidate solution is selected. Based on this, a new set of candidate solutions is generated by performing crossover (S1015) and mutation (S1016) operations. In crossover, partial waveforms of one candidate solution are exchanged with corresponding partial waveforms in another candidate solution. In mutation, one partial waveform is replaced with another from the same waveform.

Each new candidate solution (offspring) is evaluated to calculate fitness (S1017), and the candidate solution selected from the old set of candidate solutions and the new candidate solution (descendants) are combined to create a new set of candidate solutions (new (S1018). Candidate solutions to be selected from the old set are selected according to a predetermined criterion based on fitness (for example, selecting a predetermined number of candidate solutions having high fitness or a candidate solution having a fitness equal to or higher than a predetermined value). The candidate solution selected in S1014 may be selected.

When the termination criterion is satisfied (YES in S1013), a candidate solution (partial waveform set) having the best fitness is obtained as an optimized set of training partial waveforms (S1019). The obtained set corresponds to a single channel abnormality determination model. The obtained set and the k nearest neighbor algorithm may be combined and handled as a single channel abnormality determination model.

The decision fusion rule learning unit 202 learns a decision fusion rule (or classification rule) for detecting an abnormality of the target object. The decision fusion rule learning unit 202 transmits the generated decision fusion rule to the gateway 100 via the transmission unit 206. The receiving unit 107 of the gateway 100 receives this determined fusion rule and stores it in the determined fusion rule database 106.

The decision fusion rule is to detect an abnormality of the target object and identify the basis of the abnormality (sensor node) in the target object. Also, sensor nodes that are not included in the decision fusion rule can be identified as sensor nodes that are unnecessary for detecting an abnormality of the target object. The decision fusion rule learning unit 202 stores the learned decision fusion rule in an internal database, and transmits it to the gateway 100 and stores it in the decision fusion rule database 106 as described above.

The decision fusion rule learning unit 202 extracts data as shown in FIG. 41 from the sensor database as shown in FIG. 36 in order to learn the decision fusion rule. The data to be extracted includes each time stamp (ID), a state label (normal or abnormal) of each sensor node, and a determination label (normal or abnormal) of the target object. A classification rule learning approach is used from these extracted data to generate a decision fusion rule.
That is, a classification rule that can accurately predict the state (normal or abnormal) of the target object is learned using the determination label of the target object as the class label and the state label of the sensor node as the feature value. That is, a combination of feature selection and classification is used to learn the decision fusion rules. The resulting classification rules are composed of single or multiple decision fusion rules. For example, consider the following classification rule:
IF (N4 = abnormal AND N8 = abnormal AND N19 = abnormal) THEN (target object = abnormal) ELSE (target object = normal).
This classification rule consists of one decision fusion rule and is interpreted as follows. That is, when the data of the sensor nodes N4, N8, and N19 is abnormal, the target object is interpreted as being in an abnormal state at that time. In the cause-related context, the cause of the abnormality of the target object is that the data of the sensor nodes N4, N8, and N19 is abnormal. Another interpretation is that only three sensor nodes, N4, N8, and N19, are needed to discover anomalies in the target object.
Hereinafter, it may mean that the sensing data of the sensor node is in an abnormal state only with the name of the sensor node (that is, a variable representing the sensor node).

The following notation is used for notation of the decision fusion rule.

The above notation means that when all sensor nodes (Na, Nb,..., Nk) are abnormal, the target object is in an abnormal state.

An example of another classification rule is shown.
IF (((N4 AND N8) OR N10) AND (N19 OR N25)) THEN target object = abnormal) ELSE (target object = normal)
The classification rules in this case include the following four decision fusion rules.
(a) (N4, N8, N19) ⇒ target object;
(b) (N4, N8, N25) ⇒ target object;
(c) (N10, N19) ⇒ target object;
(d) (N10, N25) ⇒ Target object.
Classification rules (decision fusion rules) can have various formats. FIG. 42 shows an example of the AND format, and FIG. 43 shows an example of the rule format.

42. The AND format in the first line in FIG. 42 means that when the data of the sensor nodes N4, N8, and N19 are all abnormal, the target object A is in an abnormal state at that time. The AND format in the second row means that when all the data of the sensor nodes N4, N8 and N25 are abnormal, the target object A is in an abnormal state at that time. The AND format in the third row means that when all the data of the sensor nodes N10 and N19 are abnormal, the target object A is in an abnormal state at that time. The AND format on the fourth line means that when all the data of the sensor nodes N10 and N25 are abnormal, the target object A is in an abnormal state at that time.

The rule format in the first line of FIG. 43 is that all the rules included in at least one of (N4, N8, N19), (N4, N8, N25), (N10, N19), (N10, N25) When all the sensor node data is abnormal, it means that the target object A is abnormal.

The evaluation using the AND format rule of FIG. 42 is easier and faster than the evaluation using the classification rule including a large number of decision fusion rules as shown in FIG. Furthermore, in the AND format rule of FIG. 42, all the sensor nodes in the row are obtained as the determination basis, so that the basis identification becomes easier.
On the other hand, the rule format of FIG. 43 is a compact expression of a large number of decision fusion rules, but extra syntax analysis is required to identify the judgment basis. Using an algebraic variant, the classification rule can be transformed into a sum of product (SOP) format of multiple decision fusion rules, as shown in FIG. Note that if the sensor node is included in a number of decision fusion rules, indexing the sensor node can reduce the confirmation cost.

The decision fusion rule learning unit 202 can identify sensor nodes that are not included in the decision fusion rule as unnecessary sensor nodes for detecting an abnormality of the target object by scanning all learned decision fusion rules.

For example, in the case of a classification rule consisting of four decision fusion rules (a), (b), (c), (d) as described above, sensor nodes N4, N8, N10, It can be identified that N19 and N25 are necessary and the remaining sensor nodes are unnecessary.

One of the purposes of the decision fusion rule learning unit 202 is to find a combination of sensor nodes necessary for predicting an abnormal state in the target object.

For N sensor nodes, there are 2 ⁿ combinations of sensor nodes. When n is small, all combinations are thoroughly searched, and the combination with the maximum support (evaluation value) in the training data can be found as the best one. Instead of the best combination, multiple combinations may be found that have support greater than the threshold. On the other hand, if n is very large, it is not feasible to exhaustively search all combinations. In that situation, various heuristic search algorithms such as genetic algorithm (GA) or genetic programming (GP) can be used.

When a genetic algorithm is used, one decision fusion rule can be constructed for each execution of the algorithm. In order to obtain a large number of decision fusion rules, a large number of GA runs are required.

The method for constructing a classification rule (decision fusion rule) by applying GA is shown below using FIGS.

FIG. 46 shows an example of a processing flow for constructing a classification rule by a genetic algorithm. First, an encoding method for mapping between a solution space and a search space is determined (S1101). In the GA context, the term “classification rule” is used to mean a decision fusion rule.

Fig. 45 shows an example of encoding for a solution problem by GA. It is a binary string consisting of 0s and 1s. When there are n sensor nodes, the length of each string (each candidate solution) is n. That is, this encoding method indicates that the selection of each of the plurality of sensor nodes is mapped to binary values. 0 means that the corresponding sensor node does not affect the target object. 1 means that if the target object is in an abnormal state, the sensor node corresponding to that 1 must be in an abnormal state. When GA is used with this encoding, only one decision fusion rule is derived for each GA run. Each execution is an AND operation of all sensor node states (0 or 1). That is, the problem is directed to the feature selection problem.

Next, the values of various control parameters are determined in the same manner as S1002 in FIG. 39 (S1102). Next, an initial candidate solution (initial candidate classification rule) is randomly generated (S1103). In the illustrated example, since the number n of sensor nodes is 5, the string length is 5. In this example, 10 candidates are generated.

Next, each candidate solution is evaluated, and each fitness is calculated (S1104). FIG. 48 shows an example of evaluation of candidate solutions. That is, based on the candidate solution and the extracted data as shown in FIG. 41, each data (each row) is determined based on the candidate solution, and normal or abnormal determination is obtained for each data. Then, the ratio at which the determined result matches the actual state of the target object is calculated as fitness. In the example shown in the figure, for 10 data, the judgment result and the actual state match for 8 data, and the other 2 data do not match, so the fitness (accuracy) is 8/10 = 0.8.

Next, it is checked whether or not the termination condition is satisfied (S1105). If it is not satisfied (NO), a candidate solution is selected from the current population according to a predetermined criterion based on fitness (S1106). As the predetermined reference, for example, a predetermined number of candidate solutions are selected from those having high fitness, or candidate solutions having fitness equal to or higher than a predetermined value are selected.

Next, based on the selected candidate solution, crossover and mutation operations are applied to generate descendants (new candidate classification rules). FIG. 47 shows an example of generating offspring using crossover and mutation in the genetic algorithm (GA). In FIG. 47, for example, a candidate solution (candidate classification rule 1) of “10011” is interpreted as follows.
IF (N1 = abnormal AND N4 = abnormal AND N5 = abnormal) THEN target object = abnormal ELSE target object = normal.
Next, the generated offspring are evaluated in the same manner as in step S1104, and the fitness of each is calculated (S1107).

Next, a candidate solution is selected according to a predetermined standard based on fitness in the previous generation population, and a new population is generated by combining the selected candidate solution and the generated descendant (S1108). As the predetermined criterion, for example, a predetermined number of candidate solutions are selected from those having high fitness, or a candidate solution having a fitness equal to or higher than a predetermined value is selected.

When the termination condition is satisfied (YES in S1105), the candidate solution (candidate classification rule) having the highest fitness in the population at that time is acquired as the best classification rule (S1109).

Thus, the GA generates one decision fusion rule from the best classification rule for each execution.

On the other hand, in order to construct a classification rule including a large number of decision fusion rules as shown in FIG. 43, various machine learning classification methods and feature selection methods can be used. One such method of constructing such classification rules is genetic programming (GP). This GP is disclosed by Koza in “Genetic Programming:“ On ”the“ Programming ”of“ Computers ”by“ means ”of“ Natural ”Selection” [MIT Press, 1992]. Another example of classification rule construction classification is C4.5, which is disclosed by Quinlan in "C4.5: Programs for Machine Learning" [Morgan Kaufman Publishers, 1993].

By using genetic programming (GP), various tree structures can be derived, and many trees are searched in various generations during execution, so GP has a better classification rule than C4.5. To derive.

GP (genetic programming) uses tree-based coding. The tree can be either an expression by an S expression as shown in FIG. 49 (Symbolic Expression, S expression) or a decision tree as shown in FIG. For compactness, encoding based on the S representation is preferred.

The tree structure of the S expression in FIG. 49 corresponds to the above-mentioned (((N4 AND N8) OR N10) AND (N19 OR N25)) and is interpreted as follows.
IF (((N4 AND N8) OR N10) AND (N19 OR N25)) THEN target object = abnormal) ELSE (target object = normal)
As another example, the S expression of (N1 XOR N3) AND N5 is interpreted as follows.
IF (N1 = abnormal XOR N3 = abnormal) AND (N5 = abnormal) THEN target object = abnormal ELSE target object = normal
In the decision tree of FIG. 50, for example, “N4->(False)->N10->(True)->N19->(True)-> False answer” indicates that sensor node N4 is false (normal) and sensor node N10 is If true (abnormal) and the sensor node N19 is true (abnormal), it means that the target object is true (abnormal).

FIG. 51 shows an example of a processing flow for constructing a classification rule by genetic programming (GP). Here, the case where S expression is used as a tree structure is shown. In the figure, AND, OR, NOT, and XOR are logical operators.

First, an encoding method for mapping between the solution space and the search space is determined (S1201). In the genetic algorithm (GA), the gene type is expressed by a sequence (see FIG. 45), but in the genetic programming (GP), it is expressed by a tree structure.

In the case of S expression, the name (variable) of the sensor node is assigned to the terminal node of the tree structure, and the logical operator is assigned to the non-terminal node. That is, an encoding method is used that defines a variable representing a sensor node selected from a plurality of sensor nodes at the end node of the tree structure and a mapping of a logic operation symbol selected from the plurality of logic operation symbols at a non-terminal node of the tree structure.

On the other hand, in the case of a decision tree, the name of the sensor node (variable) is set to the non-terminal node, the value indicating true (target object is abnormal) or false (target object is normal) to the end node, and each branch is directly above it. A value indicating that the variable is true (the variable value is abnormal) or false (the variable value is normal) is assigned.

Next, various control parameter values are determined in the same manner as in step S1002 of FIG. 39 (S1202).

Next, an initial candidate solution (initial candidate classification rule) is randomly generated (S1203). The size of the tree structure is also determined randomly within the constraints for each candidate solution, and variables and logical operators are randomly assigned to each node of the tree structure.

Next, each candidate solution is evaluated and each fitness is calculated (S1204). FIG. 52 shows an example of fitness calculation for a candidate solution (candidate classification rule). Based on the candidate solution and the extracted data as shown in FIG. 41, each data (each row) is determined based on the candidate solution to obtain normal or abnormal determination. Of the determination results obtained for each data, the ratio that matches the actual state (determination label) of the target object is calculated as fitness. In the example shown in the figure, out of 10 data, the judgment result and the actual state (judgment label) match for 8 data, and the other 2 data do not match, so fitness (accuracy) is 8/10 = 0.8. In addition, what is necessary is just to calculate fitness by the same method also when a decision tree is used as a tree structure.

Next, a candidate solution is selected according to a predetermined criterion based on fitness in the current population (S1205). As the predetermined reference, for example, a predetermined number is selected from those having high fitness, or candidate solutions having fitness equal to or higher than a predetermined value are selected. Further, based on the selected candidate solution, the operation of crossover and mutation is applied to generate a descendant (new candidate classification rule) (S1205). FIG. 53 shows an example of generating offspring using crossover and mutation in genetic programming (GP). In the example mutation shown in the figure, one node is replaced with another node, but this is an example. In the mutation, subtrees are also replaced between different sizes. For example, one node is replaced with a subtree composed of a plurality of hierarchies.

Next, the generated offspring are evaluated in the same manner as in step S1204, and the respective fitness is calculated (S1206).

Next, candidate solutions are selected according to a predetermined criterion in the previous generation population, and a new population is generated by combining the selected candidate solutions and the generated descendants (S1207). As a predetermined reference, a predetermined number of candidate solutions are selected from those having high fitness, or candidate solutions having a fitness equal to or higher than a predetermined value are selected.

Next, it is checked whether an end condition (for example, a candidate solution having a desired fitness is obtained, the maximum generation to be executed has been reached, etc.) is satisfied (S1208). If not satisfied (NO), the process returns to step S1205. If the end condition is satisfied (YES), the candidate solution (candidate classification rule) having the highest fitness in the population at that time is acquired as the best classification rule (S1209). ). More specifically, the logical operation expression (or decision tree) specified by the candidate solution is acquired as the best classification rule. FIG. 54 shows a processing flow inside the gateway.

The single channel abnormality determination unit 102 of the gateway 100 collects data from a plurality of sensor nodes 1 to n (S2001). The data from the sensor node may be waveform data or a set of values observed at regular time intervals. The time interval for observation may be different for each sensor node.

The single channel abnormality determination unit 102 uses the abnormality determination model for each sensor node in the abnormality determination model database 105 to classify the data from each sensor node as abnormal or normal (S2002). The sensor data classification for each sensor node is shown on the left of FIG.

Here, an example of a classification method when the data is a waveform will be described with reference to FIG. FIG. 55 shows an example of test waveform classification using an abnormality determination model (optimized training partial waveform group). As shown on the right side of FIG. 55, the test waveform is first divided into a plurality of sections. The division method is specified in advance. As a division method, for example, a predetermined number of pieces are divided with a constant width. The optimum partial waveform data with the closest distance for each section is identified from the abnormality determination model. Then, the state (normal or abnormal) of the optimum partial waveform data specified for each section is confirmed, and the larger one is adopted. In the example of FIG. 55, abnormality is adopted. Such classification is performed for at least sensor nodes included in the decision fusion rule. The sensor node included in the determined fusion rule is designated in advance in the single channel abnormality determination unit 102. The designation may be performed by a notification from the comprehensive determination unit 103, or may be performed by a maintenance staff or a staff.

Next, the comprehensive judgment unit 103 extracts the decision fusion rule (classification rule) of the target object from the decision fusion rule database 106 (S2003). In the upper right of FIG. 58, a classification rule composed of a plurality of decision fusion rules is extracted.

The overall determination unit 103 checks whether the extracted decision fusion rule matches the state (normal or abnormal) of the sensing data of the sensor node included in the rule (S2004), and matches at least one decision fusion rule. If so (YES in S2004), the target object is determined to be in an abnormal state. In the example of FIG. 58, two decision fusion rules are satisfied.

In this case, the data filtering unit 104 sends the determination result (the target object is abnormal) and the state (determination basis) of the sensor node included in the satisfied decision fusion rule to the server 200 of the remote monitoring center (S2005). . An example of a message format transmitted to server 200 in step S2005 is shown in FIG. In the server 200, these data (determination result and determination basis) transmitted from the gateway 100 are received by the receiving unit 205 and stored in the database 204.

On the other hand, if any decision fusion rule included in the classification rule is not satisfied, the data filtering unit 104 sends a list of all sensor nodes indicating an abnormal state to the server 200 of the monitoring center (S2006). An example of the message format transmitted to the server 200 in this step S2006 is shown in FIG. Further, the data filtering unit 104 sends the data of these sensor nodes to the server 200 of the remote monitoring center (S2007). The time stamp of the sensor node data may also be transmitted simultaneously. At this time, the gateway 100 may discard the sensor node data in a normal state.

The server 200 may store these data (a list of sensor nodes in an abnormal state, data of these sensor nodes, and a time stamp) in the sensor database 201. At this time, the determination label of the target object is set normally. Also, sensor node status labels not included in the list are set normally. Thereafter, for example, the single channel abnormality determination model learning unit 203 or the decision fusion rule learning unit 202 may perform the above-described processing based on, for example, the updated sensor database 201.

In the above-described embodiment, the abnormality determination model based on each sensor and the decision fusion rule for comprehensively determining the determination of each abnormality determination model are all learned in the server. However, the present invention is not limited to this setup, and if the gateway has sufficient computing resources, the abnormality determination model and the decision fusion rule may be learned at the gateway.

As described above, according to the present embodiment, since the feature selection technique is used to learn the decision fusion rule, unnecessary sensor nodes for the target object can be identified and removed from the remote monitoring system, thereby reducing the cost of the system. it can. Unlike the reasoning in the Bayesian network, the state matching of only the sensor node specified in the decision fusion rule is performed, so that the cause of the abnormal event from many sensor nodes can be identified efficiently. The detection of the abnormal event becomes reliable by using the state of a large number of sensors to confirm the abnormality in the target object. Furthermore, no prior knowledge about the causal relationship at the sensor node is necessary to learn (construct) the decision fusion rule.
Communication overhead for transmitting data to the remote monitoring center is reduced by using the data filter unit. Only when the abnormality of the target object cannot be confirmed by the decision fusion rule, the sensor data may be sent to the server of the remote monitoring center by the data filtering unit.

As described above, according to the first and second embodiments, it is possible to extract only the portion that contributes to the determination in the accumulated multi-channel sensor data and generate the determination model in consideration of the probability dependency between the channels. It is possible to perform determination using the generated determination model, and to accurately indicate the determination basis without excess or deficiency. In addition, when there is a new input, training data with high accuracy can be added, so that the performance of the determination model can be continuously improved.

The present invention includes a manufacturing system monitoring system, an elevator monitoring system, an air conditioning system monitoring system, a power system monitoring system, a vital sensing system monitoring system and a health equipment monitoring system in medical and nursing care, etc. It can be used as various remote monitoring systems such as quality control, maintenance and condition monitoring.

Claims

A plurality of time-series data relating to a plurality of variables obtained by observing a monitoring target with a plurality of sensors, and a normal class or an abnormal class representing the state of the monitoring target when the plurality of time-series data are acquired. A data storage unit for storing a plurality of training data sets;
A plurality of sections are specified for each of the plurality of variables, and each of the plurality of variables is a plurality of sections of data from the plurality of time-series data included in the plurality of training data. A waveform dividing unit for extracting segment data of
For each of the plurality of variables, by performing determination by the nearest neighbor method for each of the plurality of sections using the plurality of segment data extracted by the waveform dividing unit, one of the plurality of sections is obtained. An evaluation unit for selecting the best interval,
For each of the plurality of variables, based on the number of times determined to be normal for each of the plurality of sections and the number of times determined to be abnormal, calculate the normal and abnormal conditional probabilities of the best section,
Calculating normal and abnormal prior probabilities from the total number of normal classes and the total number of abnormal classes included in the plurality of training data;
A calculation unit;
Storing normal and abnormal prior probabilities, and for each of the plurality of variables, identification information of the best section, segment data of the best section, a class related to the segment data, and the class of the best section A storage unit for storing normal and abnormal conditional probabilities;
A sensing unit for observing the monitoring target with a plurality of sensors and acquiring a plurality of time-series data regarding a plurality of variables;
For each of the plurality of variables, a selection unit that selects segment data from the plurality of time-series data acquired by the sensing unit according to the best interval,
For each of the plurality of variables, for the segment data selected by the selection unit, the uppermost predetermined number of segment data is detected by the nearest neighbor method using the segment data in the storage unit,
For each of the plurality of variables, the respective ratios of the normal class and the abnormal class in the predetermined number of segment data are multiplied by the normal and abnormal conditional probabilities in the storage unit, respectively, and the multiplication value is the plurality And the likelihood of normal and abnormal is calculated by multiplying between the variables and multiplying by the prior probability of normal and abnormal,
A determination unit that determines the state of the monitoring target to be one of the normality and abnormality having the highest likelihood; and
An abnormality judgment system with
The calculation unit, with respect to the first variable and the second variable specified in advance among the plurality of variables, the first variable when the nearest neighbor determination by the best interval of the second variable becomes normal and abnormal Calculate the first conditional probability that the nearest neighbor judgment by the best interval of is normal and abnormal,
The storage unit stores the first conditional probability of the normal and abnormal,
2. The system according to claim 1, wherein the determination unit calculates the normal and abnormal likelihoods by further multiplying the normal conditional and abnormal first conditional probabilities.
3. The system according to claim 2, wherein the evaluation unit calculates the number of correct answers for each of the plurality of sections, and selects the section with the highest number of correct answers as the best section.
The calculation unit, regarding at least one variable among the plurality of variables, when there are two or more sections in which the number of correct answers is the same or greater than or equal to a threshold value, the abnormal condition for each of the two or more sections Calculate the probability of
The evaluation unit, for each of the two or more sections, the conditional probability of the abnormality calculated for the best section of the plurality of variables other than the at least one variable other than the at least one variable, and the two The likelihood of the abnormality is calculated by multiplying the conditional probability of each abnormality in each of the above-described sections, and the section having the higher likelihood of the abnormality among the two or more sections is defined as the at least 1 3. The system according to claim 2, wherein the system is selected as the best interval of two variables.
An abnormality determination method executed in a computer,
The computer is
A plurality of time-series data relating to a plurality of variables obtained by observing a monitoring target with a plurality of sensors, and a normal class or an abnormal class representing the state of the monitoring target when the plurality of time-series data are acquired. Reading data from a data storage unit that stores a plurality of training data in pairs;
A plurality of sections are specified for each of the plurality of variables, and each of the plurality of variables is a plurality of sections of data from the plurality of time-series data included in the plurality of training data. Waveform segmentation step for extracting segment data of
For each of the plurality of variables, by performing determination by the nearest neighbor method for each of the plurality of sections using the plurality of segment data extracted by the waveform dividing unit, one of the plurality of sections is obtained. An evaluation step for selecting the best interval,
For each of the plurality of variables, based on the number of times determined to be normal for each of the plurality of sections and the number of times determined to be abnormal, calculate the normal and abnormal conditional probabilities of the best section,
A calculation step of calculating normal and abnormal prior probabilities from the total number of normal classes and the total number of abnormal classes included in the plurality of training data;
Storing normal and abnormal prior probabilities, and for each of the plurality of variables, identification information of the best section, segment data of the best section, a class related to the segment data, and the class of the best section A storage step for storing normal and abnormal conditional probabilities;
A sensing step of acquiring a plurality of time-series data relating to a plurality of variables by observing the monitoring target with a plurality of sensors;
For each of the plurality of variables, a selection step of selecting segment data from the plurality of time-series data acquired by the sensing unit according to the best interval,
For each of the plurality of variables, for the selected segment data, the nearest neighbor method using the segment data in the storage unit to detect a predetermined predetermined number of segment data,
For each of the plurality of variables, the respective ratios of the normal class and the abnormal class in the predetermined number of segment data are multiplied by the normal and abnormal conditional probabilities in the storage unit, respectively, and the multiplication value is the plurality of the plurality of variables. And the likelihood of normal and abnormal is calculated by multiplying between the variables and multiplying by the prior probability of normal and abnormal,
A determination step of determining the state of the monitoring target to be one of the normality and abnormality having a higher likelihood; and
An abnormality determination method characterized by executing
A plurality of first labels each indicating whether sensor data observed by a plurality of sensor nodes monitoring the target object is abnormal or normal, and a second label indicating whether the state of the target object is normal or normal A first database for storing training data of
(A-1) A plurality of candidate solutions are generated by performing the mapping a plurality of times at random using an encoding method that prescribes mapping of the presence or absence of each of the plurality of sensor nodes to a bit string,
(A-2) Genetic evaluation of fitness of each of the plurality of candidate solutions with respect to the first database, and generation of new candidate solutions by crossover of candidate solutions selected based on the fitness and mutation operations A determination fusion rule learning unit that determines an optimal candidate solution having optimal fitness by repeatedly performing according to an algorithm, and identifies a sensor node in which a bit is set in the optimal candidate solution;
(B-1) Whether the sensor data observed by the specified sensor node is abnormal or normal, whether the given sensor data prepared in advance for the specified sensor node is abnormal or normal Using the classifier to determine,
(B-2) When all the determination results for the specified sensor node indicate abnormality, the target object is determined to be abnormal, and when at least one of the determination results indicates normal, An overall determination unit that determines that the target object is normal;
With
The decision fusion rule learning unit is configured to evaluate fitness of each of the plurality of candidate solutions.
For each of the plurality of training data, the first label of the sensor node on which a bit is set in the candidate solution is detected, and the state of the normal or abnormal state indicated by the detected first label is selected. And calculating a rate at which the state selected for each of the plurality of training data matches the state indicated by the second label of each of the plurality of training data.
A plurality of first labels each indicating whether sensor data observed by a plurality of sensor nodes monitoring the target object is abnormal or normal, and a second label indicating whether the state of the target object is normal or normal A first database for storing training data of
(A-1) It is defined that a variable representing a sensor node selected from the plurality of sensor nodes is mapped to a terminal node of the tree structure, and a logical operation symbol selected from a plurality of logical operation symbols is mapped to a non-terminal node of the tree structure. Using the encoding method to S representation, a plurality of candidate solutions are generated by randomly performing the mapping a plurality of times,
(A-2) Genetic evaluation of the fitness of each of the plurality of candidate solutions with respect to the first database, and generation of a new candidate solution by crossover of candidate solutions selected based on the fitness and mutation operation A decision fusion rule learning unit that obtains an optimal candidate solution having optimal fitness by performing iteratively and obtaining a logical operation expression specified by the optimal candidate solution;
(B-1) Whether the sensor data observed by the sensor node of the variable included in the logical operation expression is abnormal or normal, the given sensor data prepared in advance for the sensor node is abnormal and normal Judgment is made using a classifier that decides either
(B-2) When the sensor data is abnormal, the sensor node variable is true, and when it is normal, it is determined whether the logical operation expression is true or false. When the determination is true, When the target object is abnormal or false, a general determination unit that determines that the target object is normal,
With
The decision fusion rule learning unit is configured to evaluate fitness of each of the plurality of candidate solutions.
For each of the plurality of training data, when the first label of the sensor node included in the candidate solution indicates abnormality, the variable of the sensor node is specified as true, and when indicated as normal, the variable is specified by the candidate solution. Determine whether the logical expression is true or false. If true, determine that the target object is abnormal, if false, normal.
Calculating a rate at which the state determined for each of the plurality of training data matches the state indicated on the second label of each of the plurality of training data;
An abnormality determination system characterized by that.
A plurality of first labels each indicating whether sensor data observed by a plurality of sensor nodes monitoring the target object is abnormal or normal, and a second label indicating whether the state of the target object is normal or normal A first database for storing training data of
(A-1) a variable representing a sensor node selected from the plurality of sensor nodes as a non-terminal node of the tree structure, a value indicating whether a variable of the node immediately above is true or false, and a tree structure branch A plurality of candidate solutions are generated by performing the mapping multiple times at random using a coding method to a decision tree, which specifies that a value indicating true or false is mapped to a non-terminal node of
(A-2) Genetic evaluation of the fitness of each of the plurality of candidate solutions with respect to the first database, and generation of a new candidate solution by crossover of candidate solutions selected based on the fitness and mutation operation A determination fusion rule learning unit that obtains an optimal candidate solution having optimal fitness by performing iterative programming and obtaining a decision tree specified by the optimal candidate solution;
(B-1) Whether the sensor data observed by the sensor node of the variable included in the decision tree is abnormal or normal, whether the given sensor data prepared in advance for the sensor node is abnormal or normal Using a classifier that determines
(B-2) The variable corresponding to the sensor node is true when it is abnormal, false when it is normal, the true / false of the decision tree is determined, and the target object is abnormal when the determination result is true , A general determination unit that determines that it is normal when false,
With
The decision fusion rule learning unit is configured to evaluate fitness of each of the plurality of candidate solutions.
For each of the plurality of training data, the decision tree specified by the candidate solution is defined as true when the first label of the sensor node of the variable included in the candidate solution indicates abnormality, and false when normal. If the target object is true, the target object is abnormal, and if it is false, the target object is determined to be normal.
Calculating, as the fitness, the proportion of the state determined for each of the plurality of training data and the state indicated by the second label of each of the plurality of training data,
An abnormality determination system characterized by that.