JP2011192097A - Failure detection method and information processing system using the same - Google Patents

Failure detection method and information processing system using the same Download PDF

Info

Publication number
JP2011192097A
JP2011192097A JP2010058618A JP2010058618A JP2011192097A JP 2011192097 A JP2011192097 A JP 2011192097A JP 2010058618 A JP2010058618 A JP 2010058618A JP 2010058618 A JP2010058618 A JP 2010058618A JP 2011192097 A JP2011192097 A JP 2011192097A
Authority
JP
Japan
Prior art keywords
learning
abnormality
device
operation
operation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2010058618A
Other languages
Japanese (ja)
Inventor
Mitsuhiro Imai
Tatsuya Kameyama
Junichi Kimura
達也 亀山
光洋 今井
淳一 木村
Original Assignee
Hitachi Ltd
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd, 株式会社日立製作所 filed Critical Hitachi Ltd
Priority to JP2010058618A priority Critical patent/JP2011192097A/en
Publication of JP2011192097A publication Critical patent/JP2011192097A/en
Pending legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a failure detection method for reducing false failure determination, in failure determination using a data mining method. <P>SOLUTION: The system includes a first learning means and a second learning means. The first learning means receives operation information indicating a feature of device operating state from a plurality of devices having different operation structures, performs learning by use of a statistic means from the operation information, and updates and stores learning data. The second learning means updates and stores a threshold for each operation structure from operation structures identifying the devices of different configurations and operation information obtained from devices of the same operation structure in the first learning means. A failure level analysis means calculates a level of failure from the received operation information and the learning data stored by the first learning means, and a failure determination means compares the level of failure with the threshold corresponding to the operation configuration of the device of the received operation information, the threshold being stored by the second learning means, and determines whether the level of failure is an abnormal value or not. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

  In particular, the present invention can be applied to changes in the operating configuration of a device, new changes due to changes due to addition / deletion / update of the hardware configuration of the device, and changes due to addition / deletion / update of software executed on the device. The present invention relates to an abnormality detection method capable of detecting an abnormality with respect to addition of a device having a different operation configuration and an information processing system using the abnormality detection method.

  Conventionally, a statistical model is created by learning using statistical methods, the statistical distance is calculated from the operation information and the statistical model, the degree of abnormality is obtained, and the degree of abnormality is abnormal using the threshold value specified in the configuration file. There was a method of determining whether or not there is (see, for example, Patent Document 1).

  Conventionally, there has been a method of setting a threshold value for determining an abnormality for each type of device in advance and determining whether or not the operation information is abnormal (see, for example, Patent Document 2).

  Furthermore, conventionally, there has been a dynamic learning method of a normal operation model for detecting an abnormality (see, for example, Patent Document 3).

JP 2005-182647 A JP 2000-181761 A JP 2008-129714 A

  In recent years, with the advent of microprocessors that realize various functions by software, microprocessors are installed in almost all devices regardless of industry or consumer. As the performance of microprocessors has improved, the scale of executable software has increased and more multifunctional devices can be realized.

  Furthermore, by installing an OS (Operating System), it is possible to realize multitasking and multithreaded applications, and various applications can be executed simultaneously under OS management.

  For example, an HGW (Home Gateway) installed at a subscriber's home on an NGN (Next Generation Network) that realizes an integrated multimedia service of telephone, data communication, and streaming broadcasting is an OSGi (Open Services). Application (Gateway initiative) framework technology can be used to remotely install, start, stop, and uninstall applications, thereby realizing various services.

  In such a device, although software functions are enhanced, anomalies caused by the operation of a plurality of softwares are problematic. As software becomes more sophisticated and larger, the occurrence of abnormalities that are difficult to predict, such as diversified user environments, compatibility with other companies' software, and malicious virus software that crashes new software year by year, is increasing. Yes. In such a background, it is important to detect an abnormality (failure) during operation of the device.

  For example, in Patent Document 1, a learning unit creates a new statistical model by performing a probability distribution calculation process on the basis of an existing statistical model for device operation information received from an electronic device. The statistical distance is calculated from the model and the score of each data is output as the analysis result. The detection / notification unit determines whether there is a score higher than the preset threshold in the setting file. If the threshold is exceeded, the administrator is notified by e-mail that there is an abnormality.

  For example, in Patent Document 2, the terminal monitoring apparatus reads failure monitoring information for each terminal, and uses a threshold value according to the type of the terminal to compare the failure monitoring information with the threshold value, thereby predicting and detecting a failure. To do.

  For example, in Patent Document 3, the abnormality detection device prepares in advance an analysis normal operation model obtained based on static analysis in advance, and performs abnormality detection using the analysis normal operation model. Using the result of abnormality detection, the normal operation model is learned while collating the determination result of abnormality detection using the normal operation model based on learning.

  In an environment such as an HGW system in which the OSGi framework is introduced, an operation configuration such as addition of a new device, change of a hardware configuration of a device, or change of an application to be executed is changed during the operation of the system. In order to predict in advance an abnormality related to the operation of a plurality of software in a specific operation configuration, it is necessary to check in advance an abnormality in a combination of all software in the same operation configuration. However, it is difficult to examine all abnormalities in combination with software created by other companies, for example. For this reason, a statistical model is created by learning using statistical methods from the operational information that represents the characteristics of the operational state of the device associated with the execution of software on the device, and the statistical distance is calculated from the operational information and the statistical model. Therefore, it is necessary to determine the degree of abnormality and determine whether the degree of abnormality is abnormal. However, since the operation information of a device with a completely new operation configuration is not learned using the operation information of a device with a new configuration, the degree of abnormality is obtained using an incorrect statistical model, and whether or not the abnormality degree is abnormal is determined. Make a mistake.

  On the other hand, in Patent Document 1, a statistical model is created by learning using a statistical method, the statistical distance is calculated from the operation information and the statistical model, the degree of abnormality is obtained, and the threshold value specified in the setting file Although a method for determining whether or not the degree of abnormality is abnormal is disclosed using, a configuration necessary for determining an abnormality based on operation information of a newly configured device is not described.

  Although Patent Document 2 discloses a method of setting a threshold value for determining an abnormality for each type of device in advance and determining whether or not the operation information is abnormal, statistics are obtained by learning using a statistical method. Since a model has not been created, the cost and labor of setting an appropriate threshold value for each type of device in advance is imposed.

  Although Patent Document 3 discloses a dynamic learning method for a normal operation model for detecting an abnormality, it is necessary to prepare an analysis normal operation model obtained based on static analysis in advance.

  FIG. 1 is a block diagram illustrating the configuration of the main part of Patent Document 1 in order to illustrate the difference from this patent.

  The operation information acquisition unit 15 collects operation information during operation of the device and stores it in the operation information data 21. The learning unit 30 performs learning using the statistical method from the operation information data 20 and stores the learned data in the learning data 40. The analysis unit 51 outputs a statistical distance score from the learning data based on the operation information. The abnormality determination means 61 compares the score with the threshold value stored in advance in the threshold information 71 to determine whether or not there is an abnormality.

  FIG. 2 is a block diagram illustrating the configuration of the main part of Patent Document 2 in order to illustrate the difference from this patent. The operation information acquisition unit 15 collects operation information during operation of the device, stores the operation information in the operation information data 20, and outputs the operation information.

  The operation configuration acquisition unit 17 identifies the type of device, and outputs the threshold value of the device that has collected the operation information from the threshold information 71 that stores the threshold value for each device in advance. The abnormality determination unit 61 compares the operation information with the threshold value and determines whether or not there is an abnormality.

  The object of the present invention is to change the operating configuration of a device (change, update, addition, deletion, etc. of hardware and software) as needed, and to detect characteristic changes in the operating state of the device. In anomaly detection using information, the statistical model is learned from the operation information of the device while detecting the anomaly using the statistical model, and the degree of abnormality obtained by calculating the statistical distance from the operation information and the statistical model is abnormal. An object of the present invention is to provide an anomaly detection method and an information processing system for detecting anomalies by detecting an anomaly according to the learning amount of a statistical model while detecting anomalies.

  The above objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings.

  A typical one of the inventions disclosed in the present application will be briefly described as follows.

  That is, the abnormality detection method of the present invention is an abnormality detection method for monitoring an operation state of a plurality of devices and detecting an abnormality of the device, collecting operation information indicating the operation state from the device, and A first learning step for learning information and storing a learning result; and an operating configuration comprising a configuration during operation of the device is collected and collected from the device corresponding to the operating configuration in the first learning step. A second learning step of learning a threshold value corresponding to the operational configuration by learning a threshold value corresponding to the operational configuration according to the learned amount of the operational information, the operational information collected from the device, and the The analysis step of comparing and analyzing the learning result, and outputting the content as an abnormality level, the abnormality level, and the threshold corresponding to the same operation configuration as the device for which the abnormality level was obtained in the analysis step By comparing the values, the abnormality degree and having an abnormality determination step of determining whether indicating the value of the abnormal or not.

  An information processing system according to the present invention is an information processing system comprising a plurality of devices and an abnormality monitoring device that monitors an operation state of the device and detects an abnormality of the device. A first learning processing unit that collects operation information to be collected from the device, learns the operation information, and stores a learning result; and collects an operation configuration including a configuration at the time of operation of the device, 2nd learning which learns the threshold value for every said operating configuration according to the learning amount of the said operating information collected from the said apparatus corresponding to the said operating configuration in a learning step, and memorize | stores the threshold value corresponding to an operating configuration A processing unit, an analysis processing unit that compares and analyzes the operation information collected from the device and the learning result, and outputs the content as an abnormality level, the abnormality level, and the abnormality level in the analysis step Sought The abnormality monitoring apparatus includes an abnormality determination processing unit that compares a threshold value corresponding to the same operation configuration as the device and determines whether the abnormality degree indicates a value of abnormality. To do.

  In the information processing system of the present invention, the analysis processing unit and the abnormality determination processing unit may be provided in the device instead of the abnormality monitoring device.

  According to the present invention, the learning model can be learned using the operation information during operation of the device. Therefore, the abnormality that cannot be detected in advance in the test environment can be detected by using the learned learning model. Even if the configuration of the device is changed due to deletion, update, etc., the probability of erroneously detecting an abnormality determination using a learned learning model can be reduced.

10 is a block diagram illustrating a configuration of a main part of Patent Document 1. FIG. 10 is a block diagram illustrating a configuration of a main part of Patent Document 2. FIG. It is a block diagram which illustrates the outline | summary of one embodiment of this invention. It is a figure which illustrates the outline | summary of the method of calculating | requiring the abnormality degree by data mining. It is a figure which illustrates the outline | summary of the method of calculating | requiring a cluster range. It is a figure which illustrates the outline | summary of the transition of the feature vector of the operation information in the apparatus of the learned operation configuration. It is a figure which illustrates the outline | summary of the transition of the feature vector of the operation information in the apparatus of an unlearned operation configuration. It is explanatory drawing which illustrates an example of the change of the normalized abnormality degree in the apparatus of the learned operation structure. It is explanatory drawing which illustrates the outline | summary of the change of the operation number of the apparatus of an unlearned operation structure. It is explanatory drawing which illustrates the outline | summary of the change of the learning amount learned using the operation information of the apparatus of an unlearned operation structure. It is explanatory drawing which illustrates the threshold value setting judged to be abnormal from the operation information of the apparatus of an unlearned operation configuration. It is explanatory drawing which illustrates the other threshold value setting judged to be abnormal from the operation information of the apparatus of an unlearned operation configuration. It is a block diagram which illustrates the composition of the information processing system which is one embodiment of the present invention. 1 is a block diagram illustrating a configuration of an abnormality monitoring apparatus according to an embodiment of the present invention. It is a block diagram which illustrates the composition of HGW which is one embodiment of the present invention. It is a block diagram which illustrates the composition of an abnormality monitoring device and HGW which are one embodiment of the present invention. It is a flowchart figure which illustrates the operation | movement of the learning process (610) which is one embodiment of this invention. It is a flowchart figure which illustrates operation | movement of the operation | movement structure registration process (630) which is one embodiment of this invention. It is a flowchart figure which illustrates the operation | movement of the threshold value learning process (640) which is one embodiment of this invention. It is a flowchart figure which illustrates the operation | movement of the threshold value update process (step S230) which is one embodiment of this invention. It is a flowchart figure which illustrates the operation | movement of the other threshold value update process (step S230) which is one embodiment of this invention. It is a flowchart figure which illustrates the operation | movement of the abnormality determination process (620) which is one embodiment of this invention. It is a figure which illustrates the hardware configuration information 183 which is one embodiment of this invention. It is a figure which illustrates the application information 182 which is one embodiment of this invention. It is a figure which illustrates the customer information 181 which is one embodiment of this invention. It is a figure which illustrates an example of the operation structure data 85 which is one embodiment of this invention. It is a figure which illustrates the threshold value data 70 which is one embodiment of this invention. It is a figure which illustrates the learning data 40 which is one embodiment of this invention. It is a figure which illustrates the apparatus operation information 155 which is one embodiment of this invention. It is a figure which illustrates the operation information data 20 which is one embodiment of this invention. FIG. 17 is a sequence diagram illustrating an operation flow of FIG. 16 as an embodiment of the present invention. It is a sequence diagram which illustrates the flow of operation in other composition.

  The present invention receives operation information representing characteristics of the operating state of a device from a plurality of devices having different operation configurations during operation, performs learning using a statistical method from the operation information, and learns data The first learning means for updating and storing the operation configuration, the operation configuration for identifying devices having different configurations, and the learning amount using the operation information obtained from the devices having the same operation configuration in the first learning means, A second learning means for updating and storing each threshold value is provided.

  Further, the abnormality level analysis means calculates the abnormality degree from the received operation information and the learning data stored by the first learning means, and performs second learning corresponding to the operation configuration of the equipment of the abnormality degree and the received operation information. An abnormality determining means is provided for comparing the threshold values stored by the means and determining whether or not the degree of abnormality is a value indicating abnormality.

  DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of the invention will be described in detail with reference to the drawings. Note that in the drawings illustrating the embodiment, the same components are denoted by the same reference symbols in principle, and the repetitive description thereof is omitted.

  Hereinafter, an embodiment will be described in detail.

  First, an outline of a typical embodiment of the present invention will be described as a first embodiment.

  The outline of the present invention is illustrated in FIG. 3 in contrast to FIGS. FIG. 3 is a block diagram illustrating an outline of an embodiment of the present invention.

In FIG. 3, the operation information acquisition unit 15 operates to collect various operation information that can be acquired during operation and store it in operation information data in an arbitrary number of devices.
The operation configuration acquisition unit 16 operates together with the operation information acquisition unit 15 to collect operation configurations indicating configurations of an arbitrary number of devices.

  The learning unit 30 operates to create a statistical model from the operation information of the operation information data 20 by using a statistical method and store it in the learning data 40.

  The management unit 80 operates to register the operation configuration information not registered in the operation configuration data 85 in the operation configuration information collected by the operation configuration acquisition unit 16 in the operation configuration data.

  The threshold learning unit 90 measures a learning amount that is the number of operation information used for learning by the learning unit 30 for each operation configuration of the operation configuration data 85, calculates a threshold corresponding to the learning amount, The threshold data 70 is updated with the obtained threshold value.

  The abnormality degree analysis unit 50 operates to output the statistical distance between the operation information collected by the operation information acquisition unit 15 and the statistical model of the learning data 40 as the degree of abnormality.

  The abnormality determination means compares the abnormality degree output from the abnormality degree analysis means 50 with the threshold value based on the operation configuration of the device that outputs the operation information for which the abnormality degree is obtained, in the threshold data 70, It operates to determine whether the degree indicates an abnormality.

  The learning process (600) includes operation information data 20, learning means 30, and learning data 40. The operation configuration registration process (630) includes management means 80 and operation configuration data 85. The threshold learning process (640) includes threshold learning means 90 and threshold data 70. The abnormality determination process (620) includes an abnormality degree analysis unit 50 and an abnormality determination unit 60.

Next, an example of creation of learning data 40 using a statistical method in the learning means 30 will be described with reference to FIG. FIG. 4 is a diagram illustrating an outline of a method for obtaining the degree of abnormality by data mining. The learning data 40 includes information (average vector m, standard deviation σ) of j clusters ω j and a cluster threshold Th. The j clusters ω j illustrated in FIG. 4 are obtained from, for example, a p-dimensional feature vector x based on operation information collected from an arbitrary number of devices by using a cluster analysis method for grouping data in a data mining method. Is extracted, and cluster analysis is performed from a large number of feature vectors.

An example of a method for obtaining the cluster threshold value of the cluster ω i will be described with reference to FIGS. 4 and 5. FIG. 5 is a diagram illustrating an outline of a method for obtaining a cluster range. The cluster threshold is the cluster range. For example, if the distribution of feature vectors obtained from the features of the operation information used for learning follows a normal distribution of N (m i , σ i 2 ) with a standard deviation of σ i in cluster ω i , FIG. As illustrated, in the normal distribution of N (m i , σ i 2 ), a probability point with a preset rejection rate α can be set as the cluster threshold value in the cluster ω i .

Further, for example, among the feature vectors used for learning belonging to the cluster ω i , the feature vector having the longest statistical distance from the average value m i can be set as the cluster threshold value.

Next, an example of a method for obtaining the degree of abnormality in the degree of abnormality analysis means 50 will be described with reference to FIG. In FIG. 4, .omega.i the i-th cluster, if an average vector of the cluster omega i and m i, statistical distance between the mean vector m i and the feature vector x is obtained. It can be determined that the value of the statistical distance closest to the average vector m i is the degree of abnormality belonging to the cluster ω i .

  Next, a method for determining whether there is an abnormality in the abnormality determination means 60 will be described with reference to FIGS.

  First, an abnormality determination method using the operation information of the learned operation configuration device will be described with reference to FIGS. 6 and 8. FIG. 6 is a diagram illustrating an outline of the transition of feature vectors of operation information in a learned operation configuration device. FIG. 8 is an explanatory diagram illustrating an example of a change in the normalized degree of abnormality in a device having a learned operation configuration.

  FIG. 6 illustrates a two-dimensional feature vector for ease of explanation. The feature vector transitions between the clusters as illustrated in FIG. In a normal operating state, the distribution of feature vectors of operating information is distributed within each cluster range. However, when the feature vector transitions outside each cluster range, it can be determined that the operation information indicates an abnormal state using the cluster range as a cluster threshold value.

  Since each cluster threshold value in FIG. 6 is a different value, it is possible to normalize the abnormality level so that the threshold value becomes 1 with the cluster threshold value of the cluster to which the abnormality level belongs, and obtain the normalized abnormality level. it can. Therefore, in FIG. 8, when the normalized abnormality degree exceeds the threshold value 1, it can be determined as an abnormality.

  Next, an abnormality determination method using operation information of an unlearned operation configuration device will be described with reference to FIGS. 7 and 9 to 12. FIG. 7 is a diagram illustrating an outline of the transition of the feature vector of the operation information in an unlearned operation configuration device. FIG. 9 is an explanatory diagram illustrating an overview of changes in the number of operating devices having an unlearned operating configuration. FIG. 10 is an explanatory diagram illustrating an overview of changes in the learning amount that is learned using the operation information of an unlearned operation configuration device. FIG. 11 is an explanatory diagram illustrating threshold setting for determining an abnormality from the operation information of an unlearned operation configuration device. FIG. 12 is an explanatory diagram illustrating another threshold setting for determining an abnormality from the operation information of an unlearned operation configuration device. FIG. 7 illustrates a two-dimensional feature vector for ease of explanation.

  When learning is performed using unlearned operation information, a new cluster can be expected to be added before learning. For example, the feature vector obtained from the unlearned operation information may possibly move outside the circle of each cluster in FIG. In an unlearned stage with respect to the operation information in a specific configuration, both abnormalities may belong to a new cluster.

  The relationship between the number of operating devices having an unlearned operating configuration and time can be illustrated in FIG. The learning amount of the operation information of the device having the unlearned operation configuration increases as the operation number is integrated as illustrated in FIG. Therefore, the threshold value setting method for each operational configuration is, for example, when the learning amount is small as illustrated in FIG. 11, the maximum threshold value set in advance is set, and the learning amount is equal to or greater than a predetermined value. The threshold can be 1. Further, as shown in FIG. 12, another threshold value setting method for each operation configuration is obtained as a function using the threshold value as an argument of the learning amount, and the learning amount is not less than a predetermined value. The threshold can be 1.

  Next, the configuration of an information processing system according to Example 2, which is another embodiment of the present invention, will be described with reference to FIG. FIG. 13 is a block diagram illustrating a configuration of an information processing system according to an embodiment of this invention.

  The information processing system includes a subscriber line base station 100 connected to both the Internet 120, a subscriber line network 110 such as NGN (Next Generation Network), and a plurality of subscribers connected to the subscriber line network. It has a configuration with a home.

  The subscriber line base station 100 includes a program distribution device 101, a gateway device 102, and an abnormality monitoring device 103.

  The subscriber home 130 has a configuration including an HGW 131, a sensor device 134, and an information home appliance 135.

  The program distribution apparatus 101 and the abnormality monitoring apparatus 103 are connected to the subscriber line network 110 and can send and receive information according to a predetermined procedure.

  The gateway device 102 is connected to both the Internet 120 and the subscriber line network 110, and is a gateway for transmitting and receiving information between a device connected to the Internet 120 and a device connected to the subscriber line network 110. As a function.

  The HGW 131 is connected to both the subscriber line network 110 and the home network 132, and is a gateway for transmitting and receiving information between a device connected to the subscriber line network 110 and a device connected to the home network 132. As a function.

  The sensor device 134 and the information home appliance 135 are connected to a home network 132 configured by wire or wireless, and can transmit and receive information according to a predetermined procedure.

  Next, an example of the hardware configuration of the abnormality monitoring apparatus 103 and the HGW 131 in the present embodiment will be described with reference to FIGS. FIG. 14 is a block diagram illustrating the configuration of the abnormality monitoring apparatus 103 according to the embodiment of this invention.

  In FIG. 14, the abnormality monitoring device 103 has a configuration including a CPU 300, a communication IF 301, a non-volatile storage device 302, a main memory 303, and a non-volatile memory 304, which are each connected to a bus 305 and have a predetermined procedure. The information can be transmitted and received according to

  The non-volatile memory 304 stores a boot program, and the non-volatile storage device 302 stores various programs. When the abnormality monitoring device 103 is activated, various programs are read from the nonvolatile storage device 302 to the main memory 303 by a boot program stored in the nonvolatile memory 304. The CPU 300 can process information by executing various programs read to the main memory 303, and can perform transmission / reception of information by the communication IF 301 or the like.

  As described above, the nonvolatile storage device 302 stores various programs for the CPU 300 to read into the main memory 303 and execute them. For example, an HDD (Hard Disk Drive), an SDD (Solid State Drive), an optical disk ( (Optical Disk Drive) and the like. The communication IF 301 can be realized by a network card or the like. The communication IF 301 is connected to the subscriber line network 110, and can transmit and receive information to and from devices connected to the subscriber line network 110.

  FIG. 15 is a block diagram illustrating the configuration of the HGW 131 which is an embodiment of the present invention. The HGW 131 includes a CPU 310, a main memory 313, a nonvolatile memory 314, a sensor device 315, an abnormality countermeasure device 316, a first communication IF 311, and a second communication IF 312, which are connected to a bus 317. Information can be transmitted and received according to a predetermined procedure.

  The nonvolatile memory 314 stores a boot program and various programs. When the HGW 131 is activated, the various programs are read from the nonvolatile memory 314 to the main memory 313 by the boot program stored in the nonvolatile memory 314. The CPU 310 processes information by executing various programs read to the main memory 313, and performs transmission / reception of information through the first communication IF 311, the second communication IF 312 and the like.

  The sensor device 315 can acquire various state fluctuations that occur as the CPU 310 executes various programs. The abnormality countermeasure device 316 takes measures to recover from an abnormality or prevent an abnormality from occurring when an abnormality is detected or a sign of an abnormality is detected.

  The first communication IF 311 can be realized by a network card or the like. The second communication IF 312 can be realized by a network card or the like. The first communication IF 311 is connected to the subscriber line network 110 and can transmit and receive information to and from devices connected to the subscriber line network 110. The second communication IF 312 is connected to the home network 132 and can send and receive information to and from devices connected to the home network 132.

  Naturally, the configurations of the abnormality monitoring apparatus 103 and the HGW 131 described above are not limited to the configurations illustrated in FIGS. 14 and 15. For example, in the HGW 131, the sensor device 315 and the abnormality countermeasure device 316 are all realized by a software program, and when executed by the CPU 310, the sensor device 315 and the abnormality countermeasure device 316 are not included. In this case, the software program is stored in the nonvolatile memory 314, read onto the main memory 313, and executed by the CPU 310.

  Although the hardware configuration of the program distribution apparatus 101 is not illustrated, the program distribution apparatus 101 includes at least one computer (including a CPU, a main memory, a nonvolatile storage device, an input device, an output device, a communication IF, and the like). Various programs that are read from the non-volatile storage device to the main memory and executed by the CPU are, for example, various programs stored in the non-volatile storage device distributed by request from the HGW 131 using the OSGi framework technology, for example. The program to be implemented is implemented. In addition, a program executed by the HGW 131 is stored in the nonvolatile storage device.

  Although the hardware configuration of the gateway device 102 is not illustrated, the gateway device 102 includes at least one computer (including a CPU, a main memory, a nonvolatile storage device, an input device, an output device, a communication IF, and the like). Various programs that are read from the non-volatile storage device to the main memory and executed by the CPU are implemented with, for example, programs that mediate transmission / reception of data according to various Internet protocols between the HGW 131 and the Internet 120.

  Although the hardware configuration of the information home appliance 135 is not illustrated, it is configured from at least one computer (including a CPU, a main memory, a nonvolatile storage device, an input device, an output device, a communication IF, and the like). Various programs that are read from the non-volatile storage device to the main memory and executed by the CPU are connected to various server devices on the Internet 120 via the HGW 131 via the local network 132, for example, and are provided by the server devices. A program that implements the service is implemented.

  Although the hardware configuration of the sensor device 134 is not illustrated, the sensor device 134 includes at least one computer (including a CPU, a main memory, a nonvolatile memory, an input device, an output device, a communication IF, and the like). Various programs read from the nonvolatile memory to the main memory and executed by the CPU are connected to various server devices on the Internet 120 via the HGW 131 via the local network 132, for example, and various services provided by the server device A program for acquiring and transmitting various information such as temperature information and position information necessary for realizing the above is installed.

  Next, the operation of each part of the abnormality monitoring apparatus 103 and the HGW 131 according to an embodiment of the present invention will be described with reference to the flowcharts of FIGS. 16, 17 to 22, and the sequence diagram of FIG. FIG. 3 illustrates the main functions of FIG. 16 and is referred to with parentheses as necessary.

  FIGS. 17 to 22 are flowcharts of main processes described in FIG. 16, and FIG. 17 is a flowchart illustrating the operation of the learning process (610). FIG. 18 is a flowchart illustrating the operation of the operation configuration registration process (630). FIG. 19 is a flowchart illustrating the operation of the threshold learning process (640). FIG. 20 is a flowchart illustrating the operation of the threshold update process (step S230). FIG. 21 is a flowchart illustrating the operation of another threshold value update process (step S230). FIG. 22 is a flowchart illustrating the operation of the abnormality determination process (620). FIG. 31 is a sequence diagram illustrating the operation flow of FIG.

  23 to 30 illustrate an example of main information stored in each unit in FIG. FIG. 23 is a diagram illustrating hardware configuration information 183. FIG. 24 is a diagram illustrating the application information 182. As illustrated in FIG. FIG. 25 is a diagram illustrating customer information 181. FIG. 26 is a diagram illustrating an example of the operation configuration data 85. FIG. 27 is a diagram illustrating threshold data 70. FIG. 28 is a diagram illustrating the learning data 40. FIG. 29 is a diagram illustrating device operation information 155. FIG. 30 is a diagram illustrating the operation information data 20.

  In the HGW 131, the operation application information 140 stores an application registered in advance, the execution unit 145 performs an application execution process (600) for executing the application registered in the operation application information 140, and the operation information collection unit 150 Collecting dynamic operation information such as changes in the state associated with the execution of the application executed by the execution unit 145, for example, the amount of change in physical memory usage, the number of execution threads, and the frequency of interrupts, and stores them in the device operation information 155 Then, operation information collection processing (605) is performed.

  The device configuration management unit 160 collects a static operation configuration such as an application registered in the user information, hardware configuration, software configuration, and operation application information 140 of the device and stores it in the device configuration information 165. Processing (607) is performed.

  The abnormality countermeasure execution means 170 receives the abnormality information 175 when an abnormality is detected during execution of the application, and performs abnormality countermeasure processing (650, for example, restarting the device or stopping execution of the application that is expected to be an abnormality source). ).

  In the abnormality monitoring apparatus 103, the operation information acquisition unit 15 receives the device operation information 155 from the HGW 131 as necessary and accumulates it in the operation information data 20.

  In the configuration information registration process (630), the learning unit 30 waits until the operation information accumulated in the operation information data 20 reaches an amount necessary for statistical learning (step S100), and the operation information becomes a sufficient amount. When a statistical method such as data mining is used, the feature vectors of the accumulated operation information data 20 are classified into a plurality of clusters, and new statistical data including the average, standard deviation, and cluster threshold value for each cluster is obtained. Create (step S110), and update the learning data 40 with new statistical data (step S120).

  The operation configuration acquisition unit 16 acquires the device operation configuration 165 of the HGW 131 as necessary.

  In the operation configuration registration process (630), the management unit 180 collates the device operation configuration 165 with the previously registered customer information 181, application information 182, and hardware configuration information 183, and unregistered devices in the operation configuration data 85. The system waits until the operating configuration 165 is acquired (step S130), and additionally registers the unregistered device operating configuration 165 in the operating configuration data 85 (step S140).

  In the threshold value learning process (640), the threshold value learning means 90 waits until learning by the learning means 30 is executed (step S200), and when learning by the learning means 30 is executed, The learning amount of the operation information is measured for each operation configuration (step S220), and threshold value update processing corresponding to the learning amount is executed (step S230). Steps S220 and S230 are repeated until all threshold values that require updating of threshold data 70 are updated (step S210).

An example of the threshold update process (step S230) will be described with reference to the flowchart of FIG. The threshold update process (step S230) is compared with, for example, a preset learning amount (step S300). When the learning amount is larger than the set value, the threshold is set to 1 as a reference value, and the learning amount is small. The time is set to a threshold value Th max larger than a preset value 1, and the threshold value data 70 is updated with the set threshold value (step 330). With this operation, the threshold value is set as illustrated in FIG.

  An example of another threshold value update process (step S230) will be described with reference to the flowchart of FIG. In the threshold value update process (step S230), for example, a temporary threshold value is calculated by a function expression for obtaining a threshold value using the learning amount as an argument (step S340), and the temporary threshold value is equal to or less than the reference value 1 In this case, the threshold value is set to the reference value 1 (S360). In other cases, the temporary threshold value is set as the threshold value (step S370). 70 is updated (step S380). With this operation, the threshold value is set as illustrated in FIG.

  An example of the abnormality determination process (620) will be described with reference to the flowchart of FIG. In the abnormality determination process (620), the abnormality level analysis unit 50 stands by until the operation information acquisition unit 15 acquires the device operation information (step S400), and sequentially acquires operation information one by one from the device operation information (step S400). S405) If there is no operation information to be acquired, the process returns to step S400. A feature vector is extracted from the acquired operation information, a statistical distance between the feature vector and an average of a plurality of clusters registered in the learning data 40 is obtained as an abnormality level (step S410), and an abnormality level at which the abnormality level is minimized, A cluster to which the degree of abnormality belongs is obtained (step S420).

  The abnormality determination means 60 normalizes the abnormality degree from the cluster threshold value of the cluster to which the abnormality degree registered in the learning data 40 belongs to obtain the normalized abnormality degree (step S430). A threshold value in the same operating configuration as the HGW 131 that acquired the operation information for which the degree of abnormality is obtained is acquired (step S440), and the normalized abnormality level is compared with the threshold value (step 450). If the normalized abnormality degree is larger than the threshold value, it is determined that there is an abnormality (step S460).

  In one embodiment, the abnormality countermeasure management unit 65 responds to the type of abnormality to the HGW 131 that has transmitted the device operation information 155 including the operation information determined to be abnormal when the abnormality determination unit 60 determines that there is an abnormality. The countermeasure method 175 is transmitted.

  FIG. 16 exemplifies a configuration including one HGW 131 and one abnormality monitoring apparatus 103, but the number of HGWs is not limited. In addition, the abnormality monitoring apparatus 103 performs a learning process (600), an operation configuration registration process (630), a threshold learning process (640), and an abnormality determination process (620) according to an embodiment of the present invention. It does not restrict the execution in a distributed manner.

  16 illustrates that the abnormality determination process (620) is performed by the abnormality monitoring apparatus 103. For example, as illustrated in a sequence diagram illustrating the flow of operations in another configuration of FIG. By transmitting the threshold data 70 to the HGW 131, the abnormality determination process (620) may be executed by the HGW 131. The present invention does not limit the arrangement of each process.

  Further, in the embodiment, the method of creating the learning data 40 in the learning unit 30 is exemplified, but the method of creating the learning data 40 is not limited.

  In the embodiment, the learning data 40 is exemplified by a method of updating the learning data with new learning data by the learning means 30, but the learning means 30 can also learn by referring to the learning data 40. In addition, it is of course possible to previously create learning data and update the learning data 40 in advance by another means for a new operating configuration.

  Further, in the embodiment, the method for obtaining the degree of abnormality in the abnormality degree analyzing means 50 is exemplified, but the method for obtaining the degree of abnormality is not limited.

  In the embodiment, the threshold learning means 90 has exemplified the method for obtaining the threshold value, but the method for obtaining the threshold value is not limited.

  Further, in the embodiment, the method for obtaining the cluster range is illustrated, but the method for obtaining the cluster range is not limited.

  Further, in the embodiment, the method of normalizing the degree of abnormality and comparing it with the threshold is exemplified, but the present invention is not limited to this, and the cluster threshold of the cluster to which the threshold of the operating configuration and the obtained degree of abnormality belong. Depending on the value, a threshold value of the degree of abnormality may be set.

  In the embodiment, the classification of the operation configuration illustrated in FIG. 26 illustrates the combination of the hardware configuration and the registered application. However, the present invention is not limited to this. For example, the combination of the hardware configuration and the active application is a unit. The threshold value may be set in the operating configuration.

  As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

  As is clear from the above description, since the learning model 30 can learn the learning model using the operation information collected from the device during the operation of the device, the learning model that has previously learned the detection of an abnormality that cannot be detected in the test environment is used. Furthermore, by providing threshold value learning means 90 and setting a threshold value for each operation configuration according to the learning amount for each operation configuration, the abnormality determination means 60 performs abnormality degree analysis means. The threshold for determining whether or not the abnormality level output by 50 is abnormal is set to the learning amount using the operation information acquired from the device having the same operation configuration as the device that outputs the operation information for which the abnormality level is obtained. Therefore, even if the device configuration is changed due to new installation, function addition, deletion, update, etc., it is likely that anomaly judgment using a learned learning model will be erroneously detected. It is possible to reduce the.

15 operation information acquisition means 16 operation configuration acquisition means 20 operation information data 30 learning means 40 learning data 50 abnormality degree analysis means 60 abnormality determination means 70 threshold data 80 management means 85 operation configuration data 90 threshold learning means 610 learning processing 620 Abnormality determination processing 630 Operation configuration registration processing 640 Threshold value learning processing

Claims (9)

  1. An abnormality detection method for monitoring an operating state of a plurality of devices and detecting an abnormality of the device,
    A first learning step of collecting operating information indicating the operating state from the device, learning the operating information, and storing a learning result;
    The threshold for each operating configuration according to the learning amount of the operating information collected from the device corresponding to the operating configuration in the first learning step by collecting the operating configuration consisting of the operating configuration of the device A second learning step of learning a value and storing a threshold corresponding to the operating configuration;
    Analyzing the operation information collected from the device in comparison with the learning result, and analyzing the output as the degree of abnormality;
    Comparing the abnormality level with a threshold value corresponding to the same operating configuration as the device for which the abnormality level was obtained in the analysis step, and determining whether the abnormality level indicates a value of whether or not the abnormality is abnormal An abnormality detection method comprising: a determination step.
  2. In claim 1,
    The abnormality detection method characterized in that the second learning step learns the threshold value in synchronization with the first learning step.
  3. In claim 1,
    The abnormality detection method characterized in that the analysis step outputs an abnormality degree asynchronously with the first learning step and the second learning step.
  4. An information processing system comprising a plurality of devices and an abnormality monitoring device that monitors an operating state of the devices and detects an abnormality of the devices,
    A first learning processing unit that collects operation information indicating the operation state from the device, learns the operation information, and stores a learning result;
    The threshold for each operating configuration according to the learning amount of the operating information collected from the device corresponding to the operating configuration in the first learning step by collecting the operating configuration consisting of the operating configuration of the device A second learning processing unit for learning a value and storing a threshold corresponding to the operation configuration;
    Analyzing the operation information collected from the device in comparison with the learning result, and outputting the content as an abnormality level;
    Comparing the abnormality level with a threshold value corresponding to the same operating configuration as the device for which the abnormality level was obtained in the analysis step, and determining whether the abnormality level indicates a value of whether or not the abnormality is abnormal An information processing system comprising a determination processing unit in the abnormality monitoring device.
  5. In claim 4,
    The information processing system, wherein the second learning processing unit learns the threshold value in synchronization with the first learning processing unit.
  6. In claim 4,
    The information processing system, wherein the analysis processing unit outputs an abnormality degree asynchronously with the first learning processing unit and the second learning processing unit.
  7. An information processing system comprising a plurality of devices and an abnormality monitoring device that monitors an operating state of the devices and detects an abnormality of the devices,
    A first learning processing unit that collects operation information indicating the operation state from the device, learns the operation information, and stores a learning result;
    The threshold for each operating configuration according to the learning amount of the operating information collected from the device corresponding to the operating configuration in the first learning step by collecting the operating configuration consisting of the operating configuration of the device A second learning processing unit that learns a value and stores a threshold value corresponding to the operating configuration;
    Analyzing the operation information collected from the device in comparison with the learning result, and outputting the content as an abnormality level;
    Comparing the abnormality level with a threshold value corresponding to the same operating configuration as the device for which the abnormality level was obtained in the analysis step, and determining whether the abnormality level indicates a value of whether or not the abnormality is abnormal An information processing system comprising a determination processing unit in the device.
  8. In claim 7,
    The information processing system, wherein the second learning processing unit learns the threshold value in synchronization with the first learning processing unit.
  9. In claim 7,
    The information processing system, wherein the analysis processing unit outputs an abnormality degree asynchronously with the first learning processing unit and the second learning processing unit.
JP2010058618A 2010-03-16 2010-03-16 Failure detection method and information processing system using the same Pending JP2011192097A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2010058618A JP2011192097A (en) 2010-03-16 2010-03-16 Failure detection method and information processing system using the same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2010058618A JP2011192097A (en) 2010-03-16 2010-03-16 Failure detection method and information processing system using the same

Publications (1)

Publication Number Publication Date
JP2011192097A true JP2011192097A (en) 2011-09-29

Family

ID=44796916

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2010058618A Pending JP2011192097A (en) 2010-03-16 2010-03-16 Failure detection method and information processing system using the same

Country Status (1)

Country Link
JP (1) JP2011192097A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014050985A1 (en) 2012-09-27 2014-04-03 日東電工株式会社 System for remotely monitoring household appliance
WO2014196129A1 (en) * 2013-06-03 2014-12-11 日本電気株式会社 Fault analysis device, fault analysis method, and recording medium
WO2014208002A1 (en) * 2013-06-25 2014-12-31 日本電気株式会社 System analysis device, system analysis method and system analysis program
JP2015046133A (en) * 2013-08-29 2015-03-12 日本電信電話株式会社 Controller, computation resources management method, and computation resources management program
WO2015072085A1 (en) * 2013-11-12 2015-05-21 日本電気株式会社 Log analysis system, log analysis method, and storage medium
JP2016024790A (en) * 2014-07-24 2016-02-08 富士通フロンテック株式会社 Operation management server, operation program, and server operation method
JP2016517984A (en) * 2013-04-11 2016-06-20 オラクル・インターナショナル・コーポレイション Grasping seasonal trends in Java heap usage, forecasting, anomaly detection, endpoint forecasting
WO2018150550A1 (en) * 2017-02-17 2018-08-23 株式会社日立製作所 Learning data management device and learning data management method
JP2018530803A (en) * 2015-07-14 2018-10-18 サイオス テクノロジー コーポレーションSios Technology Corporation Apparatus and method for utilizing machine learning principles for root cause analysis and repair in a computer environment
US10417111B2 (en) 2016-05-09 2019-09-17 Oracle International Corporation Correlation of stack segment intensity in emergent relationships
US10613960B2 (en) 2017-07-31 2020-04-07 Mitsubishi Electric Corporation Information processing apparatus and information processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005182647A (en) * 2003-12-22 2005-07-07 Nec Corp Abnormality detector for apparatus
JP2005250802A (en) * 2004-03-03 2005-09-15 Toshiba Solutions Corp Device and program for detecting improper access
JP2009053862A (en) * 2007-08-24 2009-03-12 Hitachi Ltd Information processing system, data format conversion method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005182647A (en) * 2003-12-22 2005-07-07 Nec Corp Abnormality detector for apparatus
JP2005250802A (en) * 2004-03-03 2005-09-15 Toshiba Solutions Corp Device and program for detecting improper access
JP2009053862A (en) * 2007-08-24 2009-03-12 Hitachi Ltd Information processing system, data format conversion method, and program

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CSNG200801000001; 中台慎二: 'サポートベクターマシンを用いた事例ベース障害検出' 電子情報通信学会技術研究報告 第108巻、第288号, 20081106, pp.1-6, 社団法人電子情報通信学会 *
JPN6013049851; 中台慎二: 'サポートベクターマシンを用いた事例ベース障害検出' 電子情報通信学会技術研究報告 第108巻、第288号, 20081106, pp.1-6, 社団法人電子情報通信学会 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014050985A1 (en) 2012-09-27 2014-04-03 日東電工株式会社 System for remotely monitoring household appliance
JP2016517984A (en) * 2013-04-11 2016-06-20 オラクル・インターナショナル・コーポレイション Grasping seasonal trends in Java heap usage, forecasting, anomaly detection, endpoint forecasting
US10333798B2 (en) 2013-04-11 2019-06-25 Oracle International Corporation Seasonal trending, forecasting, anomaly detection, and endpoint prediction of thread intensity statistics
US10205640B2 (en) 2013-04-11 2019-02-12 Oracle International Corporation Seasonal trending, forecasting, anomaly detection, and endpoint prediction of java heap usage
WO2014196129A1 (en) * 2013-06-03 2014-12-11 日本電気株式会社 Fault analysis device, fault analysis method, and recording medium
JPWO2014196129A1 (en) * 2013-06-03 2017-02-23 日本電気株式会社 Fault analysis apparatus, fault analysis method, and computer program
US9612898B2 (en) 2013-06-03 2017-04-04 Nec Corporation Fault analysis apparatus, fault analysis method, and recording medium
JPWO2014208002A1 (en) * 2013-06-25 2017-02-23 日本電気株式会社 System analysis apparatus, system analysis method, and system analysis program
US9658916B2 (en) 2013-06-25 2017-05-23 Nec Corporation System analysis device, system analysis method and system analysis program
WO2014208002A1 (en) * 2013-06-25 2014-12-31 日本電気株式会社 System analysis device, system analysis method and system analysis program
JP2015046133A (en) * 2013-08-29 2015-03-12 日本電信電話株式会社 Controller, computation resources management method, and computation resources management program
JPWO2015072085A1 (en) * 2013-11-12 2017-03-16 日本電気株式会社 Log analysis system, log analysis method, and program
WO2015072085A1 (en) * 2013-11-12 2015-05-21 日本電気株式会社 Log analysis system, log analysis method, and storage medium
JP2016024790A (en) * 2014-07-24 2016-02-08 富士通フロンテック株式会社 Operation management server, operation program, and server operation method
JP2018530803A (en) * 2015-07-14 2018-10-18 サイオス テクノロジー コーポレーションSios Technology Corporation Apparatus and method for utilizing machine learning principles for root cause analysis and repair in a computer environment
US10417111B2 (en) 2016-05-09 2019-09-17 Oracle International Corporation Correlation of stack segment intensity in emergent relationships
US10467123B2 (en) 2016-05-09 2019-11-05 Oracle International Corporation Compression techniques for encoding stack trace information
US10534643B2 (en) 2016-05-09 2020-01-14 Oracle International Corporation Correlation of thread intensity and heap usage to identify heap-hoarding stack traces
WO2018150550A1 (en) * 2017-02-17 2018-08-23 株式会社日立製作所 Learning data management device and learning data management method
US10613960B2 (en) 2017-07-31 2020-04-07 Mitsubishi Electric Corporation Information processing apparatus and information processing method

Similar Documents

Publication Publication Date Title
US10033748B1 (en) System and method employing structured intelligence to verify and contain threats at endpoints
US10176321B2 (en) Leveraging behavior-based rules for malware family classification
US20160170818A1 (en) Adaptive fault diagnosis
JP2018142372A (en) System and method for automated memory and thread execution anomaly detection in computer network
EP2956858B1 (en) Periodicity optimization in an automated tracing system
US20180004960A1 (en) Systems and Methods for Security and Risk Assessment and Testing of Applications
Zheng et al. Distributed qos evaluation for real-world web services
US9032254B2 (en) Real time monitoring of computer for determining speed and energy consumption of various processes
US9323599B1 (en) Time series metric data modeling and prediction
US10321342B2 (en) Methods and systems for performance monitoring for mobile applications
US9525706B2 (en) Apparatus and method for diagnosing malicious applications
JP5831558B2 (en) Operation management apparatus, operation management method, and program
US20170161478A1 (en) Active Authentication of Users
US20150074812A1 (en) Detecting Malicious Use of Computer Resources by Tasks Running on a Computer System
US8635498B2 (en) Performance analysis of applications
US8352790B2 (en) Abnormality detection method, device and program
KR100938672B1 (en) The method and apparatus for detecting dll inserted by malicious code
Sharma et al. CloudPD: Problem determination and diagnosis in shared dynamic clouds
DE102016102381A1 (en) Security event detection through virtual machine introspection
US8621284B2 (en) Operations management apparatus, operations management system, data processing method, and operations management program
US10243982B2 (en) Log analyzing device, attack detecting device, attack detection method, and program
US20130124923A1 (en) Device and Method for Detecting and Diagnosing Correlated Network Anomalies
US7783744B2 (en) Facilitating root cause analysis for abnormal behavior of systems in a networked environment
US8260622B2 (en) Compliant-based service level objectives
US8601319B2 (en) Method and apparatus for cause analysis involving configuration changes

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20120830

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20131002

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20131008

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20131206

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20140318