WO2016075915A1

WO2016075915A1 - Log analyzing system, log analyzing method, and program recording medium

Info

Publication number: WO2016075915A1
Application number: PCT/JP2015/005570
Authority: WO
Inventors: 遼介外川
Original assignee: 日本電気株式会社
Priority date: 2014-11-10
Filing date: 2015-11-06
Publication date: 2016-05-19
Also published as: JPWO2016075915A1; JP6665784B2

Abstract

In order to shorten the time required to extract a combination of log messages that appear synchronously, when analyzing log messages which have been output from an information processing system, a log analyzing system is provided with: a reference pattern generating means which, on the basis of log message appearance information, generates a reference pattern for each combination of log messages that appear synchronously; and a reference pattern linking means which compares, between the reference patterns, the appearance information of the log messages contained in the reference patterns, and links together pairs of reference patterns on the basis of the comparison results.

Description

Log analysis system, log analysis method, and program recording medium

The present invention relates to a log analysis system, a log analysis method, and a log analysis program for analyzing a log output from an information processing system.

The operation manager of an information processing system such as a computer system monitors the log output by the computer system, checks the normality of the system, and analyzes abnormalities such as failures. It is important to monitor and analyze the log based on the relevance of a plurality of messages included in the log.

As a result of the increase in size and complexity of computer systems, the number of logs output by computer systems has become enormous. For this reason, the operation manager cannot know all of the computer system, and it is difficult to grasp the relationship between messages included in the log.

Based on such a background, the following technologies are disclosed in order to analyze an enormous log output from a computer system.

Patent Document 1 discloses a message analysis system that detects the occurrence of a failure based on messages collected from a plurality of computer systems and analyzes the detected failure. The message analysis system disclosed in Patent Document 1 accumulates time elements of messages generated corresponding to cases, and aggregates cases using received message times and accumulated time elements. The message analysis system aggregates and analyzes a plurality of messages for each case.

Patent Document 2 discloses a log monitoring system that detects a specific event by analyzing log information output from application software executed by a computer based on a predefined condition. The log monitoring system of Patent Document 2 classifies each log information in a preset time zone unit based on the log onset time included in the accumulated log information. The log monitoring system compares the messages included in each log information within the same time zone, and measures the number of log information including the same message as an expression frequency condition. Then, when the number of occurrences of log information per unit time matches the expression number condition, the log monitoring system generates notification condition information candidates that are referred to by the event detection device that performs the notification process of the log information.

Furthermore, the following technology discloses a technology for automatically generating analysis rules for comprehensive analysis of a huge log.

Patent Document 3 discloses an information processing apparatus that supports increasing the filter accuracy of a failure message. The information processing apparatus of Patent Literature 3 extracts only relevant messages from a plurality of messages transmitted from a device at the time of failure, and groups the plurality of extracted messages. The information processing apparatus determines a relationship between messages by paying attention to a co-occurrence relationship between an arbitrary message and a message output before and after the message is transmitted. The information processing apparatus groups messages when the value of the index indicating the strength of the co-occurrence relationship is equal to or greater than a certain value.

Patent Document 4 discloses a notification device that detects an abnormality of a message that occurs in a distributed system composed of a plurality of information processing devices. The notification device of Patent Document 4 records a message that occurred on an arbitrary day of the week and time zone, and the number of occurrences of the message, and groups the messages as a series of messages up to a separately defined maximum length value. To do. The notification device groups a plurality of normal messages transmitted from the analysis target device as a series of related messages.

JP 2006-331026 A JP 2008-41041 A JP 2014-104851 A Japanese Patent No. 4944391

The message analysis system of Patent Document 1 can analyze a message received in real time by matching processing with examples defined in various formats. However, the message analysis system has a problem that undefined cases cannot be analyzed in real time. This is because the message analysis system performs message analysis based on predefined cases.

According to the log monitoring system of Patent Document 2, conditions necessary for updating a known event and detecting a new event can be generated by analyzing log information including the same message in the same time zone. However, the log monitoring system has a problem in that conditions necessary for updating a known event and detecting a new event cannot be generated unless the same message is detected.

The information processing apparatus of Patent Document 3 defines the relationship between messages using the co-occurrence probability of consecutive messages and the score calculated using the co-occurrence probability. Therefore, for example, when the types of messages for which the relationship must be defined increases as the number of logs increases, the combinations of messages to be considered also increase. As a result, the information processing apparatus has a problem that it takes time to obtain an appropriate solution because the amount of calculation increases when the number of message combinations themselves increases.

The notification device of Patent Document 4 defines the number of types of a series of consecutive messages with a maximum length, but does not specifically disclose a standard for defining the maximum length. In addition, the notification device has a problem that even unrelated messages are grouped because all messages appearing in an arbitrary time zone are targeted for grouping.

The present invention provides a log analysis system capable of reducing the time for extracting a combination of log messages output continuously within a predetermined time when analyzing a log message output from an information processing system. With the goal.

The log analysis system according to the present invention includes reference pattern generation means for generating a reference pattern for each combination of log messages that appear synchronously based on appearance information of log messages, and appearance information of log messages included in the reference patterns. Reference pattern combining means for comparing the reference patterns and combining the reference patterns based on the comparison result is provided.

In the log analysis method of the present invention, based on the log message appearance information, a reference pattern is generated for each combination of log messages that appear synchronously, and the log message appearance information included in the reference pattern is generated between the reference patterns. The reference patterns are combined based on the comparison result.

According to the log analysis program of the present invention, a process for generating a reference pattern for each combination of log messages that appear synchronously based on the appearance information of log messages, and the appearance information of the log message included in the reference pattern between the reference patterns Then, the computer executes the process of combining the reference patterns based on the comparison result.

According to the present invention, when analyzing a log message output from an information processing system, it is possible to shorten the time for extracting a combination of messages that appear synchronously.

It is a block diagram which shows the structure of the log analysis system which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of the log file used with the log analysis system which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of the integrated log used with the log analysis system which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of the appearance information used with the log analysis system which concerns on the 1st Embodiment of this invention. It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 1st Embodiment of this invention. It is a flowchart regarding operation | movement of the log analysis system which concerns on the 1st Embodiment of this invention. It is a flowchart regarding the appearance information totalization process of the log analysis system which concerns on the 1st Embodiment of this invention. It is a flowchart regarding the reference | standard pattern production | generation process of the log analysis system which concerns on the 1st Embodiment of this invention. It is a flowchart regarding the reference | standard pattern combination process of the log analysis system which concerns on the 1st Embodiment of this invention. It is a flowchart regarding the reference | standard pattern combination process of the log analysis system which concerns on the 1st Embodiment of this invention. It is a block diagram which shows the structure of the characteristic part of the log analysis system which concerns on the 1st Embodiment of this invention. It is a block diagram which shows the structure of the log analysis system which concerns on the 2nd Embodiment of this invention. It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 2nd Embodiment of this invention. It is a flowchart regarding the order information generation process of the log analysis system which concerns on the 2nd Embodiment of this invention. It is a block diagram which shows the structure of the log analysis system which concerns on the 3rd Embodiment of this invention. It is a figure which shows an example of the log identification information used with the log analysis system which concerns on the 3rd Embodiment of this invention. It is a flowchart regarding the log identification process of the log analysis system which concerns on the 3rd Embodiment of this invention. It is a block diagram which shows the structure of the log analysis system which concerns on the 4th Embodiment of this invention. It is a flowchart regarding the log classification | category process of the log analysis system which concerns on the 4th Embodiment of this invention. It is a block diagram which shows the structure of the log analysis system which concerns on the 5th Embodiment of this invention. It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 5th Embodiment of this invention. It is a flowchart regarding the transition time learning process of the log analysis system which concerns on the 5th Embodiment of this invention. It is a block diagram which shows the hardware constitutions for enabling the log analysis system which concerns on embodiment of this invention.

Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. However, the preferred embodiments described below are technically preferable for carrying out the present invention, but the scope of the invention is not limited to the following.

(First embodiment)
First, a log system 1 according to a first embodiment of the present invention will be described with reference to the drawings.

〔Constitution〕
FIG. 1 is a block diagram showing a configuration of a log analysis system 1 according to the first embodiment of the present invention. In addition, the direction of the arrow shown in all the block diagrams after FIG. 1 shows an example, and does not limit the direction of the signal between blocks.

As shown in FIG. 1, the log analysis system 1 according to the present embodiment includes a log collection unit 11, a log aggregation unit 12, a reference pattern generation unit 13, a reference pattern combination unit 14, and a pattern storage unit 15. Prepare.

[Log collection means]
The log collection unit 11 collects log files of the analysis target system 10. The log collection unit 11 may receive a log file from the analysis target system 10 or may read the log file from a storage unit (not shown). In addition, the log collection unit 11 may accept an input of a log file from the operation manager.

FIG. 2 shows an example of log files (log files 101 to 103) collected by the log analysis system 1. A log file is a set of log messages (also called log records), and is composed of at least one log message as shown in FIG. The log message includes a plurality of log elements such as a log ID (Identifier) that is an identifier for identifying each log message, the time when the log message is output, the message body, the log level, and the like. Note that the log ID is also referred to as a log identifier, and may be simply referred to as ID below.

The log collection unit 11 generates an integrated log in which the log messages stored in all the log files are rearranged in time series based on the collected at least one log file. The log collecting unit 11 transmits the generated integrated log to the log totaling unit 12.

FIG. 3 shows an example (integrated log 104) of the integrated log generated by the log collecting means 11. The unified log is a set of log messages and is composed of at least one log message as shown in FIG. The integrated log is a combination of log messages that originally constituted different log files. The integrated log may be a set of information obtained by combining an identifier for identifying a log file and a line number of the log message in the log file.

The log collection unit 11 may receive, from the operation manager, specification of the range of log messages to be collected, such as specification of the log file itself to be collected and specification of the date and time of the log message recorded in the log file. In addition, the log collection unit 11 reads a file (not shown) in which information necessary for analyzing a log message is defined, and the log analysis system 1 easily analyzes the format of the acquired log file according to the information defined by the file. May be converted to

[Log aggregation means]
The log totaling unit 12 calculates the appearance information of each log message based on the information received from the log collecting unit 11 and a separately defined time width. The time width indicates the range of the appearance time of the log message to be counted by the log counting means 12. The time width may be defined by the user, or may be recorded in advance in a file (not shown).

FIG. 4 is a diagram showing an example of appearance information (appearance information 105). As shown in FIG. 4, the appearance information is composed of a pair of at least one appearance time and the number of appearances corresponding to the log ID of the log message. Note that the appearance information may include the total number of appearances. In the appearance information 105 of FIG. 4, a plurality of appearance times are recorded for each log ID, and the number of appearances corresponding to each of the plurality of appearance times is recorded.

The log totaling means 12 reads the integrated log for each time width, and totals the type and number of IDs included in the corresponding portion of the integrated log within the read time width as the number of appearances. The log totaling means 12 selects one arbitrary time from the time divided by the time width and registers it as the appearance time of the ID. For example, the log totaling unit 12 may register the median value, the minimum value, and the maximum value of the divided times as the appearance time. The log totaling unit 12 transmits the calculated appearance information to the reference pattern generating unit 13.

[Reference pattern generation means]
The reference pattern generation unit 13 compares at least one piece of appearance information received from the log totaling unit 12 and combines the pieces of appearance information having the same ID. Then, the reference pattern generation unit 13 transmits the combined ID combination and its appearance information to the reference pattern combination unit 14. That is, the reference pattern generation unit 13 generates a reference pattern for each combination of log messages that appear synchronously based on the log message appearance information.

The reference pattern generation unit 13 may receive, for example, designation of a determination criterion related to the identity of appearance information from the operation manager. Further, the reference pattern generation unit 13 may read a file (not shown) in which information necessary for determining the identity of appearance information is defined, and compare the appearance information of the input ID based on the file.

[Reference pattern combining means]
The reference pattern combining unit 14 compares the appearance information regarding the ID received from the reference pattern generating unit 13 or a combination of a plurality of IDs. The reference pattern combining unit 14 combines a single ID or a combination of a plurality of IDs that satisfy a separately defined condition. That is, the reference pattern combining unit 14 compares the appearance information of the log message included in the reference pattern between the reference patterns, and combines the reference patterns based on the comparison result. The reference pattern combining unit 14 outputs the set of combined results to the pattern storage unit 15 as a “pattern (combination)”. This set of patterns (combinations) is also called a “pattern set”.

FIG. 5 shows a combination information table 106 in which patterns (combinations) are summarized in a table format. The pattern (combination) includes a single ID or a combination of a plurality of IDs and appearance information corresponding to them. In the combination information table 106 of FIG. 5, the appearance information is composed of the appearance time and the number of appearances.

The pattern storage unit 15 stores the pattern (combination) output from the reference pattern combination unit 14.

The above is the description of the configuration of the log analysis system 1 according to the present embodiment.

[Operation]
Next, the operation of the log analysis system 1 according to this embodiment will be described.

FIG. 6 is a flowchart regarding an outline of the operation of the log analysis system 1 according to the present embodiment. The log analysis system 1 according to the present embodiment performs three processes: an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.

In FIG. 6, the appearance information totaling process in step S1 is a process in which the log collecting unit 11 reads the log file and the log totaling unit 12 totals the appearance information for each ID.

The reference pattern generation process in step S2 is a process in which the reference pattern generation unit 13 combines at least one log message that appears synchronously as a reference pattern based on the appearance information for each ID. Note that “at least one log message appearing synchronously” means “at least one log message output continuously within a certain period of time”.

The reference pattern combining process in step S3 is a process in which the reference pattern combining unit 14 combines a combination of IDs based on the reference pattern set to generate a pattern (combination).

Hereinafter, the operation of the log analysis system 1 according to the first exemplary embodiment will be described in detail by dividing it into three parts, that is, an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.

[Appearance information aggregation process]
First, the appearance information aggregation process will be described. The appearance information totaling process is a process in which the log collecting unit 11 reads a log file and the log totaling unit 12 totals appearance information for each ID. FIG. 7 is a flowchart regarding the appearance information tabulation process.

7, first, the log collection unit 11 reads the log file output from the analysis target system 10 (step S101).

The log collecting unit 11 generates an integrated log by combining all acquired log files (step S102).

The log collection unit 11 rearranges the log messages of the integrated log in chronological order based on the time information of each log message (step S103).

Next, the log totaling means 12 reads the log message of the integrated log based on the defined time width (step S104).

For example, it is assumed that the time of the log message to start reading is “2014/07/01 — 12:00:01” and the defined time width is “1 minute”. At this time, the log totaling unit 12 reads a log message in a section from “2014/07 / 01_12: 00: 01” to “2014/07 / 01_12: 00: 10: 00”.

Next, the log totaling unit 12 totals the number of appearances of the same ID from the set of read log messages, and records a set of time information and the number of appearances as appearance information for each ID (step S105).

For example, in the section from “2014/07/01 — 12:00:01” to “2014/07/01 — 12:00: 10:00”, the log message with ID “1001” is “10 times” and the log message with ID “2034” Appears “3 times”. At this time, the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the number of appearances “10” to the appearance information of the ID “1001”. Similarly, the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the appearance count “3” to the appearance information of the ID “2034”.

Here, the log totaling unit 12 determines whether or not the last log message of the integrated log has been reached (step S106).

When the last log message of the integrated log is reached (Yes in step S106), the log totaling unit 12 outputs appearance information for each ID to the reference pattern generating unit 12 (step S107).

On the other hand, when the last log message of the integrated log has not been reached (No in step S106), the process returns to step S104.

That is, the log totaling unit 12 repeats the processes in steps S104 and S105 until the last log message of the integrated log is reached. The log can be entered by the user so that the reading of the log message can be completed at an arbitrary time, or the time for completing the reading from the definition information (not shown) can be obtained. The counting means 12 may be configured.

This completes the description of the appearance information aggregation process.

[Reference pattern generation processing]
Next, the reference pattern generation process will be described. The reference pattern generation process is a process in which the reference pattern generation unit 13 combines log messages that appear synchronously as reference patterns based on the appearance information for each ID. FIG. 8 is a flowchart regarding reference pattern generation processing. Note that the operation related to log message combination described with reference to FIG. 8 is an example, and any method may be used as long as IDs generated at the same time can be compared and linked.

In FIG. 8, first, the reference pattern generation unit 13 reads the appearance information for each ID output by the log aggregation unit 12 (step S201).

Based on the received appearance information set for each ID (hereinafter referred to as a combined candidate set), the reference pattern generation unit 13 calculates the total number of appearances (hereinafter referred to as the appearance frequency) for each appearance time of each ID (step S202). ).

The reference pattern generation unit 13 rearranges the appearance information constituting the combination candidate set in ascending order of appearance frequency (step S203).

The reference pattern generation unit 13 selects an ID as a comparison source (hereinafter referred to as a comparison source ID) from the combination candidate set (step S204). Here, the reference pattern generation unit 13 selects the ID of the appearance information having the lowest appearance frequency from the combination candidate set as the comparison source ID, and uses the selected comparison source ID as the appearance information of another ID (comparison target ID). Although described as a comparison, the selection may be based on another criterion.

Here, the reference pattern generation unit 13 determines whether or not the appearance frequency of the selected comparison source ID is the maximum among the appearance information constituting the combination candidate set (step S205).

When the appearance frequency of the selected comparison source ID is not the maximum (No in step S205), the reference pattern generation unit 13 verifies whether there is an ID having the same appearance information as the selected comparison source ID (step S205). S206). On the other hand, if the appearance frequency of the selected comparison source ID is maximum (Yes in step S205), the process proceeds to step S209.

If there is an ID having the same appearance information as the selected comparison source ID (hereinafter referred to as a comparison target ID) (Yes in step S206), the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID, Are generated (step S207). On the other hand, if there is no ID having the same appearance information as the comparison source ID in step S206 (No in step S206), the process returns to step S204 to acquire another ID as the comparison source ID.

In this way, the processing from step S204 to step S206 is repeated until there is no comparison target ID having the same appearance information as the selected comparison source ID.

Here, a supplementary explanation will be given regarding step S207.

In step S207, for example, it is assumed that the appearance time of a certain comparison source ID “2048” is as follows.
“2014/07 / 01_9: 00: 01, 2014/07 / 01_10: 00: 01, 2014/07 / 01_11: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_13: 00: 01 2014/07 / 01_14: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_16: 00: 01, 2014/07 / 01_17: 00: 01, 2014/07 / 01_18: 00: 01 "
It is assumed that the number of appearances corresponding to each appearance time of the comparison source ID “2048” is “2, 2, 2, 2, 2, 2, 2, 2, 2, 2”.

At this time, it is assumed that the appearance times of the comparison target ID “2049” are the following 10 types.
“2014/07 / 01_9: 00: 01, 2014/07 / 01_10: 00: 01, 2014/07 / 01_11: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_13: 00: 01 2014/07 / 01_14: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_16: 00: 01, 2014/07 / 01_17: 00: 01, 2014/07 / 01_18: 00: 01 "
The number of appearances corresponding to each appearance time of the comparison target ID “2049” is assumed to be “2, 2, 2, 2, 2, 2, 2, 2, 2, 2”.

In this case, the total number of appearances (appearance frequency) of the comparison source ID “2048” and the comparison target ID “2049” is both “20”, and the appearance time is also the same. Therefore, the comparison source ID “2048” and the comparison target ID “2049” are to be combined.

In addition, in the comparison between the appearance frequency and the appearance time, when the appearance frequency is the same, but only the appearance time is shifted by one unit (for example, 1 minute), the IDs are regarded as having the same appearance information. May be.

For example, it is assumed that the appearance time of a certain comparison source ID “3018” is as follows.
"2014/07 / 01_9: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_18: 00: 01"
The number of appearances corresponding to each appearance time of the comparison source ID “3018” is assumed to be “3, 3, 3, 3”.

Similarly, it is assumed that the appearance time of the comparison target ID “4024” is as follows.
“2014/07/01 — 9:00:01, 2014/07/01 — 12:00:01, 2014/07/01 —12: 01: 01, 2014/07/01 —15: 00: 01, 2014/07/01 —18: 00: 01 , 2014/07 / 01_18: 01: 01 "
The number of appearances corresponding to each appearance time of the comparison target ID “4024” is assumed to be “3, 2, 1, 3, 1, 2”.

At this time, the total value (appearance frequency) of the number of appearances of the comparison source ID “3018” and the comparison target ID “4024” is both “12” times. However, in the comparison target ID “4024”, there is a difference in appearance time “2014/07 / 01_12: 01: 01, 2014/07 / 01_18: 01: 01” which was not in the comparison source ID “3018”. At this time, if the time width is “1 minute”, the difference in appearance time is the appearance time “2014/07 / 01_12: 00: 01, 2014/07 / 01_18: 00: 01 of the comparison source ID“ 3018 ”. It is the time adjacent to. In this case, the difference time belongs to the adjacent time, and the comparison source ID “3018” and the comparison target ID “4024” are to be combined.

It should be noted that the criteria for determining the same range of time difference may be separately defined by the user. Further, a threshold may be set for the appearance frequency and the degree of coincidence of the appearance information, and IDs that satisfy the set threshold condition may be combined.

Returning to the flowchart of FIG. 8 (Yes in Step S206), the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID in Step S207, and then updates the appearance information of the combination candidate set (Step S208). . In updating the appearance information, first, the reference pattern generation unit 13 adds the combination of the generated ID combination and the ID appearance information to the combination candidate set. Second, the reference pattern generation unit 13 deletes the comparison source ID and the comparison target ID from the combination candidate set. When the combination candidate set is updated, the process returns to step S203.

As described above, the processing of step S203 to step S208 is repeated until the combination candidate (appearance information) having the maximum appearance frequency is reached in the combination candidate set.

Finally, when the appearance frequency of the selected comparison source ID is the highest (Yes in step S205), the reference pattern generation unit 13 uses the set obtained by rearranging the appearance information constituting the candidate combination set in ascending order of appearance frequency. The pattern set is output to the reference pattern combining unit 14 (step S209). Note that that the appearance frequency of the selected comparison source ID is maximum means that the combination candidate (appearance information) having the maximum appearance frequency in the combination candidate set has been reached.

The above is the description of the reference pattern generation process.

[Reference pattern combination processing]
Next, the reference pattern combination process will be described. The reference pattern combining process is a process in which the reference pattern combining unit 14 combines a combination of IDs based on a reference pattern set to generate a pattern (combination). 9 and 10 are flowcharts relating to the reference pattern combining process. The reference pattern set is a set of reference patterns, and is a pattern composed of a combination of an ID combination and appearance information of the combination in the same manner as a pattern (combination) set.

9, first, the reference pattern combining unit 14 reads the reference pattern set generated by the reference pattern generation unit 13 in the reference pattern generation process (step S301).

The reference pattern combining unit 14 selects a reference pattern with the lowest appearance frequency from the read reference pattern set (step S302). The reference pattern selected here is called a comparison source pattern. The reference pattern combining unit 14 selects a reference pattern from the reference pattern set in ascending order of appearance frequency.
Here, the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set read in step S301 (step S303).

If there is a comparison source pattern (Yes in step S303), the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S304). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S303), the process proceeds to step S312 in FIG.

Here, the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set read in step S301 (step S305).

When there is a comparison target pattern (Yes in step S305), the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern included in the comparison target pattern set, and the similarity of the appearance information The degree is calculated (step S306). On the other hand, when there is no comparison target pattern (No in step S305), the process proceeds to step S308.

Here, a supplementary explanation will be given for step S306.

For example, it is assumed that the appearance times of the comparison source patterns “5025, 6036” are as follows.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
The number of appearances corresponding to each appearance time of the comparison source pattern “5025, 6036” is “2, 2, 2, 2, 2”.

On the other hand, it is assumed that the appearance times of the comparison target patterns “1001, 3009, 7049” are as follows.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
Then, it is assumed that the number of appearances corresponding to each appearance time of the comparison target pattern “1001, 3009, 7049” is “2, 1, 1, 2, 2”.

That is, the appearance information common to the two reference patterns has the appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7. / 4 — 12:00:01, 2014/7/5 — 12:00:01 ”, and the number of appearances is“ 2, 1, 1, 2, 2 ”.

At this time, the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is calculated to be “8/8”, that is, “1.0” from the ratio of the number of appearances. . However, the ratio of the number of appearances is a ratio between “appearance frequency of common appearance information” and “appearance frequency of appearance information to be compared”, and is calculated by the following formula 1.
(Appearance ratio) = (Appearance frequency of common appearance information) / (Appearance frequency of comparison target appearance information) (1)
In Equation 1, the ratio of the appearance frequency of the common part to the appearance frequency of the comparison target is used as the similarity index, but in addition, the ratio of the appearance frequency of the common part to the appearance frequency of the comparison source is used. May be. In addition, in the calculation of the ratio of the number of appearances, the appearance frequency of the appearance information is used, but the number of appearance times may be used instead.

The reference pattern combining unit 14 selects, as a combination candidate pattern, a comparison target pattern in which the similarity calculated in the process of step S306 satisfies a threshold value defined separately (step S307). Then, the process returns to step S304. The threshold condition may be satisfied when, for example, the above-described similarity exceeds a predetermined threshold or is equal to or higher than a predetermined threshold.

The reference pattern combining unit 14 repeats the processing of steps S304 to S307 until there is no comparison target pattern (No in step S305), and generates a set of combination candidate patterns.

Here, a supplementary explanation will be given of step S307.

For example, assume that the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is “1.0”. At this time, if the predetermined threshold is “0.9”, the similarity is equal to or higher than the threshold, and the comparison target patterns “1001, 3009, 7049” are the combination candidate patterns. Here, only one similarity index is used, but when there are multiple similarity indices, a single value may be applied as a threshold, or a threshold is individually set for each index. You may prepare.

When there is no comparison target pattern (No in step S305), the reference pattern combining unit 14 extracts all appearance information from the set of combination candidate patterns and generates candidate appearance information by combining all the extracted appearance information. (Step S308).

Here, a supplementary explanation will be given regarding step S308. In the following, a case where there are two types of combination candidate patterns “1001, 3009, 7049” and “2004, 4016” will be described as an example.

Assume that the combination candidate patterns “1001, 3009, 7049” have the following five appearance times.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
Assume that the number of appearances corresponding to each appearance time of the combination candidate patterns “1001, 3009, 7049” is “2, 1, 1, 2, 2”.

Furthermore, it is assumed that the appearance time of the combination candidate pattern “2004, 4016” is “2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01”, and the number of appearances is “1, 1”.

At this time, the candidate appearance information has an appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00: 01, 2014/7/5 — 12:00:01 ”and the number of appearances is“ 2, 2, 2, 2, 2 ”.

Then, the reference pattern combination unit 14 compares the appearance information of the comparison source pattern with the candidate appearance information of the combination candidate pattern, and calculates the similarity between the two in the same manner as the process of step S306 (step S309).

Here, the reference pattern combining unit 14 determines whether or not the similarity calculated in step S309 is equal to or greater than a separately defined threshold (step S310). Note that the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern may be determined based on whether the similarity satisfies a predetermined threshold condition.

If the similarity is smaller than the threshold (No in step S310), the reference pattern combining unit 14 returns to the process of step S302 to acquire the next reference pattern as a new comparison source pattern.

On the other hand, when the similarity is equal to or higher than the threshold (Yes in Step S310), the reference pattern combining unit 14 updates the reference pattern (Step S311). In updating the reference pattern, first, the reference pattern combining unit 14 generates a combined pattern obtained by combining the comparison source pattern and the combination candidate pattern, and adds the generated combined pattern to the reference pattern set. Second, the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set. When the reference pattern is updated, the process returns to step S302.

As described above, the reference pattern combining unit 14 repeats the processing corresponding to Step S302 to Step S309 until the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern is equal to or greater than the threshold value.

Next, the case where there is no comparison source pattern in the reference pattern set in step S303 of FIG. 9 (No in step S303) will be described with reference to FIG.

In FIG. 10, the reference pattern combining unit 14 rearranges the patterns of the reference pattern set in ascending order of appearance frequency (step S312).

Next, the reference pattern combining unit 14 acquires reference patterns from the reference pattern set in ascending order of appearance frequency (step S313). The reference pattern selected here corresponds to a reference pattern for comparison (hereinafter referred to as comparison pattern).

Here, the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set (step S314).

If there is a comparison source pattern (Yes in step S314), the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S315). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S314), the process proceeds to step S320.

Here, the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set (step S316).

Here, when there is a comparison target pattern (Yes in step S316), the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern, and compares the appearance information similarity A and similarity. The degree B is calculated (step S317).

The similarity A (first similarity) is a ratio between the appearance frequency (also referred to as the first frequency) of the comparison source pattern (also referred to as the first pattern) and the common appearance frequency. The similarity A is calculated by the following formula 2.
(Similarity A) = (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison source pattern) (2)
The similarity B (second similarity) is a ratio between the appearance frequency (also referred to as the second frequency) of the comparison target pattern (also referred to as the second pattern) that is an appearance candidate and the common appearance frequency. The similarity B is calculated by the following formula 3.
(Similarity B) = (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison target pattern) (3)
The common appearance frequency is the appearance time and the number of appearances between the appearance time and the number of appearances in the appearance information of the comparison source pattern and the appearance time and the number of appearances in the appearance information of the comparison target pattern. Is the sum of the number of occurrences of matching. That is, when the first pattern and the second pattern are compared, the total number of appearances of the patterns with the same appearance information corresponds to the common frequency.

For example, the appearance time of the comparison source pattern is “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01 ”. The number of appearances of the comparison source pattern is “2, 1, 1, 1, 2, 2”. Further, it is assumed that the appearance time of the comparison target pattern is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”. The number of appearances of the comparison target pattern is “2, 2, 2”. At this time, the appearance time common to both is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”. Since the number of appearances corresponding to it is “2, 2, 2”, the total appearance frequency “6” is a common appearance frequency. As a result, since the appearance frequency of the comparison source pattern is “8”, the similarity A is 6/8 based on Expression 2, and the appearance frequency of the comparison target pattern is “6”, so the similarity B is 6 / based on Expression 3. 6 For example, when the predetermined threshold value for the similarity A is 1 and the predetermined threshold value for the similarity B is 0.8, both the similarity A and the similarity B satisfy the predetermined threshold. In addition, in the comparison of appearance information, it may be determined that the appearance information matches even when there is a difference in the appearance time, as in the process of step S207.

On the other hand, when there is no comparison target pattern (No in step S316), the process returns to step S313.

Here, the reference pattern combining unit 14 determines whether or not each of the similarity A and the similarity B calculated in step S317 is equal to or greater than a predetermined threshold defined separately (step S318). In addition, regarding the similarity A and the similarity B, it may be determined whether other predetermined threshold conditions are satisfied.

If each of the similarity A and the similarity B is less than a separately defined threshold value (No in step S318), the reference pattern combining unit 14 uses the step to acquire the next reference pattern as a new comparison source pattern. The process returns to S313.

On the other hand, if each of the similarity A and the similarity B is equal to or greater than a separately defined threshold (Yes in step S318), the reference pattern combining unit 14 updates the reference pattern (step S319). In updating the reference pattern, first, the reference pattern combining unit 14 generates a new reference pattern that combines the combination candidate pattern and the reference pattern of the comparison source, and adds the generated new reference pattern to the reference pattern set. . Note that the appearance information of the new reference pattern is a common element between the combination candidate pattern and the comparison source pattern. Second, the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set. When the reference pattern is updated, the process returns to step S313 to select the next reference pattern as a new comparison source pattern.

Finally, when the reference pattern having the maximum appearance frequency is reached (No in step S314), the reference pattern combining unit 14 leaves the repetition process of steps S313 to S319. Then, the reference pattern combining unit 14 outputs the updated reference pattern set to the pattern storage unit 15 as a pattern set (step S320).

The above is the description of the reference pattern combining process.

Here, the configuration of the characteristic part of the log analysis system according to the present embodiment is shown in FIG. In FIG. 11, the reference pattern generation means 13 generates a reference pattern that combines log messages that appear synchronously based on the appearance information of the log message. The reference pattern combining unit 14 compares the appearance information between the reference patterns and combines at least one reference pattern based on the comparison result. The concept of combining at least one reference pattern includes updating a reference pattern without other reference patterns to be combined as it is.

〔effect〕
According to the log analysis system according to the first embodiment described above, when a log message is analyzed, it is possible to shorten the time for extracting a combination of messages that are continuously output within a predetermined time. This is because the reference pattern generation unit 13 of the log analysis system according to the present embodiment does not individually compare the appearance of log messages output in synchronization with pattern generation, but compares them as a single message. It is.

Further, according to the log analysis system according to the first embodiment, it is possible to group only log messages having a high co-occurrence probability by satisfying a certain threshold condition by defining a threshold value at the time of log message analysis.

Furthermore, according to the log analysis system according to the first embodiment, it is possible to correctly extract, as a pattern, a plurality of messages that appear together within a time width that may be divided under the constraint condition of the number of messages. it can. This is because the log analysis system according to the present embodiment reads the integrated log file according to the time width and calculates the relationship between the individual IDs according to the threshold value.

(Second Embodiment)
Next, a log analysis system 2 according to the second embodiment of the present invention will be described.

〔Constitution〕
FIG. 12 is a block diagram showing a functional configuration of the log analysis system 2 according to the present embodiment. The log analysis system 2 according to the present embodiment has a configuration in which an order learning unit 21 is added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 2 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.

The order learning means 21 refers to the integrated log based on the pattern set output by the reference pattern combining means 14 and extracts the order information 22 for each pattern. Note that the order information 22 analyzes whether the log IDs included in the pattern (combination) appear in the order included in the “pattern (order)” when analyzing the log using the pattern (combination). This information is used when The pattern (order) is also called “order pattern” and is a pattern in which log IDs are arranged in the order of appearance.

The order learning means 21 outputs the generated order information 22 to the pattern storage means 15 and records it. FIG. 12 illustrates a state in which the pattern storage unit 15 stores the order information 22 and the pattern set 150.

The order information 22 includes a pattern (combination) obtained by combining at least one ID, a pattern (order) considering the arrangement order of IDs included in the pattern (combination), and the occurrence probability of each pattern (order). Including. Further, the order information 22 may include a set of patterns (combinations) in another format so that the patterns (combinations) can be managed while maintaining uniqueness with a common ID. The order information 22 may include pattern appearance information.

FIG. 13 is an order information table 220 as an example of the order information 22. The order information table 220 of FIG. 13 indicates that there are patterns (orders) having two kinds of arrangement orders with respect to the pattern (combination) “1001, 2004, 3009, 5025”. One is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 2004, 3009, 5025”, and an occurrence count “90”. The other is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 3009, 2004, 5025”, and the number of occurrences “10”.

Note that the notation shown in the order information table 220 in FIG. 13 is an example, and a pattern (order) may be stored using a general notation method such as a tree diagram as long as the notation method has a similar meaning. Good. Further, instead of the number of occurrences, a ratio of each number of occurrences to the total number of occurrences may be output as an occurrence probability.

The above is the description of the configuration of the log analysis system 2 according to the present embodiment.
[Operation]
Next, an operation in which the log analysis system 2 according to the second embodiment analyzes a log message will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.

[Order information generation processing]
FIG. 14 is a flowchart regarding order information generation processing by the log analysis system 2 of the log analysis system 2 according to the present embodiment.

14, first, the order learning means 21 receives a pattern set from the pattern storage means 15 (step S401). Note that the order learning unit 21 may be configured to directly receive the pattern set from the reference pattern combining unit 14.

Next, the order learning means 21 reads the corresponding part of the integrated log based on the appearance information of each pattern included in the received pattern set (step S402).

The relevant part of the integrated log read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, when the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the order learning unit 21 changes the order from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01 ”is read.

Next, the order learning means 21 reads the order of IDs included in each pattern among the log messages included in the corresponding portion of the read integrated log (step S403).

For example, assume that the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”. At this time, when the order learning means 21 refers to only the IDs included in the pattern “1001, 2004, 3009, 5025” with respect to the read data, the order of IDs “1001, 3009, 2004, 5025” is read.

Next, the order learning means 21 adds 1 to the number of occurrences regarding the order of the read ID, and extracts the order information (step S404).

Here, the order learning means 21 verifies whether or not the order information 22 has been generated for all the patterns included in the received pattern set (step S405).

When the order information 22 is generated for all the patterns included in the received pattern set (Yes in step S405), the order learning means 21 outputs the generated order information 22 to the pattern storage means 15 for recording ( Step S406). On the other hand, if the order information 22 has not been generated for all the patterns included in the received pattern set (No in step S405), the process returns to step S402 to generate the order information 22 for the unprocessed pattern. .

The order learning unit 21 repeats the processes of steps S402 to S405 described above, and generates order information 22 for all patterns included in the pattern set received from the reference pattern combining unit 14.

The above is the description regarding the order information generation processing by the log analysis system 2.

〔effect〕
The log analysis system according to the second embodiment can generate pattern order information based on the result generated by the reference pattern combining unit, and can generate a pattern and its order information with a small amount of calculation. The reason is that the log analysis system according to the present embodiment includes a reference pattern generation unit.

(Third embodiment)
Next, a log analysis system 3 according to the third embodiment of the present invention will be described.

〔Constitution〕
FIG. 15 is a block diagram showing a functional configuration of the log analysis system 3 according to the present embodiment. The log analysis system 3 according to the present embodiment has a configuration in which log identification means 31 and log identification information 32 are added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 3 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.

FIG. 16 is a diagram showing an example of the log identification information 32 (log identification information 320).

The log identification information 32 is a set of a set of a log ID and a record expression corresponding to the log ID. The log ID is also called a log identifier and is an identifier given to the log message. The record representation is a representation of the body of the log message corresponding to the log ID.

In the example of FIG. 16, it means that the log message corresponding to the log ID “1001” includes a character string “mysql started”. In the example of FIG. 16, a character string is shown, but the record expression can be expressed using arbitrary information such as a regular expression or a uniquely defined template as long as it can be compared with the log message. May be.

The log identification unit 31 assigns a log ID to the log message included in the integrated log read from the log collection unit 11 with reference to the record expression recorded in the log identification information 32. Then, the log identification unit 31 outputs an integrated log of the log message to which the log ID is assigned to the log totaling unit 12.

The above is the description of the configuration of the log analysis system 3 according to the present embodiment.

[Operation]
Next, the operation of the log analysis system 3 according to the third embodiment will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.

[Log identification processing]
FIG. 17 is a flowchart regarding log identification processing by the log identification unit 31 of the log analysis system 3 according to the present embodiment.

17, first, the log identification unit 31 reads the integrated log generated by the log collection unit 11 (step S501).

Next, the log identification unit 31 refers to the log identification information 32 and assigns a log ID to the log message included in the read integrated log (step S502).

Here, the log identification unit 31 determines whether or not a log ID has been assigned to all log messages included in the read integrated log (step S503).

When log IDs are assigned to all log messages (Yes in step S503), the log identification unit 31 transmits the integrated log to the log totaling unit 12 (step S504).

On the other hand, when there is a log message to which no log ID is assigned (No in step S503), the process returns to step S502 in order to assign a log ID to a log message to which no log ID is assigned.

Since the subsequent operations are the same as those of the log analysis system 1 according to the first embodiment, the description thereof is omitted.

〔effect〕
The log analysis system according to the third embodiment can generate a pattern (combination) with a small amount of calculation from a plurality of log files to which a common log ID is not assigned based on the log identification information. This is because the log analysis system according to the third embodiment generates a reference pattern by combining log identification means that assigns a log ID to a log message based on log identification information and logs that appear synchronously. This is because it includes reference pattern generation means.

(Fourth embodiment)
Next, a log analysis system 4 according to the fourth embodiment of the present invention will be described.

〔Constitution〕
FIG. 18 is a block diagram illustrating a functional configuration of the log analysis system 4 according to the fourth embodiment. The log analysis system according to the fourth embodiment has a configuration in which log classification means 41 is added to the log analysis system 1 according to the first embodiment. Note that, in the log analysis system 4 according to the present embodiment, the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1) is denoted by the same reference numeral, and description thereof is omitted. To do.

The log classification unit 41 reads the integrated log from the log collecting unit 11 and calculates the feature similarity based on the characteristics of the log message included in the read integrated log. The log classification means 41 groups and classifies a plurality of log messages having a high degree of similarity, and assigns a common log ID (also referred to as a group identifier) to the log messages classified into the same group. Then, the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12.

The above is the description of the configuration of the log analysis system 4 according to the present embodiment.

[Operation]
Next, the operation of the log analysis system 4 according to the fourth embodiment will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.

[Log identification processing]
FIG. 19 is a flowchart regarding log classification processing by the log classification unit 41 of the log analysis system 4 according to the present embodiment.

19, first, the log classification unit 41 reads the integrated log generated by the log collection unit 11 (step S601).

Next, the log classification means 41 calculates feature amounts for all log messages included in the read integrated log, and performs classification based on the similarity (step S602).

It should be noted that in the calculation based on the feature amount and the classification based on the similarity, for example, an algorithm and an index such as a shortest distance method, a longest distance method, a group average method, a Ward method, and a k-Means method can be used.

Next, the log classification means 41 assigns a log ID to each classified group according to the classification result (step S603).

Then, the log classification unit 41 assigns a log ID to all log messages included in the integrated log according to the log ID assigned to each group (step S604).

Finally, the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12 (step S605).

〔effect〕
According to the log analysis system according to the fourth embodiment, a pattern (combination) can be generated with a small amount of calculation even from a plurality of log files to which a common log ID is not assigned. This is because log classification means for assigning log IDs that can be uniquely identified to similar log messages by calculating and classifying feature amounts based on the log messages, and the logs that appear synchronously together as a reference pattern This is for providing a reference pattern generating means for generating.

(Fifth embodiment)
Next, a log analysis system 5 according to the fifth exemplary embodiment of the present invention will be described.

〔Constitution〕
FIG. 20 is a block diagram illustrating a functional configuration of the log analysis system 5 according to the fifth embodiment. The log analysis system 5 according to the fifth embodiment has a configuration in which a transition time learning unit 51 is added to the log analysis system 2 according to the second embodiment. Note that, in the log analysis system 5 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 2 according to the second embodiment (FIG. 12), and the description thereof is omitted. To do. FIG. 20 illustrates how the pattern storage unit 15 stores the transition information 52 and the pattern set 150. Although omitted in FIG. 20, the pattern storage unit 15 stores the order information 22 as in FIG.

The transition time learning means 51 extracts the transition time required for transition between individual log IDs in the pattern based on the order information 22 of each pattern extracted by the order learning means 21.

FIG. 21 is a diagram showing an example of the transition time (transition time table 510) output by the transition time learning means 51. The transition time represents a transition between log IDs in the order information 22 and a time required for the transition.

21, the pattern (order) “1001, 2004, 3009, 5025” includes three types of transitions “1001 → 2004”, “2004 → 3009”, and “3009 → 5025”. Each transition time is “1 second”, “2 seconds”, and “1 second” as shown in parentheses in the transition time table 510 of FIG.

In FIG. 21, individual transitions are divided and expressed as an example. However, only the corresponding transition times may be recorded without recording individual transitions. In FIG. 21, the pattern (combination) and the pattern (order) are recorded so as to include them. However, it may be configured such that each stored separately can be read using a unique identifier.

The above is the description of the configuration of the log analysis system 5 according to the present embodiment.

[Operation]
Next, the operation of the log analysis system 5 according to the fifth embodiment will be described. Note that the order information generation process, the appearance information aggregation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 2 according to the second embodiment, and thus description thereof is omitted.

[Transition time learning process]
FIG. 22 is a flowchart regarding the transition time learning process by the log classification unit 41 of the log analysis system 4 according to the present embodiment.

First, in FIG. 22, the order learning means 21 reads a corresponding portion of the integrated log based on the appearance information of each pattern included in the pattern set (step S <b> 701).

Note that the corresponding portion read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, if the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the integration from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01” Read the log.

Next, the order learning means 21 reads the order of IDs included in the pattern among the log messages included in the read corresponding part (step S702).

For example, assume that the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”. At this time, if only the ID included in the pattern is referred to for the read data, the order is “1001, 3009, 2004, 5025”.

Next, the transition time learning means 51 calculates the transition time between IDs based on the order of the IDs read by the order learning means 21 (step S703).

For example, when the time of the ID “1001” is “2014/7 / 7_09: 01: 00” and the time of the ID “3009” is “2014/7 / 7_09: 01: 00”, the transition “1001 → 3009” The transition time is “11 seconds”.

Here, the transition time learning means 51 determines whether or not the transition time has been calculated for all the appearance times included in the pattern appearance information (step 704).

If all transition times among the appearance times included in the pattern appearance information are calculated (Yes in step S704), the process proceeds to step S705. On the other hand, if the transition time has not been calculated for all the appearance times included in the pattern appearance information (No in step S704), the process returns to step S702.

The transition time learning means 51 repeats the processes of step S702 and step S703 described above for all the appearance times included in the pattern appearance information, and acquires the transition time of each transition.

Next, the transition time learning means 51 totals the obtained transition times for each transition, calculates values such as an average value and a median value, and records them as transition times for each transition (step S705). The transition time learning means 51 may obtain and record values such as an average value, a median value, and a variance as the transition time, or may record only a set of a maximum value and a minimum value. Alternatively, the transition time learning means 51 may be configured to record all transition times as they are.

Here, the transition time learning means 51 determines whether or not the transition time has been calculated for all the patterns included in the pattern set and their transitions (step S706).

When transition times are calculated for all patterns included in the pattern set and their transitions (Yes in step S706), the transition time learning unit 51 uses the pattern storage unit to store information about the generated transition times (transition information 52). 15 (step S707). On the other hand, if the transition time has not been calculated for all patterns included in the pattern set and their transitions (No in step S706), the process returns to step S701.

The transition time learning means 51 repeats the processing from step S701 to step S706 for each pattern, and calculates transition times for all patterns included in the pattern set and their transitions.

The above is the description regarding the transition time learning process by the log classification means 41.

〔effect〕
According to the log analysis system according to the fifth embodiment, the transition time between each element in the pattern is generated based on the result generated by the reference pattern combining unit, and the pattern and the identifier included in the pattern with a small amount of calculation The transition time between can be generated. This is because the log analysis system according to the present embodiment includes a reference pattern generation unit and a transition time learning unit.

(Hardware configuration)
Next, a hardware configuration for enabling the log analysis system according to the embodiment of the present invention will be described using the computer 60 of FIG. 23 as an example.

23, the computer 60 includes a processor 61, a main storage device 62, an auxiliary storage device 63, an input / output interface 64, and a communication interface 67. The processor 61, the main storage device 62, the auxiliary storage device 63, the input / output interface 64, and the communication interface 67 are connected to each other via a bus 68 so as to be able to exchange data. The processor 61, the main storage device 62, the auxiliary storage device 63, and the input / output interface 64 are connected to a network (not shown) through a communication interface 67.

The processor 61 expands the program stored in the auxiliary storage device 63 or the like in the main storage device 62, and executes the expanded program. In the present embodiment, a configuration using a software program installed in the computer 60 may be used. Moreover, it is good also as a structure using the software program stored in the storage etc. which can be accessed via a network.

The main storage device 62 may be a volatile memory such as a DRAM (DRAM: Dynamic Random Access Memory). Further, a non-volatile memory such as MRAM may be configured and added as the main storage device 62 (MRAM: Magnetically Random Access Memory). A program is expanded in the main storage device 62.

The auxiliary storage device 63 is configured by a local disk such as a hard disk or a flash memory. Note that the auxiliary storage device 63 may be an external storage device connected to the computer 60 or a network storage connected via a network.

The input / output interface 64 is a device that connects the computer 60 and peripheral devices based on the connection standard between the computer 60 and peripheral devices. The communication interface 67 is a device that mediates data exchange between a network (not shown) and the processor 61. In FIG. 23, the interface is abbreviated as I / F (I / F: Interface).

In addition, the computer 60 may be provided with input devices such as a keyboard, a mouse, and a touch panel as necessary. These input devices are used to input information and settings. Note that when the touch panel is used as an input device, the display device also serves as the input device. Data exchange between the processor 61 and the input device may be mediated by the input / output interface 64.

Further, the computer 60 may be provided with a display device for displaying information. When the display device is provided, the computer 60 is provided with a display control device (not shown) for controlling the display of the display device. A display device (not shown) may be connected via the input / output interface 64.

Further, the computer 60 is provided with a reader / writer as necessary. The reader / writer is connected to the bus 68, mediates data exchange between the processor 61 and a recording medium (program recording medium) (not shown), reads a data program from the recording medium, and records the processing results of the computer 60 as a recording medium. Write to. The recording medium can be realized by, for example, a semiconductor recording medium such as an SD card (SD: Secure Digital). The recording medium may be realized by a magnetic recording medium such as a flexible disk, or an optical recording medium such as a CD or a DVD (CD: Compact Disc, DVD: Digital Versatile Disc).

The above is an example of the hardware configuration for enabling the log analysis system according to the embodiment of the present invention. Note that the hardware configuration in FIG. 23 is an example of a hardware configuration to enable the log analysis system according to the present embodiment, and does not limit the scope of the present invention. A log analysis program that causes a computer to execute the processing of the log analysis system according to the present embodiment is also included in the scope of the present invention. Furthermore, a program recording medium that records a log analysis program according to an embodiment of the present invention is also included in the scope of the present invention.

Each embodiment described above can be implemented in appropriate combination. The block division shown in each block diagram is a configuration shown for convenience of explanation. The present invention described by taking each embodiment as an example is not limited to the configuration shown in each block diagram in the implementation. Further, in the description of the operation of the log analysis system according to the embodiment of the present invention, a plurality of operations are described in order, but the order of these operations can be changed within a range where there is no problem. Also, these operations are not always executed at different timings. For example, another operation may occur in parallel during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap. Furthermore, in the description of each operation, in order to facilitate the understanding of the invention, it is described that an operation is a trigger for another operation. It does not limit the relationship. Therefore, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.

The log analysis system according to the embodiment of the present invention can be applied to a technology for operating and managing an information processing system, a physical plant, and the like.

The present invention has been described above using the above-described embodiment as an exemplary example. However, the present invention is not limited to the above-described embodiment. That is, the present invention can apply various modes that can be understood by those skilled in the art within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2014-227706 filed on November 10, 2014, the entire disclosure of which is incorporated herein.

1, 2, 3, 4, 5 Log analysis system 11 Log collection means 12 Log aggregation means 13 Reference pattern generation means 14 Reference pattern combination means 15 Pattern storage means 21 Order learning means 22 Order information 31 Log identification means 32 Log identification information 41 Log classification means 51 Transition time learning means 52 Transition information 60 Computer 61 Processor 62 Main storage device 63 Auxiliary storage device 64 Input / output interface 67 Communication interface 68 Bus

Claims

A reference pattern generating means for generating a reference pattern for each combination of the log messages that appear synchronously based on the appearance information of the log message;
A log analysis system comprising: reference pattern combining means for comparing appearance information of the log message included in the reference pattern between the reference patterns and combining the reference patterns based on the comparison result.
Log collection means for acquiring at least one log file of the analysis target system and generating an integrated log in which the log messages included in the acquired log file are collected;
Corresponding to the log identifier assigned to each of the log messages included in the integrated log generated by the log collecting means, the appearance time at which the log message appears in a predetermined time zone, and the appearance time at the appearance time The log analysis system according to claim 1, further comprising: a log totaling unit that counts the number of appearances of log messages as the appearance information.
The reference pattern generation means includes
Generating the reference pattern by combining the log identifiers based on at least one of the appearance time and the number of appearances counted by the log aggregation means;
The reference pattern combining means includes
With respect to the reference pattern combined by the reference pattern generation means, at least one of the appearance times and the number of appearances at least one of the reference patterns satisfying a predetermined threshold condition is selected, and the selected at least one selected The log analysis system according to claim 2, wherein a pattern set obtained by combining the reference patterns is output.
The reference pattern combining means includes
A first pattern that is a comparison source selected from the reference pattern set is compared with a second pattern that is a comparison target included in the reference pattern set, and the first pattern and the second pattern are compared with each other. Regarding the common frequency corresponding to the sum of the number of appearances of the patterns having the same appearance information, the first similarity that is a ratio of the first appearance frequency corresponding to the sum of the number of appearances of the first pattern and the common frequency And calculating a second similarity that is a ratio of the second appearance frequency corresponding to the sum of the number of appearances of the second pattern and the common frequency, and both the first and second similarities are calculated. The log analysis system according to claim 3, wherein the reference patterns satisfying a predetermined threshold condition are combined.
The integrated log is read from the log collecting means, and based on log identification information that is a set of the log identifier assigned to the log message and the body of the log message corresponding to the log identifier The log analysis system according to any one of claims 2 to 4, further comprising log identification means for assigning the log identifier to the log message with reference to a body of the log message included in the log.
The integrated log is read from the log collecting means, the log messages having high similarity in the characteristics of the log messages included in the integrated log are grouped and classified, and the log messages classified into the same group The log analysis system according to any one of claims 2 to 5, further comprising log classification means for assigning a common group identifier.
The order learning means for referring to the integrated log based on the pattern set output by the reference pattern combining means and extracting the order information of the pattern for each pattern included in the pattern set. The log analysis system according to claim 1.
The log analysis system according to claim 7, further comprising transition time learning means for extracting a transition time required for a transition between the individual log identifiers in the pattern based on the order information of the pattern extracted by the order learning means.
Based on the log message appearance information, generate a reference pattern for each combination of log messages that appear synchronously,
A log analysis method for comparing appearance information of the log message included in the reference pattern between the reference patterns and combining the reference patterns based on the comparison result.
A process of generating a reference pattern for each combination of log messages that appear synchronously based on the appearance information of log messages;
A program recording medium for recording a log analysis program that causes a computer to perform appearance processing of combining the reference patterns based on the comparison result by comparing appearance information of the log message included in the reference pattern between the reference patterns .