WO2016075915A1 - Log analyzing system, log analyzing method, and program recording medium - Google Patents

Log analyzing system, log analyzing method, and program recording medium Download PDF

Info

Publication number
WO2016075915A1
WO2016075915A1 PCT/JP2015/005570 JP2015005570W WO2016075915A1 WO 2016075915 A1 WO2016075915 A1 WO 2016075915A1 JP 2015005570 W JP2015005570 W JP 2015005570W WO 2016075915 A1 WO2016075915 A1 WO 2016075915A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
pattern
appearance
reference pattern
information
Prior art date
Application number
PCT/JP2015/005570
Other languages
French (fr)
Japanese (ja)
Inventor
遼介 外川
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2016558878A priority Critical patent/JP6665784B2/en
Publication of WO2016075915A1 publication Critical patent/WO2016075915A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Definitions

  • the present invention relates to a log analysis system, a log analysis method, and a log analysis program for analyzing a log output from an information processing system.
  • the operation manager of an information processing system such as a computer system monitors the log output by the computer system, checks the normality of the system, and analyzes abnormalities such as failures. It is important to monitor and analyze the log based on the relevance of a plurality of messages included in the log.
  • Patent Document 1 discloses a message analysis system that detects the occurrence of a failure based on messages collected from a plurality of computer systems and analyzes the detected failure.
  • the message analysis system disclosed in Patent Document 1 accumulates time elements of messages generated corresponding to cases, and aggregates cases using received message times and accumulated time elements.
  • the message analysis system aggregates and analyzes a plurality of messages for each case.
  • Patent Document 2 discloses a log monitoring system that detects a specific event by analyzing log information output from application software executed by a computer based on a predefined condition.
  • the log monitoring system of Patent Document 2 classifies each log information in a preset time zone unit based on the log onset time included in the accumulated log information.
  • the log monitoring system compares the messages included in each log information within the same time zone, and measures the number of log information including the same message as an expression frequency condition. Then, when the number of occurrences of log information per unit time matches the expression number condition, the log monitoring system generates notification condition information candidates that are referred to by the event detection device that performs the notification process of the log information.
  • the following technology discloses a technology for automatically generating analysis rules for comprehensive analysis of a huge log.
  • Patent Document 3 discloses an information processing apparatus that supports increasing the filter accuracy of a failure message.
  • the information processing apparatus of Patent Literature 3 extracts only relevant messages from a plurality of messages transmitted from a device at the time of failure, and groups the plurality of extracted messages.
  • the information processing apparatus determines a relationship between messages by paying attention to a co-occurrence relationship between an arbitrary message and a message output before and after the message is transmitted.
  • the information processing apparatus groups messages when the value of the index indicating the strength of the co-occurrence relationship is equal to or greater than a certain value.
  • Patent Document 4 discloses a notification device that detects an abnormality of a message that occurs in a distributed system composed of a plurality of information processing devices.
  • the notification device of Patent Document 4 records a message that occurred on an arbitrary day of the week and time zone, and the number of occurrences of the message, and groups the messages as a series of messages up to a separately defined maximum length value. To do.
  • the notification device groups a plurality of normal messages transmitted from the analysis target device as a series of related messages.
  • the message analysis system of Patent Document 1 can analyze a message received in real time by matching processing with examples defined in various formats.
  • the message analysis system has a problem that undefined cases cannot be analyzed in real time. This is because the message analysis system performs message analysis based on predefined cases.
  • conditions necessary for updating a known event and detecting a new event can be generated by analyzing log information including the same message in the same time zone.
  • the log monitoring system has a problem in that conditions necessary for updating a known event and detecting a new event cannot be generated unless the same message is detected.
  • the information processing apparatus of Patent Document 3 defines the relationship between messages using the co-occurrence probability of consecutive messages and the score calculated using the co-occurrence probability. Therefore, for example, when the types of messages for which the relationship must be defined increases as the number of logs increases, the combinations of messages to be considered also increase. As a result, the information processing apparatus has a problem that it takes time to obtain an appropriate solution because the amount of calculation increases when the number of message combinations themselves increases.
  • the notification device of Patent Document 4 defines the number of types of a series of consecutive messages with a maximum length, but does not specifically disclose a standard for defining the maximum length.
  • the notification device has a problem that even unrelated messages are grouped because all messages appearing in an arbitrary time zone are targeted for grouping.
  • the present invention provides a log analysis system capable of reducing the time for extracting a combination of log messages output continuously within a predetermined time when analyzing a log message output from an information processing system. With the goal.
  • the log analysis system includes reference pattern generation means for generating a reference pattern for each combination of log messages that appear synchronously based on appearance information of log messages, and appearance information of log messages included in the reference patterns.
  • Reference pattern combining means for comparing the reference patterns and combining the reference patterns based on the comparison result is provided.
  • a reference pattern is generated for each combination of log messages that appear synchronously, and the log message appearance information included in the reference pattern is generated between the reference patterns.
  • the reference patterns are combined based on the comparison result.
  • a process for generating a reference pattern for each combination of log messages that appear synchronously based on the appearance information of log messages, and the appearance information of the log message included in the reference pattern between the reference patterns executes the process of combining the reference patterns based on the comparison result.
  • FIG. 1 is a block diagram showing a configuration of a log analysis system 1 according to the first embodiment of the present invention.
  • the direction of the arrow shown in all the block diagrams after FIG. 1 shows an example, and does not limit the direction of the signal between blocks.
  • the log analysis system 1 includes a log collection unit 11, a log aggregation unit 12, a reference pattern generation unit 13, a reference pattern combination unit 14, and a pattern storage unit 15. Prepare.
  • the log collection unit 11 collects log files of the analysis target system 10.
  • the log collection unit 11 may receive a log file from the analysis target system 10 or may read the log file from a storage unit (not shown).
  • the log collection unit 11 may accept an input of a log file from the operation manager.
  • FIG. 2 shows an example of log files (log files 101 to 103) collected by the log analysis system 1.
  • a log file is a set of log messages (also called log records), and is composed of at least one log message as shown in FIG.
  • the log message includes a plurality of log elements such as a log ID (Identifier) that is an identifier for identifying each log message, the time when the log message is output, the message body, the log level, and the like.
  • a log ID Identifier
  • the log ID is also referred to as a log identifier, and may be simply referred to as ID below.
  • the log collection unit 11 generates an integrated log in which the log messages stored in all the log files are rearranged in time series based on the collected at least one log file.
  • the log collecting unit 11 transmits the generated integrated log to the log totaling unit 12.
  • FIG. 3 shows an example (integrated log 104) of the integrated log generated by the log collecting means 11.
  • the unified log is a set of log messages and is composed of at least one log message as shown in FIG.
  • the integrated log is a combination of log messages that originally constituted different log files.
  • the integrated log may be a set of information obtained by combining an identifier for identifying a log file and a line number of the log message in the log file.
  • the log collection unit 11 may receive, from the operation manager, specification of the range of log messages to be collected, such as specification of the log file itself to be collected and specification of the date and time of the log message recorded in the log file.
  • the log collection unit 11 reads a file (not shown) in which information necessary for analyzing a log message is defined, and the log analysis system 1 easily analyzes the format of the acquired log file according to the information defined by the file. May be converted to
  • the log totaling unit 12 calculates the appearance information of each log message based on the information received from the log collecting unit 11 and a separately defined time width.
  • the time width indicates the range of the appearance time of the log message to be counted by the log counting means 12.
  • the time width may be defined by the user, or may be recorded in advance in a file (not shown).
  • FIG. 4 is a diagram showing an example of appearance information (appearance information 105).
  • the appearance information is composed of a pair of at least one appearance time and the number of appearances corresponding to the log ID of the log message. Note that the appearance information may include the total number of appearances.
  • the appearance information 105 of FIG. 4 a plurality of appearance times are recorded for each log ID, and the number of appearances corresponding to each of the plurality of appearance times is recorded.
  • the log totaling means 12 reads the integrated log for each time width, and totals the type and number of IDs included in the corresponding portion of the integrated log within the read time width as the number of appearances.
  • the log totaling means 12 selects one arbitrary time from the time divided by the time width and registers it as the appearance time of the ID.
  • the log totaling unit 12 may register the median value, the minimum value, and the maximum value of the divided times as the appearance time.
  • the log totaling unit 12 transmits the calculated appearance information to the reference pattern generating unit 13.
  • the reference pattern generation unit 13 compares at least one piece of appearance information received from the log totaling unit 12 and combines the pieces of appearance information having the same ID. Then, the reference pattern generation unit 13 transmits the combined ID combination and its appearance information to the reference pattern combination unit 14. That is, the reference pattern generation unit 13 generates a reference pattern for each combination of log messages that appear synchronously based on the log message appearance information.
  • the reference pattern generation unit 13 may receive, for example, designation of a determination criterion related to the identity of appearance information from the operation manager. Further, the reference pattern generation unit 13 may read a file (not shown) in which information necessary for determining the identity of appearance information is defined, and compare the appearance information of the input ID based on the file.
  • the reference pattern combining unit 14 compares the appearance information regarding the ID received from the reference pattern generating unit 13 or a combination of a plurality of IDs.
  • the reference pattern combining unit 14 combines a single ID or a combination of a plurality of IDs that satisfy a separately defined condition. That is, the reference pattern combining unit 14 compares the appearance information of the log message included in the reference pattern between the reference patterns, and combines the reference patterns based on the comparison result.
  • the reference pattern combining unit 14 outputs the set of combined results to the pattern storage unit 15 as a “pattern (combination)”. This set of patterns (combinations) is also called a “pattern set”.
  • FIG. 5 shows a combination information table 106 in which patterns (combinations) are summarized in a table format.
  • the pattern (combination) includes a single ID or a combination of a plurality of IDs and appearance information corresponding to them.
  • the appearance information is composed of the appearance time and the number of appearances.
  • the pattern storage unit 15 stores the pattern (combination) output from the reference pattern combination unit 14.
  • FIG. 6 is a flowchart regarding an outline of the operation of the log analysis system 1 according to the present embodiment.
  • the log analysis system 1 according to the present embodiment performs three processes: an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.
  • the appearance information totaling process in step S1 is a process in which the log collecting unit 11 reads the log file and the log totaling unit 12 totals the appearance information for each ID.
  • the reference pattern generation process in step S2 is a process in which the reference pattern generation unit 13 combines at least one log message that appears synchronously as a reference pattern based on the appearance information for each ID. Note that “at least one log message appearing synchronously” means “at least one log message output continuously within a certain period of time”.
  • the reference pattern combining process in step S3 is a process in which the reference pattern combining unit 14 combines a combination of IDs based on the reference pattern set to generate a pattern (combination).
  • the operation of the log analysis system 1 according to the first exemplary embodiment will be described in detail by dividing it into three parts, that is, an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.
  • the appearance information totaling process is a process in which the log collecting unit 11 reads a log file and the log totaling unit 12 totals appearance information for each ID.
  • FIG. 7 is a flowchart regarding the appearance information tabulation process.
  • the log collection unit 11 reads the log file output from the analysis target system 10 (step S101).
  • the log collecting unit 11 generates an integrated log by combining all acquired log files (step S102).
  • the log collection unit 11 rearranges the log messages of the integrated log in chronological order based on the time information of each log message (step S103).
  • the log totaling means 12 reads the log message of the integrated log based on the defined time width (step S104).
  • the log totaling unit 12 reads a log message in a section from “2014/07 / 01_12: 00: 01” to “2014/07 / 01_12: 00: 10: 00”.
  • the log totaling unit 12 totals the number of appearances of the same ID from the set of read log messages, and records a set of time information and the number of appearances as appearance information for each ID (step S105).
  • the log message with ID “1001” is “10 times” and the log message with ID “2034” Appears “3 times”.
  • the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the number of appearances “10” to the appearance information of the ID “1001”.
  • the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the appearance count “3” to the appearance information of the ID “2034”.
  • the log totaling unit 12 determines whether or not the last log message of the integrated log has been reached (step S106).
  • the log totaling unit 12 outputs appearance information for each ID to the reference pattern generating unit 12 (step S107).
  • step S106 when the last log message of the integrated log has not been reached (No in step S106), the process returns to step S104.
  • the log totaling unit 12 repeats the processes in steps S104 and S105 until the last log message of the integrated log is reached.
  • the log can be entered by the user so that the reading of the log message can be completed at an arbitrary time, or the time for completing the reading from the definition information (not shown) can be obtained.
  • the counting means 12 may be configured.
  • the reference pattern generation process is a process in which the reference pattern generation unit 13 combines log messages that appear synchronously as reference patterns based on the appearance information for each ID.
  • FIG. 8 is a flowchart regarding reference pattern generation processing. Note that the operation related to log message combination described with reference to FIG. 8 is an example, and any method may be used as long as IDs generated at the same time can be compared and linked.
  • the reference pattern generation unit 13 reads the appearance information for each ID output by the log aggregation unit 12 (step S201).
  • the reference pattern generation unit 13 calculates the total number of appearances (hereinafter referred to as the appearance frequency) for each appearance time of each ID (step S202). ).
  • the reference pattern generation unit 13 rearranges the appearance information constituting the combination candidate set in ascending order of appearance frequency (step S203).
  • the reference pattern generation unit 13 selects an ID as a comparison source (hereinafter referred to as a comparison source ID) from the combination candidate set (step S204).
  • a comparison source ID an ID as a comparison source
  • the reference pattern generation unit 13 selects the ID of the appearance information having the lowest appearance frequency from the combination candidate set as the comparison source ID, and uses the selected comparison source ID as the appearance information of another ID (comparison target ID).
  • the selection may be based on another criterion.
  • the reference pattern generation unit 13 determines whether or not the appearance frequency of the selected comparison source ID is the maximum among the appearance information constituting the combination candidate set (step S205).
  • step S205 When the appearance frequency of the selected comparison source ID is not the maximum (No in step S205), the reference pattern generation unit 13 verifies whether there is an ID having the same appearance information as the selected comparison source ID (step S205). S206). On the other hand, if the appearance frequency of the selected comparison source ID is maximum (Yes in step S205), the process proceeds to step S209.
  • step S206 If there is an ID having the same appearance information as the selected comparison source ID (hereinafter referred to as a comparison target ID) (Yes in step S206), the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID, Are generated (step S207). On the other hand, if there is no ID having the same appearance information as the comparison source ID in step S206 (No in step S206), the process returns to step S204 to acquire another ID as the comparison source ID.
  • step S204 to step S206 is repeated until there is no comparison target ID having the same appearance information as the selected comparison source ID.
  • step S207 a supplementary explanation will be given regarding step S207.
  • step S207 it is assumed that the appearance time of a certain comparison source ID “2048” is as follows. “2014/07 / 01_9: 00: 01, 2014/07 / 01_10: 00: 01, 2014/07 / 01_11: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_13: 00: 01 2014/07 / 01_14: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_16: 00: 01, 2014/07 / 01_17: 00: 01, 2014/07 / 01_18: 00: 01 " It is assumed that the number of appearances corresponding to each appearance time of the comparison source ID “2048” is “2, 2, 2, 2, 2, 2, 2, 2, 2”.
  • the appearance times of the comparison target ID “2049” are the following 10 types.
  • the number of appearances corresponding to each appearance time of the comparison target ID “2049” is assumed to be “2, 2, 2, 2, 2, 2, 2, 2, 2”.
  • the total number of appearances (appearance frequency) of the comparison source ID “2048” and the comparison target ID “2049” is both “20”, and the appearance time is also the same. Therefore, the comparison source ID “2048” and the comparison target ID “2049” are to be combined.
  • the IDs are regarded as having the same appearance information. May be.
  • the appearance time of a certain comparison source ID “3018” is as follows. "2014/07 / 01_9: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_18: 00: 01”
  • the number of appearances corresponding to each appearance time of the comparison source ID “3018” is assumed to be “3, 3, 3, 3”.
  • the appearance time of the comparison target ID “4024” is as follows. “2014/07/01 — 9:00:01, 2014/07/01 — 12:00:01, 2014/07/01 —12: 01: 01, 2014/07/01 —15: 00: 01, 2014/07/01 —18: 00: 01 , 2014/07 / 01_18: 01: 01 "
  • the number of appearances corresponding to each appearance time of the comparison target ID “4024” is assumed to be “3, 2, 1, 3, 1, 2”.
  • the total value (appearance frequency) of the number of appearances of the comparison source ID “3018” and the comparison target ID “4024” is both “12” times.
  • the comparison target ID “4024” there is a difference in appearance time “2014/07 / 01_12: 01: 01, 2014/07 / 01_18: 01: 01” which was not in the comparison source ID “3018”.
  • the difference in appearance time is the appearance time “2014/07 / 01_12: 00: 01, 2014/07 / 01_18: 00: 01 of the comparison source ID“ 3018 ”. It is the time adjacent to. In this case, the difference time belongs to the adjacent time, and the comparison source ID “3018” and the comparison target ID “4024” are to be combined.
  • a threshold may be set for the appearance frequency and the degree of coincidence of the appearance information, and IDs that satisfy the set threshold condition may be combined.
  • the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID in Step S207, and then updates the appearance information of the combination candidate set (Step S208). .
  • the reference pattern generation unit 13 adds the combination of the generated ID combination and the ID appearance information to the combination candidate set.
  • the reference pattern generation unit 13 deletes the comparison source ID and the comparison target ID from the combination candidate set.
  • step S203 to step S208 is repeated until the combination candidate (appearance information) having the maximum appearance frequency is reached in the combination candidate set.
  • the reference pattern generation unit 13 uses the set obtained by rearranging the appearance information constituting the candidate combination set in ascending order of appearance frequency.
  • the pattern set is output to the reference pattern combining unit 14 (step S209). Note that that the appearance frequency of the selected comparison source ID is maximum means that the combination candidate (appearance information) having the maximum appearance frequency in the combination candidate set has been reached.
  • the reference pattern combination process is a process in which the reference pattern combining unit 14 combines a combination of IDs based on a reference pattern set to generate a pattern (combination).
  • 9 and 10 are flowcharts relating to the reference pattern combining process.
  • the reference pattern set is a set of reference patterns, and is a pattern composed of a combination of an ID combination and appearance information of the combination in the same manner as a pattern (combination) set.
  • the reference pattern combining unit 14 reads the reference pattern set generated by the reference pattern generation unit 13 in the reference pattern generation process (step S301).
  • the reference pattern combining unit 14 selects a reference pattern with the lowest appearance frequency from the read reference pattern set (step S302).
  • the reference pattern selected here is called a comparison source pattern.
  • the reference pattern combining unit 14 selects a reference pattern from the reference pattern set in ascending order of appearance frequency.
  • the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set read in step S301 (step S303).
  • the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S304). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S303), the process proceeds to step S312 in FIG.
  • the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set read in step S301 (step S305).
  • step S305 When there is a comparison target pattern (Yes in step S305), the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern included in the comparison target pattern set, and the similarity of the appearance information The degree is calculated (step S306). On the other hand, when there is no comparison target pattern (No in step S305), the process proceeds to step S308.
  • step S306 a supplementary explanation will be given for step S306.
  • the appearance times of the comparison source patterns “5025, 6036” are as follows. “2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
  • the number of appearances corresponding to each appearance time of the comparison source pattern “5025, 6036” is “2, 2, 2, 2, 2”.
  • the appearance times of the comparison target patterns “1001, 3009, 7049” are as follows. “2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 " Then, it is assumed that the number of appearances corresponding to each appearance time of the comparison target pattern “1001, 3009, 7049” is “2, 1, 1, 2, 2”.
  • the appearance information common to the two reference patterns has the appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7. / 4 — 12:00:01, 2014/7/5 — 12:00:01 ”, and the number of appearances is“ 2, 1, 1, 2, 2 ”.
  • the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is calculated to be “8/8”, that is, “1.0” from the ratio of the number of appearances.
  • the ratio of the number of appearances is a ratio between “appearance frequency of common appearance information” and “appearance frequency of appearance information to be compared”, and is calculated by the following formula 1.
  • (Appearance ratio) (Appearance frequency of common appearance information) / (Appearance frequency of comparison target appearance information) (1)
  • the ratio of the appearance frequency of the common part to the appearance frequency of the comparison target is used as the similarity index, but in addition, the ratio of the appearance frequency of the common part to the appearance frequency of the comparison source is used. May be.
  • the appearance frequency of the appearance information is used, but the number of appearance times may be used instead.
  • the reference pattern combining unit 14 selects, as a combination candidate pattern, a comparison target pattern in which the similarity calculated in the process of step S306 satisfies a threshold value defined separately (step S307). Then, the process returns to step S304.
  • the threshold condition may be satisfied when, for example, the above-described similarity exceeds a predetermined threshold or is equal to or higher than a predetermined threshold.
  • the reference pattern combining unit 14 repeats the processing of steps S304 to S307 until there is no comparison target pattern (No in step S305), and generates a set of combination candidate patterns.
  • step S307 a supplementary explanation will be given of step S307.
  • the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is “1.0”.
  • the predetermined threshold is “0.9”
  • the similarity is equal to or higher than the threshold
  • the comparison target patterns “1001, 3009, 7049” are the combination candidate patterns.
  • a single value may be applied as a threshold, or a threshold is individually set for each index. You may prepare.
  • the reference pattern combining unit 14 extracts all appearance information from the set of combination candidate patterns and generates candidate appearance information by combining all the extracted appearance information. (Step S308).
  • step S308 a supplementary explanation will be given regarding step S308.
  • a case where there are two types of combination candidate patterns “1001, 3009, 7049” and “2004, 4016” will be described as an example.
  • the appearance time of the combination candidate pattern “2004, 4016” is “2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01”, and the number of appearances is “1, 1”.
  • the candidate appearance information has an appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00: 01, 2014/7/5 — 12:00:01 ”and the number of appearances is“ 2, 2, 2, 2, 2 ”.
  • the reference pattern combination unit 14 compares the appearance information of the comparison source pattern with the candidate appearance information of the combination candidate pattern, and calculates the similarity between the two in the same manner as the process of step S306 (step S309).
  • the reference pattern combining unit 14 determines whether or not the similarity calculated in step S309 is equal to or greater than a separately defined threshold (step S310). Note that the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern may be determined based on whether the similarity satisfies a predetermined threshold condition.
  • the reference pattern combining unit 14 returns to the process of step S302 to acquire the next reference pattern as a new comparison source pattern.
  • the reference pattern combining unit 14 updates the reference pattern (Step S311). In updating the reference pattern, first, the reference pattern combining unit 14 generates a combined pattern obtained by combining the comparison source pattern and the combination candidate pattern, and adds the generated combined pattern to the reference pattern set. Second, the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set. When the reference pattern is updated, the process returns to step S302.
  • the reference pattern combining unit 14 repeats the processing corresponding to Step S302 to Step S309 until the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern is equal to or greater than the threshold value.
  • the reference pattern combining unit 14 rearranges the patterns of the reference pattern set in ascending order of appearance frequency (step S312).
  • the reference pattern combining unit 14 acquires reference patterns from the reference pattern set in ascending order of appearance frequency (step S313).
  • the reference pattern selected here corresponds to a reference pattern for comparison (hereinafter referred to as comparison pattern).
  • the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set (step S314).
  • step S314 If there is a comparison source pattern (Yes in step S314), the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S315). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S314), the process proceeds to step S320.
  • a comparison target pattern hereinafter referred to as a comparison target pattern
  • the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set (step S316).
  • the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern, and compares the appearance information similarity A and similarity.
  • the degree B is calculated (step S317).
  • the similarity A is a ratio between the appearance frequency (also referred to as the first frequency) of the comparison source pattern (also referred to as the first pattern) and the common appearance frequency.
  • the similarity A is calculated by the following formula 2.
  • (Similarity A) (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison source pattern) (2)
  • the similarity B (second similarity) is a ratio between the appearance frequency (also referred to as the second frequency) of the comparison target pattern (also referred to as the second pattern) that is an appearance candidate and the common appearance frequency.
  • the similarity B is calculated by the following formula 3.
  • (Similarity B) (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison target pattern) (3)
  • the common appearance frequency is the appearance time and the number of appearances between the appearance time and the number of appearances in the appearance information of the comparison source pattern and the appearance time and the number of appearances in the appearance information of the comparison target pattern. Is the sum of the number of occurrences of matching. That is, when the first pattern and the second pattern are compared, the total number of appearances of the patterns with the same appearance information corresponds to the common frequency.
  • the appearance time of the comparison source pattern is “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01 ”.
  • the number of appearances of the comparison source pattern is “2, 1, 1, 1, 2, 2”.
  • the appearance time of the comparison target pattern is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”.
  • the number of appearances of the comparison target pattern is “2, 2, 2”.
  • the appearance time common to both is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”.
  • the total appearance frequency “6” is a common appearance frequency.
  • the similarity A is 6/8 based on Expression 2
  • the appearance frequency of the comparison target pattern is “6”
  • the similarity B is 6 / based on Expression 3. 6
  • the predetermined threshold value for the similarity A is 1 and the predetermined threshold value for the similarity B is 0.8
  • both the similarity A and the similarity B satisfy the predetermined threshold.
  • step S316 when there is no comparison target pattern (No in step S316), the process returns to step S313.
  • the reference pattern combining unit 14 determines whether or not each of the similarity A and the similarity B calculated in step S317 is equal to or greater than a predetermined threshold defined separately (step S318). In addition, regarding the similarity A and the similarity B, it may be determined whether other predetermined threshold conditions are satisfied.
  • the reference pattern combining unit 14 uses the step to acquire the next reference pattern as a new comparison source pattern. The process returns to S313.
  • the reference pattern combining unit 14 updates the reference pattern (step S319).
  • the reference pattern combining unit 14 generates a new reference pattern that combines the combination candidate pattern and the reference pattern of the comparison source, and adds the generated new reference pattern to the reference pattern set. .
  • the appearance information of the new reference pattern is a common element between the combination candidate pattern and the comparison source pattern.
  • the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set.
  • the process returns to step S313 to select the next reference pattern as a new comparison source pattern.
  • the reference pattern combining unit 14 leaves the repetition process of steps S313 to S319. Then, the reference pattern combining unit 14 outputs the updated reference pattern set to the pattern storage unit 15 as a pattern set (step S320).
  • the reference pattern generation means 13 generates a reference pattern that combines log messages that appear synchronously based on the appearance information of the log message.
  • the reference pattern combining unit 14 compares the appearance information between the reference patterns and combines at least one reference pattern based on the comparison result.
  • the concept of combining at least one reference pattern includes updating a reference pattern without other reference patterns to be combined as it is.
  • log analysis system it is possible to group only log messages having a high co-occurrence probability by satisfying a certain threshold condition by defining a threshold value at the time of log message analysis.
  • the log analysis system it is possible to correctly extract, as a pattern, a plurality of messages that appear together within a time width that may be divided under the constraint condition of the number of messages. it can. This is because the log analysis system according to the present embodiment reads the integrated log file according to the time width and calculates the relationship between the individual IDs according to the threshold value.
  • FIG. 12 is a block diagram showing a functional configuration of the log analysis system 2 according to the present embodiment.
  • the log analysis system 2 according to the present embodiment has a configuration in which an order learning unit 21 is added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 2 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.
  • the order learning means 21 refers to the integrated log based on the pattern set output by the reference pattern combining means 14 and extracts the order information 22 for each pattern.
  • the order information 22 analyzes whether the log IDs included in the pattern (combination) appear in the order included in the “pattern (order)” when analyzing the log using the pattern (combination). This information is used when The pattern (order) is also called “order pattern” and is a pattern in which log IDs are arranged in the order of appearance.
  • the order learning means 21 outputs the generated order information 22 to the pattern storage means 15 and records it.
  • FIG. 12 illustrates a state in which the pattern storage unit 15 stores the order information 22 and the pattern set 150.
  • the order information 22 includes a pattern (combination) obtained by combining at least one ID, a pattern (order) considering the arrangement order of IDs included in the pattern (combination), and the occurrence probability of each pattern (order). Including. Further, the order information 22 may include a set of patterns (combinations) in another format so that the patterns (combinations) can be managed while maintaining uniqueness with a common ID. The order information 22 may include pattern appearance information.
  • FIG. 13 is an order information table 220 as an example of the order information 22.
  • the order information table 220 of FIG. 13 indicates that there are patterns (orders) having two kinds of arrangement orders with respect to the pattern (combination) “1001, 2004, 3009, 5025”.
  • One is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 2004, 3009, 5025”, and an occurrence count “90”.
  • the other is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 3009, 2004, 5025”, and the number of occurrences “10”.
  • a pattern (order) may be stored using a general notation method such as a tree diagram as long as the notation method has a similar meaning. Good. Further, instead of the number of occurrences, a ratio of each number of occurrences to the total number of occurrences may be output as an occurrence probability.
  • FIG. 14 is a flowchart regarding order information generation processing by the log analysis system 2 of the log analysis system 2 according to the present embodiment.
  • the order learning means 21 receives a pattern set from the pattern storage means 15 (step S401).
  • the order learning unit 21 may be configured to directly receive the pattern set from the reference pattern combining unit 14.
  • the order learning means 21 reads the corresponding part of the integrated log based on the appearance information of each pattern included in the received pattern set (step S402).
  • the relevant part of the integrated log read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, when the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the order learning unit 21 changes the order from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01 ”is read.
  • the order learning means 21 reads the order of IDs included in each pattern among the log messages included in the corresponding portion of the read integrated log (step S403).
  • the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”.
  • the order learning means 21 refers to only the IDs included in the pattern “1001, 2004, 3009, 5025” with respect to the read data, the order of IDs “1001, 3009, 2004, 5025” is read.
  • the order learning means 21 adds 1 to the number of occurrences regarding the order of the read ID, and extracts the order information (step S404).
  • the order learning means 21 verifies whether or not the order information 22 has been generated for all the patterns included in the received pattern set (step S405).
  • the order learning means 21 When the order information 22 is generated for all the patterns included in the received pattern set (Yes in step S405), the order learning means 21 outputs the generated order information 22 to the pattern storage means 15 for recording ( Step S406). On the other hand, if the order information 22 has not been generated for all the patterns included in the received pattern set (No in step S405), the process returns to step S402 to generate the order information 22 for the unprocessed pattern. .
  • the order learning unit 21 repeats the processes of steps S402 to S405 described above, and generates order information 22 for all patterns included in the pattern set received from the reference pattern combining unit 14.
  • the log analysis system according to the second embodiment can generate pattern order information based on the result generated by the reference pattern combining unit, and can generate a pattern and its order information with a small amount of calculation.
  • the reason is that the log analysis system according to the present embodiment includes a reference pattern generation unit.
  • FIG. 15 is a block diagram showing a functional configuration of the log analysis system 3 according to the present embodiment.
  • the log analysis system 3 according to the present embodiment has a configuration in which log identification means 31 and log identification information 32 are added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 3 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.
  • FIG. 16 is a diagram showing an example of the log identification information 32 (log identification information 320).
  • the log identification information 32 is a set of a set of a log ID and a record expression corresponding to the log ID.
  • the log ID is also called a log identifier and is an identifier given to the log message.
  • the record representation is a representation of the body of the log message corresponding to the log ID.
  • the log message corresponding to the log ID “1001” includes a character string “mysql started”.
  • a character string is shown, but the record expression can be expressed using arbitrary information such as a regular expression or a uniquely defined template as long as it can be compared with the log message. May be.
  • the log identification unit 31 assigns a log ID to the log message included in the integrated log read from the log collection unit 11 with reference to the record expression recorded in the log identification information 32. Then, the log identification unit 31 outputs an integrated log of the log message to which the log ID is assigned to the log totaling unit 12.
  • FIG. 17 is a flowchart regarding log identification processing by the log identification unit 31 of the log analysis system 3 according to the present embodiment.
  • the log identification unit 31 reads the integrated log generated by the log collection unit 11 (step S501).
  • the log identification unit 31 refers to the log identification information 32 and assigns a log ID to the log message included in the read integrated log (step S502).
  • the log identification unit 31 determines whether or not a log ID has been assigned to all log messages included in the read integrated log (step S503).
  • the log identification unit 31 transmits the integrated log to the log totaling unit 12 (step S504).
  • step S503 when there is a log message to which no log ID is assigned (No in step S503), the process returns to step S502 in order to assign a log ID to a log message to which no log ID is assigned.
  • the log analysis system according to the third embodiment can generate a pattern (combination) with a small amount of calculation from a plurality of log files to which a common log ID is not assigned based on the log identification information. This is because the log analysis system according to the third embodiment generates a reference pattern by combining log identification means that assigns a log ID to a log message based on log identification information and logs that appear synchronously. This is because it includes reference pattern generation means.
  • FIG. 18 is a block diagram illustrating a functional configuration of the log analysis system 4 according to the fourth embodiment.
  • the log analysis system according to the fourth embodiment has a configuration in which log classification means 41 is added to the log analysis system 1 according to the first embodiment.
  • the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1) is denoted by the same reference numeral, and description thereof is omitted. To do.
  • the log classification unit 41 reads the integrated log from the log collecting unit 11 and calculates the feature similarity based on the characteristics of the log message included in the read integrated log.
  • the log classification means 41 groups and classifies a plurality of log messages having a high degree of similarity, and assigns a common log ID (also referred to as a group identifier) to the log messages classified into the same group. Then, the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12.
  • FIG. 19 is a flowchart regarding log classification processing by the log classification unit 41 of the log analysis system 4 according to the present embodiment.
  • the log classification unit 41 reads the integrated log generated by the log collection unit 11 (step S601).
  • the log classification means 41 calculates feature amounts for all log messages included in the read integrated log, and performs classification based on the similarity (step S602).
  • an algorithm and an index such as a shortest distance method, a longest distance method, a group average method, a Ward method, and a k-Means method can be used.
  • the log classification means 41 assigns a log ID to each classified group according to the classification result (step S603).
  • the log classification unit 41 assigns a log ID to all log messages included in the integrated log according to the log ID assigned to each group (step S604).
  • the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12 (step S605).
  • a pattern (combination) can be generated with a small amount of calculation even from a plurality of log files to which a common log ID is not assigned.
  • log classification means for assigning log IDs that can be uniquely identified to similar log messages by calculating and classifying feature amounts based on the log messages, and the logs that appear synchronously together as a reference pattern This is for providing a reference pattern generating means for generating.
  • FIG. 20 is a block diagram illustrating a functional configuration of the log analysis system 5 according to the fifth embodiment.
  • the log analysis system 5 according to the fifth embodiment has a configuration in which a transition time learning unit 51 is added to the log analysis system 2 according to the second embodiment.
  • the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 2 according to the second embodiment (FIG. 12), and the description thereof is omitted.
  • FIG. 20 illustrates how the pattern storage unit 15 stores the transition information 52 and the pattern set 150. Although omitted in FIG. 20, the pattern storage unit 15 stores the order information 22 as in FIG.
  • the transition time learning means 51 extracts the transition time required for transition between individual log IDs in the pattern based on the order information 22 of each pattern extracted by the order learning means 21.
  • FIG. 21 is a diagram showing an example of the transition time (transition time table 510) output by the transition time learning means 51.
  • the transition time represents a transition between log IDs in the order information 22 and a time required for the transition.
  • the pattern (order) “1001, 2004, 3009, 5025” includes three types of transitions “1001 ⁇ 2004”, “2004 ⁇ 3009”, and “3009 ⁇ 5025”.
  • Each transition time is “1 second”, “2 seconds”, and “1 second” as shown in parentheses in the transition time table 510 of FIG.
  • FIG. 22 is a flowchart regarding the transition time learning process by the log classification unit 41 of the log analysis system 4 according to the present embodiment.
  • the order learning means 21 reads a corresponding portion of the integrated log based on the appearance information of each pattern included in the pattern set (step S ⁇ b> 701).
  • the corresponding portion read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, if the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the integration from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01” Read the log.
  • the order learning means 21 reads the order of IDs included in the pattern among the log messages included in the read corresponding part (step S702).
  • the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”.
  • the order is “1001, 3009, 2004, 5025”.
  • the transition time learning means 51 calculates the transition time between IDs based on the order of the IDs read by the order learning means 21 (step S703).
  • the transition “1001 ⁇ 3009” is “11 seconds”.
  • the transition time learning means 51 determines whether or not the transition time has been calculated for all the appearance times included in the pattern appearance information (step 704).
  • step S704 If all transition times among the appearance times included in the pattern appearance information are calculated (Yes in step S704), the process proceeds to step S705. On the other hand, if the transition time has not been calculated for all the appearance times included in the pattern appearance information (No in step S704), the process returns to step S702.
  • the transition time learning means 51 repeats the processes of step S702 and step S703 described above for all the appearance times included in the pattern appearance information, and acquires the transition time of each transition.
  • the transition time learning means 51 totals the obtained transition times for each transition, calculates values such as an average value and a median value, and records them as transition times for each transition (step S705).
  • the transition time learning means 51 may obtain and record values such as an average value, a median value, and a variance as the transition time, or may record only a set of a maximum value and a minimum value. Alternatively, the transition time learning means 51 may be configured to record all transition times as they are.
  • the transition time learning means 51 determines whether or not the transition time has been calculated for all the patterns included in the pattern set and their transitions (step S706).
  • the transition time learning unit 51 uses the pattern storage unit to store information about the generated transition times (transition information 52). 15 (step S707). On the other hand, if the transition time has not been calculated for all patterns included in the pattern set and their transitions (No in step S706), the process returns to step S701.
  • the transition time learning means 51 repeats the processing from step S701 to step S706 for each pattern, and calculates transition times for all patterns included in the pattern set and their transitions.
  • the transition time between each element in the pattern is generated based on the result generated by the reference pattern combining unit, and the pattern and the identifier included in the pattern with a small amount of calculation
  • the transition time between can be generated. This is because the log analysis system according to the present embodiment includes a reference pattern generation unit and a transition time learning unit.
  • the computer 60 includes a processor 61, a main storage device 62, an auxiliary storage device 63, an input / output interface 64, and a communication interface 67.
  • the processor 61, the main storage device 62, the auxiliary storage device 63, the input / output interface 64, and the communication interface 67 are connected to each other via a bus 68 so as to be able to exchange data.
  • the processor 61, the main storage device 62, the auxiliary storage device 63, and the input / output interface 64 are connected to a network (not shown) through a communication interface 67.
  • the processor 61 expands the program stored in the auxiliary storage device 63 or the like in the main storage device 62, and executes the expanded program.
  • a configuration using a software program installed in the computer 60 may be used.
  • the main storage device 62 may be a volatile memory such as a DRAM (DRAM: Dynamic Random Access Memory). Further, a non-volatile memory such as MRAM may be configured and added as the main storage device 62 (MRAM: Magnetically Random Access Memory). A program is expanded in the main storage device 62.
  • DRAM Dynamic Random Access Memory
  • MRAM Magnetically Random Access Memory
  • the auxiliary storage device 63 is configured by a local disk such as a hard disk or a flash memory. Note that the auxiliary storage device 63 may be an external storage device connected to the computer 60 or a network storage connected via a network.
  • the input / output interface 64 is a device that connects the computer 60 and peripheral devices based on the connection standard between the computer 60 and peripheral devices.
  • the communication interface 67 is a device that mediates data exchange between a network (not shown) and the processor 61. In FIG. 23, the interface is abbreviated as I / F (I / F: Interface).
  • the computer 60 may be provided with input devices such as a keyboard, a mouse, and a touch panel as necessary. These input devices are used to input information and settings. Note that when the touch panel is used as an input device, the display device also serves as the input device. Data exchange between the processor 61 and the input device may be mediated by the input / output interface 64.
  • the computer 60 may be provided with a display device for displaying information.
  • the computer 60 is provided with a display control device (not shown) for controlling the display of the display device.
  • a display device (not shown) may be connected via the input / output interface 64.
  • the computer 60 is provided with a reader / writer as necessary.
  • the reader / writer is connected to the bus 68, mediates data exchange between the processor 61 and a recording medium (program recording medium) (not shown), reads a data program from the recording medium, and records the processing results of the computer 60 as a recording medium.
  • the recording medium can be realized by, for example, a semiconductor recording medium such as an SD card (SD: Secure Digital).
  • SD Secure Digital
  • the recording medium may be realized by a magnetic recording medium such as a flexible disk, or an optical recording medium such as a CD or a DVD (CD: Compact Disc, DVD: Digital Versatile Disc).
  • the above is an example of the hardware configuration for enabling the log analysis system according to the embodiment of the present invention.
  • the hardware configuration in FIG. 23 is an example of a hardware configuration to enable the log analysis system according to the present embodiment, and does not limit the scope of the present invention.
  • a log analysis program that causes a computer to execute the processing of the log analysis system according to the present embodiment is also included in the scope of the present invention.
  • a program recording medium that records a log analysis program according to an embodiment of the present invention is also included in the scope of the present invention.
  • each embodiment described above can be implemented in appropriate combination.
  • the block division shown in each block diagram is a configuration shown for convenience of explanation.
  • the present invention described by taking each embodiment as an example is not limited to the configuration shown in each block diagram in the implementation.
  • a plurality of operations are described in order, but the order of these operations can be changed within a range where there is no problem.
  • these operations are not always executed at different timings. For example, another operation may occur in parallel during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap.
  • an operation is a trigger for another operation. It does not limit the relationship. Therefore, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.
  • the log analysis system according to the embodiment of the present invention can be applied to a technology for operating and managing an information processing system, a physical plant, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

In order to shorten the time required to extract a combination of log messages that appear synchronously, when analyzing log messages which have been output from an information processing system, a log analyzing system is provided with: a reference pattern generating means which, on the basis of log message appearance information, generates a reference pattern for each combination of log messages that appear synchronously; and a reference pattern linking means which compares, between the reference patterns, the appearance information of the log messages contained in the reference patterns, and links together pairs of reference patterns on the basis of the comparison results.

Description

ログ分析システム、ログ分析方法およびプログラム記録媒体Log analysis system, log analysis method, and program recording medium
 本発明は、情報処理システムが出力したログを分析するログ分析システム、ログ分析方法およびログ分析プログラムに関する。 The present invention relates to a log analysis system, a log analysis method, and a log analysis program for analyzing a log output from an information processing system.
 コンピュータシステムなどの情報処理システムの運用管理者は、当該コンピュータシステムが出力するログを監視し、当該システムの正常性の確認や障害などの異常の分析を行う。ログの監視および分析は、当該ログに含まれる複数のメッセージの関連性に基づいて行うことが重要である。 The operation manager of an information processing system such as a computer system monitors the log output by the computer system, checks the normality of the system, and analyzes abnormalities such as failures. It is important to monitor and analyze the log based on the relevance of a plurality of messages included in the log.
 コンピュータシステムが大規模化かつ複雑化した結果、コンピュータシステムが出力するログの数量は膨大となっている。そのため、運用管理者がコンピュータシステムの全てを知悉することはできず、ログに含まれるメッセージ間の関連性を把握しきることは難しい。 As a result of the increase in size and complexity of computer systems, the number of logs output by computer systems has become enormous. For this reason, the operation manager cannot know all of the computer system, and it is difficult to grasp the relationship between messages included in the log.
 このような背景を踏まえ、コンピュータシステムが出力する膨大なログを分析するために、以下のような技術が開示されている。 Based on such a background, the following technologies are disclosed in order to analyze an enormous log output from a computer system.
 特許文献1は、複数のコンピュータシステムから収集したメッセージを基に障害の発生を検知し、検知した障害を分析するメッセージ分析システムについて開示している。特許文献1のメッセージ分析システムは、事例に対応して発生するメッセージの時間的要素を蓄積し、受信したメッセージの時刻と蓄積した時間的要素とを用いて事例を集約する。そして、そのメッセージ分析システムは、複数のメッセージを事例毎に集約して分析する。 Patent Document 1 discloses a message analysis system that detects the occurrence of a failure based on messages collected from a plurality of computer systems and analyzes the detected failure. The message analysis system disclosed in Patent Document 1 accumulates time elements of messages generated corresponding to cases, and aggregates cases using received message times and accumulated time elements. The message analysis system aggregates and analyzes a plurality of messages for each case.
 特許文献2は、コンピュータが実行するアプリケーションソフト等から出力されるログ情報を予め定義された条件に基づいて分析することによって、特定の事象を検知するログ監視システムについて開示している。特許文献2のログ監視システムは、蓄積されたログ情報に含まれるログ発現時刻に基づいて、予め設定された時間帯単位で各ログ情報を分類する。そのログ監視システムは、同じ時間帯内の各ログ情報に含まれるメッセージを比較し、同一メッセージを含むログ情報の数を発現回数条件として計測する。そして、そのログ監視システムは、単位時間あたりのログ情報の発現回数が発現回数条件に合致する場合、そのログ情報の通知処理を行う事象検知装置が参照する通知条件情報の候補を生成する。 Patent Document 2 discloses a log monitoring system that detects a specific event by analyzing log information output from application software executed by a computer based on a predefined condition. The log monitoring system of Patent Document 2 classifies each log information in a preset time zone unit based on the log onset time included in the accumulated log information. The log monitoring system compares the messages included in each log information within the same time zone, and measures the number of log information including the same message as an expression frequency condition. Then, when the number of occurrences of log information per unit time matches the expression number condition, the log monitoring system generates notification condition information candidates that are referred to by the event detection device that performs the notification process of the log information.
 さらに、以下の技術には、膨大なログを網羅的に分析するための分析ルールを自動で生成する技術が開示されている。 Furthermore, the following technology discloses a technology for automatically generating analysis rules for comprehensive analysis of a huge log.
 特許文献3は、障害メッセージのフィルタ精度を高めることを支援する情報処理装置について開示している。特許文献3の情報処理装置は、障害時に機器から送信された複数のメッセージから関連するメッセージのみを抽出し、抽出された複数のメッセージをグループ化する。その情報処理装置は、任意のメッセージと、当該メッセージが送信された前後の時間に出力されたメッセージとの共起関係に着目してメッセージ同士の関係を判定する。その情報処理装置は、共起関係の強さを示す指標の値が一定値以上の場合、メッセージ同士をグループ化する。 Patent Document 3 discloses an information processing apparatus that supports increasing the filter accuracy of a failure message. The information processing apparatus of Patent Literature 3 extracts only relevant messages from a plurality of messages transmitted from a device at the time of failure, and groups the plurality of extracted messages. The information processing apparatus determines a relationship between messages by paying attention to a co-occurrence relationship between an arbitrary message and a message output before and after the message is transmitted. The information processing apparatus groups messages when the value of the index indicating the strength of the co-occurrence relationship is equal to or greater than a certain value.
 特許文献4は、複数の情報処理装置からなる分散システムにおいて発生するメッセージの異常を検知する通知装置について開示している。特許文献4の通知装置は、任意の曜日および時間帯において発生していたメッセージと、そのメッセージの発生回数とを記録し、別途定義された最大長の値まで当該メッセージを一連のメッセージとしてグループ化する。その通知装置は、分析対象の機器から送信された正常時の複数のメッセージを関連する一連のメッセージとしてグループ化する。 Patent Document 4 discloses a notification device that detects an abnormality of a message that occurs in a distributed system composed of a plurality of information processing devices. The notification device of Patent Document 4 records a message that occurred on an arbitrary day of the week and time zone, and the number of occurrences of the message, and groups the messages as a series of messages up to a separately defined maximum length value. To do. The notification device groups a plurality of normal messages transmitted from the analysis target device as a series of related messages.
特開2006-331026号公報JP 2006-331026 A 特開2008-41041号公報JP 2008-41041 A 特開2014-106851号公報JP 2014-104851 A 特許第4944391号公報Japanese Patent No. 4944391
 特許文献1のメッセージ分析システムは、様々な形式で定義された事例とのマッチング処理によって、リアルタイムに受信するメッセージを分析することができる。しかし、そのメッセージ分析システムには、定義されていない事例に関してはリアルタイムで分析することができないという問題点があった。なぜならば、そのメッセージ分析システムは、定義済みの事例に基づいてメッセージの分析を行うためである。 The message analysis system of Patent Document 1 can analyze a message received in real time by matching processing with examples defined in various formats. However, the message analysis system has a problem that undefined cases cannot be analyzed in real time. This is because the message analysis system performs message analysis based on predefined cases.
 特許文献2のログ監視システムによれば、同じ時間帯の同一メッセージを含むログ情報を分析することによって、既知事象の更新および新規事象の検知に必要な条件を生成することができる。しかし、そのログ監視システムには、同じメッセージが検出されない限り、既知事象の更新および新規事象の検知に必要な条件を生成することができないという問題点があった。 According to the log monitoring system of Patent Document 2, conditions necessary for updating a known event and detecting a new event can be generated by analyzing log information including the same message in the same time zone. However, the log monitoring system has a problem in that conditions necessary for updating a known event and detecting a new event cannot be generated unless the same message is detected.
 特許文献3の情報処理装置は、連続するメッセージの共起確率と、共起確率を用いて算出されるスコアとを用いてメッセージ間の関係性を定義する。そのため、例えば、関係性を定義しなければならないメッセージの種類がログの数量の増加に伴って増加した場合、考慮すべきメッセージの組み合わせも増加してしまう。その結果、その情報処理装置には、メッセージの組み合わせ自体の数が増加した際に計算量も増加することによって、適切な解を得るのに時間が掛かるという問題点があった。 The information processing apparatus of Patent Document 3 defines the relationship between messages using the co-occurrence probability of consecutive messages and the score calculated using the co-occurrence probability. Therefore, for example, when the types of messages for which the relationship must be defined increases as the number of logs increases, the combinations of messages to be considered also increase. As a result, the information processing apparatus has a problem that it takes time to obtain an appropriate solution because the amount of calculation increases when the number of message combinations themselves increases.
 特許文献4の通知装置は、連続する一連のメッセージの種類数を最大長で定義しているが、最大長を定義する基準を具体的に開示していない。また、その通知装置は、任意の時間帯に出現した全てのメッセージをグループ化の対象とするため、関係の無いメッセージまでグループ化されてしまうという問題点があった。 The notification device of Patent Document 4 defines the number of types of a series of consecutive messages with a maximum length, but does not specifically disclose a standard for defining the maximum length. In addition, the notification device has a problem that even unrelated messages are grouped because all messages appearing in an arbitrary time zone are targeted for grouping.
 本発明は、情報処理システムから出力されたログメッセージを分析する際に、一定時間内に連続して出力されたログメッセージの組み合わせを抽出する時間を短縮することができるログ分析システムを提供することを目的とする。 The present invention provides a log analysis system capable of reducing the time for extracting a combination of log messages output continuously within a predetermined time when analyzing a log message output from an information processing system. With the goal.
 本発明のログ分析システムは、ログメッセージの出現情報に基づいて、同期して出現するログメッセージの組み合わせ毎に基準パターンを生成する基準パターン生成手段と、基準パターンに含まれるログメッセージの出現情報を基準パターン間で比較し、比較した結果に基づいて基準パターン同士を結合する基準パターン結合手段とを備える。 The log analysis system according to the present invention includes reference pattern generation means for generating a reference pattern for each combination of log messages that appear synchronously based on appearance information of log messages, and appearance information of log messages included in the reference patterns. Reference pattern combining means for comparing the reference patterns and combining the reference patterns based on the comparison result is provided.
 本発明のログ分析方法においては、ログメッセージの出現情報に基づいて、同期して出現するログメッセージの組み合わせ毎に基準パターンを生成し、基準パターンに含まれるログメッセージの出現情報を基準パターン間で比較し、比較した結果に基づいて基準パターン同士を結合する。 In the log analysis method of the present invention, based on the log message appearance information, a reference pattern is generated for each combination of log messages that appear synchronously, and the log message appearance information included in the reference pattern is generated between the reference patterns. The reference patterns are combined based on the comparison result.
 本発明のログ分析プログラムは、ログメッセージの出現情報に基づいて、同期して出現するログメッセージの組み合わせ毎に基準パターンを生成する処理と、基準パターンに含まれるログメッセージの出現情報を基準パターン間で比較し、比較した結果に基づいて基準パターン同士を結合する処理とをコンピュータに実行させる。 According to the log analysis program of the present invention, a process for generating a reference pattern for each combination of log messages that appear synchronously based on the appearance information of log messages, and the appearance information of the log message included in the reference pattern between the reference patterns Then, the computer executes the process of combining the reference patterns based on the comparison result.
 本発明によれば、情報処理システムから出力されたログメッセージを分析する際に、同期して出現するメッセージの組み合わせを抽出する時間を短縮することができる。 According to the present invention, when analyzing a log message output from an information processing system, it is possible to shorten the time for extracting a combination of messages that appear synchronously.
本発明の第1の実施形態に係るログ分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムで用いるログファイルの一例を示す図である。It is a figure which shows an example of the log file used with the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムで用いる統合ログの一例を示す図である。It is a figure which shows an example of the integrated log used with the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムで用いる出現情報の一例を示す図である。It is a figure which shows an example of the appearance information used with the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムで用いるパターンの一例を示す図である。It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの動作に関するフローチャートである。It is a flowchart regarding operation | movement of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの出現情報集計処理に関するフローチャートである。It is a flowchart regarding the appearance information totalization process of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの基準パターン生成処理に関するフローチャートである。It is a flowchart regarding the reference | standard pattern production | generation process of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの基準パターン結合処理に関するフローチャートである。It is a flowchart regarding the reference | standard pattern combination process of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの基準パターン結合処理に関するフローチャートである。It is a flowchart regarding the reference | standard pattern combination process of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第1の実施形態に係るログ分析システムの特徴部分の構成を示すブロック図である。It is a block diagram which shows the structure of the characteristic part of the log analysis system which concerns on the 1st Embodiment of this invention. 本発明の第2の実施形態に係るログ分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log analysis system which concerns on the 2nd Embodiment of this invention. 本発明の第2の実施形態に係るログ分析システムで用いるパターンの一例を示す図である。It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 2nd Embodiment of this invention. 本発明の第2の実施形態に係るログ分析システムの順序情報生成処理に関するフローチャートである。It is a flowchart regarding the order information generation process of the log analysis system which concerns on the 2nd Embodiment of this invention. 本発明の第3の実施形態に係るログ分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log analysis system which concerns on the 3rd Embodiment of this invention. 本発明の第3の実施形態に係るログ分析システムで用いるログ識別情報の一例を示す図である。It is a figure which shows an example of the log identification information used with the log analysis system which concerns on the 3rd Embodiment of this invention. 本発明の第3の実施形態に係るログ分析システムのログ識別処理に関するフローチャートである。It is a flowchart regarding the log identification process of the log analysis system which concerns on the 3rd Embodiment of this invention. 本発明の第4の実施形態に係るログ分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log analysis system which concerns on the 4th Embodiment of this invention. 本発明の第4の実施形態に係るログ分析システムのログ分類処理に関するフローチャートである。It is a flowchart regarding the log classification | category process of the log analysis system which concerns on the 4th Embodiment of this invention. 本発明の第5の実施形態に係るログ分析システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log analysis system which concerns on the 5th Embodiment of this invention. 本発明の第5の実施形態に係るログ分析システムで用いるパターンの一例を示す図である。It is a figure which shows an example of the pattern used with the log analysis system which concerns on the 5th Embodiment of this invention. 本発明の第5の実施形態に係るログ分析システムの遷移時間学習処理に関するフローチャートである。It is a flowchart regarding the transition time learning process of the log analysis system which concerns on the 5th Embodiment of this invention. 本発明の実施形態に係るログ分析システムを可能とするためのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions for enabling the log analysis system which concerns on embodiment of this invention.
 以下に、本発明を実施するための形態について図面を用いて説明する。ただし、以下に述べる実施形態には、本発明を実施するために技術的に好ましい限定がされているが、発明の範囲を以下に限定するものではない。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. However, the preferred embodiments described below are technically preferable for carrying out the present invention, but the scope of the invention is not limited to the following.
 (第1の実施形態)
 まず、本発明の第1の実施形態に係るログシステム1について、図面を参照しながら説明する。
(First embodiment)
First, a log system 1 according to a first embodiment of the present invention will be described with reference to the drawings.
 〔構成〕
 図1は、本発明の第1の実施形態に係るログ分析システム1の構成を示すブロック図である。なお、図1以降の全てのブロック図中に示した矢印の向きは、一例を示すものであり、ブロック間の信号の向きを限定するものではない。
〔Constitution〕
FIG. 1 is a block diagram showing a configuration of a log analysis system 1 according to the first embodiment of the present invention. In addition, the direction of the arrow shown in all the block diagrams after FIG. 1 shows an example, and does not limit the direction of the signal between blocks.
 図1に示すように、本実施形態に係るログ分析システム1は、ログ収集手段11と、ログ集計手段12と、基準パターン生成手段13と、基準パターン結合手段14と、パターン記憶手段15とを備える。 As shown in FIG. 1, the log analysis system 1 according to the present embodiment includes a log collection unit 11, a log aggregation unit 12, a reference pattern generation unit 13, a reference pattern combination unit 14, and a pattern storage unit 15. Prepare.
 [ログ収集手段]
 ログ収集手段11は、分析対象システム10のログファイルを収集する。ログ収集手段11は、分析対象システム10からログファイルを受信してもよいし、図示しない記憶部からログファイルを読み出してもよい。また、ログ収集手段11は、運用管理者からログファイルの入力を受け付けてもよい。
[Log collection means]
The log collection unit 11 collects log files of the analysis target system 10. The log collection unit 11 may receive a log file from the analysis target system 10 or may read the log file from a storage unit (not shown). In addition, the log collection unit 11 may accept an input of a log file from the operation manager.
 図2に、ログ分析システム1が収集するログファイルの一例(ログファイル101~103)を示す。ログファイルは、ログメッセージ(ログレコードとも呼ぶ)の集合であり、図2のように少なくとも一つのログメッセージから構成される。ログメッセージは、それぞれのログメッセージを識別するための識別子であるログID(Identifier)、当該ログメッセージが出力された時刻、メッセージの本文、ログレベルなどといった複数のログ要素により構成される。なお、ログIDはログ識別子とも呼ばれ、以下においては単にIDと記載されることもある。 FIG. 2 shows an example of log files (log files 101 to 103) collected by the log analysis system 1. A log file is a set of log messages (also called log records), and is composed of at least one log message as shown in FIG. The log message includes a plurality of log elements such as a log ID (Identifier) that is an identifier for identifying each log message, the time when the log message is output, the message body, the log level, and the like. Note that the log ID is also referred to as a log identifier, and may be simply referred to as ID below.
 ログ収集手段11は、収集した少なくとも一つのログファイルに基づいて、全てのログファイルに格納されたログメッセージを時系列順に並び替えた統合ログを生成する。ログ収集手段11は、生成した統合ログをログ集計手段12に向けて送信する。 The log collection unit 11 generates an integrated log in which the log messages stored in all the log files are rearranged in time series based on the collected at least one log file. The log collecting unit 11 transmits the generated integrated log to the log totaling unit 12.
 図3に、ログ収集手段11が生成する統合ログの一例(統合ログ104)を示す。統合ログは、ログメッセージの集合であり、図3のように少なくとも一つのログメッセージから構成される。なお、統合ログは、元々は異なるログファイルを構成していたログメッセージを結合したものである。また、統合ログは、ログファイルを識別するための識別子と、ログファイルにおける当該ログメッセージの行番号とを組み合わせた情報の集合であってもよい。 FIG. 3 shows an example (integrated log 104) of the integrated log generated by the log collecting means 11. The unified log is a set of log messages and is composed of at least one log message as shown in FIG. The integrated log is a combination of log messages that originally constituted different log files. The integrated log may be a set of information obtained by combining an identifier for identifying a log file and a line number of the log message in the log file.
 ログ収集手段11は、例えば、収集対象のログファイルそのものの指定や、ログファイルに記録されたログメッセージの日時の指定など、収集するログメッセージの範囲の指定を運用管理者から受け付けてもよい。また、ログ収集手段11は、ログメッセージの分析に必要な情報を定義したファイル(図示しない)を読み込み、ファイルが定義する情報に従って、取得したログファイルの形式をログ分析システム1が分析しやすい形式に変換してもよい。 The log collection unit 11 may receive, from the operation manager, specification of the range of log messages to be collected, such as specification of the log file itself to be collected and specification of the date and time of the log message recorded in the log file. In addition, the log collection unit 11 reads a file (not shown) in which information necessary for analyzing a log message is defined, and the log analysis system 1 easily analyzes the format of the acquired log file according to the information defined by the file. May be converted to
 [ログ集計手段]
 ログ集計手段12は、ログ収集手段11から受信した情報と、別途定義された時間幅とに基づいて、それぞれのログメッセージの出現情報を算出する。時間幅は、ログ集計手段12が集計対象とするログメッセージの出現時間の範囲を示す。時間幅は、ユーザによって定義されたものであってもよいし、図示しないファイルに予め記録されたものであってもよい。
[Log aggregation means]
The log totaling unit 12 calculates the appearance information of each log message based on the information received from the log collecting unit 11 and a separately defined time width. The time width indicates the range of the appearance time of the log message to be counted by the log counting means 12. The time width may be defined by the user, or may be recorded in advance in a file (not shown).
 図4は、出現情報の一例(出現情報105)を示す図である。出現情報は、図4のように、ログメッセージのログIDに対応させた少なくとも一つの出現時刻と出現回数との組によって構成されている。なお、出現情報は、出現回数の総和を含んでいてもよい。図4の出現情報105には、それぞれのログIDに関して、複数の出現時刻が記録されるとともに、複数の出現時刻それぞれに対応する出現回数が記録されている。 FIG. 4 is a diagram showing an example of appearance information (appearance information 105). As shown in FIG. 4, the appearance information is composed of a pair of at least one appearance time and the number of appearances corresponding to the log ID of the log message. Note that the appearance information may include the total number of appearances. In the appearance information 105 of FIG. 4, a plurality of appearance times are recorded for each log ID, and the number of appearances corresponding to each of the plurality of appearance times is recorded.
 ログ集計手段12は、統合ログを時間幅毎に読み込み、読み込んだ時間幅内の統合ログの当該箇所に含まれるIDの種類とその数を出現回数として集計する。ログ集計手段12は、時間幅で分割された時間の中で任意の時刻を1つ選択して、当該IDの出現時刻として登録する。ログ集計手段12は、例えば、分割された時間の中央値や最小値、最大値を出現時刻として登録してもよい。ログ集計手段12は、算出した出現情報を基準パターン生成手段13へ送信する。 The log totaling means 12 reads the integrated log for each time width, and totals the type and number of IDs included in the corresponding portion of the integrated log within the read time width as the number of appearances. The log totaling means 12 selects one arbitrary time from the time divided by the time width and registers it as the appearance time of the ID. For example, the log totaling unit 12 may register the median value, the minimum value, and the maximum value of the divided times as the appearance time. The log totaling unit 12 transmits the calculated appearance information to the reference pattern generating unit 13.
 [基準パターン生成手段]
 基準パターン生成手段13は、ログ集計手段12から受信した少なくとも一つの出現情報を比較し、同一のIDをもつ出現情報同士を結合する。そして、基準パターン生成手段13は、結合したIDの組み合わせとその出現情報とを基準パターン結合手段14に送信する。すなわち、基準パターン生成手段13は、ログメッセージの出現情報に基づいて、同期して出現するログメッセージの組み合わせ毎に基準パターンを生成する。
[Reference pattern generation means]
The reference pattern generation unit 13 compares at least one piece of appearance information received from the log totaling unit 12 and combines the pieces of appearance information having the same ID. Then, the reference pattern generation unit 13 transmits the combined ID combination and its appearance information to the reference pattern combination unit 14. That is, the reference pattern generation unit 13 generates a reference pattern for each combination of log messages that appear synchronously based on the log message appearance information.
 基準パターン生成手段13は、例えば、出現情報の同一性に関する判定基準などの指定を運用管理者から受け付けてもよい。また、基準パターン生成手段13は、出現情報の同一性の判定に必要な情報を定義したファイル(図示しない)を読み込み、入力されたIDの出現情報をそのファイルに基づいて比較してもよい。 The reference pattern generation unit 13 may receive, for example, designation of a determination criterion related to the identity of appearance information from the operation manager. Further, the reference pattern generation unit 13 may read a file (not shown) in which information necessary for determining the identity of appearance information is defined, and compare the appearance information of the input ID based on the file.
 [基準パターン結合手段]
 基準パターン結合手段14は、基準パターン生成手段13から受信したIDまたは複数のIDの組み合わせに関する出現情報を比較する。基準パターン結合手段14は、別途定義した条件を満たす単独のIDまたは複数のIDの組み合わせを結合する。すなわち、基準パターン結合手段14は、基準パターンに含まれるログメッセージの出現情報を基準パターン間で比較し、比較した結果に基づいて基準パターン同士を結合する。基準パターン結合手段14は、結合した結果の集合を「パターン(組み合わせ)」としてパターン記憶手段15へ出力する。なお、このパターン(組み合わせ)の集合を「パターン集合」とも呼ぶ。
[Reference pattern combining means]
The reference pattern combining unit 14 compares the appearance information regarding the ID received from the reference pattern generating unit 13 or a combination of a plurality of IDs. The reference pattern combining unit 14 combines a single ID or a combination of a plurality of IDs that satisfy a separately defined condition. That is, the reference pattern combining unit 14 compares the appearance information of the log message included in the reference pattern between the reference patterns, and combines the reference patterns based on the comparison result. The reference pattern combining unit 14 outputs the set of combined results to the pattern storage unit 15 as a “pattern (combination)”. This set of patterns (combinations) is also called a “pattern set”.
 図5には、パターン(組み合わせ)を表形式でまとめた組み合わせ情報テーブル106を示す。パターン(組み合わせ)は、単独のIDまたは複数のIDの組み合わせと、それらに対応する出現情報とを含む。図5の組み合わせ情報テーブル106において、出現情報は、出現時刻および出現回数によって構成されている。 FIG. 5 shows a combination information table 106 in which patterns (combinations) are summarized in a table format. The pattern (combination) includes a single ID or a combination of a plurality of IDs and appearance information corresponding to them. In the combination information table 106 of FIG. 5, the appearance information is composed of the appearance time and the number of appearances.
 パターン記憶手段15は、基準パターン結合手段14が出力したパターン(組み合わせ)を格納する。 The pattern storage unit 15 stores the pattern (combination) output from the reference pattern combination unit 14.
 以上が、本実施形態に係るログ分析システム1の構成についての説明である。 The above is the description of the configuration of the log analysis system 1 according to the present embodiment.
 〔動作〕
 次に、本実施形態に係るログ分析システム1の動作について説明する。
[Operation]
Next, the operation of the log analysis system 1 according to this embodiment will be described.
 図6は、本実施形態に係るログ分析システム1の動作の概略に関するフローチャートである。本実施形態に係るログ分析システム1は、出現情報集計処理、基準パターン生成処理および基準パターン結合処理という三つの処理を行う。 FIG. 6 is a flowchart regarding an outline of the operation of the log analysis system 1 according to the present embodiment. The log analysis system 1 according to the present embodiment performs three processes: an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.
 図6において、ステップS1の出現情報集計処理は、ログ収集手段11がログファイルを読み込み、ログ集計手段12がID毎の出現情報を集計する処理である。 In FIG. 6, the appearance information totaling process in step S1 is a process in which the log collecting unit 11 reads the log file and the log totaling unit 12 totals the appearance information for each ID.
 ステップS2の基準パターン生成処理は、基準パターン生成手段13が、ID毎の出現情報に基づいて、同期して出現する少なくとも一つのログメッセージを基準パターンとして結びつける処理である。なお、「同期して出現する少なくとも一つのログメッセージ」とは、「一定時間内に連続して出力された少なくとも一つのログメッセージ」を意味する。 The reference pattern generation process in step S2 is a process in which the reference pattern generation unit 13 combines at least one log message that appears synchronously as a reference pattern based on the appearance information for each ID. Note that “at least one log message appearing synchronously” means “at least one log message output continuously within a certain period of time”.
 ステップS3の基準パターン結合処理は、基準パターン結合手段14が、基準パターン集合に基づいてIDの組み合わせを結合し、パターン(組み合わせ)を生成する処理である。 The reference pattern combining process in step S3 is a process in which the reference pattern combining unit 14 combines a combination of IDs based on the reference pattern set to generate a pattern (combination).
 以下において、第1の実施形態に係るログ分析システム1の動作を、出現情報集計処理、基準パターン生成処理および基準パターン結合処理の三つに分けて詳細に説明する。 Hereinafter, the operation of the log analysis system 1 according to the first exemplary embodiment will be described in detail by dividing it into three parts, that is, an appearance information aggregation process, a reference pattern generation process, and a reference pattern combination process.
 [出現情報集計処理]
 まず、出現情報集計処理について説明する。出現情報集計処理とは、ログ収集手段11がログファイルを読み込み、ログ集計手段12がID毎の出現情報を集計する処理である。図7は、出現情報集計処理に関するフローチャートである。
[Appearance information aggregation process]
First, the appearance information aggregation process will be described. The appearance information totaling process is a process in which the log collecting unit 11 reads a log file and the log totaling unit 12 totals appearance information for each ID. FIG. 7 is a flowchart regarding the appearance information tabulation process.
 図7において、まず、ログ収集手段11は、分析対象システム10から出力されたログファイルを読み込む(ステップS101)。 7, first, the log collection unit 11 reads the log file output from the analysis target system 10 (step S101).
 ログ収集手段11は、取得した全てのログファイルを結合することによって、統合ログを生成する(ステップS102)。 The log collecting unit 11 generates an integrated log by combining all acquired log files (step S102).
 ログ収集手段11は、各ログメッセージの時刻情報に基づいて、統合ログのログメッセージを時系列順に並び変える(ステップS103)。 The log collection unit 11 rearranges the log messages of the integrated log in chronological order based on the time information of each log message (step S103).
 次に、ログ集計手段12は、定義された時間幅に基づいて、統合ログのログメッセージを読み込む(ステップS104)。 Next, the log totaling means 12 reads the log message of the integrated log based on the defined time width (step S104).
 例えば、読み込みを開始するログメッセージの時刻が「2014/07/01_12:00:01」であり、定義された時間幅が「1分」であったとする。このとき、ログ集計手段12は、「2014/07/01_12:00:01」から「2014/07/01_12:01:00」までの区間のログメッセージを読み込む。 For example, it is assumed that the time of the log message to start reading is “2014/07/01 — 12:00:01” and the defined time width is “1 minute”. At this time, the log totaling unit 12 reads a log message in a section from “2014/07 / 01_12: 00: 01” to “2014/07 / 01_12: 00: 10: 00”.
 次に、ログ集計手段12は、読み込んだログメッセージの集合から同一のIDの出現数を集計し、時刻情報と出現回数との組をID毎に出現情報として記録する(ステップS105)。 Next, the log totaling unit 12 totals the number of appearances of the same ID from the set of read log messages, and records a set of time information and the number of appearances as appearance information for each ID (step S105).
 例えば、「2014/07/01_12:00:01」から「2014/07/01_12:01:00」までの区間で、ID「1001」のログメッセージが「10回」、ID「2034」のログメッセージが「3回」出現していたとする。このとき、ログ集計手段12は、ID「1001」の出現情報に出現時刻「2014/07/01_12:00:01」と出現回数「10」とを追加する。同様に、ログ集計手段12は、ID「2034」の出現情報に、出現時刻「2014/07/01_12:00:01」と、出現回数「3」とを追加する。 For example, in the section from “2014/07/01 — 12:00:01” to “2014/07/01 — 12:00: 10:00”, the log message with ID “1001” is “10 times” and the log message with ID “2034” Appears “3 times”. At this time, the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the number of appearances “10” to the appearance information of the ID “1001”. Similarly, the log totaling unit 12 adds the appearance time “2014/07/01 — 12:00:01” and the appearance count “3” to the appearance information of the ID “2034”.
 ここで、ログ集計手段12は、統合ログの最後のログメッセージに到達したか否かを判断する(ステップS106)。 Here, the log totaling unit 12 determines whether or not the last log message of the integrated log has been reached (step S106).
 統合ログの最後のログメッセージに到達すると(ステップS106でYes)、ログ集計手段12は、ID毎の出現情報を基準パターン生成手段12に出力する(ステップS107)。 When the last log message of the integrated log is reached (Yes in step S106), the log totaling unit 12 outputs appearance information for each ID to the reference pattern generating unit 12 (step S107).
 一方、統合ログの最後のログメッセージに到達していない場合は(ステップS106でNo)、ステップS104に戻る。 On the other hand, when the last log message of the integrated log has not been reached (No in step S106), the process returns to step S104.
 すなわち、ログ集計手段12は、統合ログの最後のログメッセージに到達するまで上記のステップS104およびステップS105の処理を繰り返す。なお、任意の時刻においてログメッセージの読み込みを完了することができるように、ユーザによる時刻の入力を可能としたり、図示しない定義情報からの読み込みを完了する時刻の取得を可能としたりするようにログ集計手段12を構成してもよい。 That is, the log totaling unit 12 repeats the processes in steps S104 and S105 until the last log message of the integrated log is reached. The log can be entered by the user so that the reading of the log message can be completed at an arbitrary time, or the time for completing the reading from the definition information (not shown) can be obtained. The counting means 12 may be configured.
 以上が、出現情報集計処理についての説明である。 This completes the description of the appearance information aggregation process.
 [基準パターン生成処理]
 次に、基準パターン生成処理について説明する。基準パターン生成処理とは、基準パターン生成手段13が、ID毎の出現情報に基づき、同期して出現するログメッセージを基準パターンとして結びつける処理である。図8は、基準パターン生成処理に関するフローチャートである。なお、図8を用いて説明するログメッセージの結合に関する動作は一例であって、同時刻に発生するIDを比較して結び付けることができる手法であればどのような手法を用いてもかまわない。
[Reference pattern generation processing]
Next, the reference pattern generation process will be described. The reference pattern generation process is a process in which the reference pattern generation unit 13 combines log messages that appear synchronously as reference patterns based on the appearance information for each ID. FIG. 8 is a flowchart regarding reference pattern generation processing. Note that the operation related to log message combination described with reference to FIG. 8 is an example, and any method may be used as long as IDs generated at the same time can be compared and linked.
 図8において、まず、基準パターン生成手段13は、ログ集計手段12によって出力されたID毎の出現情報を読み込む(ステップS201)。 In FIG. 8, first, the reference pattern generation unit 13 reads the appearance information for each ID output by the log aggregation unit 12 (step S201).
 基準パターン生成手段13は、受信したID毎の出現情報の集合(以下、結合候補集合)に基づいて、各IDの出現時刻毎に出現回数の総和(以下、出現頻度)を算出する(ステップS202)。 Based on the received appearance information set for each ID (hereinafter referred to as a combined candidate set), the reference pattern generation unit 13 calculates the total number of appearances (hereinafter referred to as the appearance frequency) for each appearance time of each ID (step S202). ).
 基準パターン生成手段13は、結合候補集合を構成する出現情報を出現頻度の昇順に並び替える(ステップS203)。 The reference pattern generation unit 13 rearranges the appearance information constituting the combination candidate set in ascending order of appearance frequency (step S203).
 基準パターン生成手段13は、結合候補集合から比較元となるID(以下、比較元ID)を選択する(ステップS204)。ここでは、基準パターン生成手段13は、結合候補集合から出現頻度が最小である出現情報のIDを比較元IDとして選択し、選択した比較元IDを他のID(比較対象ID)の出現情報と比較するものとして説明するが、それとは別の基準で選択してもよい。 The reference pattern generation unit 13 selects an ID as a comparison source (hereinafter referred to as a comparison source ID) from the combination candidate set (step S204). Here, the reference pattern generation unit 13 selects the ID of the appearance information having the lowest appearance frequency from the combination candidate set as the comparison source ID, and uses the selected comparison source ID as the appearance information of another ID (comparison target ID). Although described as a comparison, the selection may be based on another criterion.
 ここで、基準パターン生成手段13は、選択した比較元IDの出現頻度が、結合候補集合を構成する出現情報のうち最大であるか否かを判断する(ステップS205)。 Here, the reference pattern generation unit 13 determines whether or not the appearance frequency of the selected comparison source ID is the maximum among the appearance information constituting the combination candidate set (step S205).
 選択した比較元IDの出現頻度が最大ではない場合(ステップS205でNo)、基準パターン生成手段13は、選択した比較元IDと同一の出現情報をもつIDがあるか否かを検証する(ステップS206)。一方、選択した比較元IDの出現頻度が最大である場合(ステップS205でYes)、ステップS209に進む。 When the appearance frequency of the selected comparison source ID is not the maximum (No in step S205), the reference pattern generation unit 13 verifies whether there is an ID having the same appearance information as the selected comparison source ID (step S205). S206). On the other hand, if the appearance frequency of the selected comparison source ID is maximum (Yes in step S205), the process proceeds to step S209.
 選択した比較元IDと同一の出現情報を持つID(以下、比較対象ID)がある場合(ステップS206でYes)、基準パターン生成手段13は、比較元IDと比較対象IDとを結合し、IDの組み合わせを生成する(ステップS207)。一方、ステップS206において、比較元IDと同一の出現情報を持つIDがない場合(ステップS206でNo)、別のIDを比較元IDとして取得するためにステップS204の処理に戻る。 If there is an ID having the same appearance information as the selected comparison source ID (hereinafter referred to as a comparison target ID) (Yes in step S206), the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID, Are generated (step S207). On the other hand, if there is no ID having the same appearance information as the comparison source ID in step S206 (No in step S206), the process returns to step S204 to acquire another ID as the comparison source ID.
 このように、ステップS204~ステップS206の処理は、選択した比較元IDと同一の出現情報を持つ比較対象IDがなくなるまで繰り返される。 In this way, the processing from step S204 to step S206 is repeated until there is no comparison target ID having the same appearance information as the selected comparison source ID.
 ここで、ステップS207に関して補足説明をする。 Here, a supplementary explanation will be given regarding step S207.
 ステップS207において、例えば、ある比較元ID「2048」の出現時刻が、以下の10通りであるとする。
「2014/07/01_9:00:01、2014/07/01_10:00:01、2014/07/01_11:00:01、2014/07/01_12:00:01、2014/07/01_13:00:01、2014/07/01_14:00:01、2014/07/01_15:00:01、2014/07/01_16:00:01、2014/07/01_17:00:01、2014/07/01_18:00:01」
 そして、比較元ID「2048」の各出現時刻に対応する出現回数が「2、2、2、2、2、2、2、2、2、2」であるとする。
In step S207, for example, it is assumed that the appearance time of a certain comparison source ID “2048” is as follows.
“2014/07 / 01_9: 00: 01, 2014/07 / 01_10: 00: 01, 2014/07 / 01_11: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_13: 00: 01 2014/07 / 01_14: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_16: 00: 01, 2014/07 / 01_17: 00: 01, 2014/07 / 01_18: 00: 01 "
It is assumed that the number of appearances corresponding to each appearance time of the comparison source ID “2048” is “2, 2, 2, 2, 2, 2, 2, 2, 2, 2”.
 このとき、比較対象ID「2049」の出現時刻が、以下の10通りであるとする。
「2014/07/01_9:00:01、2014/07/01_10:00:01、2014/07/01_11:00:01、2014/07/01_12:00:01、2014/07/01_13:00:01、2014/07/01_14:00:01、2014/07/01_15:00:01、2014/07/01_16:00:01、2014/07/01_17:00:01、2014/07/01_18:00:01」
 そして、比較対象ID「2049」の各出現時刻に対応する出現回数が「2、2、2、2、2、2、2、2、2、2」であるとする。
At this time, it is assumed that the appearance times of the comparison target ID “2049” are the following 10 types.
“2014/07 / 01_9: 00: 01, 2014/07 / 01_10: 00: 01, 2014/07 / 01_11: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_13: 00: 01 2014/07 / 01_14: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_16: 00: 01, 2014/07 / 01_17: 00: 01, 2014/07 / 01_18: 00: 01 "
The number of appearances corresponding to each appearance time of the comparison target ID “2049” is assumed to be “2, 2, 2, 2, 2, 2, 2, 2, 2, 2”.
 この場合、比較元ID「2048」および比較対象ID「2049」の出現回数の合計値(出現頻度)はともに「20」であり、出現時刻も同一である。そのため、比較元ID「2048」と比較対象ID「2049」とは結合の対象となる。 In this case, the total number of appearances (appearance frequency) of the comparison source ID “2048” and the comparison target ID “2049” is both “20”, and the appearance time is also the same. Therefore, the comparison source ID “2048” and the comparison target ID “2049” are to be combined.
 また、上述の出現頻度と出現時刻との比較において、出現頻度は同一であるが、出現時刻のみが1単位(例えば1分)ずれていた場合、当該ID同士は同一の出現情報を持つとみなしてもよい。 In addition, in the comparison between the appearance frequency and the appearance time, when the appearance frequency is the same, but only the appearance time is shifted by one unit (for example, 1 minute), the IDs are regarded as having the same appearance information. May be.
 例えば、ある比較元ID「3018」の出現時刻が以下の4通りであるとする。
「2014/07/01_9:00:01、2014/07/01_12:00:01、2014/07/01_15:00:01、2014/07/01_18:00:01」
 そして、比較元ID「3018」の各出現時刻に対応する出現回数が「3、3、3、3」であるとする。
For example, it is assumed that the appearance time of a certain comparison source ID “3018” is as follows.
"2014/07 / 01_9: 00: 01, 2014/07 / 01_12: 00: 01, 2014/07 / 01_15: 00: 01, 2014/07 / 01_18: 00: 01"
The number of appearances corresponding to each appearance time of the comparison source ID “3018” is assumed to be “3, 3, 3, 3”.
 同様に、比較対象ID「4024」の出現時刻が以下の6通りであるとする。
「2014/07/01_9:00:01、2014/07/01_12:00:01、2014/07/01_12:01:01、2014/07/01_15:00:01、2014/07/01_18:00:01、2014/07/01_18:01:01」
 そして、比較対象ID「4024」の各出現時刻に対応する出現回数が、「3、2、1、3、1、2」であるとする。
Similarly, it is assumed that the appearance time of the comparison target ID “4024” is as follows.
“2014/07/01 — 9:00:01, 2014/07/01 — 12:00:01, 2014/07/01 —12: 01: 01, 2014/07/01 —15: 00: 01, 2014/07/01 —18: 00: 01 , 2014/07 / 01_18: 01: 01 "
The number of appearances corresponding to each appearance time of the comparison target ID “4024” is assumed to be “3, 2, 1, 3, 1, 2”.
 このとき、比較元ID「3018」および比較対象ID「4024」の出現回数の合計値(出現頻度)はともに「12」回である。しかし、比較対象ID「4024」では、比較元ID「3018」にはなかった出現時刻の差分「2014/07/01_12:01:01、2014/07/01_18:01:01」がある。このとき、時間幅が「1分」であったとすると、出現時刻の差分は、比較元ID「3018」の出現時刻「2014/07/01_12:00:01、2014/07/01_18:00:01」と隣接した時刻である。この場合、差分となった時刻は、その隣接した時刻に属するものとし、比較元ID「3018」と比較対象ID「4024」とを結合の対象とする。 At this time, the total value (appearance frequency) of the number of appearances of the comparison source ID “3018” and the comparison target ID “4024” is both “12” times. However, in the comparison target ID “4024”, there is a difference in appearance time “2014/07 / 01_12: 01: 01, 2014/07 / 01_18: 01: 01” which was not in the comparison source ID “3018”. At this time, if the time width is “1 minute”, the difference in appearance time is the appearance time “2014/07 / 01_12: 00: 01, 2014/07 / 01_18: 00: 01 of the comparison source ID“ 3018 ”. It is the time adjacent to. In this case, the difference time belongs to the adjacent time, and the comparison source ID “3018” and the comparison target ID “4024” are to be combined.
 なお、時刻の差分をどの範囲まで同一とみなすかの基準は、ユーザが別途定義するように構成してもよい。また、出現頻度および出現情報の一致度について閾値を設定し、設定した閾値条件を満たすID同士を結合するように構成してもよい。 It should be noted that the criteria for determining the same range of time difference may be separately defined by the user. Further, a threshold may be set for the appearance frequency and the degree of coincidence of the appearance information, and IDs that satisfy the set threshold condition may be combined.
 図8のフローチャート(ステップS206でYes)に戻ると、基準パターン生成手段13は、ステップS207において比較元IDと比較対象IDとを結合した後に、結合候補集合の出現情報を更新する(ステップS208)。出現情報の更新において、第一に、基準パターン生成手段13は、生成したIDの組み合わせとIDの出現情報との組を結合候補集合に追加する。第二に、基準パターン生成手段13は、比較元IDおよび比較対象IDを結合候補集合から削除する。結合候補集合が更新されると、ステップS203の処理に戻る。 Returning to the flowchart of FIG. 8 (Yes in Step S206), the reference pattern generation unit 13 combines the comparison source ID and the comparison target ID in Step S207, and then updates the appearance information of the combination candidate set (Step S208). . In updating the appearance information, first, the reference pattern generation unit 13 adds the combination of the generated ID combination and the ID appearance information to the combination candidate set. Second, the reference pattern generation unit 13 deletes the comparison source ID and the comparison target ID from the combination candidate set. When the combination candidate set is updated, the process returns to step S203.
 このように、ステップS203~ステップS208の処理は、結合候補集合のうち出現頻度が最大の結合候補(出現情報)に到達するまで繰り返される。 As described above, the processing of step S203 to step S208 is repeated until the combination candidate (appearance information) having the maximum appearance frequency is reached in the combination candidate set.
 最後に、選択した比較元IDの出現頻度が最大である場合(ステップS205でYes)、基準パターン生成手段13は、結合候補集合を構成する出現情報を出現頻度の昇順に並び替えた集合を基準パターン集合として基準パターン結合手段14に出力する(ステップS209)。なお、選択した比較元IDの出現頻度が最大であるとは、結合候補集合のうち出現頻度が最大の結合候補(出現情報)に到達したことを意味する。 Finally, when the appearance frequency of the selected comparison source ID is the highest (Yes in step S205), the reference pattern generation unit 13 uses the set obtained by rearranging the appearance information constituting the candidate combination set in ascending order of appearance frequency. The pattern set is output to the reference pattern combining unit 14 (step S209). Note that that the appearance frequency of the selected comparison source ID is maximum means that the combination candidate (appearance information) having the maximum appearance frequency in the combination candidate set has been reached.
 以上が、基準パターン生成処理についての説明である。 The above is the description of the reference pattern generation process.
 [基準パターン結合処理]
 次に、基準パターン結合処理について説明する。基準パターン結合処理とは、基準パターン結合手段14が、基準パターン集合に基づいてIDの組み合わせを結合し、パターン(組み合わせ)を生成する処理である。図9および図10は、基準パターン結合処理に関するフローチャートである。なお、基準パターン集合とは、基準パターンの集合であり、パターン(組み合わせ)の集合と同様に、IDの組み合わせと、当該組み合わせの出現情報との組で構成されるパターンである。
[Reference pattern combination processing]
Next, the reference pattern combination process will be described. The reference pattern combining process is a process in which the reference pattern combining unit 14 combines a combination of IDs based on a reference pattern set to generate a pattern (combination). 9 and 10 are flowcharts relating to the reference pattern combining process. The reference pattern set is a set of reference patterns, and is a pattern composed of a combination of an ID combination and appearance information of the combination in the same manner as a pattern (combination) set.
 図9において、まず、基準パターン結合手段14は、基準パターン生成処理において基準パターン生成手段13が生成した基準パターン集合を読み込む(ステップS301)。 9, first, the reference pattern combining unit 14 reads the reference pattern set generated by the reference pattern generation unit 13 in the reference pattern generation process (step S301).
 基準パターン結合手段14は、読み込んだ基準パターン集合から、出現頻度が最も低い基準パターンを選択する(ステップS302)。ここで選択された基準パターンを比較元パターンと呼ぶ。基準パターン結合手段14は、出現頻度が低い順に基準パターン集合から基準パターンを選択する。
 ここで、基準パターン結合手段14は、ステップS301で読み込んだ基準パターン集合の中に比較元パターンがあるか否かを判断する(ステップS303)。
The reference pattern combining unit 14 selects a reference pattern with the lowest appearance frequency from the read reference pattern set (step S302). The reference pattern selected here is called a comparison source pattern. The reference pattern combining unit 14 selects a reference pattern from the reference pattern set in ascending order of appearance frequency.
Here, the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set read in step S301 (step S303).
 比較元パターンがある場合(ステップS303でYes)、基準パターン結合手段14は、比較元パターンの出現頻度以下のパターンを比較対象のパターン(以下、比較対象パターン)として基準パターン集合から選択する(ステップS304)。この比較対象パターンの集合を比較対象パターン集合と呼ぶ。一方、比較元パターンがなかった場合(ステップS303でNo)、図10のステップS312に進む。 If there is a comparison source pattern (Yes in step S303), the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S304). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S303), the process proceeds to step S312 in FIG.
 ここで、基準パターン結合手段14は、ステップS301で読み込んだ基準パターン集合の中に比較対象パターンがあるか否かを判断する(ステップS305)。 Here, the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set read in step S301 (step S305).
 比較対象パターンがある場合(ステップS305でYes)、基準パターン結合手段14は、比較元パターンの出現情報と、比較対象パターン集合に含まれる比較対象パターンの出現情報とを比較し、出現情報の類似度を算出する(ステップS306)。一方、比較対象パターンがない場合(ステップS305でNo)、ステップS308に進む。 When there is a comparison target pattern (Yes in step S305), the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern included in the comparison target pattern set, and the similarity of the appearance information The degree is calculated (step S306). On the other hand, when there is no comparison target pattern (No in step S305), the process proceeds to step S308.
 ここで、ステップS306について補足説明をする。 Here, a supplementary explanation will be given for step S306.
 例えば、比較元パターン「5025、6036」の出現時刻が以下の5通りであるとする。
「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」
 そして、比較元パターン「5025、6036」の各出現時刻に対応する出現回数が「2、2、2、2、2」であるとする。
For example, it is assumed that the appearance times of the comparison source patterns “5025, 6036” are as follows.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
The number of appearances corresponding to each appearance time of the comparison source pattern “5025, 6036” is “2, 2, 2, 2, 2”.
 一方で、比較対象パターン「1001、3009、7049」の出現時刻が以下の5通りであるとする。
「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」
 そして、比較対象パターン「1001、3009、7049」の各出現時刻に対応する出現回数が「2、1、1、2、2」であるとする。
On the other hand, it is assumed that the appearance times of the comparison target patterns “1001, 3009, 7049” are as follows.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
Then, it is assumed that the number of appearances corresponding to each appearance time of the comparison target pattern “1001, 3009, 7049” is “2, 1, 1, 2, 2”.
 すなわち、二つの基準パターンに共通する出現情報は、出現時刻が「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」、出現回数が「2、1、1、2、2」である。 That is, the appearance information common to the two reference patterns has the appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7. / 4 — 12:00:01, 2014/7/5 — 12:00:01 ”, and the number of appearances is“ 2, 1, 1, 2, 2 ”.
 このとき、比較元パターン「5025、6036」と比較対象パターン「1001、3009、7049」との類似度は、出現回数の比から「8/8」すなわち「1.0」であると計算される。ただし、出現回数の比は、「共通する出現情報の出現頻度」と「比較対象の出現情報の出現頻度」との比であり、以下の式1によって算出する。
(出現回数の比)=(共通する出現情報の出現頻度)/(比較対象の出現情報の出現頻度)・・・(1)
 なお、式1においては、類似度の指標として、比較対象の出現頻度に対する共通部分の出現頻度の比を用いたが、これに加えて比較元の出現頻度に対する共通部分の出現頻度の比を用いてもよい。また、出現回数の比の算出において、出現情報の出現頻度を用いたが、その代わりに出現時刻の個数を用いてもよい。
At this time, the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is calculated to be “8/8”, that is, “1.0” from the ratio of the number of appearances. . However, the ratio of the number of appearances is a ratio between “appearance frequency of common appearance information” and “appearance frequency of appearance information to be compared”, and is calculated by the following formula 1.
(Appearance ratio) = (Appearance frequency of common appearance information) / (Appearance frequency of comparison target appearance information) (1)
In Equation 1, the ratio of the appearance frequency of the common part to the appearance frequency of the comparison target is used as the similarity index, but in addition, the ratio of the appearance frequency of the common part to the appearance frequency of the comparison source is used. May be. In addition, in the calculation of the ratio of the number of appearances, the appearance frequency of the appearance information is used, but the number of appearance times may be used instead.
 基準パターン結合手段14は、ステップS306の処理で算出した類似度が別途定義した閾値条件を満たす比較対象パターンを結合候補パターンとして選択する(ステップS307)。そして、ステップS304に戻る。閾値条件は、例えば、前述の類似度が所定の閾値を超える場合や、所定の閾値以上の場合に満たされるとすればよい。 The reference pattern combining unit 14 selects, as a combination candidate pattern, a comparison target pattern in which the similarity calculated in the process of step S306 satisfies a threshold value defined separately (step S307). Then, the process returns to step S304. The threshold condition may be satisfied when, for example, the above-described similarity exceeds a predetermined threshold or is equal to or higher than a predetermined threshold.
 基準パターン結合手段14は、比較対象パターンがなくなるまで(ステップS305でNo)ステップS304~S307の処理を繰り返し、結合候補パターンの集合を生成する。 The reference pattern combining unit 14 repeats the processing of steps S304 to S307 until there is no comparison target pattern (No in step S305), and generates a set of combination candidate patterns.
 ここで、ステップS307について補足説明する。 Here, a supplementary explanation will be given of step S307.
 例えば、比較元パターン「5025、6036」と比較対象パターン「1001、3009、7049」との類似度が「1.0」であったとする。このとき、所定の閾値が「0.9」であれば、類似度が閾値以上であるため、比較対象パターン「1001、3009、7049」は結合候補パターンとなる。ここでは、類似度の指標を1つのみとしたが、複数の類似度の指標がある場合は、単一の値を閾値として適用してもよいし、それぞれの指標に対して個別に閾値を用意してもよい。 For example, assume that the similarity between the comparison source pattern “5025, 6036” and the comparison target pattern “1001, 3009, 7049” is “1.0”. At this time, if the predetermined threshold is “0.9”, the similarity is equal to or higher than the threshold, and the comparison target patterns “1001, 3009, 7049” are the combination candidate patterns. Here, only one similarity index is used, but when there are multiple similarity indices, a single value may be applied as a threshold, or a threshold is individually set for each index. You may prepare.
 ところで、比較対象パターンがない場合(ステップS305でNo)、基準パターン結合手段14は、結合候補パターンの集合から全ての出現情報を抽出し、抽出した全ての出現情報を結合した候補出現情報を生成する(ステップS308)。 When there is no comparison target pattern (No in step S305), the reference pattern combining unit 14 extracts all appearance information from the set of combination candidate patterns and generates candidate appearance information by combining all the extracted appearance information. (Step S308).
 ここで、ステップS308に関して補足説明をする。以下では、結合候補パターンが「1001、3009、7049」および「2004、4016」の2種類であった場合を例としてあげる。 Here, a supplementary explanation will be given regarding step S308. In the following, a case where there are two types of combination candidate patterns “1001, 3009, 7049” and “2004, 4016” will be described as an example.
 結合候補パターン「1001、3009、7049」の出現時刻が以下の5通りであるとする。
「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」
 そして、結合候補パターン「1001、3009、7049」の各出現時刻に対応する出現回数が「2、1、1、2、2」であるとする。
Assume that the combination candidate patterns “1001, 3009, 7049” have the following five appearance times.
“2014/7 / 1_12: 00: 01, 2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01, 2014/7 / 4_12: 00: 01, 2014/7 / 5_12: 00: 01 "
Assume that the number of appearances corresponding to each appearance time of the combination candidate patterns “1001, 3009, 7049” is “2, 1, 1, 2, 2”.
 さらに、結合候補パターン「2004、4016」の出現時刻が「2014/7/2_12:00:01、2014/7/3_12:00:01」、出現回数が「1、1」であるとする。 Furthermore, it is assumed that the appearance time of the combination candidate pattern “2004, 4016” is “2014/7 / 2_12: 00: 01, 2014/7 / 3_12: 00: 01”, and the number of appearances is “1, 1”.
 このとき、候補出現情報は、出現時刻が「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」、出現回数が「2、2、2、2、2」となる。 At this time, the candidate appearance information has an appearance time “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00: 01, 2014/7/5 — 12:00:01 ”and the number of appearances is“ 2, 2, 2, 2, 2 ”.
 そして、基準パターン結合手段14は、比較元パターンの出現情報と、結合候補パターンの候補出現情報とを比較し、ステップS306の処理と同様に二者間の類似度を算出する(ステップS309)。 Then, the reference pattern combination unit 14 compares the appearance information of the comparison source pattern with the candidate appearance information of the combination candidate pattern, and calculates the similarity between the two in the same manner as the process of step S306 (step S309).
 ここで、基準パターン結合手段14は、ステップS309で算出した類似度が、別途定義した閾値以上であるか否かを判定する(ステップS310)。なお、比較元パターンの出現情報と、結合候補パターンの候補出現情報との間の類似度に関しては、その類似度が所定の閾値条件を満たすか否かで判定してもよい。 Here, the reference pattern combining unit 14 determines whether or not the similarity calculated in step S309 is equal to or greater than a separately defined threshold (step S310). Note that the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern may be determined based on whether the similarity satisfies a predetermined threshold condition.
 類似度が閾値より小さい場合(ステップS310でNo)、基準パターン結合手段14は、新たなる比較元パターンとして次の基準パターンを取得するために、ステップS302の処理に戻る。 If the similarity is smaller than the threshold (No in step S310), the reference pattern combining unit 14 returns to the process of step S302 to acquire the next reference pattern as a new comparison source pattern.
 一方、類似度が閾値以上の場合(ステップS310でYes)、基準パターン結合手段14は基準パターンを更新する(ステップS311)。基準パターンの更新において、第一に、基準パターン結合手段14は、比較元パターンと結合候補パターンとをそれぞれ結合した結合済みパターンを生成し、生成した結合済みパターンを基準パターン集合に追加する。第二に、基準パターン結合手段14は、比較元パターンと結合候補パターンとを基準パターン集合から削除する。基準パターンを更新すると、ステップS302に戻る。 On the other hand, when the similarity is equal to or higher than the threshold (Yes in Step S310), the reference pattern combining unit 14 updates the reference pattern (Step S311). In updating the reference pattern, first, the reference pattern combining unit 14 generates a combined pattern obtained by combining the comparison source pattern and the combination candidate pattern, and adds the generated combined pattern to the reference pattern set. Second, the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set. When the reference pattern is updated, the process returns to step S302.
 このように、基準パターン結合手段14は、比較元パターンの出現情報と結合候補パターンの候補出現情報との類似度が閾値以上になるまで、ステップS302~ステップS309に相当する処理を繰り返す。 As described above, the reference pattern combining unit 14 repeats the processing corresponding to Step S302 to Step S309 until the similarity between the appearance information of the comparison source pattern and the candidate appearance information of the combination candidate pattern is equal to or greater than the threshold value.
 次に、図9のステップS303において、基準パターン集合の中に比較元パターンがない場合(ステップS303でNo)について、図10を用いて説明する。 Next, the case where there is no comparison source pattern in the reference pattern set in step S303 of FIG. 9 (No in step S303) will be described with reference to FIG.
 図10において、基準パターン結合手段14は、基準パターン集合のパターンを出現頻度の昇順に並び替える(ステップS312)。 In FIG. 10, the reference pattern combining unit 14 rearranges the patterns of the reference pattern set in ascending order of appearance frequency (step S312).
 次に、基準パターン結合手段14は、基準パターン集合から、出現頻度が低い順に基準パターンを取得する(ステップS313)。ここで選択された基準パターンが比較元の基準パターン(以下、比較元パターン)に相当する。 Next, the reference pattern combining unit 14 acquires reference patterns from the reference pattern set in ascending order of appearance frequency (step S313). The reference pattern selected here corresponds to a reference pattern for comparison (hereinafter referred to as comparison pattern).
 ここで、基準パターン結合手段14は、基準パターン集合の中に比較元パターンがあるか否かを判断する(ステップS314)。 Here, the reference pattern combining unit 14 determines whether or not there is a comparison source pattern in the reference pattern set (step S314).
 比較元パターンがある場合(ステップS314でYes)、基準パターン結合手段14は、比較元パターンの出現頻度以下のパターンを比較対象のパターン(以下、比較対象パターン)として基準パターン集合から選択する(ステップS315)。この比較対象パターンの集合を比較対象パターン集合と呼ぶ。一方、比較元パターンがない場合(ステップS314でNo)、ステップS320に進む。 If there is a comparison source pattern (Yes in step S314), the reference pattern combining unit 14 selects a pattern having a frequency equal to or lower than the appearance frequency of the comparison source pattern as a comparison target pattern (hereinafter referred to as a comparison target pattern) from the reference pattern set (step). S315). This set of comparison target patterns is called a comparison target pattern set. On the other hand, if there is no comparison source pattern (No in step S314), the process proceeds to step S320.
 ここで、基準パターン結合手段14は、基準パターン集合の中に比較対象パターンがあるか否かを判断する(ステップS316)。 Here, the reference pattern combining unit 14 determines whether or not there is a comparison target pattern in the reference pattern set (step S316).
 ここで、比較対象パターンがある場合(ステップS316でYes)、基準パターン結合手段14は、比較元パターンの出現情報と、比較対象パターンの出現情報とを比較し、出現情報の類似度Aおよび類似度Bを算出する(ステップS317)。 Here, when there is a comparison target pattern (Yes in step S316), the reference pattern combining unit 14 compares the appearance information of the comparison source pattern with the appearance information of the comparison target pattern, and compares the appearance information similarity A and similarity. The degree B is calculated (step S317).
 類似度A(第1の類似度)は、比較元パターン(第1のパターンとも呼ぶ)の出現頻度(第1の頻度とも呼ぶ)と共通する出現頻度との比である。類似度Aは以下の式2によって算出する。
(類似度A)=(共通する出現情報の出現頻度)/(比較元パターンの出現情報の出現頻度)・・・(2)
 類似度B(第2の類似度)は、出現候補である比較対象パターン(第2のパターンとも呼ぶ)の出現頻度(第2の頻度とも呼ぶ)と共通する出現頻度との比である。類似度Bは以下の式3によって算出する。
(類似度B)=(共通する出現情報の出現頻度)/(比較対象パターンの出現情報の出現頻度)・・・(3)
 なお、共通する出現頻度とは、比較元パターンの出現情報における出現時刻および出現回数と、比較対象パターンの出現情報における出現時刻および出現回数とを比較した際に、両者間で出現時刻および出現回数が一致した際の一致した出現回数の総和である。すなわち、第1のパターンと第2のパターンとを比較した際に、出現情報が一致するパターンの出現回数の総和が共通頻度に相当する。
The similarity A (first similarity) is a ratio between the appearance frequency (also referred to as the first frequency) of the comparison source pattern (also referred to as the first pattern) and the common appearance frequency. The similarity A is calculated by the following formula 2.
(Similarity A) = (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison source pattern) (2)
The similarity B (second similarity) is a ratio between the appearance frequency (also referred to as the second frequency) of the comparison target pattern (also referred to as the second pattern) that is an appearance candidate and the common appearance frequency. The similarity B is calculated by the following formula 3.
(Similarity B) = (Appearance frequency of common appearance information) / (Appearance frequency of appearance information of comparison target pattern) (3)
The common appearance frequency is the appearance time and the number of appearances between the appearance time and the number of appearances in the appearance information of the comparison source pattern and the appearance time and the number of appearances in the appearance information of the comparison target pattern. Is the sum of the number of occurrences of matching. That is, when the first pattern and the second pattern are compared, the total number of appearances of the patterns with the same appearance information corresponds to the common frequency.
 例えば、比較元パターンの出現時刻が「2014/7/1_12:00:01、2014/7/2_12:00:01、2014/7/3_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」であるとする。そして、比較元パターンの出現回数が「2、1、1、2、2」であるとする。また、比較対象パターンの出現時刻が「2014/7/1_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」であるとする。そして、比較対象パターンの出現回数が「2、2、2」であるとする。このとき、両者間で共通する出現時刻は「2014/7/1_12:00:01、2014/7/4_12:00:01、2014/7/5_12:00:01」である。そして、それに対応する出現回数は「2、2、2」であるので、出現回数の総和の「6」が共通する出現頻度となる。その結果、比較元パターンの出現頻度は「8」なので、式2に基づき類似度Aは6/8となり、比較対象パターンの出現頻度は「6」なので、式3に基づき類似度Bは6/6となる。例えば、類似度Aに対する所定の閾値が1、類似度Bに対する所定の閾値が0.8の場合、類似度Aおよび類似度Bの両方において所定の閾値が満たされたことになる。なお、出現情報の比較においては、ステップS207の処理と同様に、出現時刻に差分がある場合であっても出現情報が一致すると判定してもよい。 For example, the appearance time of the comparison source pattern is “2014/7/1 — 12:00:01, 2014/7/2 — 12:00:01, 2014/7/3 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01 ”. The number of appearances of the comparison source pattern is “2, 1, 1, 1, 2, 2”. Further, it is assumed that the appearance time of the comparison target pattern is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”. The number of appearances of the comparison target pattern is “2, 2, 2”. At this time, the appearance time common to both is “2014/7/1 — 12:00:01, 2014/7/4 — 12:00:01, 2014/7/5 — 12:00:01”. Since the number of appearances corresponding to it is “2, 2, 2”, the total appearance frequency “6” is a common appearance frequency. As a result, since the appearance frequency of the comparison source pattern is “8”, the similarity A is 6/8 based on Expression 2, and the appearance frequency of the comparison target pattern is “6”, so the similarity B is 6 / based on Expression 3. 6 For example, when the predetermined threshold value for the similarity A is 1 and the predetermined threshold value for the similarity B is 0.8, both the similarity A and the similarity B satisfy the predetermined threshold. In addition, in the comparison of appearance information, it may be determined that the appearance information matches even when there is a difference in the appearance time, as in the process of step S207.
 一方、比較対象パターンがない場合(ステップS316でNo)、ステップS313に戻る。 On the other hand, when there is no comparison target pattern (No in step S316), the process returns to step S313.
 ここで、基準パターン結合手段14は、ステップS317で算出した類似度Aおよび類似度Bのそれぞれが、別途定義した所定の閾値以上であるか否かを判定する(ステップS318)。なお、類似度Aおよび類似度Bに関しては、他の所定の閾値条件を満たすか否かを判定してもよい。 Here, the reference pattern combining unit 14 determines whether or not each of the similarity A and the similarity B calculated in step S317 is equal to or greater than a predetermined threshold defined separately (step S318). In addition, regarding the similarity A and the similarity B, it may be determined whether other predetermined threshold conditions are satisfied.
 類似度Aおよび類似度Bのそれぞれが、別途定義した閾値未満である場合(ステップS318でNo)、基準パターン結合手段14は、新たなる比較元パターンとして次の基準パターンを取得するために、ステップS313の処理に戻る。 If each of the similarity A and the similarity B is less than a separately defined threshold value (No in step S318), the reference pattern combining unit 14 uses the step to acquire the next reference pattern as a new comparison source pattern. The process returns to S313.
 一方、類似度Aおよび類似度Bのそれぞれが、別途定義した閾値以上である場合(ステップS318でYes)、基準パターン結合手段14は基準パターンを更新する(ステップS319)。基準パターンの更新において、第一に、基準パターン結合手段14は、結合候補パターンと、比較元の基準パターンとを結合した新規基準パターンを生成し、生成した新規基準パターンを基準パターン集合に追加する。なお、新規基準パターンの出現情報は、結合候補パターンと比較元パターンとの共通要素である。第二に、基準パターン結合手段14は、基準パターン集合から比較元パターンおよび結合候補パターンを削除する。基準パターンを更新すると、新たなる比較元パターンとして次の基準パターンを選択するために、ステップS313に戻る。 On the other hand, if each of the similarity A and the similarity B is equal to or greater than a separately defined threshold (Yes in step S318), the reference pattern combining unit 14 updates the reference pattern (step S319). In updating the reference pattern, first, the reference pattern combining unit 14 generates a new reference pattern that combines the combination candidate pattern and the reference pattern of the comparison source, and adds the generated new reference pattern to the reference pattern set. . Note that the appearance information of the new reference pattern is a common element between the combination candidate pattern and the comparison source pattern. Second, the reference pattern combining unit 14 deletes the comparison source pattern and the combination candidate pattern from the reference pattern set. When the reference pattern is updated, the process returns to step S313 to select the next reference pattern as a new comparison source pattern.
 最後に、最大の出現頻度を持つ基準パターンに到達した場合(ステップS314でNo)、基準パターン結合手段14は、ステップS313~S319の繰り返し処理を離脱する。そして、基準パターン結合手段14は、更新された基準パターン集合をパターン集合としてパターン記憶手段15へ出力する(ステップS320)。 Finally, when the reference pattern having the maximum appearance frequency is reached (No in step S314), the reference pattern combining unit 14 leaves the repetition process of steps S313 to S319. Then, the reference pattern combining unit 14 outputs the updated reference pattern set to the pattern storage unit 15 as a pattern set (step S320).
 以上が、基準パターン結合処理についての説明である。 The above is the description of the reference pattern combining process.
 ここで、本実施形態に係るログ分析システムの特徴部分の構成を図11に示す。図11において、基準パターン生成手段13は、ログメッセージの出現情報に基づいて、同期して出現するログメッセージを組み合わせた基準パターンを生成する。基準パターン結合手段14は、基準パターン間の出現情報を比較し、比較した結果に基づいて少なくとも一つの基準パターンを結合する。なお、少なくとも一つの基準パターンを結合するという概念には、結合すべき他の基準パターンが無い基準パターンをそのまま更新することを含む。 Here, the configuration of the characteristic part of the log analysis system according to the present embodiment is shown in FIG. In FIG. 11, the reference pattern generation means 13 generates a reference pattern that combines log messages that appear synchronously based on the appearance information of the log message. The reference pattern combining unit 14 compares the appearance information between the reference patterns and combines at least one reference pattern based on the comparison result. The concept of combining at least one reference pattern includes updating a reference pattern without other reference patterns to be combined as it is.
 〔効果〕
 以上の第1の実施形態に係るログ分析システムによれば、ログメッセージを分析する際に、一定時間内に連続して出力されたメッセージの組み合わせを抽出する時間を短縮することができる。なぜならば、本実施形態に係るログ分析システムの基準パターン生成手段13は、パターン生成時に同期して出力されるログメッセージの出現を個別に比較するのではなく、一つのまとまったメッセージとして比較するからである。
〔effect〕
According to the log analysis system according to the first embodiment described above, when a log message is analyzed, it is possible to shorten the time for extracting a combination of messages that are continuously output within a predetermined time. This is because the reference pattern generation unit 13 of the log analysis system according to the present embodiment does not individually compare the appearance of log messages output in synchronization with pattern generation, but compares them as a single message. It is.
 また、第1の実施形態に係るログ分析システムによれば、ログメッセージの分析時に閾値を定義することによって一定の閾値条件を充足し、共起確率が高いログメッセージのみをグループ化できる。 Further, according to the log analysis system according to the first embodiment, it is possible to group only log messages having a high co-occurrence probability by satisfying a certain threshold condition by defining a threshold value at the time of log message analysis.
 さらに、第1の実施形態に係るログ分析システムによれば、メッセージの個数という制約条件では分割される可能性のある時間幅内にまとまって出現する複数のメッセージを、パターンとして正しく抽出することができる。なぜならば、本実施形態に係るログ分析システムは、時間幅に従って統合ログファイルを読み込み、個々のIDの関係を閾値に従って算出するためである。 Furthermore, according to the log analysis system according to the first embodiment, it is possible to correctly extract, as a pattern, a plurality of messages that appear together within a time width that may be divided under the constraint condition of the number of messages. it can. This is because the log analysis system according to the present embodiment reads the integrated log file according to the time width and calculates the relationship between the individual IDs according to the threshold value.
 (第2の実施形態)
 次に、本発明の第2の実施形態に係るログ分析システム2について説明する。
(Second Embodiment)
Next, a log analysis system 2 according to the second embodiment of the present invention will be described.
 〔構成〕
 図12は、本実施形態に係るログ分析システム2の機能構成を示すブロック図である。本実施形態に係るログ分析システム2は、第1の実施形態に係るログ分析システム1に順序学習手段21を追加した構成をもつ。なお、本実施形態に係るログ分析システム2において、第1の実施形態に係るログ分析システム1の構成(図1)と実質的に同一の構成については、同様の符号を付与して説明を省略する。
〔Constitution〕
FIG. 12 is a block diagram showing a functional configuration of the log analysis system 2 according to the present embodiment. The log analysis system 2 according to the present embodiment has a configuration in which an order learning unit 21 is added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 2 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.
 順序学習手段21は、基準パターン結合手段14が出力したパターン集合に基づいて統合ログを参照し、パターン毎に順序情報22を抽出する。なお、順序情報22は、パターン(組み合わせ)を用いたログの分析時に、当該パターン(組み合わせ)に含まれるログIDが、「パターン(順序)」に含まれた順序通りに出現しているかを分析する際に利用する情報である。なお、パターン(順序)とは、「順序パターン」とも呼ばれ、ログIDを出現順で並べたパターンである。 The order learning means 21 refers to the integrated log based on the pattern set output by the reference pattern combining means 14 and extracts the order information 22 for each pattern. Note that the order information 22 analyzes whether the log IDs included in the pattern (combination) appear in the order included in the “pattern (order)” when analyzing the log using the pattern (combination). This information is used when The pattern (order) is also called “order pattern” and is a pattern in which log IDs are arranged in the order of appearance.
 順序学習手段21は、生成した順序情報22をパターン記憶手段15に出力して記録する。なお、図12には、パターン記憶手段15が順序情報22およびパターン集合150を格納する様子を図示している。 The order learning means 21 outputs the generated order information 22 to the pattern storage means 15 and records it. FIG. 12 illustrates a state in which the pattern storage unit 15 stores the order information 22 and the pattern set 150.
 順序情報22は、少なくとも1つのIDを組み合わせたパターン(組み合わせ)と、そのパターン(組み合わせ)に含まれるIDの並び順を考慮したパターン(順序)と、それぞれのパターン(順序)の発生確率とを含む。また、順序情報22は、パターン(組み合わせ)の集合を別の形式で有し、共通のIDで一意性を保ったままパターン(組み合わせ)を管理できるように構成してもよい。また、順序情報22は、パターンの出現情報を含んでもよい。 The order information 22 includes a pattern (combination) obtained by combining at least one ID, a pattern (order) considering the arrangement order of IDs included in the pattern (combination), and the occurrence probability of each pattern (order). Including. Further, the order information 22 may include a set of patterns (combinations) in another format so that the patterns (combinations) can be managed while maintaining uniqueness with a common ID. The order information 22 may include pattern appearance information.
 図13は、順序情報22の一例の順序情報テーブル220である。図13の順序情報テーブル220では、パターン(組み合わせ)「1001、2004、3009、5025」に関して、2通りの並び順をもつパターン(順序)があることを示す。一つは、パターン(組み合わせ)「1001、2004、3009、5025」、パターン(順序)「1001、2004、3009、5025」、発生回数「90」の組み合わせである。もう一つは、パターン(組み合わせ)「1001、2004、3009、5025」、パターン(順序)「1001、3009、2004、5025」、発生回数「10」の組み合わせである。 FIG. 13 is an order information table 220 as an example of the order information 22. The order information table 220 of FIG. 13 indicates that there are patterns (orders) having two kinds of arrangement orders with respect to the pattern (combination) “1001, 2004, 3009, 5025”. One is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 2004, 3009, 5025”, and an occurrence count “90”. The other is a combination of a pattern (combination) “1001, 2004, 3009, 5025”, a pattern (order) “1001, 3009, 2004, 5025”, and the number of occurrences “10”.
 なお、図13の順序情報テーブル220に示す表記は一例であり、同様の意味をもつ表記方式であれば、樹形図などの一般的な表記方式を用いてパターン(順序)を記憶してもよい。また、発生回数ではなく、発生回数の総和に対する各発生回数の比を発生確率として出力するように構成してもよい。 Note that the notation shown in the order information table 220 in FIG. 13 is an example, and a pattern (order) may be stored using a general notation method such as a tree diagram as long as the notation method has a similar meaning. Good. Further, instead of the number of occurrences, a ratio of each number of occurrences to the total number of occurrences may be output as an occurrence probability.
 以上が、本実施形態に係るログ分析システム2の構成についての説明である。
[動作]
 次に、第2の実施形態に係るログ分析システム2がログメッセージを分析する動作について説明する。なお、出現情報集計処理、基準パターン生成処理および基準パターン結合処理に関しては、第1の実施形態に係るログ分析システム1と同様であるため、説明は省略する。
The above is the description of the configuration of the log analysis system 2 according to the present embodiment.
[Operation]
Next, an operation in which the log analysis system 2 according to the second embodiment analyzes a log message will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.
 [順序情報生成処理]
 図14は、本実施形態に係るログ分析システム2のログ分析システム2による順序情報生成処理に関するフローチャートである。
[Order information generation processing]
FIG. 14 is a flowchart regarding order information generation processing by the log analysis system 2 of the log analysis system 2 according to the present embodiment.
 図14において、まず、順序学習手段21は、パターン記憶手段15からパターン集合を受信する(ステップS401)。なお、順序学習手段21は、基準パターン結合手段14からパターン集合を直接受信するように構成してもよい。 14, first, the order learning means 21 receives a pattern set from the pattern storage means 15 (step S401). Note that the order learning unit 21 may be configured to directly receive the pattern set from the reference pattern combining unit 14.
 次に、順序学習手段21は、受信したパターン集合に含まれる各パターンの出現情報に基づいて統合ログの該当箇所を読み込む(ステップS402)。 Next, the order learning means 21 reads the corresponding part of the integrated log based on the appearance information of each pattern included in the received pattern set (step S402).
 順序学習手段21が読み込む統合ログの該当箇所は、出現情報に記録された出現時刻と別途定義された時間幅によって決定される。例えば、出現時刻が「2014/7/7_09:00:01」、時間幅が「1分」の場合、順序学習手段21は、「2014/7/7_09:00:01」から「2014/7/7_09:01:01」の時間帯の統合ログを読み込む。 The relevant part of the integrated log read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, when the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the order learning unit 21 changes the order from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01 ”is read.
 次に、順序学習手段21は、読み込んだ統合ログの該当箇所に含まれるログメッセージのうち、各パターンに含まれるIDの順序を読み込む(ステップS403)。 Next, the order learning means 21 reads the order of IDs included in each pattern among the log messages included in the corresponding portion of the read integrated log (step S403).
 例えば、パターン「1001、2004、3009、5025」に関して、読み込んだデータが「1001、7049、6036、4900、3009、2004、8088、5025」であったとする。このとき、順序学習手段21は、読み込んだデータに関して、パターン「1001、2004、3009、5025」に含まれるIDのみを参照すると、「1001、3009、2004、5025」というIDの順序が読み込まれる。 For example, assume that the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”. At this time, when the order learning means 21 refers to only the IDs included in the pattern “1001, 2004, 3009, 5025” with respect to the read data, the order of IDs “1001, 3009, 2004, 5025” is read.
 次に、順序学習手段21は、読み込んだIDの順序に関して発生回数を1追加し、順序情報を抽出する(ステップS404)。 Next, the order learning means 21 adds 1 to the number of occurrences regarding the order of the read ID, and extracts the order information (step S404).
 ここで、順序学習手段21は、受信したパターン集合に含まれる全てのパターンに対して順序情報22を生成したか否かを検証する(ステップS405)。 Here, the order learning means 21 verifies whether or not the order information 22 has been generated for all the patterns included in the received pattern set (step S405).
 受信したパターン集合に含まれる全てのパターンに対して順序情報22を生成した場合(ステップS405でYes)、順序学習手段21は、生成した順序情報22をパターン記憶手段15に出力して記録する(ステップS406)。一方、受信したパターン集合に含まれる全てのパターンに対して順序情報22を生成していない場合(ステップS405でNo)、未処理のパターンに対して順序情報22を生成するためにステップS402に戻る。 When the order information 22 is generated for all the patterns included in the received pattern set (Yes in step S405), the order learning means 21 outputs the generated order information 22 to the pattern storage means 15 for recording ( Step S406). On the other hand, if the order information 22 has not been generated for all the patterns included in the received pattern set (No in step S405), the process returns to step S402 to generate the order information 22 for the unprocessed pattern. .
 順序学習手段21は、上記のステップS402~ステップS405の処理を繰り返し、基準パターン結合手段14から受信したパターン集合に含まれる全てのパターンに対して順序情報22を生成する。 The order learning unit 21 repeats the processes of steps S402 to S405 described above, and generates order information 22 for all patterns included in the pattern set received from the reference pattern combining unit 14.
 以上が、ログ分析システム2による順序情報生成処理に関する説明である。 The above is the description regarding the order information generation processing by the log analysis system 2.
 〔効果〕
 第2の実施形態に係るログ分析システムは、基準パターン結合手段が生成した結果に基づいてパターンの順序情報を生成し、少ない計算量でパターンとその順序情報を生成することができる。その理由は、本実施形態に係るログ分析システムが、基準パターン生成手段を備えるからである。
〔effect〕
The log analysis system according to the second embodiment can generate pattern order information based on the result generated by the reference pattern combining unit, and can generate a pattern and its order information with a small amount of calculation. The reason is that the log analysis system according to the present embodiment includes a reference pattern generation unit.
 (第3の実施形態)
 次に、本発明の第3の実施形態に係るログ分析システム3について説明する。
(Third embodiment)
Next, a log analysis system 3 according to the third embodiment of the present invention will be described.
 〔構成〕
 図15は、本実施形態に係るログ分析システム3の機能構成を示すブロック図である。本実施形態に係るログ分析システム3は、第1の実施形態に係るログ分析システム1にログ識別手段31およびログ識別情報32を追加した構成をもつ。なお、本実施形態に係るログ分析システム3において、第1の実施形態に係るログ分析システム1の構成(図1)と実質的に同一の構成については、同様の符号を付与して説明を省略する。
〔Constitution〕
FIG. 15 is a block diagram showing a functional configuration of the log analysis system 3 according to the present embodiment. The log analysis system 3 according to the present embodiment has a configuration in which log identification means 31 and log identification information 32 are added to the log analysis system 1 according to the first embodiment. Note that in the log analysis system 3 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1), and the description thereof is omitted. To do.
 図16は、ログ識別情報32の一例(ログ識別情報320)を示す図である。 FIG. 16 is a diagram showing an example of the log identification information 32 (log identification information 320).
 ログ識別情報32は、ログIDと、そのログIDに対応するレコード表現との組の集合である。ログIDとは、ログ識別子とも呼ばれ、ログメッセージに付与された識別子である。レコード表現とは、ログIDに対応するログメッセージの本文を表現したものである。 The log identification information 32 is a set of a set of a log ID and a record expression corresponding to the log ID. The log ID is also called a log identifier and is an identifier given to the log message. The record representation is a representation of the body of the log message corresponding to the log ID.
 図16の例では、ログID「1001」に該当するログメッセージは、「mysqld started」という文字列を含むことを意味している。なお、図16の例では文字列を示しているが、レコード表現は、ログメッセージとの比較が可能な形態であれば、正規表現や独自に定義したテンプレートなどの任意の情報を用いて表現してもよい。 In the example of FIG. 16, it means that the log message corresponding to the log ID “1001” includes a character string “mysql started”. In the example of FIG. 16, a character string is shown, but the record expression can be expressed using arbitrary information such as a regular expression or a uniquely defined template as long as it can be compared with the log message. May be.
 ログ識別手段31は、ログ収集手段11から読み込んだ統合ログに含まれるログメッセージに対し、ログ識別情報32に記録されたレコード表現を参照してログIDを割り当てる。そして、ログ識別手段31は、ログIDを付与したログメッセージの統合ログをログ集計手段12に出力する。 The log identification unit 31 assigns a log ID to the log message included in the integrated log read from the log collection unit 11 with reference to the record expression recorded in the log identification information 32. Then, the log identification unit 31 outputs an integrated log of the log message to which the log ID is assigned to the log totaling unit 12.
 以上が、本実施形態に係るログ分析システム3の構成についての説明である。 The above is the description of the configuration of the log analysis system 3 according to the present embodiment.
 〔動作〕
 次に、第3の実施形態に係るログ分析システム3の動作について説明する。なお、出現情報集計処理、基準パターン生成処理および基準パターン結合処理に関しては、第1の実施形態に係るログ分析システム1と同様であるため、説明は省略する。
[Operation]
Next, the operation of the log analysis system 3 according to the third embodiment will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.
 〔ログ識別処理〕
 図17は、本実施形態に係るログ分析システム3のログ識別手段31によるログ識別処理に関するフローチャートである。
[Log identification processing]
FIG. 17 is a flowchart regarding log identification processing by the log identification unit 31 of the log analysis system 3 according to the present embodiment.
 図17において、まず、ログ識別手段31は、ログ収集手段11が生成した統合ログを読み込む(ステップS501)。 17, first, the log identification unit 31 reads the integrated log generated by the log collection unit 11 (step S501).
 次に、ログ識別手段31は、ログ識別情報32を参照し、読み込んだ統合ログに含まれるログメッセージに対してログIDを付与する(ステップS502)。 Next, the log identification unit 31 refers to the log identification information 32 and assigns a log ID to the log message included in the read integrated log (step S502).
 ここで、ログ識別手段31は、読み込んだ統合ログに含まれる全てのログメッセージに対してログIDが付与されたか否かを判断する(ステップS503)。 Here, the log identification unit 31 determines whether or not a log ID has been assigned to all log messages included in the read integrated log (step S503).
 全てのログメッセージに対してログIDが付与された場合(ステップS503でYes)、ログ識別手段31は、その統合ログをログ集計手段12に送信する(ステップS504)。 When log IDs are assigned to all log messages (Yes in step S503), the log identification unit 31 transmits the integrated log to the log totaling unit 12 (step S504).
 一方、ログIDが付与されていないログメッセージがある場合(ステップS503でNo)、ログIDが付与されていないログメッセージに対してログIDを付与するために、ステップS502に戻る。 On the other hand, when there is a log message to which no log ID is assigned (No in step S503), the process returns to step S502 in order to assign a log ID to a log message to which no log ID is assigned.
 以降の動作は、第1の実施形態に係るログ分析システム1と同様であるため、説明を省略する。 Since the subsequent operations are the same as those of the log analysis system 1 according to the first embodiment, the description thereof is omitted.
 〔効果〕
 第3の実施形態に係るログ分析システムは、ログ識別情報に基づいて、共通のログIDが付与されていない複数のログファイルからもパターン(組み合わせ)を少ない計算量で生成することができる。なぜならば、第3の実施形態に係るログ分析システムは、ログ識別情報に基づいてログメッセージに対してログIDを付与するログ識別手段と、同期して出現するログをまとめて基準パターンを生成する基準パターン生成手段とを備えているためである。
〔effect〕
The log analysis system according to the third embodiment can generate a pattern (combination) with a small amount of calculation from a plurality of log files to which a common log ID is not assigned based on the log identification information. This is because the log analysis system according to the third embodiment generates a reference pattern by combining log identification means that assigns a log ID to a log message based on log identification information and logs that appear synchronously. This is because it includes reference pattern generation means.
 (第4の実施形態)
 次に、本発明の第4の実施形態に係るログ分析システム4について説明する。
(Fourth embodiment)
Next, a log analysis system 4 according to the fourth embodiment of the present invention will be described.
 〔構成〕
 図18は、第4の実施形態に係るログ分析システム4の機能構成を示すブロック図である。第4の実施形態に係るログ分析システムは、第1の実施形態に係るログ分析システム1に、ログ分類手段41を追加した構成をもつ。なお、本実施形態に係るログ分析システム4において、第1の実施形態に係るログ分析システム1の構成(図1)と実質的に同一の構成については、同様の符号を付与して説明を省略する。
〔Constitution〕
FIG. 18 is a block diagram illustrating a functional configuration of the log analysis system 4 according to the fourth embodiment. The log analysis system according to the fourth embodiment has a configuration in which log classification means 41 is added to the log analysis system 1 according to the first embodiment. Note that, in the log analysis system 4 according to the present embodiment, the substantially same configuration as the configuration of the log analysis system 1 according to the first embodiment (FIG. 1) is denoted by the same reference numeral, and description thereof is omitted. To do.
 ログ分類手段41は、ログ収集手段11から統合ログを読み込み、読み込んだ統合ログに含まれるログメッセージの特徴に基づいて、特徴の類似度を算出する。ログ分類手段41は、算出した類似度が高い複数のログメッセージ同士をグループ化して分類し、同一のグループに分類されたログメッセージに共通のログID(グループ識別子とも呼ぶ)を付与する。そして、ログ分類手段41は、グループ毎に共通のログIDが付与されたログメッセージの統合ログをログ集計手段12に出力する。 The log classification unit 41 reads the integrated log from the log collecting unit 11 and calculates the feature similarity based on the characteristics of the log message included in the read integrated log. The log classification means 41 groups and classifies a plurality of log messages having a high degree of similarity, and assigns a common log ID (also referred to as a group identifier) to the log messages classified into the same group. Then, the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12.
 以上が、本実施形態に係るログ分析システム4の構成についての説明である。 The above is the description of the configuration of the log analysis system 4 according to the present embodiment.
 〔動作〕
 次に、第4の実施形態に係るログ分析システム4の動作について説明する。なお、出現情報集計処理、基準パターン生成処理および基準パターン結合処理に関しては、第1の実施形態に係るログ分析システム1と同様であるため、説明は省略する。
[Operation]
Next, the operation of the log analysis system 4 according to the fourth embodiment will be described. Note that the appearance information tabulation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 1 according to the first embodiment, and thus description thereof is omitted.
 〔ログ識別処理〕
 図19は、本実施形態に係るログ分析システム4のログ分類手段41によるログ分類処理に関するフローチャートである。
[Log identification processing]
FIG. 19 is a flowchart regarding log classification processing by the log classification unit 41 of the log analysis system 4 according to the present embodiment.
 図19において、まず、ログ分類手段41は、ログ収集手段11が生成した統合ログを読み込む(ステップS601)。 19, first, the log classification unit 41 reads the integrated log generated by the log collection unit 11 (step S601).
 次に、ログ分類手段41は、読み込んだ統合ログに含まれる全てのログメッセージについて特徴量を算出し、類似度に基づく分類を行う(ステップS602)。 Next, the log classification means 41 calculates feature amounts for all log messages included in the read integrated log, and performs classification based on the similarity (step S602).
 なお、特徴量の算出ならびに類似度に基づく分類においては、例えば、最短距離法や最長距離法、群平均法、ウォード法、k-Means法などのアルゴリズムならびに指標を用いることができる。 It should be noted that in the calculation based on the feature amount and the classification based on the similarity, for example, an algorithm and an index such as a shortest distance method, a longest distance method, a group average method, a Ward method, and a k-Means method can be used.
 次に、ログ分類手段41は、分類結果に従って、分類された各グループにログIDを付与する(ステップS603)。 Next, the log classification means 41 assigns a log ID to each classified group according to the classification result (step S603).
 そして、ログ分類手段41は、各グループに付与されたログIDに従い、統合ログに含まれる全てのログメッセージに対してログIDを付与する(ステップS604)。 Then, the log classification unit 41 assigns a log ID to all log messages included in the integrated log according to the log ID assigned to each group (step S604).
 最後に、ログ分類手段41は、グループ毎に共通のログIDが付与されたログメッセージの統合ログをログ集計手段12に出力する(ステップS605)。 Finally, the log classification unit 41 outputs an integrated log of log messages to which a common log ID is assigned for each group to the log aggregation unit 12 (step S605).
 以降の動作は、第1の実施形態に係るログ分析システム1と同様であるため、説明を省略する。 Since the subsequent operations are the same as those of the log analysis system 1 according to the first embodiment, the description thereof is omitted.
 〔効果〕
 第4の実施形態に係るログ分析システムによれば、共通のログIDが付与されていない複数のログファイルからもパターン(組み合わせ)を少ない計算量で生成することができる。なぜならば、ログメッセージに基づいて特徴量を算出して分類することで類似するログメッセージに対し一意に識別可能なログIDを付与するログ分類手段と、同期して出現するログをまとめて基準パターンを生成する基準パターン生成手段とを備えるためである。
〔effect〕
According to the log analysis system according to the fourth embodiment, a pattern (combination) can be generated with a small amount of calculation even from a plurality of log files to which a common log ID is not assigned. This is because log classification means for assigning log IDs that can be uniquely identified to similar log messages by calculating and classifying feature amounts based on the log messages, and the logs that appear synchronously together as a reference pattern This is for providing a reference pattern generating means for generating.
 (第5の実施形態)
 次に、本発明の第5の実施形態に係るログ分析システム5について説明する。
(Fifth embodiment)
Next, a log analysis system 5 according to the fifth exemplary embodiment of the present invention will be described.
 〔構成〕
 図20は、第5の実施形態に係るログ分析システム5の機能構成を示すブロック図である。第5の実施形態に係るログ分析システム5は、第2の実施形態に係るログ分析システム2に、遷移時間学習手段51を加えた構成をもつ。なお、本実施形態に係るログ分析システム5において、第2の実施形態に係るログ分析システム2の構成(図12)と実質的に同一の構成については、同様の符号を付与して説明を省略する。また、図20には、パターン記憶手段15が遷移情報52およびパターン集合150を格納する様子を図示している。図20においては省略しているが、パターン記憶手段15は、図12と同様に順序情報22を格納している。
〔Constitution〕
FIG. 20 is a block diagram illustrating a functional configuration of the log analysis system 5 according to the fifth embodiment. The log analysis system 5 according to the fifth embodiment has a configuration in which a transition time learning unit 51 is added to the log analysis system 2 according to the second embodiment. Note that, in the log analysis system 5 according to the present embodiment, the same reference numerals are given to the substantially same configuration as the configuration of the log analysis system 2 according to the second embodiment (FIG. 12), and the description thereof is omitted. To do. FIG. 20 illustrates how the pattern storage unit 15 stores the transition information 52 and the pattern set 150. Although omitted in FIG. 20, the pattern storage unit 15 stores the order information 22 as in FIG.
 遷移時間学習手段51は、順序学習手段21が抽出した各パターンの順序情報22に基づいて、パターン内部における個々のログID間の遷移に掛かる遷移時間を抽出する。 The transition time learning means 51 extracts the transition time required for transition between individual log IDs in the pattern based on the order information 22 of each pattern extracted by the order learning means 21.
 図21は、遷移時間学習手段51が出力する遷移時間の一例(遷移時間テーブル510)を示す図である。遷移時間は、順序情報22におけるログID間の遷移と、遷移に掛かる時間とを表現する。 FIG. 21 is a diagram showing an example of the transition time (transition time table 510) output by the transition time learning means 51. The transition time represents a transition between log IDs in the order information 22 and a time required for the transition.
 図21では、「1001、2004、3009、5025」というパターン(順序)には、「1001→2004」と「2004→3009」と「3009→5025」という三種類の遷移が含まれる。それぞれの遷移時間は、図21の遷移時間テーブル510中の括弧内に示すように、「1秒」「2秒」「1秒」である。 21, the pattern (order) “1001, 2004, 3009, 5025” includes three types of transitions “1001 → 2004”, “2004 → 3009”, and “3009 → 5025”. Each transition time is “1 second”, “2 seconds”, and “1 second” as shown in parentheses in the transition time table 510 of FIG.
 なお、図21では、一例として個々の遷移を分割して表現したが、個々の遷移を記録せずに、対応する遷移時間のみを記録するよう構成してもよい。また、図21では、パターン(組み合わせ)およびパターン(順序)を含むように記録しているが、それぞれ別に記憶されたものを固有の識別子を用いて読み込みできるように構成してもよい。 In FIG. 21, individual transitions are divided and expressed as an example. However, only the corresponding transition times may be recorded without recording individual transitions. In FIG. 21, the pattern (combination) and the pattern (order) are recorded so as to include them. However, it may be configured such that each stored separately can be read using a unique identifier.
 以上が、本実施形態に係るログ分析システム5の構成についての説明である。 The above is the description of the configuration of the log analysis system 5 according to the present embodiment.
 〔動作〕
 次に、第5の実施形態に係るログ分析システム5の動作について説明する。なお、順序情報生成処理、出現情報集計処理、基準パターン生成処理および基準パターン結合処理に関しては、第2の実施形態に係るログ分析システム2と同様であるため、説明は省略する。
[Operation]
Next, the operation of the log analysis system 5 according to the fifth embodiment will be described. Note that the order information generation process, the appearance information aggregation process, the reference pattern generation process, and the reference pattern combination process are the same as those in the log analysis system 2 according to the second embodiment, and thus description thereof is omitted.
 〔遷移時間学習処理〕
 図22は、本実施形態に係るログ分析システム4のログ分類手段41による遷移時間学習処理に関するフローチャートである。
[Transition time learning process]
FIG. 22 is a flowchart regarding the transition time learning process by the log classification unit 41 of the log analysis system 4 according to the present embodiment.
 まず、図22において、順序学習手段21は、パターン集合に含まれる各パターンの出現情報に基づき、統合ログの該当箇所を読み込む(ステップS701)。 First, in FIG. 22, the order learning means 21 reads a corresponding portion of the integrated log based on the appearance information of each pattern included in the pattern set (step S <b> 701).
 なお、順序学習手段21によって読み込まれる該当箇所は、出現情報に記録された出現時刻と、別途定義された時間幅とによって決定される。例えば、出現時刻「2014/7/7_09:00:01」、時間幅「1分」ならば、「2014/7/7_09:00:01」から「2014/7/7_09:01:01」の統合ログを読み込む。 Note that the corresponding portion read by the order learning means 21 is determined by the appearance time recorded in the appearance information and a separately defined time width. For example, if the appearance time is “2014/7 / 7_09: 01: 00” and the time width is “1 minute”, the integration from “2014/7 / 7_09: 01: 00” to “2014/7 / 7_09: 01: 01” Read the log.
 次に、順序学習手段21は、読み込んだ該当箇所に含まれるログメッセージのうち、パターンに含まれるIDの順序を読み込む(ステップS702)。 Next, the order learning means 21 reads the order of IDs included in the pattern among the log messages included in the read corresponding part (step S702).
 例えば、パターン「1001、2004、3009、5025」に関して、読み込んだデータが「1001、7049、6036、4900、3009、2004、8088、5025」であったとする。このとき、読み込んだデータに関してパターンに含まれるIDのみを参照すると、その順序は「1001、3009、2004、5025」である。 For example, assume that the read data is “1001, 7049, 6036, 4900, 3009, 2004, 8088, 5025” for the pattern “1001, 2004, 3009, 5025”. At this time, if only the ID included in the pattern is referred to for the read data, the order is “1001, 3009, 2004, 5025”.
 次に、遷移時間学習手段51は、順序学習手段21が読み込んだIDの順序に基づいて、ID間の遷移時間を計算する(ステップS703)。 Next, the transition time learning means 51 calculates the transition time between IDs based on the order of the IDs read by the order learning means 21 (step S703).
 例えば、ID「1001」の時刻が「2014/7/7_09:00:01」、ID「3009」の時刻が「2014/7/7_09:00:12」であった場合、遷移「1001→3009」の遷移時間は「11秒」である。 For example, when the time of the ID “1001” is “2014/7 / 7_09: 01: 00” and the time of the ID “3009” is “2014/7 / 7_09: 01: 00”, the transition “1001 → 3009” The transition time is “11 seconds”.
 ここで、遷移時間学習手段51は、パターンの出現情報に含まれる全ての出現時刻に関して遷移時間が計算されたか否かを判断する(ステップ704)。 Here, the transition time learning means 51 determines whether or not the transition time has been calculated for all the appearance times included in the pattern appearance information (step 704).
 パターンの出現情報に含まれる出現時刻のうち、全ての遷移時間が計算された場合(ステップS704でYes)、ステップS705に進む。一方、パターンの出現情報に含まれる全ての出現時刻に関して遷移時間が計算されていない場合(ステップS704でNo)、ステップS702に戻る。 If all transition times among the appearance times included in the pattern appearance information are calculated (Yes in step S704), the process proceeds to step S705. On the other hand, if the transition time has not been calculated for all the appearance times included in the pattern appearance information (No in step S704), the process returns to step S702.
 遷移時間学習手段51は、パターンの出現情報に含まれる全ての出現時刻に対し、上述のステップS702とステップS703の処理を繰り返し、各遷移の遷移時間を取得する。 The transition time learning means 51 repeats the processes of step S702 and step S703 described above for all the appearance times included in the pattern appearance information, and acquires the transition time of each transition.
 次に、遷移時間学習手段51は、取得した遷移毎の遷移時間を集計し、平均値や中央値などの値を算出し、各遷移の遷移時間として記録する(ステップS705)。遷移時間学習手段51は、遷移時間として、平均値や中央値、分散などの値を求めて記録してもよいし、最大値と最小値との組のみを記録してもよい。あるいは、遷移時間学習手段51は、全ての遷移時間をそのまま記録するように構成してもよい。 Next, the transition time learning means 51 totals the obtained transition times for each transition, calculates values such as an average value and a median value, and records them as transition times for each transition (step S705). The transition time learning means 51 may obtain and record values such as an average value, a median value, and a variance as the transition time, or may record only a set of a maximum value and a minimum value. Alternatively, the transition time learning means 51 may be configured to record all transition times as they are.
 ここで、遷移時間学習手段51は、パターン集合に含まれる全てのパターンとその遷移に対して遷移時間が計算されたか否かを判断する(ステップS706)。 Here, the transition time learning means 51 determines whether or not the transition time has been calculated for all the patterns included in the pattern set and their transitions (step S706).
 パターン集合に含まれる全てのパターンとその遷移に対して遷移時間が計算された場合(ステップS706でYes)、遷移時間学習手段51は、生成した遷移時間に関する情報(遷移情報52)をパターン記憶手段15に記録する(ステップS707)。一方、パターン集合に含まれる全てのパターンとその遷移に対して遷移時間が計算されていない場合(ステップS706でNo)、ステップS701に戻る。 When transition times are calculated for all patterns included in the pattern set and their transitions (Yes in step S706), the transition time learning unit 51 uses the pattern storage unit to store information about the generated transition times (transition information 52). 15 (step S707). On the other hand, if the transition time has not been calculated for all patterns included in the pattern set and their transitions (No in step S706), the process returns to step S701.
 遷移時間学習手段51は、上記のステップS701~ステップS706の処理をパターン毎に繰り返し、パターン集合に含まれる全てのパターンとその遷移に対して遷移時間を算出する。 The transition time learning means 51 repeats the processing from step S701 to step S706 for each pattern, and calculates transition times for all patterns included in the pattern set and their transitions.
 以上が、ログ分類手段41による遷移時間学習処理に関する説明である。 The above is the description regarding the transition time learning process by the log classification means 41.
 〔効果〕
 第5の実施形態に係るログ分析システムによれば、基準パターン結合手段が生成した結果に基づいてパターン内の各要素間の遷移時間を生成し、少ない計算量でパターンと当該パターンに含まれる識別子間の遷移時間を生成することができる。その理由は、本実施形態に係るログ分析システムが、基準パターン生成手段と遷移時間学習手段とを備えるからである。
〔effect〕
According to the log analysis system according to the fifth embodiment, the transition time between each element in the pattern is generated based on the result generated by the reference pattern combining unit, and the pattern and the identifier included in the pattern with a small amount of calculation The transition time between can be generated. This is because the log analysis system according to the present embodiment includes a reference pattern generation unit and a transition time learning unit.
 (ハードウェア構成)
 次に、本発明の実施形態に係るログ分析システムを可能とするためのハードウェア構成について、図23のコンピュータ60を一例として挙げて説明する。
(Hardware configuration)
Next, a hardware configuration for enabling the log analysis system according to the embodiment of the present invention will be described using the computer 60 of FIG. 23 as an example.
 図23のように、コンピュータ60は、プロセッサ61、主記憶装置62、補助記憶装置63、入出力インターフェース64、通信インターフェース67を備える。プロセッサ61、主記憶装置62、補助記憶装置63、入出力インターフェース64および通信インターフェース67は、バス68を介して互いにデータ授受可能に接続される。また、プロセッサ61、主記憶装置62、補助記憶装置63および入出力インターフェース64は、通信インターフェース67を介して図示しないネットワークと接続される。 23, the computer 60 includes a processor 61, a main storage device 62, an auxiliary storage device 63, an input / output interface 64, and a communication interface 67. The processor 61, the main storage device 62, the auxiliary storage device 63, the input / output interface 64, and the communication interface 67 are connected to each other via a bus 68 so as to be able to exchange data. The processor 61, the main storage device 62, the auxiliary storage device 63, and the input / output interface 64 are connected to a network (not shown) through a communication interface 67.
 プロセッサ61は、補助記憶装置63等に格納されたプログラムを主記憶装置62に展開し、展開されたプログラムを実行する。本実施形態においては、コンピュータ60にインストールされたソフトウェアプログラムを用いる構成とすればよい。また、ネットワーク経由でアクセスできるストレージなどに格納されていたソフトウェアプログラムを用いる構成としてもよい。 The processor 61 expands the program stored in the auxiliary storage device 63 or the like in the main storage device 62, and executes the expanded program. In the present embodiment, a configuration using a software program installed in the computer 60 may be used. Moreover, it is good also as a structure using the software program stored in the storage etc. which can be accessed via a network.
 主記憶装置62は、例えばDRAM等の揮発性メモリとすればよい(DRAM:Dynamic Random Access Memory)。また、MRAM等の不揮発性メモリを主記憶装置62として構成・追加してもよい(MRAM:Magnetoresistive Random Access Memory)。主記憶装置62には、プログラムが展開される。 The main storage device 62 may be a volatile memory such as a DRAM (DRAM: Dynamic Random Access Memory). Further, a non-volatile memory such as MRAM may be configured and added as the main storage device 62 (MRAM: Magnetically Random Access Memory). A program is expanded in the main storage device 62.
 補助記憶装置63は、ハードディスクやフラッシュメモリ等のローカルディスクによって構成される。なお、補助記憶装置63は、コンピュータ60に接続された外部記憶装置としてもよいし、ネットワークを経由して接続されたネットワークストレージとしてもよい。 The auxiliary storage device 63 is configured by a local disk such as a hard disk or a flash memory. Note that the auxiliary storage device 63 may be an external storage device connected to the computer 60 or a network storage connected via a network.
 入出力インターフェース64は、コンピュータ60と周辺機器との接続規格に基づいて、コンピュータ60と周辺機器とを接続する装置である。通信インターフェース67は、図示しないネットワークとプロセッサ61との間のデータ授受を仲介する装置である。なお、図23においては、インターフェースをI/Fと略して表記している(I/F:Interface)。 The input / output interface 64 is a device that connects the computer 60 and peripheral devices based on the connection standard between the computer 60 and peripheral devices. The communication interface 67 is a device that mediates data exchange between a network (not shown) and the processor 61. In FIG. 23, the interface is abbreviated as I / F (I / F: Interface).
 また、コンピュータ60には、必要に応じて、キーボードやマウス、タッチパネルなどといった入力機器を備え付けてもよい。それらの入力機器は、情報や設定の入力に使用するために使用される。なお、タッチパネルを入力機器として用いる場合は、表示機器が入力機器を兼ねる構成となる。プロセッサ61と入力機器との間のデータ授受は、入出力インターフェース64に仲介させればよい。 In addition, the computer 60 may be provided with input devices such as a keyboard, a mouse, and a touch panel as necessary. These input devices are used to input information and settings. Note that when the touch panel is used as an input device, the display device also serves as the input device. Data exchange between the processor 61 and the input device may be mediated by the input / output interface 64.
 また、コンピュータ60には、情報を表示するための表示機器を備え付けてもよい。表示機器を備え付ける場合、コンピュータ60には、表示機器の表示を制御するための表示制御装置(図示しない)が備えられる。図示しない表示機器は、入出力インターフェース64を介して接続すればよい。 Further, the computer 60 may be provided with a display device for displaying information. When the display device is provided, the computer 60 is provided with a display control device (not shown) for controlling the display of the display device. A display device (not shown) may be connected via the input / output interface 64.
 また、コンピュータ60には、必要に応じて、リーダライタを備え付ける。リーダライタは、バス68に接続され、プロセッサ61と図示しない記録媒体(プログラム記録媒体)との間のデータ授受を仲介し、記録媒体からのデータ・プログラムの読み出し、コンピュータ60の処理結果を記録媒体に書き込む。記録媒体は、例えばSDカード等の半導体記録媒体などで実現できる(SD:Secure Digital)。また、記録媒体は、フレキシブルディスク等の磁気記録媒体、CDやDVD等の光学記録媒体によって実現してもよい(CD:Compact Disc、DVD:Digital Versatile Disc)。 Further, the computer 60 is provided with a reader / writer as necessary. The reader / writer is connected to the bus 68, mediates data exchange between the processor 61 and a recording medium (program recording medium) (not shown), reads a data program from the recording medium, and records the processing results of the computer 60 as a recording medium. Write to. The recording medium can be realized by, for example, a semiconductor recording medium such as an SD card (SD: Secure Digital). The recording medium may be realized by a magnetic recording medium such as a flexible disk, or an optical recording medium such as a CD or a DVD (CD: Compact Disc, DVD: Digital Versatile Disc).
 以上が、本発明の実施形態に係るログ分析システムを可能とするためのハードウェア構成の一例である。なお、図23のハードウェア構成は、本実施形態に係るログ分析システムを可能とするためハードウェア構成の一例であって、本発明の範囲を限定するものではない。また、本実施形態に係るログ分析システムの処理をコンピュータに実行させるログ分析プログラムも本発明の範囲に含まれる。さらに、本発明の実施形態に係るログ分析プログラムを記録したプログラム記録媒体も本発明の範囲に含まれる。 The above is an example of the hardware configuration for enabling the log analysis system according to the embodiment of the present invention. Note that the hardware configuration in FIG. 23 is an example of a hardware configuration to enable the log analysis system according to the present embodiment, and does not limit the scope of the present invention. A log analysis program that causes a computer to execute the processing of the log analysis system according to the present embodiment is also included in the scope of the present invention. Furthermore, a program recording medium that records a log analysis program according to an embodiment of the present invention is also included in the scope of the present invention.
 上述した各実施形態は、適宜組み合わせて実施されることが可能である。各ブロック図に示したブロック分けは、説明の便宜上から表された構成である。各実施形態を例に説明された本発明は、その実装に際して、各ブロック図に示した構成には限定されない。また、本発明の実施形態に係るログ分析システムの動作の説明においては、複数の動作を順番に説明しているが、それらの動作の順番は支障のない範囲で変更することができる。また、それらの動作は、それぞれ別々のタイミングで実行されるとは限らない。例えば、ある動作の実行中に他の動作が並行して発生したり、ある動作と他の動作との実行タイミングが部分的にないし全部において重複したりしてもよい。さらに、各動作の説明においては、発明の理解を容易にするため、ある動作が他の動作の契機になるように記載しているが、そのような記載は、ある動作と他の動作との関係を限定するものではない。そのため、各実施形態を実施するときには、その複数の動作の関係は内容的に支障のない範囲で変更することができる。 Each embodiment described above can be implemented in appropriate combination. The block division shown in each block diagram is a configuration shown for convenience of explanation. The present invention described by taking each embodiment as an example is not limited to the configuration shown in each block diagram in the implementation. Further, in the description of the operation of the log analysis system according to the embodiment of the present invention, a plurality of operations are described in order, but the order of these operations can be changed within a range where there is no problem. Also, these operations are not always executed at different timings. For example, another operation may occur in parallel during the execution of a certain operation, or the execution timing of a certain operation and another operation may partially or entirely overlap. Furthermore, in the description of each operation, in order to facilitate the understanding of the invention, it is described that an operation is a trigger for another operation. It does not limit the relationship. Therefore, when each embodiment is implemented, the relationship between the plurality of operations can be changed within a range that does not hinder the contents.
 本発明の実施形態に係るログ分析システムは、情報処理システムや物理プラントなどを運用管理する技術に応用することができる。 The log analysis system according to the embodiment of the present invention can be applied to a technology for operating and managing an information processing system, a physical plant, and the like.
 以上、上述した実施形態を模範的な例として本発明を説明した。しかしながら、本発明は、上述した実施形態には限定されない。即ち、本発明は、本発明のスコープ内において、当業者が理解し得る様々な態様を適用することができる。 The present invention has been described above using the above-described embodiment as an exemplary example. However, the present invention is not limited to the above-described embodiment. That is, the present invention can apply various modes that can be understood by those skilled in the art within the scope of the present invention.
 この出願は、2014年11月10日に出願された日本出願特願2014-227706を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2014-227706 filed on November 10, 2014, the entire disclosure of which is incorporated herein.
 1、2、3、4、5  ログ分析システム
 11  ログ収集手段
 12  ログ集計手段
 13  基準パターン生成手段
 14  基準パターン結合手段
 15  パターン記憶手段
 21  順序学習手段
 22  順序情報
 31  ログ識別手段
 32  ログ識別情報
 41  ログ分類手段
 51  遷移時間学習手段
 52  遷移情報
 60  コンピュータ
 61  プロセッサ
 62  主記憶装置
 63  補助記憶装置
 64  入出力インターフェース
 67  通信インターフェース
 68  バス
1, 2, 3, 4, 5 Log analysis system 11 Log collection means 12 Log aggregation means 13 Reference pattern generation means 14 Reference pattern combination means 15 Pattern storage means 21 Order learning means 22 Order information 31 Log identification means 32 Log identification information 41 Log classification means 51 Transition time learning means 52 Transition information 60 Computer 61 Processor 62 Main storage device 63 Auxiliary storage device 64 Input / output interface 67 Communication interface 68 Bus

Claims (10)

  1.  ログメッセージの出現情報に基づいて、同期して出現する前記ログメッセージの組み合わせ毎に基準パターンを生成する基準パターン生成手段と、
     前記基準パターンに含まれる前記ログメッセージの出現情報を前記基準パターン間で比較し、比較した結果に基づいて前記基準パターン同士を結合する基準パターン結合手段とを備えるログ分析システム。
    A reference pattern generating means for generating a reference pattern for each combination of the log messages that appear synchronously based on the appearance information of the log message;
    A log analysis system comprising: reference pattern combining means for comparing appearance information of the log message included in the reference pattern between the reference patterns and combining the reference patterns based on the comparison result.
  2.  分析対象システムのログファイルを少なくとも一つ取得し、取得した前記ログファイルに含まれる前記ログメッセージをまとめた統合ログを生成するログ収集手段と、
     前記ログ収集手段が生成した前記統合ログに含まれる前記ログメッセージのそれぞれに付与されたログ識別子に対応させて、所定の時間帯において、前記ログメッセージが出現する出現時間と、前記出現時間において前記ログメッセージが出現する出現回数とを前記出現情報として集計するログ集計手段とを備える請求項1に記載のログ分析システム。
    Log collection means for acquiring at least one log file of the analysis target system and generating an integrated log in which the log messages included in the acquired log file are collected;
    Corresponding to the log identifier assigned to each of the log messages included in the integrated log generated by the log collecting means, the appearance time at which the log message appears in a predetermined time zone, and the appearance time at the appearance time The log analysis system according to claim 1, further comprising: a log totaling unit that counts the number of appearances of log messages as the appearance information.
  3.  前記基準パターン生成手段は、
     前記ログ集計手段が集計した前記出現時刻および前記出現回数のうち少なくとも一方に基づいて前記ログ識別子同士を組み合わせることによって前記基準パターンを生成し、
     前記基準パターン結合手段は、
     前記基準パターン生成手段が組み合わせた前記基準パターンに関して、前記出現時刻および前記出現回数のうち少なくとも一方の類似度が所定の閾値条件を満たす前記基準パターンを少なくとも一つ選択し、選択した前記少なくとも一つの基準パターンを結合したパターン集合を出力する請求項2に記載のログ分析システム。
    The reference pattern generation means includes
    Generating the reference pattern by combining the log identifiers based on at least one of the appearance time and the number of appearances counted by the log aggregation means;
    The reference pattern combining means includes
    With respect to the reference pattern combined by the reference pattern generation means, at least one of the appearance times and the number of appearances at least one of the reference patterns satisfying a predetermined threshold condition is selected, and the selected at least one selected The log analysis system according to claim 2, wherein a pattern set obtained by combining the reference patterns is output.
  4.  前記基準パターン結合手段は、
     前記基準パターン集合の中から選択した比較元となる第1のパターンと、前記基準パターン集合に含まれる比較対象となる第2のパターンとを比較し、前記第1および第2のパターンのうち前記出現情報が一致するパターンの出現回数の総和に相当する共通頻度に関して、前記第1のパターンの出現回数の総和に相当する第1の出現頻度と前記共通頻度との比である第1の類似度と、前記第2のパターンの出現回数の総和に相当する第2の出現頻度と前記共通頻度との比である第2の類似度を算出し、前記第1および前記第2の類似度の両方が所定の閾値条件を満たす前記基準パターン同士を結合する請求項3に記載のログ分析システム。
    The reference pattern combining means includes
    A first pattern that is a comparison source selected from the reference pattern set is compared with a second pattern that is a comparison target included in the reference pattern set, and the first pattern and the second pattern are compared with each other. Regarding the common frequency corresponding to the sum of the number of appearances of the patterns having the same appearance information, the first similarity that is a ratio of the first appearance frequency corresponding to the sum of the number of appearances of the first pattern and the common frequency And calculating a second similarity that is a ratio of the second appearance frequency corresponding to the sum of the number of appearances of the second pattern and the common frequency, and both the first and second similarities are calculated. The log analysis system according to claim 3, wherein the reference patterns satisfying a predetermined threshold condition are combined.
  5.  前記ログ収集手段から前記統合ログを読み込み、前記ログメッセージに付与された前記ログ識別子と、前記ログ識別子に対応する前記ログメッセージの本文との組の集合であるログ識別情報に基づいて、前記統合ログに含まれる前記ログメッセージの本文を参照して前記ログメッセージに前記ログ識別子を割り当てるログ識別手段を備える請求項2乃至4のいずれか一項に記載のログ分析システム。 The integrated log is read from the log collecting means, and based on log identification information that is a set of the log identifier assigned to the log message and the body of the log message corresponding to the log identifier The log analysis system according to any one of claims 2 to 4, further comprising log identification means for assigning the log identifier to the log message with reference to a body of the log message included in the log.
  6.  前記ログ収集手段から前記統合ログを読み込み、前記統合ログに含まれる前記ログメッセージの特徴の類似度が高い前記ログメッセージ同士をグループ化して分類し、同一のグループに分類された前記ログメッセージに対して共通のグループ識別子を付与するログ分類手段を備える請求項2乃至5のいずれか一項に記載のログ分析システム。 The integrated log is read from the log collecting means, the log messages having high similarity in the characteristics of the log messages included in the integrated log are grouped and classified, and the log messages classified into the same group The log analysis system according to any one of claims 2 to 5, further comprising log classification means for assigning a common group identifier.
  7.  前記基準パターン結合手段が出力した前記パターン集合に基づいて前記統合ログを参照し、前記パターン集合に含まれるパターン毎に前記パターンの順序情報を抽出する順序学習手段を備える請求項1乃至6のいずれか一項に記載のログ分析システム。 The order learning means for referring to the integrated log based on the pattern set output by the reference pattern combining means and extracting the order information of the pattern for each pattern included in the pattern set. The log analysis system according to claim 1.
  8.  順序学習手段が抽出した前記パターンの順序情報に基づいて、前記パターン内部における個々の前記ログ識別子間の遷移に掛かる遷移時間を抽出する遷移時間学習手段を備える請求項7に記載のログ分析システム。 The log analysis system according to claim 7, further comprising transition time learning means for extracting a transition time required for a transition between the individual log identifiers in the pattern based on the order information of the pattern extracted by the order learning means.
  9.  ログメッセージの出現情報に基づいて、同期して出現する前記ログメッセージの組み合わせ毎に基準パターンを生成し、
     前記基準パターンに含まれる前記ログメッセージの出現情報を前記基準パターン間で比較し、比較した結果に基づいて前記基準パターン同士を結合するログ分析方法。
    Based on the log message appearance information, generate a reference pattern for each combination of log messages that appear synchronously,
    A log analysis method for comparing appearance information of the log message included in the reference pattern between the reference patterns and combining the reference patterns based on the comparison result.
  10.  ログメッセージの出現情報に基づいて、同期して出現する前記ログメッセージの組み合わせ毎に基準パターンを生成する処理と、
     前記基準パターンに含まれる前記ログメッセージの出現情報を前記基準パターン間で比較し、比較した結果に基づいて前記基準パターン同士を結合する処理とをコンピュータに実行させるログ分析プログラムを記録するプログラム記録媒体。
    A process of generating a reference pattern for each combination of log messages that appear synchronously based on the appearance information of log messages;
    A program recording medium for recording a log analysis program that causes a computer to perform appearance processing of combining the reference patterns based on the comparison result by comparing appearance information of the log message included in the reference pattern between the reference patterns .
PCT/JP2015/005570 2014-11-10 2015-11-06 Log analyzing system, log analyzing method, and program recording medium WO2016075915A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016558878A JP6665784B2 (en) 2014-11-10 2015-11-06 Log analysis system, log analysis method and log analysis program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-227706 2014-11-10
JP2014227706 2014-11-10

Publications (1)

Publication Number Publication Date
WO2016075915A1 true WO2016075915A1 (en) 2016-05-19

Family

ID=55954018

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/005570 WO2016075915A1 (en) 2014-11-10 2015-11-06 Log analyzing system, log analyzing method, and program recording medium

Country Status (2)

Country Link
JP (1) JP6665784B2 (en)
WO (1) WO2016075915A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018116322A (en) * 2017-01-16 2018-07-26 株式会社日立製作所 Log message grouping apparatus, log message grouping system and log message grouping method
WO2019103180A1 (en) * 2017-11-22 2019-05-31 한화테크윈주식회사 Data visualization system and method, and computer-readable recording medium
JP2019139565A (en) * 2018-02-13 2019-08-22 日本電気株式会社 Management device, management method, and program therefor
WO2020122522A1 (en) * 2018-12-10 2020-06-18 삼성전자(주) Electronic device and method for controlling same
JP2020149250A (en) * 2019-03-12 2020-09-17 富士通株式会社 Output program, output method and information processing device
CN112912877A (en) * 2018-09-03 2021-06-04 松下电器产业株式会社 Log output device, log output method, and log output system
CN113595787A (en) * 2021-07-27 2021-11-02 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
WO2022224582A1 (en) * 2021-04-23 2022-10-27 日立Astemo株式会社 Information processing device, information processing method, program, and storage medium
WO2023162390A1 (en) * 2022-02-25 2023-08-31 三菱電機株式会社 Analysis device and analysis method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005216148A (en) * 2004-01-30 2005-08-11 Yamatake Corp Alarm analyzer, analyzing method and program
JP2006004346A (en) * 2004-06-21 2006-01-05 Fujitsu Ltd Pattern detecting program
JP2014035749A (en) * 2012-08-10 2014-02-24 Nippon Telegr & Teleph Corp <Ntt> Log generation rule creation device and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8738625B2 (en) * 2012-06-05 2014-05-27 Hitachi, Ltd. Log management system and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005216148A (en) * 2004-01-30 2005-08-11 Yamatake Corp Alarm analyzer, analyzing method and program
JP2006004346A (en) * 2004-06-21 2006-01-05 Fujitsu Ltd Pattern detecting program
JP2014035749A (en) * 2012-08-10 2014-02-24 Nippon Telegr & Teleph Corp <Ntt> Log generation rule creation device and method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018116322A (en) * 2017-01-16 2018-07-26 株式会社日立製作所 Log message grouping apparatus, log message grouping system and log message grouping method
WO2019103180A1 (en) * 2017-11-22 2019-05-31 한화테크윈주식회사 Data visualization system and method, and computer-readable recording medium
JP7006347B2 (en) 2018-02-13 2022-01-24 日本電気株式会社 Management device, management method and its program
JP2019139565A (en) * 2018-02-13 2019-08-22 日本電気株式会社 Management device, management method, and program therefor
CN112912877B (en) * 2018-09-03 2024-06-04 松下控股株式会社 Log output device, log output method, and log output system
CN112912877A (en) * 2018-09-03 2021-06-04 松下电器产业株式会社 Log output device, log output method, and log output system
US11537491B2 (en) 2018-12-10 2022-12-27 Samsung Electronics Co., Ltd. Electronic apparatus and method of controlling the same
WO2020122522A1 (en) * 2018-12-10 2020-06-18 삼성전자(주) Electronic device and method for controlling same
JP2020149250A (en) * 2019-03-12 2020-09-17 富士通株式会社 Output program, output method and information processing device
WO2022224582A1 (en) * 2021-04-23 2022-10-27 日立Astemo株式会社 Information processing device, information processing method, program, and storage medium
CN113595787A (en) * 2021-07-27 2021-11-02 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
CN113595787B (en) * 2021-07-27 2024-03-29 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
WO2023162390A1 (en) * 2022-02-25 2023-08-31 三菱電機株式会社 Analysis device and analysis method

Also Published As

Publication number Publication date
JPWO2016075915A1 (en) 2017-08-17
JP6665784B2 (en) 2020-03-13

Similar Documents

Publication Publication Date Title
WO2016075915A1 (en) Log analyzing system, log analyzing method, and program recording medium
US9753801B2 (en) Detection method and information processing device
US10514974B2 (en) Log analysis system, log analysis method and program recording medium
JP7184078B2 (en) LOG ANALYSIS SYSTEM, LOG ANALYSIS METHOD AND PROGRAM
JP5341209B2 (en) System, method and program for checking pointer consistency in hierarchical database
US10248517B2 (en) Computer-implemented method, information processing device, and recording medium
US20180349468A1 (en) Log analysis system, log analysis method, and log analysis program
JP7247021B2 (en) Information processing device, prediction discrimination system, and prediction discrimination method
JPWO2017104119A1 (en) Log analysis system, method and program
JP6242540B1 (en) Data conversion system and data conversion method
JP6955676B2 (en) Log analysis method, system and recording medium
JP5875430B2 (en) Abnormality detection apparatus, program, and abnormality detection method
CN104603779A (en) Text mining device, text mining method, and computer-readable recording medium
US10042686B2 (en) Determination method, selection method, and determination device
WO2018122889A1 (en) Abnormality detection method, system, and program
KR102183053B1 (en) Apparatus, method, computer-readable storage medium and computer program for cleaning knowledge graph
WO2017175375A1 (en) Data cleansing system, method, and program
JP6547341B2 (en) INFORMATION PROCESSING APPARATUS, METHOD, AND PROGRAM
JP2016126532A (en) Calculation program, information processing apparatus, and calculation method
JP6621385B2 (en) Text analysis system and text analysis method
JP2008210068A (en) Data processor, data processing method and program
JP2016040707A (en) Software verification program, software verification method and software verification system
JP7021401B1 (en) Logging support device, logging system, logging support method and program
US20220253529A1 (en) Information processing apparatus, information processing method, and computer readable medium
JP6508202B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15859991

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016558878

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15859991

Country of ref document: EP

Kind code of ref document: A1