US20140344622A1 - Scalable Log Analytics - Google Patents
Scalable Log Analytics Download PDFInfo
- Publication number
- US20140344622A1 US20140344622A1 US13/897,994 US201313897994A US2014344622A1 US 20140344622 A1 US20140344622 A1 US 20140344622A1 US 201313897994 A US201313897994 A US 201313897994A US 2014344622 A1 US2014344622 A1 US 2014344622A1
- Authority
- US
- United States
- Prior art keywords
- log
- event
- message
- determining
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002547 anomalous effect Effects 0.000 claims abstract description 19
- 239000000203 mixture Substances 0.000 claims description 50
- 238000000034 method Methods 0.000 claims description 38
- 238000010223 real-time analysis Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 29
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006855 networking Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0775—Content or structure details of the error report, e.g. specific table structure, specific error fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
Definitions
- System administrators provide virtualized computing infrastructure, which typically includes a plurality of virtual machines executing on a shared set of physical hardware components, to offer highly available, fault-tolerant distributed systems.
- a large-scale virtualized infrastructure may have many (e.g., thousands) of virtual machines running on many of physical machines.
- High availability requirements provide system administrators with little time to diagnose or bring down parts of infrastructure for maintenance.
- Fault-tolerant features ensure the virtualized computing infrastructure continues to operate when problems arise, but generates many intermediate states that have to be reconciled and addressed. As such, identifying, debugging, and resolving failures and performance issues for virtualized computing environments have become increasingly challenging.
- One or more embodiments disclosed herein provide a method for providing real-time analysis of log messages for a computer infrastructure.
- the method includes receiving a plurality of log messages including a first log message, and generating a sketch associated with the first log message.
- the sketch may be generated based on words contained in the first log message.
- the method further includes determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index.
- Log messages of a same message type have similar sketches.
- the method includes determining a first log event associated with one or more log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the associated log messages.
- the method further includes determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index, and determining an anomalous log event within the plurality of log messages based on the classification for the first log event.
- FIG. 1A depicts a block diagram that illustrates a computing system with which one or more embodiments of the present disclosure may be utilized.
- FIG. 1B is a block diagram that illustrates a virtualized computing system with which one or more embodiments of the present disclosure may be utilized.
- FIG. 2 is a block diagram that illustrates a workflow for analyzing log data of the computing system, according to one embodiment of the present disclosure.
- FIGS. 3A-3B are block diagrams that depict examples of event pattern and event volume anomalies, according to one embodiment of the present disclosure.
- FIG. 4 is a flow diagram that illustrates steps for a method for analyzing log data of a computing system, according to an embodiment of the present disclosure.
- log data sometimes referred to as runtime logs, error logs, debugging logs
- log data is reduced in both volume and level of detail by first classifying messages into types by content similarity. The log data is then reduced further by grouping bursts of messages into log events. Patterns in log events, such as the collection and number of different messages types that comprise each log event, can be used to identify anomalous events within the log data. For example, patterns in the log events may be used to detect when log events occur that differ in message type composition, or when log events occur that differ in frequency of occurrence over time.
- FIG. 1A is a block diagram that illustrates a computing system 100 with which one or more embodiments of the present invention may be utilized.
- computing system 100 includes a plurality of server systems, identified as server system 102 - 1 , 102 - 2 , 102 - 3 , and referred to collectively as servers 102 .
- Each server 102 includes CPU 104 , memory 106 , networking interface 110 , storage interface 114 , and other conventional components of a computing device.
- Each server 102 further includes an operating system 120 configured to manage execution of one or more applications 122 using the computing resources (e.g., CPU 104 , memory 106 , networking interface 110 , storage interface 114 ).
- log data may indicate the state, and state transitions, that occur during operation, and may record occurrences of failures, as well as unexpected and undesirable events.
- log data may be unstructured text comprised of a plurality of log messages, including status updates, error messages, stack traces, and debugging messages.
- log analytics module 132 configured to store and analyze in real-time log data 134 from software and infrastructure components of computing system 100 .
- Log analytics module 132 may include a log index 136 configured to cache (and later query) results of the analysis of log data.
- Log analytics module 132 reduces the volume and level of details of the log data to enable a user (e.g., system administrator) to diagnose and troubleshoot issues within computing system 100 .
- log analytics module 132 is configured to parse a stream of log messages within log data and identify groups of log messages as logical events, referred to interchangeably as “log events” or “events”. To do so, log analytics module 132 is configured to classify log messages within a stream of log data as log message types that cluster together similar log messages. Log analytics module 132 is further configured to perform event detection on log messages within log data to group together log messages based on their occurrence close in time in a sequence. As described later, in one embodiment, events may be defined as a collection of log message types, and an occurrence of an event corresponds to a group of log messages having the requisite log message types appearing in log data. Log analytics module 132 may further identify anomalies within log data based on the message-type classifications and detected events, such as event volume anomalies and event pattern anomalies.
- log analytics module 132 indicates to a user what one event (as reported by log messages) means in relation to other events in the log data and highlights events occurring within computing system 100 in context.
- log analytics module 132 may highlight certain events in the context of being nearby in time to other events, such that if the certain events usually occur in a sequence, then events occurring out of that sequence may be notable.
- log analytics module 132 may highlight certain events in the context of being similar to other events, such that similar events may be clustered and analyzed together rather than be considered separately.
- log analytics module 132 may highlight certain events in the context of the hierarchical infrastructure of computing system 100 , such as being from the same thread, process, application, virtual machine, host, host group, data center, etc. The operations of log analytics module 132 are illustrated in greater detail in conjunction with FIG. 2 .
- FIG. 1B is a block diagram that illustrates a computing system 150 with which one or more embodiments of the present disclosure may be utilized.
- computing system 150 includes a host group 124 of host computers, identified as hosts 108 - 1 , 108 - 2 , 108 - 3 , and 108 - 4 , and referred to collectively as hosts 108 .
- Each host 108 is configured to provide a virtualization layer that abstracts computing resources of a hardware platform 118 into multiple virtual machines (VMs) 112 that run concurrently on the same host 108 .
- Hardware platform 118 of each host 108 may include conventional components of a computing device, such as a memory, processor, local storage, disk interface, and network interface.
- the VMs 112 run on top of a software interface layer, referred to herein as a hypervisor 116 , that enables sharing of the hardware resources of host 108 by the virtual machines.
- hypervisor 116 One example of hypervisor 116 that may be used in an embodiment described herein is a VMware ESXi hypervisor provided as part of the VMware vSphere solution made commercially available from VMware, Inc.
- Hypervisor 116 may run on top of the operating system of host 108 or directly on hardware components of host 108 .
- Each VM 112 includes a guest operating system (e.g., Microsoft Windows, Linux) and one or more guest applications and processes running on top of the guest operating system.
- a guest operating system e.g., Microsoft Windows, Linux
- computing system 150 includes virtualization management software 130 that may communicate with the plurality of hosts 108 via network 140 .
- Virtualization management software 130 is configured to carry out administrative tasks for the computing system 100 , including managing hosts 108 , managing VMs running within each host 108 , provisioning VMs, migrating VMs from one host to another host, and load balancing between hosts 108 of host group 124 .
- virtualization management software 130 is a computer program that resides and executes in a central server, which may reside in computing system 100 , or alternatively, running as a VM in one of hosts 108 .
- a virtualization management software is the vCenter® Server product made available from VMware, Inc.
- the software and infrastructure components of computing system 100 may generate large amount of log data in real-time during operation.
- log analytics module 132 is depicted in FIG. 1B as a separate component that resides and executes on a separate server or virtual machine, it is appreciated that log analytics module 132 may alternatively reside in any one of the computing devices of the virtualized computing system 150 , for example, such as the same central server where the virtualization management software 130 resides.
- log analytics module 132 may be embodied as a plug-in component configured to extend functionality of virtualization management software 130 .
- Access to the log analytics module 132 can be achieved via a client application (not shown). For example, each analysis task, such as searching for log messages, filtering for log messages, analyzing log messages over a period of time, can be accomplished through the client application.
- client application provides a stand-alone application version of the client application.
- the client application is implemented as a web browser application that provides management access from any networked device.
- FIG. 2 is a block diagram that illustrates a workflow for analyzing log data 134 of a computing infrastructure, according to one embodiment of the present disclosure. It should be recognized that, even though the workflow is described in conjunction with the system of FIG. 1A , any system configured to perform the illustrated technique is within the scope of embodiments of the disclosure.
- log data 134 may include a plurality of individual log messages 202 - 1 to 202 - 5 (collectively referred to as log messages 202 ) generated over a period of time.
- a log message may include a time stamp (e.g., “Sep 23 13:30”) indicating a date and time corresponding to the creation of the log message and a text description (e.g., “host1 sending 5738 files”). While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines.
- a time stamp e.g., “Sep 23 13:30”
- a text description e.g., “host1 sending 5738 files”. While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines.
- log analytics module 132 may classify each log message 202 as a message type based on content similarity of the log messages. In some embodiments, the content similarity is performed on the text description portion of the log message 202 . In the example shown in FIG. 2 , log analytics module 132 processes log message 202 - 1 (i.e., “Sep 23 13:30 host1 sending 5738 files”) and assigns log message 202 - 1 a first message type 204 - 1 .
- Log analytics module 132 then processes a second log message 202 - 2 (i.e., “Sep 23 13:31 host2 received 5700 files”) and determines the contents of second log message 202 - 2 are not sufficiently similar to first log message 202 - 1 and assigns a different, second message type 204 - 2 .
- log analytics module 132 processes a third log message 202 - 3 (i.e., “Sep 23 13:32 host1 warning: 38 files pending”) and assigns a third message type 204 - 3 upon determining no content similarity with the other already processed log messages.
- log messages having different message types are depicted in FIG. 2 as shapes having different patterns.
- log analytics module 132 may determine content similarity of log messages according to a “sketching” algorithm that determines if log messages contain a number of words in common in the same relative position. Determination of content similarity and the sketching algorithm are described in greater detail below.
- log analytics module 132 processes a fourth log message 202 - 4 (i.e., “Sep 23 14:00 host4 sending 382 files”) and determines content similarity with log message 202 - 1 . As such, log analytics module 132 assigns log message 202 - 4 the same first message type 204 - 1 as log message 202 - 1 , as depicted in FIG. 2 by identical patterned highlights or colors. Similarly, log analytics module 132 processes a fifth log message 202 - 5 (i.e., “Sep 23 14:01 host5 received 382 files”) and assigns the second message type 204 - 2 based on a determination of content similarity with log message 202 - 2 .
- a fourth log message 202 - 4 i.e., “Sep 23 14:00 host4 sending 382 files”
- log analytics module 132 assigns log message 202 - 4 the same first message type 204 - 1 as log message 202 - 1 , as depicted in FIG.
- log analytics module 132 is configured to identify one or more log events 206 based on the timing of the log messages. In some embodiments, log analytics module 132 may group one or more log messages 202 into log events 206 according to a burst analysis algorithm. For example, log analytics module 132 identifies a first log event 206 - 1 that includes log messages 202 - 1 , 202 - 2 , 202 - 3 , which all occur approximately the same time at September 23, 13:30 and a second log event 206 - 2 that includes log messages 202 - 4 , 202 - 5 that all occur around September 23 14:00. In one embodiment, log analytics module 132 is configured to represent each identified log event 206 as a composition of message types of log messages.
- an event type for a log event may be defined as a composition of tuples of message type and frequency.
- a first event 206 - 1 may be characterized as a composition of one occurrence of message type 204 - 1 (e.g., “Sending . . . files”), one occurrence of message type 204 - 2 (e.g., “Received . . . files”), and one occurrence of message type 204 - 3 (e.g., “Warning . . . files pending”); and second event 206 - 2 may be characterized as a composition of one occurrence of message type 204 - 1 (e.g., “Sending . . . files”) and one occurrence of message type 204 - 2 (e.g., “Received . . . files”).
- log analytics module 132 may identify anomalous events based on patterns of events from log data 134 , as shown in FIGS. 3A and 3B .
- FIG. 3A is a chart 300 depicting an example of an event volume anomaly based on frequency of occurrence of events over time.
- Log analytics module 132 may determine the number of events occurring per hour in a given time period, e.g., from 6:00 PM to 9:00 PM.
- Chart 300 further illustrates a breakdown of event types for each hour, depicting occurrences of events similar to log events 206 - 1 and 206 - 2 . As an example, it may be normal within the computing system for approximately 20 events per hour to occur.
- a sudden increase of events to 200 events per hour (e.g., at 19:00) and then to 500 events per hour (e.g., at 20:00), thereby exceeding some threshold value 302 can trigger log analytics module 132 to flag this as an anomalous occurrence of event volume.
- FIG. 3B depicts an example of an event pattern anomaly based on events that are different in message type composition.
- events 304 occurring at a given time are usually an event type similar to event 206 - 1 (i.e., events comprised of “Sending . . . files” log messages and “Received . . . files” log messages).
- an unexpected or atypical event 306 may occur, such as event 306 , which is an event comprised of “Sending . . . files” log messages, “Received . . . files” log messages, and “Warning . . . files pending” log messages, which is different from the usual events.
- log analytics 132 may determine an anomalous occurrence of a log event 306 (i.e., composed of message types 204 - 1 , 204 - 2 , and 204 - 3 ), that is different in composition from other log events (i.e., composed of message types 204 - 1 and 204 - 2 ).
- FIG. 4 is a flow diagram that illustrates steps for a method 400 for providing real-time analysis of log messages for a computer infrastructure, according to an embodiment of the present disclosure. It should be recognized that, even though the method 400 is described in conjunction with the system of FIG. 1 , any system configured to perform the method steps is within the scope of embodiments of the disclosure.
- log analytics module 132 receives a stream of log data 134 generated by software and infrastructure components of computing system 100 .
- log data 134 may include a plurality of log messages.
- log analytics module 132 may be configured to retrieve log data (e.g., log files) from software and infrastructure components of computing system 100 , including applications 122 , operation systems 120 , and in the case of virtualized computing system 150 , components such as hypervisors 116 , guest application and operating systems running within VMs 112 .
- software and infrastructure components of computing system 100 may be configured to write log files to a common destination, such as an external storage, from which log analytics module 132 may periodically retrieve log data.
- log data 134 may be transferred over network 140 directly to log analytics module 132 .
- log analytics module 132 generates a compact integer representation, or “sketch,” of text content for a log message in the received log data.
- a sketch associated with a log message is generated based on words of the log message.
- two log messages may be considered similar if the log messages contain a number of words in common in the same relative positions.
- sketches of log messages are computed such that similar log messages should have identical or substantially similar sketches.
- a sketch of a log message may be an ordered list, or tuple, of fingerprint values corresponding to a subset of the words of the log message.
- a sketch of a log message is tuple of fingerprints of “interesting” words of the log message.
- Each interesting word of the log message e.g., “host1”
- a fingerprint function such as a hash function.
- a sketch generated for a log message “host1 sending 5738 files” may be a tuple of fingerprint values (753, 1034, 886) that corresponds to interesting words (host1, Sending, files).
- a sketch for the log message “host4 Sending 382 files” can be computed as the tuple (1965, 1034, 886) that corresponds to interesting features (host4, Sending, files).
- the sketches (753, 1034, 886) and (1965, 1034, 886) have identical values “1034” and “886” in same relative positions, the two log messages may be deemed similar.
- sketches of log messages may be generated according to a sketching algorithm that uses N independent scoring functions to pick N “interesting” words of a log message, where “interesting” is determined according to each scoring function.
- a scoring function is a hash function that computes a 32-bit integer given a word.
- Score 1 ⁇ ( Word ) ( M 1 * Fingerprint ⁇ ( Word ) + A 1 ) ⁇ mod ⁇ ⁇ 2 32
- Score 2 ⁇ ( Word ) ( M 2 * Fingerprint ⁇ ( Word ) + A 2 ) ⁇ mod ⁇ ⁇ 2 32
- Score N ⁇ ( Word ) ( M N * Fingerprint ⁇ ( Word ) + A N ) ⁇ mod ⁇ ⁇ 2 32
- Log analytics module 132 scores each word in the log message and selects the word having with the highest score (i.e., “most interesting”), according to that scoring function. As each scoring function selects one word in the log message, N scoring functions results in N words being selected. The fingerprints of these N words are then combined to form a sketch of the log message.
- the four scoring functions may score the slightly different log message similarly:
- the sketching algorithm as described herein is advantageously more robust to relative insertions or deletions of text. It has been determined that the insertion or deletion of an additional word relative to the original text is unlikely to change all or even a majority of the words selected by each scoring function.
- a linear congruential generator may be used as a scoring function, though it should be recognized that other types of scoring functions can be used, including functions that are deterministic and produce uncorrelated results.
- Log analytics module 132 determines a message type classification for the log message based on the corresponding sketch for the log message.
- Log analytics module 132 classifies log messages having similar sketches to have the same message type. Such clustering helps reduce the number of log messages that need to be examiner by grouping the messages into a few number of message types that can then be highlighted. Accordingly, message type classification enable log analytics module to cluster together similar log messages to more effectively process and analyze a large volume of log data.
- log analytics module 132 queries log index 136 to determine whether the log message is similar to a previously processed log message based on the corresponding sketches, and if so, assigns the log message a same message type as the previously processed log message, at step 408 .
- log analytics module 132 queries log index 136 using the sketch (1965, 1034, 886, 1034) corresponding to the log message “host4 sending 208 files using SFTP protocol” and determines the log messages is similar to the previously processed log message “host1 sending 7182 files using SFTP protocol” based on the similarity with its corresponding sketch (753, 1034, 886, 1034).
- log analytics module 132 assigns a new message type to the log message and inserts the log message into log index 136 .
- each message type may be represented by a message type identifier, or “cluster ID.”
- cluster ID a message type identifier
- the log messages depicted in FIG. 2 may have the following sketches and corresponding cluster IDs (the sketches are shown as tuples of the most interesting words rather than the fingerprint values for clarity of illustration):
- log analytics module 132 may provide the ability to search the received log messages based on a given cluster ID.
- log analytics module 132 may use cluster ID as a search criteria for log messages that are similar to a particular log message (i.e., “find log messages “like this”) by querying for log messages having a particular cluster ID.
- log analytics module 132 may use the cluster ID as a criteria for aggregation to generate statistics, such as the Top-5 message types per hour.
- cluster ID may be content-based and enable calculation of message type classifications to be distributed.
- log index 136 may include one or more hash tables that map fingerprint values to sketches for a given log message.
- log index 136 may include N hash tables for mapping fingerprint values to sketches that contain N fingerprint values.
- each fingerprint value in the sketch i.e., each column in the tuple ⁇ 1965, 1034, 886, 1034>
- a candidate sketch must match in M different columns to be considered a match, where M is less than N.
- the incoming log message belongs to that cluster and is assigned a same message type, and the sketch is not inserted into the log index. If no candidate are found, a new cluster is generated having a new message type, and the sketch is inserted into log index 136 .
- log analytics module 132 may store a representation of each message type within log index 136 by storing a copy of a full log message.
- Log analytics module 132 may use a textual differential algorithm (e.g., longest substring match) or other additional textual analysis to verify similarity of the incoming log message to a representative of the message type and override message type classification based on poor sketches.
- the stored representation of each message type may be used to provide an example log message that is displayed to a user (e.g., system administrator) when presenting the statistics or graphical charts for the message type.
- log analytics module 132 divides one or more log messages into log events based on burst analysis. It has been determined that log messages corresponding to events within computer system 100 may be created in bursts and close-in-time. For example, a burst of log messages may be recorded by applications and guest operating system whenever a virtual machine shuts down or restarts. In one embodiment, log analytics module 132 processes time stamps of log messages 202 and tracks time between log messages. In some embodiments, log analytics module 132 may determine and maintain an average time interval associated with an event duration. For example, log messages occurring within a 10-second duration may be candidates for being grouped together as a single log event.
- Log analytics module 132 may associate one or more log messages occurring within the event duration to a log event 206 .
- Log analytics module 132 may represent each log event as a composition of different message types, such as a list of tuples of a message type and corresponding frequency of occurrence.
- one log event may be comprised of log messages having an occurrence of a “sending files” message type, two occurrences of a “received files” message type, and one occurrence of a “warning files pending” message type, and may be represented by a list of pairs having cluster ID and frequency: (22280, 1), (22281, 2), (22282, 1).
- Log analytics module 132 may then cluster together similar log events, applying a technique similar to the technique applied above for clustering similar log messages.
- log analytics module 132 queries log index 136 to determine whether a log event is similar to other log events based on the composition of message types that comprise the log event, and if so, assigns a same event type as the previously determined log events, at step 418 . Otherwise, at step 416 , log analytics module 132 assigns a new event type to the log event, and may insert the composition of the new event type into log index 136 .
- log index 136 may further include additional hash tables that map cluster IDs to compositions of event types for a given log event.
- each cluster ID may be used as a hash table lookup for candidate compositions that have some or all matching cluster IDs.
- the event type of a log event is determined by performing lookups in the hash tables according to each pair of message type identifier and a corresponding frequency of occurrence. If at least one candidate event type is found, the detected log event may be determined similar to the corresponding log event and may be assigned the same event type. If no candidate is found, a new event cluster is generated having a new event type, and the representative composition of message types is inserted into log index 136 .
- log analytics module 132 analyzes event clusters and detects an anomaly within event clusters based on the classification of log events. In some embodiments, log analytics module 132 may determine an occurrence of an “incomplete” event or a gross deviation from an expected event.
- an expected log event may be a composition of message types (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1)
- an incomplete log event may be detected upon determining an occurrence of a log event only having (22280, 2), (22281, 2), (22282, 3), (22283, 1)
- a deviation from a known log event may be detected upon determining an occurrence of a log event having (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), (34921, 292), (34927, 395).
- log analytics module 132 may determine an anomaly in event volume based on one or more threshold values. As described earlier in conjunction with FIG. 3A , log analytics module 132 may detect when a number of events occurring per unit of time exceeds or falls below a threshold value. For example, log analytics module 132 may determine an occurrence of an anomaly in event volume when the number of events occurring per hour exceeds 500 events per hour (suggesting over-activity), or falls below 5 events per hour (suggesting inactivity). In some embodiments, a threshold value may be associated with a particular event type, such that occurrences of that particular event type that exceeds the threshold value may be flagged as an anomaly. The threshold values may be pre-determined, as well as configurable by a user.
- the threshold values may be dynamically determined based on the performance history of the computing system, for example, using a weighted moving average, or other suitable heuristics.
- the threshold values may be specified in a variety of manners, including absolute numerical values (e.g., 500 events/hr), and relative values, such as percentages (e.g., 200% change).
- log analytics module 132 may present the detected anomaly, as well as the classified message types and event types, to a user via a graphical user interface.
- the graphical user interface may provide charts, graphics, and statistical displays to illustrate a most frequent event over a past week, or an anomalous event occurring in a last 1-hour period.
- log analytics module 132 may use frequency of log events and anomaly detection to generate an alert for an operator (e.g., system administrator) that the frequency of a particular log message type has increased or decreased in an anomalous way.
- embodiments of the present disclosure provide a technique for processing log data that enables real-time analysis that is scalable for the multitude of log data generated by many software and infrastructure components of a computer system 100 .
- embodiments described herein advantageously reduces the need for multiple passes over the same dataset or the need for active intervention in the form of feedback and training to properly analyze data.
- Embodiments of the present disclosure provide a system for unsupervised, approximate clustering of log data that provides volume- and pattern-based anomaly detection.
- the various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities which usually, though not necessarily, take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the disclosure may be useful machine operations.
- one or more embodiments of the disclosure also relate to a device or an apparatus for performing these operations.
- the apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
- various general purpose machines may be used with computer programs written in accordance with the description provided herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- One or more embodiments of the present disclosure may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media.
- the term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system; computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer.
- Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD-ROM (Compact Disc-ROM), a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices.
- NAS network attached storage
- read-only memory e.g., a flash memory device
- CD-ROM Compact Disc-ROM
- CD-R Compact Disc-ROM
- CD-RW Compact Disc-RW
- DVD Digital Versatile Disc
- magnetic tape e.g., DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices.
- the computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Abstract
Description
- System administrators provide virtualized computing infrastructure, which typically includes a plurality of virtual machines executing on a shared set of physical hardware components, to offer highly available, fault-tolerant distributed systems. However, a large-scale virtualized infrastructure may have many (e.g., thousands) of virtual machines running on many of physical machines. High availability requirements provide system administrators with little time to diagnose or bring down parts of infrastructure for maintenance. Fault-tolerant features ensure the virtualized computing infrastructure continues to operate when problems arise, but generates many intermediate states that have to be reconciled and addressed. As such, identifying, debugging, and resolving failures and performance issues for virtualized computing environments have become increasingly challenging.
- Many software and hardware components generate log data to facilitate technical support and troubleshooting. However, over an entire virtualized computing infrastructure, massive amounts of unstructured log data can be generated continuously by every component of the virtualized computing infrastructure. As such, finding information within the log data that identifies problems of virtualized computing infrastructure is difficult, due to the overwhelming volume of unstructured log data to be analyzed.
- One or more embodiments disclosed herein provide a method for providing real-time analysis of log messages for a computer infrastructure. The method includes receiving a plurality of log messages including a first log message, and generating a sketch associated with the first log message. The sketch may be generated based on words contained in the first log message. The method further includes determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index. Log messages of a same message type have similar sketches. The method includes determining a first log event associated with one or more log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the associated log messages. The method further includes determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index, and determining an anomalous log event within the plurality of log messages based on the classification for the first log event.
- Further embodiments of the present disclosure include a non-transitory computer-readable storage medium that includes instructions that enable a processing unit to implement one or more of the methods set forth above or the functions of the computer system set forth above.
- So that the manner in which the above recited aspects are attained and can be understood in detail, a more particular description of embodiments of the disclosure, briefly summarized above, may be had by reference to the appended drawings.
-
FIG. 1A depicts a block diagram that illustrates a computing system with which one or more embodiments of the present disclosure may be utilized. -
FIG. 1B is a block diagram that illustrates a virtualized computing system with which one or more embodiments of the present disclosure may be utilized. -
FIG. 2 is a block diagram that illustrates a workflow for analyzing log data of the computing system, according to one embodiment of the present disclosure. -
FIGS. 3A-3B are block diagrams that depict examples of event pattern and event volume anomalies, according to one embodiment of the present disclosure. -
FIG. 4 is a flow diagram that illustrates steps for a method for analyzing log data of a computing system, according to an embodiment of the present disclosure. - One or more embodiments disclosed herein provide methods, systems, and computer programs for analyzing log data for a computing infrastructure in real-time. In one embodiment, log data, sometimes referred to as runtime logs, error logs, debugging logs, is reduced in both volume and level of detail by first classifying messages into types by content similarity. The log data is then reduced further by grouping bursts of messages into log events. Patterns in log events, such as the collection and number of different messages types that comprise each log event, can be used to identify anomalous events within the log data. For example, patterns in the log events may be used to detect when log events occur that differ in message type composition, or when log events occur that differ in frequency of occurrence over time.
-
FIG. 1A is a block diagram that illustrates acomputing system 100 with which one or more embodiments of the present invention may be utilized. As illustrated,computing system 100 includes a plurality of server systems, identified as server system 102-1, 102-2, 102-3, and referred to collectively as servers 102. Each server 102 includesCPU 104,memory 106,networking interface 110,storage interface 114, and other conventional components of a computing device. Each server 102 further includes anoperating system 120 configured to manage execution of one ormore applications 122 using the computing resources (e.g.,CPU 104,memory 106,networking interface 110, storage interface 114). - As mentioned earlier, software and infrastructure components of
computing system 100 including servers 102,operating systems 120, andapplications 122 running on top ofoperating system 120, may generate log data during operation. Log data may indicate the state, and state transitions, that occur during operation, and may record occurrences of failures, as well as unexpected and undesirable events. In one embodiment, log data may be unstructured text comprised of a plurality of log messages, including status updates, error messages, stack traces, and debugging messages. With thousands to millions of different processes running in a complex computing environment, an overwhelming large volume of heterogeneous log data, having varying syntax, structure, and even language, may be generated. As such, finding log messages relevant to the context of a particular issue, as well as proactively identifying emerging issues from log data, can be challenging. - Accordingly, embodiments of the present disclosure provide a
log analytics module 132 configured to store and analyze in real-time log data 134 from software and infrastructure components ofcomputing system 100.Log analytics module 132 may include a log index 136 configured to cache (and later query) results of the analysis of log data. Loganalytics module 132 reduces the volume and level of details of the log data to enable a user (e.g., system administrator) to diagnose and troubleshoot issues withincomputing system 100. - In one embodiment,
log analytics module 132 is configured to parse a stream of log messages within log data and identify groups of log messages as logical events, referred to interchangeably as “log events” or “events”. To do so,log analytics module 132 is configured to classify log messages within a stream of log data as log message types that cluster together similar log messages.Log analytics module 132 is further configured to perform event detection on log messages within log data to group together log messages based on their occurrence close in time in a sequence. As described later, in one embodiment, events may be defined as a collection of log message types, and an occurrence of an event corresponds to a group of log messages having the requisite log message types appearing in log data.Log analytics module 132 may further identify anomalies within log data based on the message-type classifications and detected events, such as event volume anomalies and event pattern anomalies. - Through analysis techniques described herein,
log analytics module 132 indicates to a user what one event (as reported by log messages) means in relation to other events in the log data and highlights events occurring withincomputing system 100 in context. In some embodiments,log analytics module 132 may highlight certain events in the context of being nearby in time to other events, such that if the certain events usually occur in a sequence, then events occurring out of that sequence may be notable. In some embodiments,log analytics module 132 may highlight certain events in the context of being similar to other events, such that similar events may be clustered and analyzed together rather than be considered separately. In some embodiments,log analytics module 132 may highlight certain events in the context of the hierarchical infrastructure ofcomputing system 100, such as being from the same thread, process, application, virtual machine, host, host group, data center, etc. The operations oflog analytics module 132 are illustrated in greater detail in conjunction withFIG. 2 . - While embodiments of the present invention are described in conjunction with a computing environment having physical components, it should be recognized that
log data 134 may be generated by components of other alternative computing architectures, including a virtualized computing system as shown inFIG. 1B .FIG. 1B is a block diagram that illustrates acomputing system 150 with which one or more embodiments of the present disclosure may be utilized. As illustrated,computing system 150 includes ahost group 124 of host computers, identified as hosts 108-1, 108-2, 108-3, and 108-4, and referred to collectively as hosts 108. Each host 108 is configured to provide a virtualization layer that abstracts computing resources of ahardware platform 118 into multiple virtual machines (VMs) 112 that run concurrently on the same host 108.Hardware platform 118 of each host 108 may include conventional components of a computing device, such as a memory, processor, local storage, disk interface, and network interface. The VMs 112 run on top of a software interface layer, referred to herein as ahypervisor 116, that enables sharing of the hardware resources of host 108 by the virtual machines. One example ofhypervisor 116 that may be used in an embodiment described herein is a VMware ESXi hypervisor provided as part of the VMware vSphere solution made commercially available from VMware, Inc.Hypervisor 116 may run on top of the operating system of host 108 or directly on hardware components of host 108. EachVM 112 includes a guest operating system (e.g., Microsoft Windows, Linux) and one or more guest applications and processes running on top of the guest operating system. - In the embodiment shown in
FIG. 1B ,computing system 150 includes virtualization management software 130 that may communicate with the plurality of hosts 108 vianetwork 140. Virtualization management software 130 is configured to carry out administrative tasks for thecomputing system 100, including managing hosts 108, managing VMs running within each host 108, provisioning VMs, migrating VMs from one host to another host, and load balancing between hosts 108 ofhost group 124. In one embodiment, virtualization management software 130 is a computer program that resides and executes in a central server, which may reside incomputing system 100, or alternatively, running as a VM in one of hosts 108. One example of a virtualization management software is the vCenter® Server product made available from VMware, Inc. Similar to the software and infrastructure components ofcomputing system 100, the software and infrastructure components ofcomputing system 100, including, host group(s) 124, hosts 108,VMs 112 running on hosts 108, guest operating systems, applications, and processes running within VMs, may generate large amount of log data in real-time during operation. - While
log analytics module 132 is depicted inFIG. 1B as a separate component that resides and executes on a separate server or virtual machine, it is appreciated thatlog analytics module 132 may alternatively reside in any one of the computing devices of thevirtualized computing system 150, for example, such as the same central server where the virtualization management software 130 resides. In one embodiment,log analytics module 132 may be embodied as a plug-in component configured to extend functionality of virtualization management software 130. Access to thelog analytics module 132 can be achieved via a client application (not shown). For example, each analysis task, such as searching for log messages, filtering for log messages, analyzing log messages over a period of time, can be accomplished through the client application. One embodiment provides a stand-alone application version of the client application. In another embodiment, the client application is implemented as a web browser application that provides management access from any networked device. -
FIG. 2 is a block diagram that illustrates a workflow for analyzinglog data 134 of a computing infrastructure, according to one embodiment of the present disclosure. It should be recognized that, even though the workflow is described in conjunction with the system ofFIG. 1A , any system configured to perform the illustrated technique is within the scope of embodiments of the disclosure. In the embodiment shown, logdata 134 may include a plurality of individual log messages 202-1 to 202-5 (collectively referred to as log messages 202) generated over a period of time. In some embodiments, a log message may include a time stamp (e.g., “Sep 23 13:30”) indicating a date and time corresponding to the creation of the log message and a text description (e.g., “host1 sending 5738 files”). While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines. - In one embodiment,
log analytics module 132 may classify each log message 202 as a message type based on content similarity of the log messages. In some embodiments, the content similarity is performed on the text description portion of the log message 202. In the example shown inFIG. 2 ,log analytics module 132 processes log message 202-1 (i.e., “Sep 23 13:30 host1 sending 5738 files”) and assigns log message 202-1 a first message type 204-1.Log analytics module 132 then processes a second log message 202-2 (i.e., “Sep 23 13:31 host2 received 5700 files”) and determines the contents of second log message 202-2 are not sufficiently similar to first log message 202-1 and assigns a different, second message type 204-2. Similarly,log analytics module 132 processes a third log message 202-3 (i.e., “Sep 23 13:32 host1 warning: 38 files pending”) and assigns a third message type 204-3 upon determining no content similarity with the other already processed log messages. For sake of illustration, log messages having different message types are depicted inFIG. 2 as shapes having different patterns. In one embodiment,log analytics module 132 may determine content similarity of log messages according to a “sketching” algorithm that determines if log messages contain a number of words in common in the same relative position. Determination of content similarity and the sketching algorithm are described in greater detail below. - Continuing the example shown in
FIG. 2 ,log analytics module 132 processes a fourth log message 202-4 (i.e., “Sep 23 14:00 host4 sending 382 files”) and determines content similarity with log message 202-1. As such,log analytics module 132 assigns log message 202-4 the same first message type 204-1 as log message 202-1, as depicted inFIG. 2 by identical patterned highlights or colors. Similarly,log analytics module 132 processes a fifth log message 202-5 (i.e., “Sep 23 14:01 host5 received 382 files”) and assigns the second message type 204-2 based on a determination of content similarity with log message 202-2. - In one embodiment,
log analytics module 132 is configured to identify one or more log events 206 based on the timing of the log messages. In some embodiments,log analytics module 132 may group one or more log messages 202 into log events 206 according to a burst analysis algorithm. For example,log analytics module 132 identifies a first log event 206-1 that includes log messages 202-1, 202-2, 202-3, which all occur approximately the same time at September 23, 13:30 and a second log event 206-2 that includes log messages 202-4, 202-5 that all occur around September 23 14:00. In one embodiment,log analytics module 132 is configured to represent each identified log event 206 as a composition of message types of log messages. In some embodiments, an event type for a log event may be defined as a composition of tuples of message type and frequency. In the example shown inFIG. 2 , a first event 206-1 may be characterized as a composition of one occurrence of message type 204-1 (e.g., “Sending . . . files”), one occurrence of message type 204-2 (e.g., “Received . . . files”), and one occurrence of message type 204-3 (e.g., “Warning . . . files pending”); and second event 206-2 may be characterized as a composition of one occurrence of message type 204-1 (e.g., “Sending . . . files”) and one occurrence of message type 204-2 (e.g., “Received . . . files”). - According to one embodiment,
log analytics module 132 may identify anomalous events based on patterns of events fromlog data 134, as shown inFIGS. 3A and 3B .FIG. 3A is achart 300 depicting an example of an event volume anomaly based on frequency of occurrence of events over time.Log analytics module 132 may determine the number of events occurring per hour in a given time period, e.g., from 6:00 PM to 9:00 PM. Chart 300 further illustrates a breakdown of event types for each hour, depicting occurrences of events similar to log events 206-1 and 206-2. As an example, it may be normal within the computing system for approximately 20 events per hour to occur. But, a sudden increase of events to 200 events per hour (e.g., at 19:00) and then to 500 events per hour (e.g., at 20:00), thereby exceeding somethreshold value 302, can triggerlog analytics module 132 to flag this as an anomalous occurrence of event volume. -
FIG. 3B depicts an example of an event pattern anomaly based on events that are different in message type composition. As shown,events 304 occurring at a given time are usually an event type similar to event 206-1 (i.e., events comprised of “Sending . . . files” log messages and “Received . . . files” log messages). However, an unexpected oratypical event 306 may occur, such asevent 306, which is an event comprised of “Sending . . . files” log messages, “Received . . . files” log messages, and “Warning . . . files pending” log messages, which is different from the usual events. In this case, loganalytics 132 may determine an anomalous occurrence of a log event 306 (i.e., composed of message types 204-1, 204-2, and 204-3), that is different in composition from other log events (i.e., composed of message types 204-1 and 204-2). -
FIG. 4 is a flow diagram that illustrates steps for a method 400 for providing real-time analysis of log messages for a computer infrastructure, according to an embodiment of the present disclosure. It should be recognized that, even though the method 400 is described in conjunction with the system ofFIG. 1 , any system configured to perform the method steps is within the scope of embodiments of the disclosure. - The method 400 begins at
step 402, wherelog analytics module 132 receives a stream oflog data 134 generated by software and infrastructure components ofcomputing system 100. As described above, logdata 134 may include a plurality of log messages. In some embodiments,log analytics module 132 may be configured to retrieve log data (e.g., log files) from software and infrastructure components ofcomputing system 100, includingapplications 122,operation systems 120, and in the case ofvirtualized computing system 150, components such ashypervisors 116, guest application and operating systems running withinVMs 112. In other embodiments, software and infrastructure components ofcomputing system 100 may be configured to write log files to a common destination, such as an external storage, from whichlog analytics module 132 may periodically retrieve log data. In some embodiments, logdata 134 may be transferred overnetwork 140 directly to loganalytics module 132. - At
step 404,log analytics module 132 generates a compact integer representation, or “sketch,” of text content for a log message in the received log data. In one embodiment, a sketch associated with a log message is generated based on words of the log message. As mentioned above, two log messages may be considered similar if the log messages contain a number of words in common in the same relative positions. As such, sketches of log messages are computed such that similar log messages should have identical or substantially similar sketches. In one embodiment, a sketch of a log message may be an ordered list, or tuple, of fingerprint values corresponding to a subset of the words of the log message. - In some embodiments, a sketch of a log message is tuple of fingerprints of “interesting” words of the log message. Each interesting word of the log message (e.g., “host1”) can be given a value (e.g., 753) using a fingerprint function, such as a hash function. For example, a sketch generated for a log message “host1 sending 5738 files” may be a tuple of fingerprint values (753, 1034, 886) that corresponds to interesting words (host1, Sending, files). In another example, a sketch for the log message “
host4 Sending 382 files” can be computed as the tuple (1965, 1034, 886) that corresponds to interesting features (host4, Sending, files). As such, because the sketches (753, 1034, 886) and (1965, 1034, 886) have identical values “1034” and “886” in same relative positions, the two log messages may be deemed similar. - In one implementation, sketches of log messages may be generated according to a sketching algorithm that uses N independent scoring functions to pick N “interesting” words of a log message, where “interesting” is determined according to each scoring function. In some embodiments, a scoring function is a hash function that computes a 32-bit integer given a word. In such a scheme, a sketch may be composed of 32-bit fingerprints of the most interesting words in a log message, where “most interesting” is determined by N scoring functions (e.g., N=8):
-
- The parameters MN and AN for each scoring function may be selected such that the scoring functions are linearly independent (i.e., Σi=0 N(Ci*Scorei(word))=0 only if Ci are zeros) and the different scores for a particular word are uncorrelated.
Log analytics module 132, for each scoring function, scores each word in the log message and selects the word having with the highest score (i.e., “most interesting”), according to that scoring function. As each scoring function selects one word in the log message, N scoring functions results in N words being selected. The fingerprints of these N words are then combined to form a sketch of the log message. - For example, the log message “host1 sending 7182 files using SFTP protocol” may scored in the following manner by N=4 scoring functions, where the most interesting word for each scoring function is emphasized:
-
- score1: host1 sending 7182 files using SFTP protocol
- Score2: host1 sending 7182 files using SFTP protocol
- Score3: host1 sending 7182 files using SFTP protocol
- Score4: host1 sending 7182 files using SFTP protocol
In this example, the four scoring functions determined that the most interesting words were “host1”, “sending,” “files,” and “sending” (again). The word “host1” had the highest score of the 6 words in the log message according to the first scoring function Score′. The word “sending” had the highest score of the 6 words according to both the second and fourth scoring function, and the word “files” was the highest scoring word of the word in the log message according to the third scoring function. As such, the resulting sketch would be a 4-tuple of the fingerprints of these words as follows. For clarity, simple numerical values (e.g., 753) are shown for the fingerprint values, but it should be recognized that fingerprint values may be 32-bit values (e.g., 0x459c8cbb). - Fingerprint(“host1”)=753
- Fingerprint(“sending”)=1034
- Fingerprint(“files”)=886
- Fingerprint(“sending”)=1034
- Sketch1=(753, 1034, 886, 1034))
- Continuing this example, if a similar but slightly different log message (i.e., “host4 sending 208 files using SFTP protocol”) is received and processed, the four scoring functions may score the slightly different log message similarly:
-
- Score1: host4 sending 208 files using SFTP protocol
- Score2: host4 sending 208 files using SFTP protocol
- Score3: host4 sending 208 files using SFTP protocol
- Score4: host4 sending 208 using SFTP protocol
As shown, a change in the score of first word “host4” did not affect the selection of the highest scoring word for three out of the four scoring functions. It has been determined that if a majority of N independent scoring functions select the same words in two different log messages, the log messages are very likely to be similar overall. For example, in this case, the resulting sketch would be a 4-tuple of the fingerprints of these words: - Fingerprint(“host4”)=1965
- Fingerprint(“sending”)=1034
- Fingerprint(“files”)=886
- Fingerprint(“sending”)=1034
- Sketch2=(1965, 1034, 886, 1034)
Comparing the sketches for the two log messages: - Sketch1˜Sketch2
- (753, 1034, 886, 1034)˜(1965, 1034, 886, 1034)
reveals three out of four fingerprint values in common (i.e., “1034”, “886”, and “1034”). As such, a majority of the scoring functions have selected the same words “sending”, “files”, and “sending” in both log messages, and therefore the two log messages may be deemed similar.
- While other approaches for selecting words in a log message may be used, such as choosing the first few words of a log message or selecting even-numbered words, or other content-insensitive schemes, the sketching algorithm as described herein is advantageously more robust to relative insertions or deletions of text. It has been determined that the insertion or deletion of an additional word relative to the original text is unlikely to change all or even a majority of the words selected by each scoring function. In one embodiment, a linear congruential generator (LCG) may be used as a scoring function, though it should be recognized that other types of scoring functions can be used, including functions that are deterministic and produce uncorrelated results.
-
Log analytics module 132 then determines a message type classification for the log message based on the corresponding sketch for the log message.Log analytics module 132 classifies log messages having similar sketches to have the same message type. Such clustering helps reduce the number of log messages that need to be examiner by grouping the messages into a few number of message types that can then be highlighted. Accordingly, message type classification enable log analytics module to cluster together similar log messages to more effectively process and analyze a large volume of log data. - At
step 406,log analytics module 132 queries log index 136 to determine whether the log message is similar to a previously processed log message based on the corresponding sketches, and if so, assigns the log message a same message type as the previously processed log message, atstep 408. For example,log analytics module 132 queries log index 136 using the sketch (1965, 1034, 886, 1034) corresponding to the log message “host4 sending 208 files using SFTP protocol” and determines the log messages is similar to the previously processed log message “host1 sending 7182 files using SFTP protocol” based on the similarity with its corresponding sketch (753, 1034, 886, 1034). As discussed earlier, in some embodiments, two log messages may be deemed similar and assigned a same message type if a majority of the scoring functions have selected the same words in both log messages. Otherwise, atstep 410,log analytics module 132 assigns a new message type to the log message and inserts the log message into log index 136. - In one embodiment, each message type may be represented by a message type identifier, or “cluster ID.” For example, the log messages depicted in
FIG. 2 may have the following sketches and corresponding cluster IDs (the sketches are shown as tuples of the most interesting words rather than the fingerprint values for clarity of illustration): -
- In this example, the sketch (host1, sending, -Files, sending) is given the same cluster ID 22280 as the sketch (host4, sending, -Files, sending), because of matching 3 out of 4 fingerprint values. In some embodiments,
log analytics module 132 may provide the ability to search the received log messages based on a given cluster ID. In some embodiments,log analytics module 132 may use cluster ID as a search criteria for log messages that are similar to a particular log message (i.e., “find log messages “like this”) by querying for log messages having a particular cluster ID. In some embodiments,log analytics module 132 may use the cluster ID as a criteria for aggregation to generate statistics, such as the Top-5 message types per hour. In some embodiments, cluster ID may be content-based and enable calculation of message type classifications to be distributed. - According to one implementation, log index 136 may include one or more hash tables that map fingerprint values to sketches for a given log message. In some embodiments, log index 136 may include N hash tables for mapping fingerprint values to sketches that contain N fingerprint values. To determine whether a log message is similar to other log messages, each fingerprint value in the sketch (i.e., each column in the tuple <1965, 1034, 886, 1034>) may be used to search for candidate sketches. In one particular embodiment, a candidate sketch must match in M different columns to be considered a match, where M is less than N. As an example, where N=8, each of the 8 fingerprints in a sketch is looked up in its corresponding hash table to find candidate sketches with at least 6 matching fingerprints (M=60). If at least one candidate is found, the incoming log message belongs to that cluster and is assigned a same message type, and the sketch is not inserted into the log index. If no candidate are found, a new cluster is generated having a new message type, and the sketch is inserted into log index 136.
- In some embodiments,
log analytics module 132 may store a representation of each message type within log index 136 by storing a copy of a full log message.Log analytics module 132 may use a textual differential algorithm (e.g., longest substring match) or other additional textual analysis to verify similarity of the incoming log message to a representative of the message type and override message type classification based on poor sketches. In some embodiments, the stored representation of each message type may be used to provide an example log message that is displayed to a user (e.g., system administrator) when presenting the statistics or graphical charts for the message type. - At
step 412,log analytics module 132 divides one or more log messages into log events based on burst analysis. It has been determined that log messages corresponding to events withincomputer system 100 may be created in bursts and close-in-time. For example, a burst of log messages may be recorded by applications and guest operating system whenever a virtual machine shuts down or restarts. In one embodiment,log analytics module 132 processes time stamps of log messages 202 and tracks time between log messages. In some embodiments,log analytics module 132 may determine and maintain an average time interval associated with an event duration. For example, log messages occurring within a 10-second duration may be candidates for being grouped together as a single log event.Log analytics module 132 may associate one or more log messages occurring within the event duration to a log event 206.Log analytics module 132 may represent each log event as a composition of different message types, such as a list of tuples of a message type and corresponding frequency of occurrence. For example, one log event may be comprised of log messages having an occurrence of a “sending files” message type, two occurrences of a “received files” message type, and one occurrence of a “warning files pending” message type, and may be represented by a list of pairs having cluster ID and frequency: (22280, 1), (22281, 2), (22282, 1). -
Log analytics module 132 may then cluster together similar log events, applying a technique similar to the technique applied above for clustering similar log messages. Atstep 414,log analytics module 132 queries log index 136 to determine whether a log event is similar to other log events based on the composition of message types that comprise the log event, and if so, assigns a same event type as the previously determined log events, atstep 418. Otherwise, atstep 416,log analytics module 132 assigns a new event type to the log event, and may insert the composition of the new event type into log index 136. - In one implementation, log index 136 may further include additional hash tables that map cluster IDs to compositions of event types for a given log event. As such, to determine whether a log event is similar to other log events, each cluster ID may be used as a hash table lookup for candidate compositions that have some or all matching cluster IDs. In some embodiments, the event type of a log event is determined by performing lookups in the hash tables according to each pair of message type identifier and a corresponding frequency of occurrence. If at least one candidate event type is found, the detected log event may be determined similar to the corresponding log event and may be assigned the same event type. If no candidate is found, a new event cluster is generated having a new event type, and the representative composition of message types is inserted into log index 136.
- At
step 420,log analytics module 132 analyzes event clusters and detects an anomaly within event clusters based on the classification of log events. In some embodiments,log analytics module 132 may determine an occurrence of an “incomplete” event or a gross deviation from an expected event. For example, where an expected log event may be a composition of message types (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), an incomplete log event may be detected upon determining an occurrence of a log event only having (22280, 2), (22281, 2), (22282, 3), (22283, 1) In another example, a deviation from a known log event may be detected upon determining an occurrence of a log event having (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), (34921, 292), (34927, 395). - In some embodiments,
log analytics module 132 may determine an anomaly in event volume based on one or more threshold values. As described earlier in conjunction withFIG. 3A ,log analytics module 132 may detect when a number of events occurring per unit of time exceeds or falls below a threshold value. For example,log analytics module 132 may determine an occurrence of an anomaly in event volume when the number of events occurring per hour exceeds 500 events per hour (suggesting over-activity), or falls below 5 events per hour (suggesting inactivity). In some embodiments, a threshold value may be associated with a particular event type, such that occurrences of that particular event type that exceeds the threshold value may be flagged as an anomaly. The threshold values may be pre-determined, as well as configurable by a user. In some embodiments, the threshold values may be dynamically determined based on the performance history of the computing system, for example, using a weighted moving average, or other suitable heuristics. The threshold values may be specified in a variety of manners, including absolute numerical values (e.g., 500 events/hr), and relative values, such as percentages (e.g., 200% change). In some embodiments,log analytics module 132 may present the detected anomaly, as well as the classified message types and event types, to a user via a graphical user interface. For example, the graphical user interface may provide charts, graphics, and statistical displays to illustrate a most frequent event over a past week, or an anomalous event occurring in a last 1-hour period. In one embodiment,log analytics module 132 may use frequency of log events and anomaly detection to generate an alert for an operator (e.g., system administrator) that the frequency of a particular log message type has increased or decreased in an anomalous way. - Accordingly, embodiments of the present disclosure provide a technique for processing log data that enables real-time analysis that is scalable for the multitude of log data generated by many software and infrastructure components of a
computer system 100. In contrast to conventional approaches, embodiments described herein advantageously reduces the need for multiple passes over the same dataset or the need for active intervention in the form of feedback and training to properly analyze data. Embodiments of the present disclosure provide a system for unsupervised, approximate clustering of log data that provides volume- and pattern-based anomaly detection. - Although one or more embodiments of the present disclosure have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
- The various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities which usually, though not necessarily, take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the disclosure may be useful machine operations. In addition, one or more embodiments of the disclosure also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the description provided herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. One or more embodiments of the present disclosure may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system; computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD-ROM (Compact Disc-ROM), a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
- Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the disclosure(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s).
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/897,994 US9244755B2 (en) | 2013-05-20 | 2013-05-20 | Scalable log analytics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/897,994 US9244755B2 (en) | 2013-05-20 | 2013-05-20 | Scalable log analytics |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140344622A1 true US20140344622A1 (en) | 2014-11-20 |
US9244755B2 US9244755B2 (en) | 2016-01-26 |
Family
ID=51896800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/897,994 Active 2033-10-28 US9244755B2 (en) | 2013-05-20 | 2013-05-20 | Scalable log analytics |
Country Status (1)
Country | Link |
---|---|
US (1) | US9244755B2 (en) |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150094959A1 (en) * | 2013-10-02 | 2015-04-02 | Nec Laboratories America, Inc. | Heterogeneous log analysis |
US9104573B1 (en) * | 2013-09-16 | 2015-08-11 | Amazon Technologies, Inc. | Providing relevant diagnostic information using ontology rules |
US20150227598A1 (en) * | 2014-02-13 | 2015-08-13 | Amazon Technologies, Inc. | Log data service in a virtual environment |
US20150370799A1 (en) * | 2014-06-24 | 2015-12-24 | Vmware, Inc. | Method and system for clustering and prioritizing event messages |
US20150370885A1 (en) * | 2014-06-24 | 2015-12-24 | Vmware, Inc. | Method and system for clustering event messages and managing event-message clusters |
US20160098485A1 (en) * | 2014-10-05 | 2016-04-07 | Splunk Inc. | Field Value Search Drill Down |
US20160132566A1 (en) * | 2014-11-10 | 2016-05-12 | Red Hat, Inc. | Native federation view suggestion |
CN105824744A (en) * | 2016-03-21 | 2016-08-03 | 焦点科技股份有限公司 | Real-time log collection and analysis method on basis of B2B (Business to Business) platform |
US20160224531A1 (en) | 2015-01-30 | 2016-08-04 | Splunk Inc. | Suggested Field Extraction |
US20160259869A1 (en) * | 2015-03-02 | 2016-09-08 | Ca, Inc. | Self-learning simulation environments |
US20160357960A1 (en) * | 2015-06-03 | 2016-12-08 | Fujitsu Limited | Computer-readable storage medium, abnormality detection device, and abnormality detection method |
US9524397B1 (en) | 2015-07-06 | 2016-12-20 | Bank Of America Corporation | Inter-system data forensics |
US20170004188A1 (en) * | 2015-06-30 | 2017-01-05 | Ca, Inc. | Apparatus and Method for Graphically Displaying Transaction Logs |
US20170063762A1 (en) * | 2015-09-01 | 2017-03-02 | Sap Portals Israel Ltd | Event log analyzer |
US20170068709A1 (en) * | 2015-09-09 | 2017-03-09 | International Business Machines Corporation | Scalable and accurate mining of control flow from execution logs across distributed systems |
US9633106B1 (en) | 2011-06-30 | 2017-04-25 | Sumo Logic | Log data analysis |
US9639443B2 (en) * | 2015-03-02 | 2017-05-02 | Ca, Inc. | Multi-component and mixed-reality simulation environments |
US20170134408A1 (en) * | 2015-11-10 | 2017-05-11 | Sap Se | Standard metadata model for analyzing events with fraud, attack, or any other malicious background |
US20170139766A1 (en) * | 2015-11-16 | 2017-05-18 | International Business Machines Corporation | Management of computing machines with troubleshooting prioritization |
WO2017087437A1 (en) * | 2015-11-17 | 2017-05-26 | Nec Laboratories America, Inc. | Fast pattern discovery for log analytics |
US9740755B2 (en) | 2014-09-30 | 2017-08-22 | Splunk, Inc. | Event limited field picker |
US9842160B2 (en) | 2015-01-30 | 2017-12-12 | Splunk, Inc. | Defining fields from particular occurences of field labels in events |
US9916346B2 (en) | 2015-01-30 | 2018-03-13 | Splunk Inc. | Interactive command entry list |
US9922084B2 (en) | 2015-01-30 | 2018-03-20 | Splunk Inc. | Events sets in a visually distinct display format |
US9977803B2 (en) | 2015-01-30 | 2018-05-22 | Splunk Inc. | Column-based table manipulation of event data |
US20180144041A1 (en) * | 2016-11-21 | 2018-05-24 | International Business Machines Corporation | Transaction discovery in a log sequence |
WO2018118379A1 (en) * | 2016-12-21 | 2018-06-28 | Mastercard International Incorporated | Systems and methods for real time computer fault evaluation |
US10013454B2 (en) | 2015-01-30 | 2018-07-03 | Splunk Inc. | Text-based table manipulation of event data |
US20180203757A1 (en) * | 2017-01-16 | 2018-07-19 | Hitachi, Ltd. | Log message grouping apparatus, log message grouping system, and log message grouping method |
US10061824B2 (en) | 2015-01-30 | 2018-08-28 | Splunk Inc. | Cell-based table manipulation of event data |
KR101909957B1 (en) * | 2018-04-03 | 2018-12-19 | 큐비트시큐리티 주식회사 | Web traffic logging system and method for detecting web hacking in real time |
US10185740B2 (en) | 2014-09-30 | 2019-01-22 | Splunk Inc. | Event selector to generate alternate views |
US10237295B2 (en) * | 2016-03-22 | 2019-03-19 | Nec Corporation | Automated event ID field analysis on heterogeneous logs |
WO2019066295A1 (en) * | 2017-09-28 | 2019-04-04 | 큐비트시큐리티 주식회사 | Web traffic logging system and method for detecting web hacking in real time |
US10311171B2 (en) | 2015-03-02 | 2019-06-04 | Ca, Inc. | Multi-component and mixed-reality simulation environments |
CN109918349A (en) * | 2019-02-25 | 2019-06-21 | 网易(杭州)网络有限公司 | Log processing method, device, storage medium and electronic device |
US10333805B2 (en) | 2017-04-21 | 2019-06-25 | Nec Corporation | Ultra-fast pattern generation algorithm for the heterogeneous logs |
US10394868B2 (en) | 2015-10-23 | 2019-08-27 | International Business Machines Corporation | Generating important values from a variety of server log files |
US10402428B2 (en) * | 2013-04-29 | 2019-09-03 | Moogsoft Inc. | Event clustering system |
US10423597B2 (en) * | 2016-03-27 | 2019-09-24 | International Business Machines Corporation | Data set visualizer for tree based file systems |
US10445311B1 (en) * | 2013-09-11 | 2019-10-15 | Sumo Logic | Anomaly detection |
US10462170B1 (en) * | 2016-11-21 | 2019-10-29 | Alert Logic, Inc. | Systems and methods for log and snort synchronized threat detection |
US10567409B2 (en) | 2017-03-20 | 2020-02-18 | Nec Corporation | Automatic and scalable log pattern learning in security log analysis |
US10664535B1 (en) | 2015-02-02 | 2020-05-26 | Amazon Technologies, Inc. | Retrieving log data from metric data |
US10678669B2 (en) | 2017-04-21 | 2020-06-09 | Nec Corporation | Field content based pattern generation for heterogeneous logs |
CN111427737A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Method and device for modifying exception log and electronic equipment |
US10726037B2 (en) | 2015-01-30 | 2020-07-28 | Splunk Inc. | Automatic field extraction from filed values |
US10733002B1 (en) * | 2016-06-28 | 2020-08-04 | Amazon Technologies, Inc. | Virtual machine instance data aggregation |
US10740212B2 (en) | 2017-06-01 | 2020-08-11 | Nec Corporation | Content-level anomaly detector for systems with limited memory |
US10896175B2 (en) | 2015-01-30 | 2021-01-19 | Splunk Inc. | Extending data processing pipelines using dependent queries |
US10929765B2 (en) | 2016-12-15 | 2021-02-23 | Nec Corporation | Content-level anomaly detection for heterogeneous logs |
US20210064500A1 (en) * | 2019-08-30 | 2021-03-04 | Dell Products, Lp | System and Method for Detecting Anomalies by Discovering Sequences in Log Entries |
WO2021067858A1 (en) * | 2019-10-03 | 2021-04-08 | Oracle International Corporation | Enhanced anomaly detection in computing environments |
US11231840B1 (en) * | 2014-10-05 | 2022-01-25 | Splunk Inc. | Statistics chart row mode drill down |
US11329860B2 (en) * | 2015-01-27 | 2022-05-10 | Moogsoft Inc. | System for decomposing events that includes user interface |
KR20220077184A (en) * | 2020-11-30 | 2022-06-09 | 가천대학교 산학협력단 | System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same |
US11442924B2 (en) | 2015-01-30 | 2022-09-13 | Splunk Inc. | Selective filtered summary graph |
US11544248B2 (en) | 2015-01-30 | 2023-01-03 | Splunk Inc. | Selective query loading across query interfaces |
US11604715B2 (en) * | 2017-01-26 | 2023-03-14 | International Business Machines Corporation | Generation of end-user sessions from end-user events identified from computer system logs |
US11615073B2 (en) | 2015-01-30 | 2023-03-28 | Splunk Inc. | Supplementing events displayed in a table format |
US11847480B2 (en) * | 2016-06-21 | 2023-12-19 | Amazon Technologies, Inc. | System for detecting impairment issues of distributed hosts |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2513885B (en) * | 2013-05-08 | 2021-04-07 | Xyratex Tech Limited | Methods of clustering computational event logs |
US10055276B2 (en) | 2016-11-09 | 2018-08-21 | International Business Machines Corporation | Probabilistic detect identification |
US10642677B2 (en) | 2017-11-02 | 2020-05-05 | International Business Machines Corporation | Log-based diagnosis for declarative-deployed applications |
US11120213B2 (en) * | 2018-01-25 | 2021-09-14 | Vmware, Inc. | Intelligent verification of presentation of a user interface |
US11195115B2 (en) | 2018-02-21 | 2021-12-07 | Red Hat Israel, Ltd. | File format prediction based on relative frequency of a character in the file |
CN109343985B (en) * | 2018-08-03 | 2021-10-22 | 联想(北京)有限公司 | Data processing method, device and storage medium |
US11403207B2 (en) * | 2020-02-28 | 2022-08-02 | Microsoft Technology Licensing, Llc. | Detection of runtime errors using machine learning |
US11314510B2 (en) | 2020-08-14 | 2022-04-26 | International Business Machines Corporation | Tracking load and store instructions and addresses in an out-of-order processor |
US11321165B2 (en) | 2020-09-22 | 2022-05-03 | International Business Machines Corporation | Data selection and sampling system for log parsing and anomaly detection in cloud microservices |
US11243835B1 (en) | 2020-12-03 | 2022-02-08 | International Business Machines Corporation | Message-based problem diagnosis and root cause analysis |
US11513930B2 (en) | 2020-12-03 | 2022-11-29 | International Business Machines Corporation | Log-based status modeling and problem diagnosis for distributed applications |
US11474892B2 (en) | 2020-12-03 | 2022-10-18 | International Business Machines Corporation | Graph-based log sequence anomaly detection and problem diagnosis |
US11797538B2 (en) | 2020-12-03 | 2023-10-24 | International Business Machines Corporation | Message correlation extraction for mainframe operation |
US11599404B2 (en) | 2020-12-03 | 2023-03-07 | International Business Machines Corporation | Correlation-based multi-source problem diagnosis |
US11403326B2 (en) | 2020-12-03 | 2022-08-02 | International Business Machines Corporation | Message-based event grouping for a computing operation |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050223027A1 (en) * | 2004-03-31 | 2005-10-06 | Lawrence Stephen R | Methods and systems for structuring event data in a database for location and retrieval |
US20090113246A1 (en) * | 2007-10-24 | 2009-04-30 | Sivan Sabato | Apparatus for and Method of Implementing system Log Message Ranking via System Behavior Analysis |
US7778419B2 (en) * | 2005-05-10 | 2010-08-17 | Research In Motion Limited | Key masking for cryptographic processes |
US7925678B2 (en) * | 2007-01-12 | 2011-04-12 | Loglogic, Inc. | Customized reporting and mining of event data |
US20110119219A1 (en) * | 2009-11-17 | 2011-05-19 | Naifeh Gregory P | Method and apparatus for analyzing system events |
US20110131453A1 (en) * | 2009-12-02 | 2011-06-02 | International Business Machines Corporation | Automatic analysis of log entries through use of clustering |
US20110185234A1 (en) * | 2010-01-28 | 2011-07-28 | Ira Cohen | System event logs |
US20110296244A1 (en) * | 2010-05-25 | 2011-12-01 | Microsoft Corporation | Log message anomaly detection |
US8073806B2 (en) * | 2007-06-22 | 2011-12-06 | Avaya Inc. | Message log analysis for system behavior evaluation |
-
2013
- 2013-05-20 US US13/897,994 patent/US9244755B2/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050223027A1 (en) * | 2004-03-31 | 2005-10-06 | Lawrence Stephen R | Methods and systems for structuring event data in a database for location and retrieval |
US7778419B2 (en) * | 2005-05-10 | 2010-08-17 | Research In Motion Limited | Key masking for cryptographic processes |
US7925678B2 (en) * | 2007-01-12 | 2011-04-12 | Loglogic, Inc. | Customized reporting and mining of event data |
US8073806B2 (en) * | 2007-06-22 | 2011-12-06 | Avaya Inc. | Message log analysis for system behavior evaluation |
US20090113246A1 (en) * | 2007-10-24 | 2009-04-30 | Sivan Sabato | Apparatus for and Method of Implementing system Log Message Ranking via System Behavior Analysis |
US20110119219A1 (en) * | 2009-11-17 | 2011-05-19 | Naifeh Gregory P | Method and apparatus for analyzing system events |
US20110131453A1 (en) * | 2009-12-02 | 2011-06-02 | International Business Machines Corporation | Automatic analysis of log entries through use of clustering |
US20110185234A1 (en) * | 2010-01-28 | 2011-07-28 | Ira Cohen | System event logs |
US20110296244A1 (en) * | 2010-05-25 | 2011-12-01 | Microsoft Corporation | Log message anomaly detection |
Non-Patent Citations (1)
Title |
---|
Yang et al.; "Near-Duplicate Detection by Instance-level Constrained Clustering;" SIGIR '06; August 2006; pp. 421-428. * |
Cited By (119)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9633106B1 (en) | 2011-06-30 | 2017-04-25 | Sumo Logic | Log data analysis |
US10402428B2 (en) * | 2013-04-29 | 2019-09-03 | Moogsoft Inc. | Event clustering system |
US11853290B2 (en) * | 2013-09-11 | 2023-12-26 | Sumo Logic, Inc. | Anomaly detection |
US11314723B1 (en) * | 2013-09-11 | 2022-04-26 | Sumo Logic, Inc. | Anomaly detection |
US20220207020A1 (en) * | 2013-09-11 | 2022-06-30 | Sumo Logic, Inc. | Anomaly detection |
US10445311B1 (en) * | 2013-09-11 | 2019-10-15 | Sumo Logic | Anomaly detection |
US9104573B1 (en) * | 2013-09-16 | 2015-08-11 | Amazon Technologies, Inc. | Providing relevant diagnostic information using ontology rules |
US10114148B2 (en) * | 2013-10-02 | 2018-10-30 | Nec Corporation | Heterogeneous log analysis |
US20150094959A1 (en) * | 2013-10-02 | 2015-04-02 | Nec Laboratories America, Inc. | Heterogeneous log analysis |
US20150227598A1 (en) * | 2014-02-13 | 2015-08-13 | Amazon Technologies, Inc. | Log data service in a virtual environment |
US10133741B2 (en) * | 2014-02-13 | 2018-11-20 | Amazon Technologies, Inc. | Log data service in a virtual environment |
US10120928B2 (en) * | 2014-06-24 | 2018-11-06 | Vmware, Inc. | Method and system for clustering event messages and managing event-message clusters |
US20150370799A1 (en) * | 2014-06-24 | 2015-12-24 | Vmware, Inc. | Method and system for clustering and prioritizing event messages |
US20150370885A1 (en) * | 2014-06-24 | 2015-12-24 | Vmware, Inc. | Method and system for clustering event messages and managing event-message clusters |
US9922099B2 (en) | 2014-09-30 | 2018-03-20 | Splunk Inc. | Event limited field picker |
US9740755B2 (en) | 2014-09-30 | 2017-08-22 | Splunk, Inc. | Event limited field picker |
US10185740B2 (en) | 2014-09-30 | 2019-01-22 | Splunk Inc. | Event selector to generate alternate views |
US10303344B2 (en) * | 2014-10-05 | 2019-05-28 | Splunk Inc. | Field value search drill down |
US20220155943A1 (en) * | 2014-10-05 | 2022-05-19 | Splunk Inc. | Statistics chart row mode drill down |
US10444956B2 (en) | 2014-10-05 | 2019-10-15 | Splunk Inc. | Row drill down of an event statistics time chart |
US11614856B2 (en) | 2014-10-05 | 2023-03-28 | Splunk Inc. | Row-based event subset display based on field metrics |
US10795555B2 (en) | 2014-10-05 | 2020-10-06 | Splunk Inc. | Statistics value chart interface row mode drill down |
US20160098485A1 (en) * | 2014-10-05 | 2016-04-07 | Splunk Inc. | Field Value Search Drill Down |
US11003337B2 (en) | 2014-10-05 | 2021-05-11 | Splunk Inc. | Executing search commands based on selection on field values displayed in a statistics table |
US11816316B2 (en) | 2014-10-05 | 2023-11-14 | Splunk Inc. | Event identification based on cells associated with aggregated metrics |
US11231840B1 (en) * | 2014-10-05 | 2022-01-25 | Splunk Inc. | Statistics chart row mode drill down |
US9921730B2 (en) | 2014-10-05 | 2018-03-20 | Splunk Inc. | Statistics time chart interface row mode drill down |
US11455087B2 (en) * | 2014-10-05 | 2022-09-27 | Splunk Inc. | Generating search commands based on field-value pair selections |
US10261673B2 (en) | 2014-10-05 | 2019-04-16 | Splunk Inc. | Statistics value chart interface cell mode drill down |
US10599308B2 (en) | 2014-10-05 | 2020-03-24 | Splunk Inc. | Executing search commands based on selections of time increments and field-value pairs |
US11687219B2 (en) * | 2014-10-05 | 2023-06-27 | Splunk Inc. | Statistics chart row mode drill down |
US11868158B1 (en) * | 2014-10-05 | 2024-01-09 | Splunk Inc. | Generating search commands based on selected search options |
US10139997B2 (en) | 2014-10-05 | 2018-11-27 | Splunk Inc. | Statistics time chart interface cell mode drill down |
US20160132566A1 (en) * | 2014-11-10 | 2016-05-12 | Red Hat, Inc. | Native federation view suggestion |
US9864786B2 (en) * | 2014-11-10 | 2018-01-09 | Red Hat, Inc. | Native federation view suggestion |
US11329860B2 (en) * | 2015-01-27 | 2022-05-10 | Moogsoft Inc. | System for decomposing events that includes user interface |
US11868364B1 (en) | 2015-01-30 | 2024-01-09 | Splunk Inc. | Graphical user interface for extracting from extracted fields |
US10877963B2 (en) | 2015-01-30 | 2020-12-29 | Splunk Inc. | Command entry list for modifying a search query |
US10061824B2 (en) | 2015-01-30 | 2018-08-28 | Splunk Inc. | Cell-based table manipulation of event data |
US11544248B2 (en) | 2015-01-30 | 2023-01-03 | Splunk Inc. | Selective query loading across query interfaces |
US10013454B2 (en) | 2015-01-30 | 2018-07-03 | Splunk Inc. | Text-based table manipulation of event data |
US11907271B2 (en) | 2015-01-30 | 2024-02-20 | Splunk Inc. | Distinguishing between fields in field value extraction |
US10915583B2 (en) | 2015-01-30 | 2021-02-09 | Splunk Inc. | Suggested field extraction |
US11442924B2 (en) | 2015-01-30 | 2022-09-13 | Splunk Inc. | Selective filtered summary graph |
US9977803B2 (en) | 2015-01-30 | 2018-05-22 | Splunk Inc. | Column-based table manipulation of event data |
US9922084B2 (en) | 2015-01-30 | 2018-03-20 | Splunk Inc. | Events sets in a visually distinct display format |
US10896175B2 (en) | 2015-01-30 | 2021-01-19 | Splunk Inc. | Extending data processing pipelines using dependent queries |
US11341129B2 (en) | 2015-01-30 | 2022-05-24 | Splunk Inc. | Summary report overlay |
US11841908B1 (en) | 2015-01-30 | 2023-12-12 | Splunk Inc. | Extraction rule determination based on user-selected text |
US9916346B2 (en) | 2015-01-30 | 2018-03-13 | Splunk Inc. | Interactive command entry list |
US11741086B2 (en) | 2015-01-30 | 2023-08-29 | Splunk Inc. | Queries based on selected subsets of textual representations of events |
US10949419B2 (en) | 2015-01-30 | 2021-03-16 | Splunk Inc. | Generation of search commands via text-based selections |
US9842160B2 (en) | 2015-01-30 | 2017-12-12 | Splunk, Inc. | Defining fields from particular occurences of field labels in events |
US11531713B2 (en) | 2015-01-30 | 2022-12-20 | Splunk Inc. | Suggested field extraction |
US11409758B2 (en) | 2015-01-30 | 2022-08-09 | Splunk Inc. | Field value and label extraction from a field value |
US11030192B2 (en) | 2015-01-30 | 2021-06-08 | Splunk Inc. | Updates to access permissions of sub-queries at run time |
US10846316B2 (en) | 2015-01-30 | 2020-11-24 | Splunk Inc. | Distinct field name assignment in automatic field extraction |
US11068452B2 (en) | 2015-01-30 | 2021-07-20 | Splunk Inc. | Column-based table manipulation of event data to add commands to a search query |
US11354308B2 (en) | 2015-01-30 | 2022-06-07 | Splunk Inc. | Visually distinct display format for data portions from events |
US11615073B2 (en) | 2015-01-30 | 2023-03-28 | Splunk Inc. | Supplementing events displayed in a table format |
US11222014B2 (en) | 2015-01-30 | 2022-01-11 | Splunk Inc. | Interactive table-based query construction using interface templates |
US20160224531A1 (en) | 2015-01-30 | 2016-08-04 | Splunk Inc. | Suggested Field Extraction |
US10726037B2 (en) | 2015-01-30 | 2020-07-28 | Splunk Inc. | Automatic field extraction from filed values |
US11573959B2 (en) | 2015-01-30 | 2023-02-07 | Splunk Inc. | Generating search commands based on cell selection within data tables |
US11544257B2 (en) | 2015-01-30 | 2023-01-03 | Splunk Inc. | Interactive table-based query construction using contextual forms |
US10664535B1 (en) | 2015-02-02 | 2020-05-26 | Amazon Technologies, Inc. | Retrieving log data from metric data |
US10311171B2 (en) | 2015-03-02 | 2019-06-04 | Ca, Inc. | Multi-component and mixed-reality simulation environments |
US20160259869A1 (en) * | 2015-03-02 | 2016-09-08 | Ca, Inc. | Self-learning simulation environments |
US9639443B2 (en) * | 2015-03-02 | 2017-05-02 | Ca, Inc. | Multi-component and mixed-reality simulation environments |
US20160357960A1 (en) * | 2015-06-03 | 2016-12-08 | Fujitsu Limited | Computer-readable storage medium, abnormality detection device, and abnormality detection method |
US20170004188A1 (en) * | 2015-06-30 | 2017-01-05 | Ca, Inc. | Apparatus and Method for Graphically Displaying Transaction Logs |
US9524397B1 (en) | 2015-07-06 | 2016-12-20 | Bank Of America Corporation | Inter-system data forensics |
US10587555B2 (en) * | 2015-09-01 | 2020-03-10 | Sap Portals Israel Ltd. | Event log analyzer |
US20170063762A1 (en) * | 2015-09-01 | 2017-03-02 | Sap Portals Israel Ltd | Event log analyzer |
US10140287B2 (en) * | 2015-09-09 | 2018-11-27 | International Business Machines Corporation | Scalable and accurate mining of control flow from execution logs across distributed systems |
US20170068709A1 (en) * | 2015-09-09 | 2017-03-09 | International Business Machines Corporation | Scalable and accurate mining of control flow from execution logs across distributed systems |
US10394868B2 (en) | 2015-10-23 | 2019-08-27 | International Business Machines Corporation | Generating important values from a variety of server log files |
US20170134408A1 (en) * | 2015-11-10 | 2017-05-11 | Sap Se | Standard metadata model for analyzing events with fraud, attack, or any other malicious background |
US9876809B2 (en) * | 2015-11-10 | 2018-01-23 | Sap Se | Standard metadata model for analyzing events with fraud, attack, or any other malicious background |
US10078542B2 (en) * | 2015-11-16 | 2018-09-18 | International Business Machines Corporation | Management of computing machines with troubleshooting prioritization |
US10831584B2 (en) | 2015-11-16 | 2020-11-10 | International Business Machines Corporation | Management of computing machines with troubleshooting prioritization |
US20170139766A1 (en) * | 2015-11-16 | 2017-05-18 | International Business Machines Corporation | Management of computing machines with troubleshooting prioritization |
WO2017087437A1 (en) * | 2015-11-17 | 2017-05-26 | Nec Laboratories America, Inc. | Fast pattern discovery for log analytics |
CN105824744A (en) * | 2016-03-21 | 2016-08-03 | 焦点科技股份有限公司 | Real-time log collection and analysis method on basis of B2B (Business to Business) platform |
US10237295B2 (en) * | 2016-03-22 | 2019-03-19 | Nec Corporation | Automated event ID field analysis on heterogeneous logs |
US10423597B2 (en) * | 2016-03-27 | 2019-09-24 | International Business Machines Corporation | Data set visualizer for tree based file systems |
US10929368B2 (en) | 2016-03-27 | 2021-02-23 | International Business Machines Corporation | Data set visualizer for tree based file systems |
US11847480B2 (en) * | 2016-06-21 | 2023-12-19 | Amazon Technologies, Inc. | System for detecting impairment issues of distributed hosts |
US10733002B1 (en) * | 2016-06-28 | 2020-08-04 | Amazon Technologies, Inc. | Virtual machine instance data aggregation |
US10462170B1 (en) * | 2016-11-21 | 2019-10-29 | Alert Logic, Inc. | Systems and methods for log and snort synchronized threat detection |
US20180144041A1 (en) * | 2016-11-21 | 2018-05-24 | International Business Machines Corporation | Transaction discovery in a log sequence |
US10740360B2 (en) * | 2016-11-21 | 2020-08-11 | International Business Machines Corporation | Transaction discovery in a log sequence |
US10929765B2 (en) | 2016-12-15 | 2021-02-23 | Nec Corporation | Content-level anomaly detection for heterogeneous logs |
US11157343B2 (en) | 2016-12-21 | 2021-10-26 | Mastercard International Incorporated | Systems and methods for real time computer fault evaluation |
WO2018118379A1 (en) * | 2016-12-21 | 2018-06-28 | Mastercard International Incorporated | Systems and methods for real time computer fault evaluation |
US10331507B2 (en) | 2016-12-21 | 2019-06-25 | Mastercard International Incorporated | Systems and methods for real time computer fault evaluation |
US20180203757A1 (en) * | 2017-01-16 | 2018-07-19 | Hitachi, Ltd. | Log message grouping apparatus, log message grouping system, and log message grouping method |
US10579461B2 (en) * | 2017-01-16 | 2020-03-03 | Hitachi, Ltd. | Log message grouping apparatus, log message grouping system, and log message grouping method |
US11604715B2 (en) * | 2017-01-26 | 2023-03-14 | International Business Machines Corporation | Generation of end-user sessions from end-user events identified from computer system logs |
US10855707B2 (en) * | 2017-03-20 | 2020-12-01 | Nec Corporation | Security system using automatic and scalable log pattern learning in security log analysis |
US11196758B2 (en) | 2017-03-20 | 2021-12-07 | Nec Corporation | Method and system for enabling automated log analysis with controllable resource requirements |
US10567409B2 (en) | 2017-03-20 | 2020-02-18 | Nec Corporation | Automatic and scalable log pattern learning in security log analysis |
US10678669B2 (en) | 2017-04-21 | 2020-06-09 | Nec Corporation | Field content based pattern generation for heterogeneous logs |
US10333805B2 (en) | 2017-04-21 | 2019-06-25 | Nec Corporation | Ultra-fast pattern generation algorithm for the heterogeneous logs |
US10740212B2 (en) | 2017-06-01 | 2020-08-11 | Nec Corporation | Content-level anomaly detector for systems with limited memory |
WO2019066295A1 (en) * | 2017-09-28 | 2019-04-04 | 큐비트시큐리티 주식회사 | Web traffic logging system and method for detecting web hacking in real time |
CN109845228A (en) * | 2017-09-28 | 2019-06-04 | 量子位安全有限公司 | Network traffic recording system and method for the attack of real-time detection network hacker |
EP3691217A4 (en) * | 2017-09-28 | 2021-05-12 | Qubit Security Inc. | Web traffic logging system and method for detecting web hacking in real time |
JP2019533841A (en) * | 2017-09-28 | 2019-11-21 | キュービット セキュリティ インコーポレーテッドQubit Security Inc. | Web traffic logging system and method for real-time detection of web hacking |
US11140181B2 (en) * | 2017-09-28 | 2021-10-05 | Qubit Security Inc. | Web traffic logging system and method for detecting web hacking in real time |
KR101909957B1 (en) * | 2018-04-03 | 2018-12-19 | 큐비트시큐리티 주식회사 | Web traffic logging system and method for detecting web hacking in real time |
CN111427737A (en) * | 2019-01-09 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Method and device for modifying exception log and electronic equipment |
CN109918349A (en) * | 2019-02-25 | 2019-06-21 | 网易(杭州)网络有限公司 | Log processing method, device, storage medium and electronic device |
US11513935B2 (en) * | 2019-08-30 | 2022-11-29 | Dell Products L.P. | System and method for detecting anomalies by discovering sequences in log entries |
US20210064500A1 (en) * | 2019-08-30 | 2021-03-04 | Dell Products, Lp | System and Method for Detecting Anomalies by Discovering Sequences in Log Entries |
WO2021067858A1 (en) * | 2019-10-03 | 2021-04-08 | Oracle International Corporation | Enhanced anomaly detection in computing environments |
EP4250116A3 (en) * | 2019-10-03 | 2024-04-10 | Oracle International Corporation | Enhanced anomaly detection in computing environments |
KR102425525B1 (en) | 2020-11-30 | 2022-07-26 | 가천대학교 산학협력단 | System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same |
KR20220077184A (en) * | 2020-11-30 | 2022-06-09 | 가천대학교 산학협력단 | System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same |
Also Published As
Publication number | Publication date |
---|---|
US9244755B2 (en) | 2016-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9244755B2 (en) | Scalable log analytics | |
US10761687B2 (en) | User interface that facilitates node pinning for monitoring and analysis of performance in a computing environment | |
US10205643B2 (en) | Systems and methods for monitoring and analyzing performance in a computer system with severity-state sorting | |
US10515469B2 (en) | Proactive monitoring tree providing pinned performance information associated with a selected node | |
US10243818B2 (en) | User interface that provides a proactive monitoring tree with state distribution ring | |
US10042834B2 (en) | Dynamic field extraction of data | |
US9319288B2 (en) | Graphical user interface for displaying information related to a virtual machine network | |
US10095731B2 (en) | Dynamically converting search-time fields to ingest-time fields | |
US11762893B2 (en) | Creation of a summary for a plurality of texts | |
US20170357710A1 (en) | Clustering log messages using probabilistic data structures | |
US9607029B1 (en) | Optimized mapping of documents to candidate duplicate documents in a document corpus | |
US20240020405A1 (en) | Extracted field generation to filter log messages | |
US11757736B2 (en) | Prescriptive analytics for network services |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: VMWARE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUANG, MARK;LIN, JUNYUAN;SIGNING DATES FROM 20130528 TO 20130724;REEL/FRAME:030874/0299 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |