US20140344622A1 - Scalable Log Analytics - Google Patents

Scalable Log Analytics Download PDF

Info

Publication number
US20140344622A1
US20140344622A1 US13/897,994 US201313897994A US2014344622A1 US 20140344622 A1 US20140344622 A1 US 20140344622A1 US 201313897994 A US201313897994 A US 201313897994A US 2014344622 A1 US2014344622 A1 US 2014344622A1
Authority
US
United States
Prior art keywords
log
event
message
determining
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/897,994
Other versions
US9244755B2 (en
Inventor
Mark Huang
Junyuan LIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VMware LLC
Original Assignee
VMware LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by VMware LLC filed Critical VMware LLC
Priority to US13/897,994 priority Critical patent/US9244755B2/en
Assigned to VMWARE, INC. reassignment VMWARE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LIN, JUNYUAN, HUANG, MARK
Publication of US20140344622A1 publication Critical patent/US20140344622A1/en
Application granted granted Critical
Publication of US9244755B2 publication Critical patent/US9244755B2/en
Assigned to VMware LLC reassignment VMware LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: VMWARE, INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0769Readable error formats, e.g. cross-platform generic formats, human understandable formats

Definitions

  • System administrators provide virtualized computing infrastructure, which typically includes a plurality of virtual machines executing on a shared set of physical hardware components, to offer highly available, fault-tolerant distributed systems.
  • a large-scale virtualized infrastructure may have many (e.g., thousands) of virtual machines running on many of physical machines.
  • High availability requirements provide system administrators with little time to diagnose or bring down parts of infrastructure for maintenance.
  • Fault-tolerant features ensure the virtualized computing infrastructure continues to operate when problems arise, but generates many intermediate states that have to be reconciled and addressed. As such, identifying, debugging, and resolving failures and performance issues for virtualized computing environments have become increasingly challenging.
  • One or more embodiments disclosed herein provide a method for providing real-time analysis of log messages for a computer infrastructure.
  • the method includes receiving a plurality of log messages including a first log message, and generating a sketch associated with the first log message.
  • the sketch may be generated based on words contained in the first log message.
  • the method further includes determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index.
  • Log messages of a same message type have similar sketches.
  • the method includes determining a first log event associated with one or more log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the associated log messages.
  • the method further includes determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index, and determining an anomalous log event within the plurality of log messages based on the classification for the first log event.
  • FIG. 1A depicts a block diagram that illustrates a computing system with which one or more embodiments of the present disclosure may be utilized.
  • FIG. 1B is a block diagram that illustrates a virtualized computing system with which one or more embodiments of the present disclosure may be utilized.
  • FIG. 2 is a block diagram that illustrates a workflow for analyzing log data of the computing system, according to one embodiment of the present disclosure.
  • FIGS. 3A-3B are block diagrams that depict examples of event pattern and event volume anomalies, according to one embodiment of the present disclosure.
  • FIG. 4 is a flow diagram that illustrates steps for a method for analyzing log data of a computing system, according to an embodiment of the present disclosure.
  • log data sometimes referred to as runtime logs, error logs, debugging logs
  • log data is reduced in both volume and level of detail by first classifying messages into types by content similarity. The log data is then reduced further by grouping bursts of messages into log events. Patterns in log events, such as the collection and number of different messages types that comprise each log event, can be used to identify anomalous events within the log data. For example, patterns in the log events may be used to detect when log events occur that differ in message type composition, or when log events occur that differ in frequency of occurrence over time.
  • FIG. 1A is a block diagram that illustrates a computing system 100 with which one or more embodiments of the present invention may be utilized.
  • computing system 100 includes a plurality of server systems, identified as server system 102 - 1 , 102 - 2 , 102 - 3 , and referred to collectively as servers 102 .
  • Each server 102 includes CPU 104 , memory 106 , networking interface 110 , storage interface 114 , and other conventional components of a computing device.
  • Each server 102 further includes an operating system 120 configured to manage execution of one or more applications 122 using the computing resources (e.g., CPU 104 , memory 106 , networking interface 110 , storage interface 114 ).
  • log data may indicate the state, and state transitions, that occur during operation, and may record occurrences of failures, as well as unexpected and undesirable events.
  • log data may be unstructured text comprised of a plurality of log messages, including status updates, error messages, stack traces, and debugging messages.
  • log analytics module 132 configured to store and analyze in real-time log data 134 from software and infrastructure components of computing system 100 .
  • Log analytics module 132 may include a log index 136 configured to cache (and later query) results of the analysis of log data.
  • Log analytics module 132 reduces the volume and level of details of the log data to enable a user (e.g., system administrator) to diagnose and troubleshoot issues within computing system 100 .
  • log analytics module 132 is configured to parse a stream of log messages within log data and identify groups of log messages as logical events, referred to interchangeably as “log events” or “events”. To do so, log analytics module 132 is configured to classify log messages within a stream of log data as log message types that cluster together similar log messages. Log analytics module 132 is further configured to perform event detection on log messages within log data to group together log messages based on their occurrence close in time in a sequence. As described later, in one embodiment, events may be defined as a collection of log message types, and an occurrence of an event corresponds to a group of log messages having the requisite log message types appearing in log data. Log analytics module 132 may further identify anomalies within log data based on the message-type classifications and detected events, such as event volume anomalies and event pattern anomalies.
  • log analytics module 132 indicates to a user what one event (as reported by log messages) means in relation to other events in the log data and highlights events occurring within computing system 100 in context.
  • log analytics module 132 may highlight certain events in the context of being nearby in time to other events, such that if the certain events usually occur in a sequence, then events occurring out of that sequence may be notable.
  • log analytics module 132 may highlight certain events in the context of being similar to other events, such that similar events may be clustered and analyzed together rather than be considered separately.
  • log analytics module 132 may highlight certain events in the context of the hierarchical infrastructure of computing system 100 , such as being from the same thread, process, application, virtual machine, host, host group, data center, etc. The operations of log analytics module 132 are illustrated in greater detail in conjunction with FIG. 2 .
  • FIG. 1B is a block diagram that illustrates a computing system 150 with which one or more embodiments of the present disclosure may be utilized.
  • computing system 150 includes a host group 124 of host computers, identified as hosts 108 - 1 , 108 - 2 , 108 - 3 , and 108 - 4 , and referred to collectively as hosts 108 .
  • Each host 108 is configured to provide a virtualization layer that abstracts computing resources of a hardware platform 118 into multiple virtual machines (VMs) 112 that run concurrently on the same host 108 .
  • Hardware platform 118 of each host 108 may include conventional components of a computing device, such as a memory, processor, local storage, disk interface, and network interface.
  • the VMs 112 run on top of a software interface layer, referred to herein as a hypervisor 116 , that enables sharing of the hardware resources of host 108 by the virtual machines.
  • hypervisor 116 One example of hypervisor 116 that may be used in an embodiment described herein is a VMware ESXi hypervisor provided as part of the VMware vSphere solution made commercially available from VMware, Inc.
  • Hypervisor 116 may run on top of the operating system of host 108 or directly on hardware components of host 108 .
  • Each VM 112 includes a guest operating system (e.g., Microsoft Windows, Linux) and one or more guest applications and processes running on top of the guest operating system.
  • a guest operating system e.g., Microsoft Windows, Linux
  • computing system 150 includes virtualization management software 130 that may communicate with the plurality of hosts 108 via network 140 .
  • Virtualization management software 130 is configured to carry out administrative tasks for the computing system 100 , including managing hosts 108 , managing VMs running within each host 108 , provisioning VMs, migrating VMs from one host to another host, and load balancing between hosts 108 of host group 124 .
  • virtualization management software 130 is a computer program that resides and executes in a central server, which may reside in computing system 100 , or alternatively, running as a VM in one of hosts 108 .
  • a virtualization management software is the vCenter® Server product made available from VMware, Inc.
  • the software and infrastructure components of computing system 100 may generate large amount of log data in real-time during operation.
  • log analytics module 132 is depicted in FIG. 1B as a separate component that resides and executes on a separate server or virtual machine, it is appreciated that log analytics module 132 may alternatively reside in any one of the computing devices of the virtualized computing system 150 , for example, such as the same central server where the virtualization management software 130 resides.
  • log analytics module 132 may be embodied as a plug-in component configured to extend functionality of virtualization management software 130 .
  • Access to the log analytics module 132 can be achieved via a client application (not shown). For example, each analysis task, such as searching for log messages, filtering for log messages, analyzing log messages over a period of time, can be accomplished through the client application.
  • client application provides a stand-alone application version of the client application.
  • the client application is implemented as a web browser application that provides management access from any networked device.
  • FIG. 2 is a block diagram that illustrates a workflow for analyzing log data 134 of a computing infrastructure, according to one embodiment of the present disclosure. It should be recognized that, even though the workflow is described in conjunction with the system of FIG. 1A , any system configured to perform the illustrated technique is within the scope of embodiments of the disclosure.
  • log data 134 may include a plurality of individual log messages 202 - 1 to 202 - 5 (collectively referred to as log messages 202 ) generated over a period of time.
  • a log message may include a time stamp (e.g., “Sep 23 13:30”) indicating a date and time corresponding to the creation of the log message and a text description (e.g., “host1 sending 5738 files”). While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines.
  • a time stamp e.g., “Sep 23 13:30”
  • a text description e.g., “host1 sending 5738 files”. While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines.
  • log analytics module 132 may classify each log message 202 as a message type based on content similarity of the log messages. In some embodiments, the content similarity is performed on the text description portion of the log message 202 . In the example shown in FIG. 2 , log analytics module 132 processes log message 202 - 1 (i.e., “Sep 23 13:30 host1 sending 5738 files”) and assigns log message 202 - 1 a first message type 204 - 1 .
  • Log analytics module 132 then processes a second log message 202 - 2 (i.e., “Sep 23 13:31 host2 received 5700 files”) and determines the contents of second log message 202 - 2 are not sufficiently similar to first log message 202 - 1 and assigns a different, second message type 204 - 2 .
  • log analytics module 132 processes a third log message 202 - 3 (i.e., “Sep 23 13:32 host1 warning: 38 files pending”) and assigns a third message type 204 - 3 upon determining no content similarity with the other already processed log messages.
  • log messages having different message types are depicted in FIG. 2 as shapes having different patterns.
  • log analytics module 132 may determine content similarity of log messages according to a “sketching” algorithm that determines if log messages contain a number of words in common in the same relative position. Determination of content similarity and the sketching algorithm are described in greater detail below.
  • log analytics module 132 processes a fourth log message 202 - 4 (i.e., “Sep 23 14:00 host4 sending 382 files”) and determines content similarity with log message 202 - 1 . As such, log analytics module 132 assigns log message 202 - 4 the same first message type 204 - 1 as log message 202 - 1 , as depicted in FIG. 2 by identical patterned highlights or colors. Similarly, log analytics module 132 processes a fifth log message 202 - 5 (i.e., “Sep 23 14:01 host5 received 382 files”) and assigns the second message type 204 - 2 based on a determination of content similarity with log message 202 - 2 .
  • a fourth log message 202 - 4 i.e., “Sep 23 14:00 host4 sending 382 files”
  • log analytics module 132 assigns log message 202 - 4 the same first message type 204 - 1 as log message 202 - 1 , as depicted in FIG.
  • log analytics module 132 is configured to identify one or more log events 206 based on the timing of the log messages. In some embodiments, log analytics module 132 may group one or more log messages 202 into log events 206 according to a burst analysis algorithm. For example, log analytics module 132 identifies a first log event 206 - 1 that includes log messages 202 - 1 , 202 - 2 , 202 - 3 , which all occur approximately the same time at September 23, 13:30 and a second log event 206 - 2 that includes log messages 202 - 4 , 202 - 5 that all occur around September 23 14:00. In one embodiment, log analytics module 132 is configured to represent each identified log event 206 as a composition of message types of log messages.
  • an event type for a log event may be defined as a composition of tuples of message type and frequency.
  • a first event 206 - 1 may be characterized as a composition of one occurrence of message type 204 - 1 (e.g., “Sending . . . files”), one occurrence of message type 204 - 2 (e.g., “Received . . . files”), and one occurrence of message type 204 - 3 (e.g., “Warning . . . files pending”); and second event 206 - 2 may be characterized as a composition of one occurrence of message type 204 - 1 (e.g., “Sending . . . files”) and one occurrence of message type 204 - 2 (e.g., “Received . . . files”).
  • log analytics module 132 may identify anomalous events based on patterns of events from log data 134 , as shown in FIGS. 3A and 3B .
  • FIG. 3A is a chart 300 depicting an example of an event volume anomaly based on frequency of occurrence of events over time.
  • Log analytics module 132 may determine the number of events occurring per hour in a given time period, e.g., from 6:00 PM to 9:00 PM.
  • Chart 300 further illustrates a breakdown of event types for each hour, depicting occurrences of events similar to log events 206 - 1 and 206 - 2 . As an example, it may be normal within the computing system for approximately 20 events per hour to occur.
  • a sudden increase of events to 200 events per hour (e.g., at 19:00) and then to 500 events per hour (e.g., at 20:00), thereby exceeding some threshold value 302 can trigger log analytics module 132 to flag this as an anomalous occurrence of event volume.
  • FIG. 3B depicts an example of an event pattern anomaly based on events that are different in message type composition.
  • events 304 occurring at a given time are usually an event type similar to event 206 - 1 (i.e., events comprised of “Sending . . . files” log messages and “Received . . . files” log messages).
  • an unexpected or atypical event 306 may occur, such as event 306 , which is an event comprised of “Sending . . . files” log messages, “Received . . . files” log messages, and “Warning . . . files pending” log messages, which is different from the usual events.
  • log analytics 132 may determine an anomalous occurrence of a log event 306 (i.e., composed of message types 204 - 1 , 204 - 2 , and 204 - 3 ), that is different in composition from other log events (i.e., composed of message types 204 - 1 and 204 - 2 ).
  • FIG. 4 is a flow diagram that illustrates steps for a method 400 for providing real-time analysis of log messages for a computer infrastructure, according to an embodiment of the present disclosure. It should be recognized that, even though the method 400 is described in conjunction with the system of FIG. 1 , any system configured to perform the method steps is within the scope of embodiments of the disclosure.
  • log analytics module 132 receives a stream of log data 134 generated by software and infrastructure components of computing system 100 .
  • log data 134 may include a plurality of log messages.
  • log analytics module 132 may be configured to retrieve log data (e.g., log files) from software and infrastructure components of computing system 100 , including applications 122 , operation systems 120 , and in the case of virtualized computing system 150 , components such as hypervisors 116 , guest application and operating systems running within VMs 112 .
  • software and infrastructure components of computing system 100 may be configured to write log files to a common destination, such as an external storage, from which log analytics module 132 may periodically retrieve log data.
  • log data 134 may be transferred over network 140 directly to log analytics module 132 .
  • log analytics module 132 generates a compact integer representation, or “sketch,” of text content for a log message in the received log data.
  • a sketch associated with a log message is generated based on words of the log message.
  • two log messages may be considered similar if the log messages contain a number of words in common in the same relative positions.
  • sketches of log messages are computed such that similar log messages should have identical or substantially similar sketches.
  • a sketch of a log message may be an ordered list, or tuple, of fingerprint values corresponding to a subset of the words of the log message.
  • a sketch of a log message is tuple of fingerprints of “interesting” words of the log message.
  • Each interesting word of the log message e.g., “host1”
  • a fingerprint function such as a hash function.
  • a sketch generated for a log message “host1 sending 5738 files” may be a tuple of fingerprint values (753, 1034, 886) that corresponds to interesting words (host1, Sending, files).
  • a sketch for the log message “host4 Sending 382 files” can be computed as the tuple (1965, 1034, 886) that corresponds to interesting features (host4, Sending, files).
  • the sketches (753, 1034, 886) and (1965, 1034, 886) have identical values “1034” and “886” in same relative positions, the two log messages may be deemed similar.
  • sketches of log messages may be generated according to a sketching algorithm that uses N independent scoring functions to pick N “interesting” words of a log message, where “interesting” is determined according to each scoring function.
  • a scoring function is a hash function that computes a 32-bit integer given a word.
  • Score 1 ⁇ ( Word ) ( M 1 * Fingerprint ⁇ ( Word ) + A 1 ) ⁇ mod ⁇ ⁇ 2 32
  • Score 2 ⁇ ( Word ) ( M 2 * Fingerprint ⁇ ( Word ) + A 2 ) ⁇ mod ⁇ ⁇ 2 32
  • Score N ⁇ ( Word ) ( M N * Fingerprint ⁇ ( Word ) + A N ) ⁇ mod ⁇ ⁇ 2 32
  • Log analytics module 132 scores each word in the log message and selects the word having with the highest score (i.e., “most interesting”), according to that scoring function. As each scoring function selects one word in the log message, N scoring functions results in N words being selected. The fingerprints of these N words are then combined to form a sketch of the log message.
  • the four scoring functions may score the slightly different log message similarly:
  • the sketching algorithm as described herein is advantageously more robust to relative insertions or deletions of text. It has been determined that the insertion or deletion of an additional word relative to the original text is unlikely to change all or even a majority of the words selected by each scoring function.
  • a linear congruential generator may be used as a scoring function, though it should be recognized that other types of scoring functions can be used, including functions that are deterministic and produce uncorrelated results.
  • Log analytics module 132 determines a message type classification for the log message based on the corresponding sketch for the log message.
  • Log analytics module 132 classifies log messages having similar sketches to have the same message type. Such clustering helps reduce the number of log messages that need to be examiner by grouping the messages into a few number of message types that can then be highlighted. Accordingly, message type classification enable log analytics module to cluster together similar log messages to more effectively process and analyze a large volume of log data.
  • log analytics module 132 queries log index 136 to determine whether the log message is similar to a previously processed log message based on the corresponding sketches, and if so, assigns the log message a same message type as the previously processed log message, at step 408 .
  • log analytics module 132 queries log index 136 using the sketch (1965, 1034, 886, 1034) corresponding to the log message “host4 sending 208 files using SFTP protocol” and determines the log messages is similar to the previously processed log message “host1 sending 7182 files using SFTP protocol” based on the similarity with its corresponding sketch (753, 1034, 886, 1034).
  • log analytics module 132 assigns a new message type to the log message and inserts the log message into log index 136 .
  • each message type may be represented by a message type identifier, or “cluster ID.”
  • cluster ID a message type identifier
  • the log messages depicted in FIG. 2 may have the following sketches and corresponding cluster IDs (the sketches are shown as tuples of the most interesting words rather than the fingerprint values for clarity of illustration):
  • log analytics module 132 may provide the ability to search the received log messages based on a given cluster ID.
  • log analytics module 132 may use cluster ID as a search criteria for log messages that are similar to a particular log message (i.e., “find log messages “like this”) by querying for log messages having a particular cluster ID.
  • log analytics module 132 may use the cluster ID as a criteria for aggregation to generate statistics, such as the Top-5 message types per hour.
  • cluster ID may be content-based and enable calculation of message type classifications to be distributed.
  • log index 136 may include one or more hash tables that map fingerprint values to sketches for a given log message.
  • log index 136 may include N hash tables for mapping fingerprint values to sketches that contain N fingerprint values.
  • each fingerprint value in the sketch i.e., each column in the tuple ⁇ 1965, 1034, 886, 1034>
  • a candidate sketch must match in M different columns to be considered a match, where M is less than N.
  • the incoming log message belongs to that cluster and is assigned a same message type, and the sketch is not inserted into the log index. If no candidate are found, a new cluster is generated having a new message type, and the sketch is inserted into log index 136 .
  • log analytics module 132 may store a representation of each message type within log index 136 by storing a copy of a full log message.
  • Log analytics module 132 may use a textual differential algorithm (e.g., longest substring match) or other additional textual analysis to verify similarity of the incoming log message to a representative of the message type and override message type classification based on poor sketches.
  • the stored representation of each message type may be used to provide an example log message that is displayed to a user (e.g., system administrator) when presenting the statistics or graphical charts for the message type.
  • log analytics module 132 divides one or more log messages into log events based on burst analysis. It has been determined that log messages corresponding to events within computer system 100 may be created in bursts and close-in-time. For example, a burst of log messages may be recorded by applications and guest operating system whenever a virtual machine shuts down or restarts. In one embodiment, log analytics module 132 processes time stamps of log messages 202 and tracks time between log messages. In some embodiments, log analytics module 132 may determine and maintain an average time interval associated with an event duration. For example, log messages occurring within a 10-second duration may be candidates for being grouped together as a single log event.
  • Log analytics module 132 may associate one or more log messages occurring within the event duration to a log event 206 .
  • Log analytics module 132 may represent each log event as a composition of different message types, such as a list of tuples of a message type and corresponding frequency of occurrence.
  • one log event may be comprised of log messages having an occurrence of a “sending files” message type, two occurrences of a “received files” message type, and one occurrence of a “warning files pending” message type, and may be represented by a list of pairs having cluster ID and frequency: (22280, 1), (22281, 2), (22282, 1).
  • Log analytics module 132 may then cluster together similar log events, applying a technique similar to the technique applied above for clustering similar log messages.
  • log analytics module 132 queries log index 136 to determine whether a log event is similar to other log events based on the composition of message types that comprise the log event, and if so, assigns a same event type as the previously determined log events, at step 418 . Otherwise, at step 416 , log analytics module 132 assigns a new event type to the log event, and may insert the composition of the new event type into log index 136 .
  • log index 136 may further include additional hash tables that map cluster IDs to compositions of event types for a given log event.
  • each cluster ID may be used as a hash table lookup for candidate compositions that have some or all matching cluster IDs.
  • the event type of a log event is determined by performing lookups in the hash tables according to each pair of message type identifier and a corresponding frequency of occurrence. If at least one candidate event type is found, the detected log event may be determined similar to the corresponding log event and may be assigned the same event type. If no candidate is found, a new event cluster is generated having a new event type, and the representative composition of message types is inserted into log index 136 .
  • log analytics module 132 analyzes event clusters and detects an anomaly within event clusters based on the classification of log events. In some embodiments, log analytics module 132 may determine an occurrence of an “incomplete” event or a gross deviation from an expected event.
  • an expected log event may be a composition of message types (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1)
  • an incomplete log event may be detected upon determining an occurrence of a log event only having (22280, 2), (22281, 2), (22282, 3), (22283, 1)
  • a deviation from a known log event may be detected upon determining an occurrence of a log event having (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), (34921, 292), (34927, 395).
  • log analytics module 132 may determine an anomaly in event volume based on one or more threshold values. As described earlier in conjunction with FIG. 3A , log analytics module 132 may detect when a number of events occurring per unit of time exceeds or falls below a threshold value. For example, log analytics module 132 may determine an occurrence of an anomaly in event volume when the number of events occurring per hour exceeds 500 events per hour (suggesting over-activity), or falls below 5 events per hour (suggesting inactivity). In some embodiments, a threshold value may be associated with a particular event type, such that occurrences of that particular event type that exceeds the threshold value may be flagged as an anomaly. The threshold values may be pre-determined, as well as configurable by a user.
  • the threshold values may be dynamically determined based on the performance history of the computing system, for example, using a weighted moving average, or other suitable heuristics.
  • the threshold values may be specified in a variety of manners, including absolute numerical values (e.g., 500 events/hr), and relative values, such as percentages (e.g., 200% change).
  • log analytics module 132 may present the detected anomaly, as well as the classified message types and event types, to a user via a graphical user interface.
  • the graphical user interface may provide charts, graphics, and statistical displays to illustrate a most frequent event over a past week, or an anomalous event occurring in a last 1-hour period.
  • log analytics module 132 may use frequency of log events and anomaly detection to generate an alert for an operator (e.g., system administrator) that the frequency of a particular log message type has increased or decreased in an anomalous way.
  • embodiments of the present disclosure provide a technique for processing log data that enables real-time analysis that is scalable for the multitude of log data generated by many software and infrastructure components of a computer system 100 .
  • embodiments described herein advantageously reduces the need for multiple passes over the same dataset or the need for active intervention in the form of feedback and training to properly analyze data.
  • Embodiments of the present disclosure provide a system for unsupervised, approximate clustering of log data that provides volume- and pattern-based anomaly detection.
  • the various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities which usually, though not necessarily, take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the disclosure may be useful machine operations.
  • one or more embodiments of the disclosure also relate to a device or an apparatus for performing these operations.
  • the apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
  • various general purpose machines may be used with computer programs written in accordance with the description provided herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • One or more embodiments of the present disclosure may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media.
  • the term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system; computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer.
  • Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD-ROM (Compact Disc-ROM), a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices.
  • NAS network attached storage
  • read-only memory e.g., a flash memory device
  • CD-ROM Compact Disc-ROM
  • CD-R Compact Disc-ROM
  • CD-RW Compact Disc-RW
  • DVD Digital Versatile Disc
  • magnetic tape e.g., DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices.
  • the computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.

Abstract

Large amounts of unstructured log data generated by software and infrastructure components of a computing system are processed and analyzed in real time to identify anomalies and potential problems within the computing system. A log analytics module reduces both the volume and level of detail of log data by first classifying log messages into message types based on their content similarity. The log analytics module may then further reduce data by grouping bursts of log messages into log events. Patterns within these log events, such as the collection and number of different message types that comprise the event, can be used to identify anomalous events.

Description

    BACKGROUND
  • System administrators provide virtualized computing infrastructure, which typically includes a plurality of virtual machines executing on a shared set of physical hardware components, to offer highly available, fault-tolerant distributed systems. However, a large-scale virtualized infrastructure may have many (e.g., thousands) of virtual machines running on many of physical machines. High availability requirements provide system administrators with little time to diagnose or bring down parts of infrastructure for maintenance. Fault-tolerant features ensure the virtualized computing infrastructure continues to operate when problems arise, but generates many intermediate states that have to be reconciled and addressed. As such, identifying, debugging, and resolving failures and performance issues for virtualized computing environments have become increasingly challenging.
  • Many software and hardware components generate log data to facilitate technical support and troubleshooting. However, over an entire virtualized computing infrastructure, massive amounts of unstructured log data can be generated continuously by every component of the virtualized computing infrastructure. As such, finding information within the log data that identifies problems of virtualized computing infrastructure is difficult, due to the overwhelming volume of unstructured log data to be analyzed.
  • SUMMARY
  • One or more embodiments disclosed herein provide a method for providing real-time analysis of log messages for a computer infrastructure. The method includes receiving a plurality of log messages including a first log message, and generating a sketch associated with the first log message. The sketch may be generated based on words contained in the first log message. The method further includes determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index. Log messages of a same message type have similar sketches. The method includes determining a first log event associated with one or more log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the associated log messages. The method further includes determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index, and determining an anomalous log event within the plurality of log messages based on the classification for the first log event.
  • Further embodiments of the present disclosure include a non-transitory computer-readable storage medium that includes instructions that enable a processing unit to implement one or more of the methods set forth above or the functions of the computer system set forth above.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • So that the manner in which the above recited aspects are attained and can be understood in detail, a more particular description of embodiments of the disclosure, briefly summarized above, may be had by reference to the appended drawings.
  • FIG. 1A depicts a block diagram that illustrates a computing system with which one or more embodiments of the present disclosure may be utilized.
  • FIG. 1B is a block diagram that illustrates a virtualized computing system with which one or more embodiments of the present disclosure may be utilized.
  • FIG. 2 is a block diagram that illustrates a workflow for analyzing log data of the computing system, according to one embodiment of the present disclosure.
  • FIGS. 3A-3B are block diagrams that depict examples of event pattern and event volume anomalies, according to one embodiment of the present disclosure.
  • FIG. 4 is a flow diagram that illustrates steps for a method for analyzing log data of a computing system, according to an embodiment of the present disclosure.
  • DETAILED DESCRIPTION
  • One or more embodiments disclosed herein provide methods, systems, and computer programs for analyzing log data for a computing infrastructure in real-time. In one embodiment, log data, sometimes referred to as runtime logs, error logs, debugging logs, is reduced in both volume and level of detail by first classifying messages into types by content similarity. The log data is then reduced further by grouping bursts of messages into log events. Patterns in log events, such as the collection and number of different messages types that comprise each log event, can be used to identify anomalous events within the log data. For example, patterns in the log events may be used to detect when log events occur that differ in message type composition, or when log events occur that differ in frequency of occurrence over time.
  • FIG. 1A is a block diagram that illustrates a computing system 100 with which one or more embodiments of the present invention may be utilized. As illustrated, computing system 100 includes a plurality of server systems, identified as server system 102-1, 102-2, 102-3, and referred to collectively as servers 102. Each server 102 includes CPU 104, memory 106, networking interface 110, storage interface 114, and other conventional components of a computing device. Each server 102 further includes an operating system 120 configured to manage execution of one or more applications 122 using the computing resources (e.g., CPU 104, memory 106, networking interface 110, storage interface 114).
  • As mentioned earlier, software and infrastructure components of computing system 100 including servers 102, operating systems 120, and applications 122 running on top of operating system 120, may generate log data during operation. Log data may indicate the state, and state transitions, that occur during operation, and may record occurrences of failures, as well as unexpected and undesirable events. In one embodiment, log data may be unstructured text comprised of a plurality of log messages, including status updates, error messages, stack traces, and debugging messages. With thousands to millions of different processes running in a complex computing environment, an overwhelming large volume of heterogeneous log data, having varying syntax, structure, and even language, may be generated. As such, finding log messages relevant to the context of a particular issue, as well as proactively identifying emerging issues from log data, can be challenging.
  • Accordingly, embodiments of the present disclosure provide a log analytics module 132 configured to store and analyze in real-time log data 134 from software and infrastructure components of computing system 100. Log analytics module 132 may include a log index 136 configured to cache (and later query) results of the analysis of log data. Log analytics module 132 reduces the volume and level of details of the log data to enable a user (e.g., system administrator) to diagnose and troubleshoot issues within computing system 100.
  • In one embodiment, log analytics module 132 is configured to parse a stream of log messages within log data and identify groups of log messages as logical events, referred to interchangeably as “log events” or “events”. To do so, log analytics module 132 is configured to classify log messages within a stream of log data as log message types that cluster together similar log messages. Log analytics module 132 is further configured to perform event detection on log messages within log data to group together log messages based on their occurrence close in time in a sequence. As described later, in one embodiment, events may be defined as a collection of log message types, and an occurrence of an event corresponds to a group of log messages having the requisite log message types appearing in log data. Log analytics module 132 may further identify anomalies within log data based on the message-type classifications and detected events, such as event volume anomalies and event pattern anomalies.
  • Through analysis techniques described herein, log analytics module 132 indicates to a user what one event (as reported by log messages) means in relation to other events in the log data and highlights events occurring within computing system 100 in context. In some embodiments, log analytics module 132 may highlight certain events in the context of being nearby in time to other events, such that if the certain events usually occur in a sequence, then events occurring out of that sequence may be notable. In some embodiments, log analytics module 132 may highlight certain events in the context of being similar to other events, such that similar events may be clustered and analyzed together rather than be considered separately. In some embodiments, log analytics module 132 may highlight certain events in the context of the hierarchical infrastructure of computing system 100, such as being from the same thread, process, application, virtual machine, host, host group, data center, etc. The operations of log analytics module 132 are illustrated in greater detail in conjunction with FIG. 2.
  • While embodiments of the present invention are described in conjunction with a computing environment having physical components, it should be recognized that log data 134 may be generated by components of other alternative computing architectures, including a virtualized computing system as shown in FIG. 1B. FIG. 1B is a block diagram that illustrates a computing system 150 with which one or more embodiments of the present disclosure may be utilized. As illustrated, computing system 150 includes a host group 124 of host computers, identified as hosts 108-1, 108-2, 108-3, and 108-4, and referred to collectively as hosts 108. Each host 108 is configured to provide a virtualization layer that abstracts computing resources of a hardware platform 118 into multiple virtual machines (VMs) 112 that run concurrently on the same host 108. Hardware platform 118 of each host 108 may include conventional components of a computing device, such as a memory, processor, local storage, disk interface, and network interface. The VMs 112 run on top of a software interface layer, referred to herein as a hypervisor 116, that enables sharing of the hardware resources of host 108 by the virtual machines. One example of hypervisor 116 that may be used in an embodiment described herein is a VMware ESXi hypervisor provided as part of the VMware vSphere solution made commercially available from VMware, Inc. Hypervisor 116 may run on top of the operating system of host 108 or directly on hardware components of host 108. Each VM 112 includes a guest operating system (e.g., Microsoft Windows, Linux) and one or more guest applications and processes running on top of the guest operating system.
  • In the embodiment shown in FIG. 1B, computing system 150 includes virtualization management software 130 that may communicate with the plurality of hosts 108 via network 140. Virtualization management software 130 is configured to carry out administrative tasks for the computing system 100, including managing hosts 108, managing VMs running within each host 108, provisioning VMs, migrating VMs from one host to another host, and load balancing between hosts 108 of host group 124. In one embodiment, virtualization management software 130 is a computer program that resides and executes in a central server, which may reside in computing system 100, or alternatively, running as a VM in one of hosts 108. One example of a virtualization management software is the vCenter® Server product made available from VMware, Inc. Similar to the software and infrastructure components of computing system 100, the software and infrastructure components of computing system 100, including, host group(s) 124, hosts 108, VMs 112 running on hosts 108, guest operating systems, applications, and processes running within VMs, may generate large amount of log data in real-time during operation.
  • While log analytics module 132 is depicted in FIG. 1B as a separate component that resides and executes on a separate server or virtual machine, it is appreciated that log analytics module 132 may alternatively reside in any one of the computing devices of the virtualized computing system 150, for example, such as the same central server where the virtualization management software 130 resides. In one embodiment, log analytics module 132 may be embodied as a plug-in component configured to extend functionality of virtualization management software 130. Access to the log analytics module 132 can be achieved via a client application (not shown). For example, each analysis task, such as searching for log messages, filtering for log messages, analyzing log messages over a period of time, can be accomplished through the client application. One embodiment provides a stand-alone application version of the client application. In another embodiment, the client application is implemented as a web browser application that provides management access from any networked device.
  • FIG. 2 is a block diagram that illustrates a workflow for analyzing log data 134 of a computing infrastructure, according to one embodiment of the present disclosure. It should be recognized that, even though the workflow is described in conjunction with the system of FIG. 1A, any system configured to perform the illustrated technique is within the scope of embodiments of the disclosure. In the embodiment shown, log data 134 may include a plurality of individual log messages 202-1 to 202-5 (collectively referred to as log messages 202) generated over a period of time. In some embodiments, a log message may include a time stamp (e.g., “Sep 23 13:30”) indicating a date and time corresponding to the creation of the log message and a text description (e.g., “host1 sending 5738 files”). While each log message 202 is depicted as a separate line of text for sake of illustration, it should be recognized that log messages 202 may be arranged in a variety of formats, including log messages that span several lines.
  • In one embodiment, log analytics module 132 may classify each log message 202 as a message type based on content similarity of the log messages. In some embodiments, the content similarity is performed on the text description portion of the log message 202. In the example shown in FIG. 2, log analytics module 132 processes log message 202-1 (i.e., “Sep 23 13:30 host1 sending 5738 files”) and assigns log message 202-1 a first message type 204-1. Log analytics module 132 then processes a second log message 202-2 (i.e., “Sep 23 13:31 host2 received 5700 files”) and determines the contents of second log message 202-2 are not sufficiently similar to first log message 202-1 and assigns a different, second message type 204-2. Similarly, log analytics module 132 processes a third log message 202-3 (i.e., “Sep 23 13:32 host1 warning: 38 files pending”) and assigns a third message type 204-3 upon determining no content similarity with the other already processed log messages. For sake of illustration, log messages having different message types are depicted in FIG. 2 as shapes having different patterns. In one embodiment, log analytics module 132 may determine content similarity of log messages according to a “sketching” algorithm that determines if log messages contain a number of words in common in the same relative position. Determination of content similarity and the sketching algorithm are described in greater detail below.
  • Continuing the example shown in FIG. 2, log analytics module 132 processes a fourth log message 202-4 (i.e., “Sep 23 14:00 host4 sending 382 files”) and determines content similarity with log message 202-1. As such, log analytics module 132 assigns log message 202-4 the same first message type 204-1 as log message 202-1, as depicted in FIG. 2 by identical patterned highlights or colors. Similarly, log analytics module 132 processes a fifth log message 202-5 (i.e., “Sep 23 14:01 host5 received 382 files”) and assigns the second message type 204-2 based on a determination of content similarity with log message 202-2.
  • In one embodiment, log analytics module 132 is configured to identify one or more log events 206 based on the timing of the log messages. In some embodiments, log analytics module 132 may group one or more log messages 202 into log events 206 according to a burst analysis algorithm. For example, log analytics module 132 identifies a first log event 206-1 that includes log messages 202-1, 202-2, 202-3, which all occur approximately the same time at September 23, 13:30 and a second log event 206-2 that includes log messages 202-4, 202-5 that all occur around September 23 14:00. In one embodiment, log analytics module 132 is configured to represent each identified log event 206 as a composition of message types of log messages. In some embodiments, an event type for a log event may be defined as a composition of tuples of message type and frequency. In the example shown in FIG. 2, a first event 206-1 may be characterized as a composition of one occurrence of message type 204-1 (e.g., “Sending . . . files”), one occurrence of message type 204-2 (e.g., “Received . . . files”), and one occurrence of message type 204-3 (e.g., “Warning . . . files pending”); and second event 206-2 may be characterized as a composition of one occurrence of message type 204-1 (e.g., “Sending . . . files”) and one occurrence of message type 204-2 (e.g., “Received . . . files”).
  • According to one embodiment, log analytics module 132 may identify anomalous events based on patterns of events from log data 134, as shown in FIGS. 3A and 3B. FIG. 3A is a chart 300 depicting an example of an event volume anomaly based on frequency of occurrence of events over time. Log analytics module 132 may determine the number of events occurring per hour in a given time period, e.g., from 6:00 PM to 9:00 PM. Chart 300 further illustrates a breakdown of event types for each hour, depicting occurrences of events similar to log events 206-1 and 206-2. As an example, it may be normal within the computing system for approximately 20 events per hour to occur. But, a sudden increase of events to 200 events per hour (e.g., at 19:00) and then to 500 events per hour (e.g., at 20:00), thereby exceeding some threshold value 302, can trigger log analytics module 132 to flag this as an anomalous occurrence of event volume.
  • FIG. 3B depicts an example of an event pattern anomaly based on events that are different in message type composition. As shown, events 304 occurring at a given time are usually an event type similar to event 206-1 (i.e., events comprised of “Sending . . . files” log messages and “Received . . . files” log messages). However, an unexpected or atypical event 306 may occur, such as event 306, which is an event comprised of “Sending . . . files” log messages, “Received . . . files” log messages, and “Warning . . . files pending” log messages, which is different from the usual events. In this case, log analytics 132 may determine an anomalous occurrence of a log event 306 (i.e., composed of message types 204-1, 204-2, and 204-3), that is different in composition from other log events (i.e., composed of message types 204-1 and 204-2).
  • FIG. 4 is a flow diagram that illustrates steps for a method 400 for providing real-time analysis of log messages for a computer infrastructure, according to an embodiment of the present disclosure. It should be recognized that, even though the method 400 is described in conjunction with the system of FIG. 1, any system configured to perform the method steps is within the scope of embodiments of the disclosure.
  • The method 400 begins at step 402, where log analytics module 132 receives a stream of log data 134 generated by software and infrastructure components of computing system 100. As described above, log data 134 may include a plurality of log messages. In some embodiments, log analytics module 132 may be configured to retrieve log data (e.g., log files) from software and infrastructure components of computing system 100, including applications 122, operation systems 120, and in the case of virtualized computing system 150, components such as hypervisors 116, guest application and operating systems running within VMs 112. In other embodiments, software and infrastructure components of computing system 100 may be configured to write log files to a common destination, such as an external storage, from which log analytics module 132 may periodically retrieve log data. In some embodiments, log data 134 may be transferred over network 140 directly to log analytics module 132.
  • At step 404, log analytics module 132 generates a compact integer representation, or “sketch,” of text content for a log message in the received log data. In one embodiment, a sketch associated with a log message is generated based on words of the log message. As mentioned above, two log messages may be considered similar if the log messages contain a number of words in common in the same relative positions. As such, sketches of log messages are computed such that similar log messages should have identical or substantially similar sketches. In one embodiment, a sketch of a log message may be an ordered list, or tuple, of fingerprint values corresponding to a subset of the words of the log message.
  • In some embodiments, a sketch of a log message is tuple of fingerprints of “interesting” words of the log message. Each interesting word of the log message (e.g., “host1”) can be given a value (e.g., 753) using a fingerprint function, such as a hash function. For example, a sketch generated for a log message “host1 sending 5738 files” may be a tuple of fingerprint values (753, 1034, 886) that corresponds to interesting words (host1, Sending, files). In another example, a sketch for the log message “host4 Sending 382 files” can be computed as the tuple (1965, 1034, 886) that corresponds to interesting features (host4, Sending, files). As such, because the sketches (753, 1034, 886) and (1965, 1034, 886) have identical values “1034” and “886” in same relative positions, the two log messages may be deemed similar.
  • In one implementation, sketches of log messages may be generated according to a sketching algorithm that uses N independent scoring functions to pick N “interesting” words of a log message, where “interesting” is determined according to each scoring function. In some embodiments, a scoring function is a hash function that computes a 32-bit integer given a word. In such a scheme, a sketch may be composed of 32-bit fingerprints of the most interesting words in a log message, where “most interesting” is determined by N scoring functions (e.g., N=8):
  • Score 1 ( Word ) = ( M 1 * Fingerprint ( Word ) + A 1 ) mod 2 32 Score 2 ( Word ) = ( M 2 * Fingerprint ( Word ) + A 2 ) mod 2 32 Score N ( Word ) = ( M N * Fingerprint ( Word ) + A N ) mod 2 32
  • The parameters MN and AN for each scoring function may be selected such that the scoring functions are linearly independent (i.e., Σi=0 N(Ci*Scorei(word))=0 only if Ci are zeros) and the different scores for a particular word are uncorrelated. Log analytics module 132, for each scoring function, scores each word in the log message and selects the word having with the highest score (i.e., “most interesting”), according to that scoring function. As each scoring function selects one word in the log message, N scoring functions results in N words being selected. The fingerprints of these N words are then combined to form a sketch of the log message.
  • For example, the log message “host1 sending 7182 files using SFTP protocol” may scored in the following manner by N=4 scoring functions, where the most interesting word for each scoring function is emphasized:
      • score1: host1 sending 7182 files using SFTP protocol
      • Score2: host1 sending 7182 files using SFTP protocol
      • Score3: host1 sending 7182 files using SFTP protocol
      • Score4: host1 sending 7182 files using SFTP protocol
        In this example, the four scoring functions determined that the most interesting words were “host1”, “sending,” “files,” and “sending” (again). The word “host1” had the highest score of the 6 words in the log message according to the first scoring function Score′. The word “sending” had the highest score of the 6 words according to both the second and fourth scoring function, and the word “files” was the highest scoring word of the word in the log message according to the third scoring function. As such, the resulting sketch would be a 4-tuple of the fingerprints of these words as follows. For clarity, simple numerical values (e.g., 753) are shown for the fingerprint values, but it should be recognized that fingerprint values may be 32-bit values (e.g., 0x459c8cbb).
      • Fingerprint(“host1”)=753
      • Fingerprint(“sending”)=1034
      • Fingerprint(“files”)=886
      • Fingerprint(“sending”)=1034
      • Sketch1=(753, 1034, 886, 1034))
  • Continuing this example, if a similar but slightly different log message (i.e., “host4 sending 208 files using SFTP protocol”) is received and processed, the four scoring functions may score the slightly different log message similarly:
      • Score1: host4 sending 208 files using SFTP protocol
      • Score2: host4 sending 208 files using SFTP protocol
      • Score3: host4 sending 208 files using SFTP protocol
      • Score4: host4 sending 208 using SFTP protocol
        As shown, a change in the score of first word “host4” did not affect the selection of the highest scoring word for three out of the four scoring functions. It has been determined that if a majority of N independent scoring functions select the same words in two different log messages, the log messages are very likely to be similar overall. For example, in this case, the resulting sketch would be a 4-tuple of the fingerprints of these words:
      • Fingerprint(“host4”)=1965
      • Fingerprint(“sending”)=1034
      • Fingerprint(“files”)=886
      • Fingerprint(“sending”)=1034
      • Sketch2=(1965, 1034, 886, 1034)
        Comparing the sketches for the two log messages:
      • Sketch1˜Sketch2
      • (753, 1034, 886, 1034)˜(1965, 1034, 886, 1034)
        reveals three out of four fingerprint values in common (i.e., “1034”, “886”, and “1034”). As such, a majority of the scoring functions have selected the same words “sending”, “files”, and “sending” in both log messages, and therefore the two log messages may be deemed similar.
  • While other approaches for selecting words in a log message may be used, such as choosing the first few words of a log message or selecting even-numbered words, or other content-insensitive schemes, the sketching algorithm as described herein is advantageously more robust to relative insertions or deletions of text. It has been determined that the insertion or deletion of an additional word relative to the original text is unlikely to change all or even a majority of the words selected by each scoring function. In one embodiment, a linear congruential generator (LCG) may be used as a scoring function, though it should be recognized that other types of scoring functions can be used, including functions that are deterministic and produce uncorrelated results.
  • Log analytics module 132 then determines a message type classification for the log message based on the corresponding sketch for the log message. Log analytics module 132 classifies log messages having similar sketches to have the same message type. Such clustering helps reduce the number of log messages that need to be examiner by grouping the messages into a few number of message types that can then be highlighted. Accordingly, message type classification enable log analytics module to cluster together similar log messages to more effectively process and analyze a large volume of log data.
  • At step 406, log analytics module 132 queries log index 136 to determine whether the log message is similar to a previously processed log message based on the corresponding sketches, and if so, assigns the log message a same message type as the previously processed log message, at step 408. For example, log analytics module 132 queries log index 136 using the sketch (1965, 1034, 886, 1034) corresponding to the log message “host4 sending 208 files using SFTP protocol” and determines the log messages is similar to the previously processed log message “host1 sending 7182 files using SFTP protocol” based on the similarity with its corresponding sketch (753, 1034, 886, 1034). As discussed earlier, in some embodiments, two log messages may be deemed similar and assigned a same message type if a majority of the scoring functions have selected the same words in both log messages. Otherwise, at step 410, log analytics module 132 assigns a new message type to the log message and inserts the log message into log index 136.
  • In one embodiment, each message type may be represented by a message type identifier, or “cluster ID.” For example, the log messages depicted in FIG. 2 may have the following sketches and corresponding cluster IDs (the sketches are shown as tuples of the most interesting words rather than the fingerprint values for clarity of illustration):
  • ( host 1 , sending , files , sending ) = 22280 ( host 2 , received , files , files ) = 22281 ( host 1 , warning , files , pending ) = 2282 ( host 4 , sending , files , sending ) = 22280
  • In this example, the sketch (host1, sending, -Files, sending) is given the same cluster ID 22280 as the sketch (host4, sending, -Files, sending), because of matching 3 out of 4 fingerprint values. In some embodiments, log analytics module 132 may provide the ability to search the received log messages based on a given cluster ID. In some embodiments, log analytics module 132 may use cluster ID as a search criteria for log messages that are similar to a particular log message (i.e., “find log messages “like this”) by querying for log messages having a particular cluster ID. In some embodiments, log analytics module 132 may use the cluster ID as a criteria for aggregation to generate statistics, such as the Top-5 message types per hour. In some embodiments, cluster ID may be content-based and enable calculation of message type classifications to be distributed.
  • According to one implementation, log index 136 may include one or more hash tables that map fingerprint values to sketches for a given log message. In some embodiments, log index 136 may include N hash tables for mapping fingerprint values to sketches that contain N fingerprint values. To determine whether a log message is similar to other log messages, each fingerprint value in the sketch (i.e., each column in the tuple <1965, 1034, 886, 1034>) may be used to search for candidate sketches. In one particular embodiment, a candidate sketch must match in M different columns to be considered a match, where M is less than N. As an example, where N=8, each of the 8 fingerprints in a sketch is looked up in its corresponding hash table to find candidate sketches with at least 6 matching fingerprints (M=60). If at least one candidate is found, the incoming log message belongs to that cluster and is assigned a same message type, and the sketch is not inserted into the log index. If no candidate are found, a new cluster is generated having a new message type, and the sketch is inserted into log index 136.
  • In some embodiments, log analytics module 132 may store a representation of each message type within log index 136 by storing a copy of a full log message. Log analytics module 132 may use a textual differential algorithm (e.g., longest substring match) or other additional textual analysis to verify similarity of the incoming log message to a representative of the message type and override message type classification based on poor sketches. In some embodiments, the stored representation of each message type may be used to provide an example log message that is displayed to a user (e.g., system administrator) when presenting the statistics or graphical charts for the message type.
  • At step 412, log analytics module 132 divides one or more log messages into log events based on burst analysis. It has been determined that log messages corresponding to events within computer system 100 may be created in bursts and close-in-time. For example, a burst of log messages may be recorded by applications and guest operating system whenever a virtual machine shuts down or restarts. In one embodiment, log analytics module 132 processes time stamps of log messages 202 and tracks time between log messages. In some embodiments, log analytics module 132 may determine and maintain an average time interval associated with an event duration. For example, log messages occurring within a 10-second duration may be candidates for being grouped together as a single log event. Log analytics module 132 may associate one or more log messages occurring within the event duration to a log event 206. Log analytics module 132 may represent each log event as a composition of different message types, such as a list of tuples of a message type and corresponding frequency of occurrence. For example, one log event may be comprised of log messages having an occurrence of a “sending files” message type, two occurrences of a “received files” message type, and one occurrence of a “warning files pending” message type, and may be represented by a list of pairs having cluster ID and frequency: (22280, 1), (22281, 2), (22282, 1).
  • Log analytics module 132 may then cluster together similar log events, applying a technique similar to the technique applied above for clustering similar log messages. At step 414, log analytics module 132 queries log index 136 to determine whether a log event is similar to other log events based on the composition of message types that comprise the log event, and if so, assigns a same event type as the previously determined log events, at step 418. Otherwise, at step 416, log analytics module 132 assigns a new event type to the log event, and may insert the composition of the new event type into log index 136.
  • In one implementation, log index 136 may further include additional hash tables that map cluster IDs to compositions of event types for a given log event. As such, to determine whether a log event is similar to other log events, each cluster ID may be used as a hash table lookup for candidate compositions that have some or all matching cluster IDs. In some embodiments, the event type of a log event is determined by performing lookups in the hash tables according to each pair of message type identifier and a corresponding frequency of occurrence. If at least one candidate event type is found, the detected log event may be determined similar to the corresponding log event and may be assigned the same event type. If no candidate is found, a new event cluster is generated having a new event type, and the representative composition of message types is inserted into log index 136.
  • At step 420, log analytics module 132 analyzes event clusters and detects an anomaly within event clusters based on the classification of log events. In some embodiments, log analytics module 132 may determine an occurrence of an “incomplete” event or a gross deviation from an expected event. For example, where an expected log event may be a composition of message types (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), an incomplete log event may be detected upon determining an occurrence of a log event only having (22280, 2), (22281, 2), (22282, 3), (22283, 1) In another example, a deviation from a known log event may be detected upon determining an occurrence of a log event having (22280, 2), (22281, 2), (22282, 3), (22283, 1), (22284, 1), (34921, 292), (34927, 395).
  • In some embodiments, log analytics module 132 may determine an anomaly in event volume based on one or more threshold values. As described earlier in conjunction with FIG. 3A, log analytics module 132 may detect when a number of events occurring per unit of time exceeds or falls below a threshold value. For example, log analytics module 132 may determine an occurrence of an anomaly in event volume when the number of events occurring per hour exceeds 500 events per hour (suggesting over-activity), or falls below 5 events per hour (suggesting inactivity). In some embodiments, a threshold value may be associated with a particular event type, such that occurrences of that particular event type that exceeds the threshold value may be flagged as an anomaly. The threshold values may be pre-determined, as well as configurable by a user. In some embodiments, the threshold values may be dynamically determined based on the performance history of the computing system, for example, using a weighted moving average, or other suitable heuristics. The threshold values may be specified in a variety of manners, including absolute numerical values (e.g., 500 events/hr), and relative values, such as percentages (e.g., 200% change). In some embodiments, log analytics module 132 may present the detected anomaly, as well as the classified message types and event types, to a user via a graphical user interface. For example, the graphical user interface may provide charts, graphics, and statistical displays to illustrate a most frequent event over a past week, or an anomalous event occurring in a last 1-hour period. In one embodiment, log analytics module 132 may use frequency of log events and anomaly detection to generate an alert for an operator (e.g., system administrator) that the frequency of a particular log message type has increased or decreased in an anomalous way.
  • Accordingly, embodiments of the present disclosure provide a technique for processing log data that enables real-time analysis that is scalable for the multitude of log data generated by many software and infrastructure components of a computer system 100. In contrast to conventional approaches, embodiments described herein advantageously reduces the need for multiple passes over the same dataset or the need for active intervention in the form of feedback and training to properly analyze data. Embodiments of the present disclosure provide a system for unsupervised, approximate clustering of log data that provides volume- and pattern-based anomaly detection.
  • Although one or more embodiments of the present disclosure have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
  • The various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities which usually, though not necessarily, take the form of electrical or magnetic signals where they, or representations of them, are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the disclosure may be useful machine operations. In addition, one or more embodiments of the disclosure also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the description provided herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. One or more embodiments of the present disclosure may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data which can thereafter be input to a computer system; computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD-ROM (Compact Disc-ROM), a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
  • Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the disclosure(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s).

Claims (20)

What is claimed is:
1. A method for providing real-time analysis of log messages for a computer infrastructure, the method comprising:
receiving a plurality of log messages including a first log message;
generating a sketch associated with the first log message, wherein the sketch is generated based on words of the first log message;
determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index, wherein log messages of a same message type have similar sketches;
determining a first log event associated with one or more of the plurality of log messages occurring with a time interval, wherein the first log event comprises a first composition of message types corresponding to the one or more of the plurality of log messages associated with the first log event;
determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index; and
determining an anomalous log event within the plurality of log messages based on the event type for the first log event.
2. The method of claim 1, wherein the sketch comprises a tuple of fingerprint values corresponding to a subset of the words of the first log message.
3. The method of claim 1, wherein log events of a same event type have similar compositions of message types.
4. The method of claim 1, wherein the determining the event type for the first log event comprises:
upon determining the first composition of message types is similar to at least one of the plurality of compositions of message types, assigning the event type to be a same event type corresponding to the similar composition of message types.
5. The method of claim 1, wherein the determining the event type of the first log event comprises:
upon determining the first composition of message types is not similar to any of the plurality of compositions of message types, assigning a new event type for the first log event and inserting the first composition of message types into the index.
6. The method of claim 1, wherein the first composition of message types for the first log event comprises a list of pairs of a message type identifier and corresponding frequency of occurrence within the first log event.
7. The method of claim 6, wherein the index comprises a plurality of hash tables configured to map compositions of message types to event types; and
wherein the determining the event type of the first log event comprises performing lookups in the plurality of hash tables according to each pair of message type identifier and corresponding frequency of occurrence.
8. The method of claim 1, wherein the determining the anomalous log event within the plurality of log messages further comprises:
determining the anomalous log event that differs in composition of message types based on the event type for the first log event.
9. The method of claim 1, wherein the determining the anomalous log event within the plurality of log messages further comprises:
determining the anomalous log event based on the event type for the first log event and further based on frequency of occurrence over time.
10. A non-transitory computer-readable storage medium comprising instructions that, when executed in a computing device, providing real-time analysis of log messages for a computer infrastructure, by performing the steps of:
receiving a plurality of log messages including a first log message;
generating a sketch associated with the first log message, wherein the sketch is generated based on words of the first log message;
determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in an index, wherein log messages of a same message type have similar sketches;
determining a first log event associated with one or more of the plurality of log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the one or more of the plurality of log messages associated with the first log event;
determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index; and
determining an anomalous log event within the plurality of log messages based on the event type for the first log event.
11. The non-transitory computer-readable storage medium of claim 10, wherein the sketch comprises a tuple of fingerprint values corresponding to a subset of the words of the first log message.
12. The non-transitory computer-readable storage medium of claim 10, wherein log events of a same event type have similar compositions of message types.
13. The non-transitory computer-readable storage medium of claim 10, wherein the determining the event type for the first log event comprises:
upon determining the first composition of message types is similar to at least one of the plurality of compositions of message types, assigning the event type to be a same event type corresponding to the similar composition of message types.
14. The non-transitory computer-readable storage medium of claim 10, wherein the determining the event type of the first log event comprises:
upon determining the first composition of message types is not similar to any of the plurality of compositions of message types, assigning a new event type for the first log event and inserting the first composition of message types into the index.
15. The non-transitory computer-readable storage medium of claim 10, wherein the first composition of message types for the first log event comprises a list of pairs of a message type identifier and corresponding frequency of occurrence within the first log event.
16. The non-transitory computer-readable storage medium of claim 15, wherein the index comprises a plurality of hash tables configured to map compositions of message types to event types; and
wherein the determining the event type of the first log event comprises performing lookups in the plurality of hash tables according to each pair of message type identifier and corresponding frequency of occurrence.
17. The non-transitory computer-readable storage medium of claim 10, wherein the determining the anomalous log event within the plurality of log messages further comprises:
determining the anomalous log event that differs in composition of message types based on the event type for the first log event.
18. The non-transitory computer-readable storage medium of claim 10, wherein the determining the anomalous log event within the plurality of log messages further comprises:
determining the anomalous log event based on the event type for the first log event and further based on frequency of occurrence over time.
19. A computer system for providing real-time analysis of log messages for a computer infrastructure, the computer system comprising:
a system memory;
a storage device having (i) a plurality of log messages including a first log message and (ii) an index having a plurality of sketches and compositions of message types; and
a processor programmed to carry out the steps of:
generating a sketch associated with the first log message, wherein the sketch is generated based on words of the first log message;
determining a message type for the first log message based on a comparison of the generated sketch to a plurality of sketches stored in the index, wherein log messages of a same message type have similar sketches;
determining a first log event associated with one or more of the plurality of log messages occurring with a first time interval, wherein the first log event comprises a first composition of message types corresponding to the one or more of the plurality of log messages associated with the first log event;
determining an event type for the first log event based on a comparison of the first composition of message types to a plurality of compositions of message types stored in the index; and
determining an anomalous log event within the plurality of log messages based on the event type for the first log event.
20. The computer system of claim 19, wherein the processor programmed to carry out the step of determining the event type for the first log event is further programmed to carry out the steps of:
upon determining the first composition of message types is similar to at least one of the plurality of compositions of message types, assigning the event type to be a same event type corresponding to the similar composition of message types.
US13/897,994 2013-05-20 2013-05-20 Scalable log analytics Active 2033-10-28 US9244755B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/897,994 US9244755B2 (en) 2013-05-20 2013-05-20 Scalable log analytics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/897,994 US9244755B2 (en) 2013-05-20 2013-05-20 Scalable log analytics

Publications (2)

Publication Number Publication Date
US20140344622A1 true US20140344622A1 (en) 2014-11-20
US9244755B2 US9244755B2 (en) 2016-01-26

Family

ID=51896800

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/897,994 Active 2033-10-28 US9244755B2 (en) 2013-05-20 2013-05-20 Scalable log analytics

Country Status (1)

Country Link
US (1) US9244755B2 (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150094959A1 (en) * 2013-10-02 2015-04-02 Nec Laboratories America, Inc. Heterogeneous log analysis
US9104573B1 (en) * 2013-09-16 2015-08-11 Amazon Technologies, Inc. Providing relevant diagnostic information using ontology rules
US20150227598A1 (en) * 2014-02-13 2015-08-13 Amazon Technologies, Inc. Log data service in a virtual environment
US20150370799A1 (en) * 2014-06-24 2015-12-24 Vmware, Inc. Method and system for clustering and prioritizing event messages
US20150370885A1 (en) * 2014-06-24 2015-12-24 Vmware, Inc. Method and system for clustering event messages and managing event-message clusters
US20160098485A1 (en) * 2014-10-05 2016-04-07 Splunk Inc. Field Value Search Drill Down
US20160132566A1 (en) * 2014-11-10 2016-05-12 Red Hat, Inc. Native federation view suggestion
CN105824744A (en) * 2016-03-21 2016-08-03 焦点科技股份有限公司 Real-time log collection and analysis method on basis of B2B (Business to Business) platform
US20160224531A1 (en) 2015-01-30 2016-08-04 Splunk Inc. Suggested Field Extraction
US20160259869A1 (en) * 2015-03-02 2016-09-08 Ca, Inc. Self-learning simulation environments
US20160357960A1 (en) * 2015-06-03 2016-12-08 Fujitsu Limited Computer-readable storage medium, abnormality detection device, and abnormality detection method
US9524397B1 (en) 2015-07-06 2016-12-20 Bank Of America Corporation Inter-system data forensics
US20170004188A1 (en) * 2015-06-30 2017-01-05 Ca, Inc. Apparatus and Method for Graphically Displaying Transaction Logs
US20170063762A1 (en) * 2015-09-01 2017-03-02 Sap Portals Israel Ltd Event log analyzer
US20170068709A1 (en) * 2015-09-09 2017-03-09 International Business Machines Corporation Scalable and accurate mining of control flow from execution logs across distributed systems
US9633106B1 (en) 2011-06-30 2017-04-25 Sumo Logic Log data analysis
US9639443B2 (en) * 2015-03-02 2017-05-02 Ca, Inc. Multi-component and mixed-reality simulation environments
US20170134408A1 (en) * 2015-11-10 2017-05-11 Sap Se Standard metadata model for analyzing events with fraud, attack, or any other malicious background
US20170139766A1 (en) * 2015-11-16 2017-05-18 International Business Machines Corporation Management of computing machines with troubleshooting prioritization
WO2017087437A1 (en) * 2015-11-17 2017-05-26 Nec Laboratories America, Inc. Fast pattern discovery for log analytics
US9740755B2 (en) 2014-09-30 2017-08-22 Splunk, Inc. Event limited field picker
US9842160B2 (en) 2015-01-30 2017-12-12 Splunk, Inc. Defining fields from particular occurences of field labels in events
US9916346B2 (en) 2015-01-30 2018-03-13 Splunk Inc. Interactive command entry list
US9922084B2 (en) 2015-01-30 2018-03-20 Splunk Inc. Events sets in a visually distinct display format
US9977803B2 (en) 2015-01-30 2018-05-22 Splunk Inc. Column-based table manipulation of event data
US20180144041A1 (en) * 2016-11-21 2018-05-24 International Business Machines Corporation Transaction discovery in a log sequence
WO2018118379A1 (en) * 2016-12-21 2018-06-28 Mastercard International Incorporated Systems and methods for real time computer fault evaluation
US10013454B2 (en) 2015-01-30 2018-07-03 Splunk Inc. Text-based table manipulation of event data
US20180203757A1 (en) * 2017-01-16 2018-07-19 Hitachi, Ltd. Log message grouping apparatus, log message grouping system, and log message grouping method
US10061824B2 (en) 2015-01-30 2018-08-28 Splunk Inc. Cell-based table manipulation of event data
KR101909957B1 (en) * 2018-04-03 2018-12-19 큐비트시큐리티 주식회사 Web traffic logging system and method for detecting web hacking in real time
US10185740B2 (en) 2014-09-30 2019-01-22 Splunk Inc. Event selector to generate alternate views
US10237295B2 (en) * 2016-03-22 2019-03-19 Nec Corporation Automated event ID field analysis on heterogeneous logs
WO2019066295A1 (en) * 2017-09-28 2019-04-04 큐비트시큐리티 주식회사 Web traffic logging system and method for detecting web hacking in real time
US10311171B2 (en) 2015-03-02 2019-06-04 Ca, Inc. Multi-component and mixed-reality simulation environments
CN109918349A (en) * 2019-02-25 2019-06-21 网易(杭州)网络有限公司 Log processing method, device, storage medium and electronic device
US10333805B2 (en) 2017-04-21 2019-06-25 Nec Corporation Ultra-fast pattern generation algorithm for the heterogeneous logs
US10394868B2 (en) 2015-10-23 2019-08-27 International Business Machines Corporation Generating important values from a variety of server log files
US10402428B2 (en) * 2013-04-29 2019-09-03 Moogsoft Inc. Event clustering system
US10423597B2 (en) * 2016-03-27 2019-09-24 International Business Machines Corporation Data set visualizer for tree based file systems
US10445311B1 (en) * 2013-09-11 2019-10-15 Sumo Logic Anomaly detection
US10462170B1 (en) * 2016-11-21 2019-10-29 Alert Logic, Inc. Systems and methods for log and snort synchronized threat detection
US10567409B2 (en) 2017-03-20 2020-02-18 Nec Corporation Automatic and scalable log pattern learning in security log analysis
US10664535B1 (en) 2015-02-02 2020-05-26 Amazon Technologies, Inc. Retrieving log data from metric data
US10678669B2 (en) 2017-04-21 2020-06-09 Nec Corporation Field content based pattern generation for heterogeneous logs
CN111427737A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Method and device for modifying exception log and electronic equipment
US10726037B2 (en) 2015-01-30 2020-07-28 Splunk Inc. Automatic field extraction from filed values
US10733002B1 (en) * 2016-06-28 2020-08-04 Amazon Technologies, Inc. Virtual machine instance data aggregation
US10740212B2 (en) 2017-06-01 2020-08-11 Nec Corporation Content-level anomaly detector for systems with limited memory
US10896175B2 (en) 2015-01-30 2021-01-19 Splunk Inc. Extending data processing pipelines using dependent queries
US10929765B2 (en) 2016-12-15 2021-02-23 Nec Corporation Content-level anomaly detection for heterogeneous logs
US20210064500A1 (en) * 2019-08-30 2021-03-04 Dell Products, Lp System and Method for Detecting Anomalies by Discovering Sequences in Log Entries
WO2021067858A1 (en) * 2019-10-03 2021-04-08 Oracle International Corporation Enhanced anomaly detection in computing environments
US11231840B1 (en) * 2014-10-05 2022-01-25 Splunk Inc. Statistics chart row mode drill down
US11329860B2 (en) * 2015-01-27 2022-05-10 Moogsoft Inc. System for decomposing events that includes user interface
KR20220077184A (en) * 2020-11-30 2022-06-09 가천대학교 산학협력단 System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same
US11442924B2 (en) 2015-01-30 2022-09-13 Splunk Inc. Selective filtered summary graph
US11544248B2 (en) 2015-01-30 2023-01-03 Splunk Inc. Selective query loading across query interfaces
US11604715B2 (en) * 2017-01-26 2023-03-14 International Business Machines Corporation Generation of end-user sessions from end-user events identified from computer system logs
US11615073B2 (en) 2015-01-30 2023-03-28 Splunk Inc. Supplementing events displayed in a table format
US11847480B2 (en) * 2016-06-21 2023-12-19 Amazon Technologies, Inc. System for detecting impairment issues of distributed hosts

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2513885B (en) * 2013-05-08 2021-04-07 Xyratex Tech Limited Methods of clustering computational event logs
US10055276B2 (en) 2016-11-09 2018-08-21 International Business Machines Corporation Probabilistic detect identification
US10642677B2 (en) 2017-11-02 2020-05-05 International Business Machines Corporation Log-based diagnosis for declarative-deployed applications
US11120213B2 (en) * 2018-01-25 2021-09-14 Vmware, Inc. Intelligent verification of presentation of a user interface
US11195115B2 (en) 2018-02-21 2021-12-07 Red Hat Israel, Ltd. File format prediction based on relative frequency of a character in the file
CN109343985B (en) * 2018-08-03 2021-10-22 联想(北京)有限公司 Data processing method, device and storage medium
US11403207B2 (en) * 2020-02-28 2022-08-02 Microsoft Technology Licensing, Llc. Detection of runtime errors using machine learning
US11314510B2 (en) 2020-08-14 2022-04-26 International Business Machines Corporation Tracking load and store instructions and addresses in an out-of-order processor
US11321165B2 (en) 2020-09-22 2022-05-03 International Business Machines Corporation Data selection and sampling system for log parsing and anomaly detection in cloud microservices
US11243835B1 (en) 2020-12-03 2022-02-08 International Business Machines Corporation Message-based problem diagnosis and root cause analysis
US11513930B2 (en) 2020-12-03 2022-11-29 International Business Machines Corporation Log-based status modeling and problem diagnosis for distributed applications
US11474892B2 (en) 2020-12-03 2022-10-18 International Business Machines Corporation Graph-based log sequence anomaly detection and problem diagnosis
US11797538B2 (en) 2020-12-03 2023-10-24 International Business Machines Corporation Message correlation extraction for mainframe operation
US11599404B2 (en) 2020-12-03 2023-03-07 International Business Machines Corporation Correlation-based multi-source problem diagnosis
US11403326B2 (en) 2020-12-03 2022-08-02 International Business Machines Corporation Message-based event grouping for a computing operation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223027A1 (en) * 2004-03-31 2005-10-06 Lawrence Stephen R Methods and systems for structuring event data in a database for location and retrieval
US20090113246A1 (en) * 2007-10-24 2009-04-30 Sivan Sabato Apparatus for and Method of Implementing system Log Message Ranking via System Behavior Analysis
US7778419B2 (en) * 2005-05-10 2010-08-17 Research In Motion Limited Key masking for cryptographic processes
US7925678B2 (en) * 2007-01-12 2011-04-12 Loglogic, Inc. Customized reporting and mining of event data
US20110119219A1 (en) * 2009-11-17 2011-05-19 Naifeh Gregory P Method and apparatus for analyzing system events
US20110131453A1 (en) * 2009-12-02 2011-06-02 International Business Machines Corporation Automatic analysis of log entries through use of clustering
US20110185234A1 (en) * 2010-01-28 2011-07-28 Ira Cohen System event logs
US20110296244A1 (en) * 2010-05-25 2011-12-01 Microsoft Corporation Log message anomaly detection
US8073806B2 (en) * 2007-06-22 2011-12-06 Avaya Inc. Message log analysis for system behavior evaluation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050223027A1 (en) * 2004-03-31 2005-10-06 Lawrence Stephen R Methods and systems for structuring event data in a database for location and retrieval
US7778419B2 (en) * 2005-05-10 2010-08-17 Research In Motion Limited Key masking for cryptographic processes
US7925678B2 (en) * 2007-01-12 2011-04-12 Loglogic, Inc. Customized reporting and mining of event data
US8073806B2 (en) * 2007-06-22 2011-12-06 Avaya Inc. Message log analysis for system behavior evaluation
US20090113246A1 (en) * 2007-10-24 2009-04-30 Sivan Sabato Apparatus for and Method of Implementing system Log Message Ranking via System Behavior Analysis
US20110119219A1 (en) * 2009-11-17 2011-05-19 Naifeh Gregory P Method and apparatus for analyzing system events
US20110131453A1 (en) * 2009-12-02 2011-06-02 International Business Machines Corporation Automatic analysis of log entries through use of clustering
US20110185234A1 (en) * 2010-01-28 2011-07-28 Ira Cohen System event logs
US20110296244A1 (en) * 2010-05-25 2011-12-01 Microsoft Corporation Log message anomaly detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Yang et al.; "Near-Duplicate Detection by Instance-level Constrained Clustering;" SIGIR '06; August 2006; pp. 421-428. *

Cited By (119)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633106B1 (en) 2011-06-30 2017-04-25 Sumo Logic Log data analysis
US10402428B2 (en) * 2013-04-29 2019-09-03 Moogsoft Inc. Event clustering system
US11853290B2 (en) * 2013-09-11 2023-12-26 Sumo Logic, Inc. Anomaly detection
US11314723B1 (en) * 2013-09-11 2022-04-26 Sumo Logic, Inc. Anomaly detection
US20220207020A1 (en) * 2013-09-11 2022-06-30 Sumo Logic, Inc. Anomaly detection
US10445311B1 (en) * 2013-09-11 2019-10-15 Sumo Logic Anomaly detection
US9104573B1 (en) * 2013-09-16 2015-08-11 Amazon Technologies, Inc. Providing relevant diagnostic information using ontology rules
US10114148B2 (en) * 2013-10-02 2018-10-30 Nec Corporation Heterogeneous log analysis
US20150094959A1 (en) * 2013-10-02 2015-04-02 Nec Laboratories America, Inc. Heterogeneous log analysis
US20150227598A1 (en) * 2014-02-13 2015-08-13 Amazon Technologies, Inc. Log data service in a virtual environment
US10133741B2 (en) * 2014-02-13 2018-11-20 Amazon Technologies, Inc. Log data service in a virtual environment
US10120928B2 (en) * 2014-06-24 2018-11-06 Vmware, Inc. Method and system for clustering event messages and managing event-message clusters
US20150370799A1 (en) * 2014-06-24 2015-12-24 Vmware, Inc. Method and system for clustering and prioritizing event messages
US20150370885A1 (en) * 2014-06-24 2015-12-24 Vmware, Inc. Method and system for clustering event messages and managing event-message clusters
US9922099B2 (en) 2014-09-30 2018-03-20 Splunk Inc. Event limited field picker
US9740755B2 (en) 2014-09-30 2017-08-22 Splunk, Inc. Event limited field picker
US10185740B2 (en) 2014-09-30 2019-01-22 Splunk Inc. Event selector to generate alternate views
US10303344B2 (en) * 2014-10-05 2019-05-28 Splunk Inc. Field value search drill down
US20220155943A1 (en) * 2014-10-05 2022-05-19 Splunk Inc. Statistics chart row mode drill down
US10444956B2 (en) 2014-10-05 2019-10-15 Splunk Inc. Row drill down of an event statistics time chart
US11614856B2 (en) 2014-10-05 2023-03-28 Splunk Inc. Row-based event subset display based on field metrics
US10795555B2 (en) 2014-10-05 2020-10-06 Splunk Inc. Statistics value chart interface row mode drill down
US20160098485A1 (en) * 2014-10-05 2016-04-07 Splunk Inc. Field Value Search Drill Down
US11003337B2 (en) 2014-10-05 2021-05-11 Splunk Inc. Executing search commands based on selection on field values displayed in a statistics table
US11816316B2 (en) 2014-10-05 2023-11-14 Splunk Inc. Event identification based on cells associated with aggregated metrics
US11231840B1 (en) * 2014-10-05 2022-01-25 Splunk Inc. Statistics chart row mode drill down
US9921730B2 (en) 2014-10-05 2018-03-20 Splunk Inc. Statistics time chart interface row mode drill down
US11455087B2 (en) * 2014-10-05 2022-09-27 Splunk Inc. Generating search commands based on field-value pair selections
US10261673B2 (en) 2014-10-05 2019-04-16 Splunk Inc. Statistics value chart interface cell mode drill down
US10599308B2 (en) 2014-10-05 2020-03-24 Splunk Inc. Executing search commands based on selections of time increments and field-value pairs
US11687219B2 (en) * 2014-10-05 2023-06-27 Splunk Inc. Statistics chart row mode drill down
US11868158B1 (en) * 2014-10-05 2024-01-09 Splunk Inc. Generating search commands based on selected search options
US10139997B2 (en) 2014-10-05 2018-11-27 Splunk Inc. Statistics time chart interface cell mode drill down
US20160132566A1 (en) * 2014-11-10 2016-05-12 Red Hat, Inc. Native federation view suggestion
US9864786B2 (en) * 2014-11-10 2018-01-09 Red Hat, Inc. Native federation view suggestion
US11329860B2 (en) * 2015-01-27 2022-05-10 Moogsoft Inc. System for decomposing events that includes user interface
US11868364B1 (en) 2015-01-30 2024-01-09 Splunk Inc. Graphical user interface for extracting from extracted fields
US10877963B2 (en) 2015-01-30 2020-12-29 Splunk Inc. Command entry list for modifying a search query
US10061824B2 (en) 2015-01-30 2018-08-28 Splunk Inc. Cell-based table manipulation of event data
US11544248B2 (en) 2015-01-30 2023-01-03 Splunk Inc. Selective query loading across query interfaces
US10013454B2 (en) 2015-01-30 2018-07-03 Splunk Inc. Text-based table manipulation of event data
US11907271B2 (en) 2015-01-30 2024-02-20 Splunk Inc. Distinguishing between fields in field value extraction
US10915583B2 (en) 2015-01-30 2021-02-09 Splunk Inc. Suggested field extraction
US11442924B2 (en) 2015-01-30 2022-09-13 Splunk Inc. Selective filtered summary graph
US9977803B2 (en) 2015-01-30 2018-05-22 Splunk Inc. Column-based table manipulation of event data
US9922084B2 (en) 2015-01-30 2018-03-20 Splunk Inc. Events sets in a visually distinct display format
US10896175B2 (en) 2015-01-30 2021-01-19 Splunk Inc. Extending data processing pipelines using dependent queries
US11341129B2 (en) 2015-01-30 2022-05-24 Splunk Inc. Summary report overlay
US11841908B1 (en) 2015-01-30 2023-12-12 Splunk Inc. Extraction rule determination based on user-selected text
US9916346B2 (en) 2015-01-30 2018-03-13 Splunk Inc. Interactive command entry list
US11741086B2 (en) 2015-01-30 2023-08-29 Splunk Inc. Queries based on selected subsets of textual representations of events
US10949419B2 (en) 2015-01-30 2021-03-16 Splunk Inc. Generation of search commands via text-based selections
US9842160B2 (en) 2015-01-30 2017-12-12 Splunk, Inc. Defining fields from particular occurences of field labels in events
US11531713B2 (en) 2015-01-30 2022-12-20 Splunk Inc. Suggested field extraction
US11409758B2 (en) 2015-01-30 2022-08-09 Splunk Inc. Field value and label extraction from a field value
US11030192B2 (en) 2015-01-30 2021-06-08 Splunk Inc. Updates to access permissions of sub-queries at run time
US10846316B2 (en) 2015-01-30 2020-11-24 Splunk Inc. Distinct field name assignment in automatic field extraction
US11068452B2 (en) 2015-01-30 2021-07-20 Splunk Inc. Column-based table manipulation of event data to add commands to a search query
US11354308B2 (en) 2015-01-30 2022-06-07 Splunk Inc. Visually distinct display format for data portions from events
US11615073B2 (en) 2015-01-30 2023-03-28 Splunk Inc. Supplementing events displayed in a table format
US11222014B2 (en) 2015-01-30 2022-01-11 Splunk Inc. Interactive table-based query construction using interface templates
US20160224531A1 (en) 2015-01-30 2016-08-04 Splunk Inc. Suggested Field Extraction
US10726037B2 (en) 2015-01-30 2020-07-28 Splunk Inc. Automatic field extraction from filed values
US11573959B2 (en) 2015-01-30 2023-02-07 Splunk Inc. Generating search commands based on cell selection within data tables
US11544257B2 (en) 2015-01-30 2023-01-03 Splunk Inc. Interactive table-based query construction using contextual forms
US10664535B1 (en) 2015-02-02 2020-05-26 Amazon Technologies, Inc. Retrieving log data from metric data
US10311171B2 (en) 2015-03-02 2019-06-04 Ca, Inc. Multi-component and mixed-reality simulation environments
US20160259869A1 (en) * 2015-03-02 2016-09-08 Ca, Inc. Self-learning simulation environments
US9639443B2 (en) * 2015-03-02 2017-05-02 Ca, Inc. Multi-component and mixed-reality simulation environments
US20160357960A1 (en) * 2015-06-03 2016-12-08 Fujitsu Limited Computer-readable storage medium, abnormality detection device, and abnormality detection method
US20170004188A1 (en) * 2015-06-30 2017-01-05 Ca, Inc. Apparatus and Method for Graphically Displaying Transaction Logs
US9524397B1 (en) 2015-07-06 2016-12-20 Bank Of America Corporation Inter-system data forensics
US10587555B2 (en) * 2015-09-01 2020-03-10 Sap Portals Israel Ltd. Event log analyzer
US20170063762A1 (en) * 2015-09-01 2017-03-02 Sap Portals Israel Ltd Event log analyzer
US10140287B2 (en) * 2015-09-09 2018-11-27 International Business Machines Corporation Scalable and accurate mining of control flow from execution logs across distributed systems
US20170068709A1 (en) * 2015-09-09 2017-03-09 International Business Machines Corporation Scalable and accurate mining of control flow from execution logs across distributed systems
US10394868B2 (en) 2015-10-23 2019-08-27 International Business Machines Corporation Generating important values from a variety of server log files
US20170134408A1 (en) * 2015-11-10 2017-05-11 Sap Se Standard metadata model for analyzing events with fraud, attack, or any other malicious background
US9876809B2 (en) * 2015-11-10 2018-01-23 Sap Se Standard metadata model for analyzing events with fraud, attack, or any other malicious background
US10078542B2 (en) * 2015-11-16 2018-09-18 International Business Machines Corporation Management of computing machines with troubleshooting prioritization
US10831584B2 (en) 2015-11-16 2020-11-10 International Business Machines Corporation Management of computing machines with troubleshooting prioritization
US20170139766A1 (en) * 2015-11-16 2017-05-18 International Business Machines Corporation Management of computing machines with troubleshooting prioritization
WO2017087437A1 (en) * 2015-11-17 2017-05-26 Nec Laboratories America, Inc. Fast pattern discovery for log analytics
CN105824744A (en) * 2016-03-21 2016-08-03 焦点科技股份有限公司 Real-time log collection and analysis method on basis of B2B (Business to Business) platform
US10237295B2 (en) * 2016-03-22 2019-03-19 Nec Corporation Automated event ID field analysis on heterogeneous logs
US10423597B2 (en) * 2016-03-27 2019-09-24 International Business Machines Corporation Data set visualizer for tree based file systems
US10929368B2 (en) 2016-03-27 2021-02-23 International Business Machines Corporation Data set visualizer for tree based file systems
US11847480B2 (en) * 2016-06-21 2023-12-19 Amazon Technologies, Inc. System for detecting impairment issues of distributed hosts
US10733002B1 (en) * 2016-06-28 2020-08-04 Amazon Technologies, Inc. Virtual machine instance data aggregation
US10462170B1 (en) * 2016-11-21 2019-10-29 Alert Logic, Inc. Systems and methods for log and snort synchronized threat detection
US20180144041A1 (en) * 2016-11-21 2018-05-24 International Business Machines Corporation Transaction discovery in a log sequence
US10740360B2 (en) * 2016-11-21 2020-08-11 International Business Machines Corporation Transaction discovery in a log sequence
US10929765B2 (en) 2016-12-15 2021-02-23 Nec Corporation Content-level anomaly detection for heterogeneous logs
US11157343B2 (en) 2016-12-21 2021-10-26 Mastercard International Incorporated Systems and methods for real time computer fault evaluation
WO2018118379A1 (en) * 2016-12-21 2018-06-28 Mastercard International Incorporated Systems and methods for real time computer fault evaluation
US10331507B2 (en) 2016-12-21 2019-06-25 Mastercard International Incorporated Systems and methods for real time computer fault evaluation
US20180203757A1 (en) * 2017-01-16 2018-07-19 Hitachi, Ltd. Log message grouping apparatus, log message grouping system, and log message grouping method
US10579461B2 (en) * 2017-01-16 2020-03-03 Hitachi, Ltd. Log message grouping apparatus, log message grouping system, and log message grouping method
US11604715B2 (en) * 2017-01-26 2023-03-14 International Business Machines Corporation Generation of end-user sessions from end-user events identified from computer system logs
US10855707B2 (en) * 2017-03-20 2020-12-01 Nec Corporation Security system using automatic and scalable log pattern learning in security log analysis
US11196758B2 (en) 2017-03-20 2021-12-07 Nec Corporation Method and system for enabling automated log analysis with controllable resource requirements
US10567409B2 (en) 2017-03-20 2020-02-18 Nec Corporation Automatic and scalable log pattern learning in security log analysis
US10678669B2 (en) 2017-04-21 2020-06-09 Nec Corporation Field content based pattern generation for heterogeneous logs
US10333805B2 (en) 2017-04-21 2019-06-25 Nec Corporation Ultra-fast pattern generation algorithm for the heterogeneous logs
US10740212B2 (en) 2017-06-01 2020-08-11 Nec Corporation Content-level anomaly detector for systems with limited memory
WO2019066295A1 (en) * 2017-09-28 2019-04-04 큐비트시큐리티 주식회사 Web traffic logging system and method for detecting web hacking in real time
CN109845228A (en) * 2017-09-28 2019-06-04 量子位安全有限公司 Network traffic recording system and method for the attack of real-time detection network hacker
EP3691217A4 (en) * 2017-09-28 2021-05-12 Qubit Security Inc. Web traffic logging system and method for detecting web hacking in real time
JP2019533841A (en) * 2017-09-28 2019-11-21 キュービット セキュリティ インコーポレーテッドQubit Security Inc. Web traffic logging system and method for real-time detection of web hacking
US11140181B2 (en) * 2017-09-28 2021-10-05 Qubit Security Inc. Web traffic logging system and method for detecting web hacking in real time
KR101909957B1 (en) * 2018-04-03 2018-12-19 큐비트시큐리티 주식회사 Web traffic logging system and method for detecting web hacking in real time
CN111427737A (en) * 2019-01-09 2020-07-17 阿里巴巴集团控股有限公司 Method and device for modifying exception log and electronic equipment
CN109918349A (en) * 2019-02-25 2019-06-21 网易(杭州)网络有限公司 Log processing method, device, storage medium and electronic device
US11513935B2 (en) * 2019-08-30 2022-11-29 Dell Products L.P. System and method for detecting anomalies by discovering sequences in log entries
US20210064500A1 (en) * 2019-08-30 2021-03-04 Dell Products, Lp System and Method for Detecting Anomalies by Discovering Sequences in Log Entries
WO2021067858A1 (en) * 2019-10-03 2021-04-08 Oracle International Corporation Enhanced anomaly detection in computing environments
EP4250116A3 (en) * 2019-10-03 2024-04-10 Oracle International Corporation Enhanced anomaly detection in computing environments
KR102425525B1 (en) 2020-11-30 2022-07-26 가천대학교 산학협력단 System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same
KR20220077184A (en) * 2020-11-30 2022-06-09 가천대학교 산학협력단 System and method for log anomaly detection using bayesian probability and closed pattern mining method and computer program for the same

Also Published As

Publication number Publication date
US9244755B2 (en) 2016-01-26

Similar Documents

Publication Publication Date Title
US9244755B2 (en) Scalable log analytics
US10761687B2 (en) User interface that facilitates node pinning for monitoring and analysis of performance in a computing environment
US10205643B2 (en) Systems and methods for monitoring and analyzing performance in a computer system with severity-state sorting
US10515469B2 (en) Proactive monitoring tree providing pinned performance information associated with a selected node
US10243818B2 (en) User interface that provides a proactive monitoring tree with state distribution ring
US10042834B2 (en) Dynamic field extraction of data
US9319288B2 (en) Graphical user interface for displaying information related to a virtual machine network
US10095731B2 (en) Dynamically converting search-time fields to ingest-time fields
US11762893B2 (en) Creation of a summary for a plurality of texts
US20170357710A1 (en) Clustering log messages using probabilistic data structures
US9607029B1 (en) Optimized mapping of documents to candidate duplicate documents in a document corpus
US20240020405A1 (en) Extracted field generation to filter log messages
US11757736B2 (en) Prescriptive analytics for network services

Legal Events

Date Code Title Description
AS Assignment

Owner name: VMWARE, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUANG, MARK;LIN, JUNYUAN;SIGNING DATES FROM 20130528 TO 20130724;REEL/FRAME:030874/0299

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8