WO2016161381A1 - Procédé et système de mise en œuvre d'un analyseur de journal dans un système d'analyse de journal - Google Patents

Procédé et système de mise en œuvre d'un analyseur de journal dans un système d'analyse de journal Download PDF

Info

Publication number
WO2016161381A1
WO2016161381A1 PCT/US2016/025739 US2016025739W WO2016161381A1 WO 2016161381 A1 WO2016161381 A1 WO 2016161381A1 US 2016025739 W US2016025739 W US 2016025739W WO 2016161381 A1 WO2016161381 A1 WO 2016161381A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
line
data
target
field
Prior art date
Application number
PCT/US2016/025739
Other languages
English (en)
Inventor
Gregory Michael FERRAR
Original Assignee
Oracle International Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corporation filed Critical Oracle International Corporation
Priority to EP16716414.4A priority Critical patent/EP3278243A1/fr
Priority to CN201680029404.0A priority patent/CN107660283B/zh
Priority to CN202111494485.0A priority patent/CN114168418A/zh
Priority claimed from US15/089,049 external-priority patent/US9767171B2/en
Priority claimed from US15/089,226 external-priority patent/US11226975B2/en
Publication of WO2016161381A1 publication Critical patent/WO2016161381A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Definitions

  • log parser is a tool that
  • a log parser must be manually constructed by a person that must be both knowledgeable about the exact format of the log file to be analyzed, as well as skilled in the specific programming infrastructure that would be used to implement the parser.
  • Some embodiments of the invention solve the above-described problems by providing an approach to automatically construct a log parser. Instead of requiring a person to manually create the contents of the log parser, the log contents themselves are used to construct the parser.
  • a method, system, or computer readable medium constructs a log parser by identifying a log to analyze, creating a mapping structure that maps contents of the log to identified element types for one or more data portions within the log, selecting a data portion from the log, analyzing the data portion relative to the mapping structure to identify variable parts and non-variable parts, for at least one of the variable parts, assigning the at least one variable part to a least restrictive data type that encompasses a variability of values detected in the at least one variable part, and automatically generating a regular expression for the log parser.
  • the regular expression may in some embodiments include non-variable parts with placeholders for the variable parts to implement a log parser, where at least two different placeholders are associated with different data types.
  • the inventive approach to identify the variable parts and the non-variable parts may be performed by identifying a line from the log to compare against the mapping structure, starting from beginning of the line, and moving forward until a mismatch is identified, finding a next common character, marking an intervening range as variable, and looping through until end of the line is reached.
  • the element type may comprise at least one of a string type, an integer type, an alphabetic character type, or a field rule type, wherein the field rule type corresponds to a sequence of elements defined by a rule.
  • Multiple lines are grouped together as a single entry for analysis against the mapping structure. In the alternative, contents of the multiple lines can be manipulated into a single line.
  • a delimiter for an analysis range within the log can be identified by identifying common elements within two lines of the log, scoring the common elements, the common element scored by consideration of positions for the common elements in combination with one or more weighting factors, and selecting a common element as the delimiter based upon scoring results.
  • a weighting factor may comprise a rule that corresponds to a
  • a sum or an average can be calculated for the positions for the common elements.
  • a key field and a value field can be extracted from the log by identifying a range for evaluation of one or more key value pairs from identification of a first key value divider and iteratively identifying key value pair dividers within a line, and iteratively walking through the line to extract the key field from a left side of an instance of the key value divider and to extract the value field from a right side of the instance of the key value divider.
  • Pre-processing may be applied to the log to classify field and value portions of the log.
  • post-processing can be applied to correct problematic assignments of content to the key field or the value field.
  • the log parser is employed in a log analytics system that is embodied as a cloud-based and/or SaaS-based (software as a service) architecture.
  • the raw log data processed by the log analytics system may originate from any log-producing source, such as database management system (DBMS), database application (DB App), middleware, operating system, hardware components, or any other log-producing application, component, or system.
  • Log monitoring is configurable using a configuration mechanism comprising UI controls operable by a user to select and configure log collection configuration and target representations for the log collection configuration.
  • the log collection configuration comprise the set of information (e.g., log rules, log source information, and log type
  • target representations identify "targets", which are individual components that that contain and/or produce logs. These targets are associated with specific components/hosts in the customer environment. The ability of the current embodiment to configure log
  • FIG. 1 A illustrates an example system which may be employed in some embodiments of the invention.
  • Fig. IB illustrates a flowchart of a method which may be employed in some embodiments of the invention.
  • Fig. 2 illustrates a reporting UI.
  • FIGs. 3 A-C provide more detailed illustrations of the internal structure of the log analytics system and the components within the customer environment that interact with the log analytics system.
  • Figs. 4A-C illustrate approaches to implement the log collection configuration.
  • Fig. 5 shows a flowchart of an approach to implement a log collection configuration by associating a log rule with a target.
  • FIG. 6 shows a flowchart of an approach to implement a log collection configuration by associating a log source with a target.
  • Fig. 7 shows a flowchart of an approach to implement target-based configuration for log monitoring.
  • Fig. 8 shows a more detailed flowchart of an approach to implement target-based configuration for log monitoring according to some embodiments of the invention.
  • Fig. 9 illustrates example XML configuration content according to some embodiments of the invention.
  • Fig. 10 illustrates server-side information to be included in the configuration file to facilitate the log parsing.
  • FIG. 11 shows a flowchart of one possible approach to implement this aspect of some embodiments of the invention.
  • Fig. 12 illustrates an architecture for implementing some embodiments of the inventive approach to associate log analysis rules to variable locations.
  • Fig. 13 illustrates extraction of additional data that is not consistent across all log entries.
  • Fig. 14 shows some example field definitions.
  • Fig. 15 shows a high level flowchart of an approach to implement a log parser according to some embodiments of the invention.
  • Fig. 16 shows a more detailed flowchart of an approach to implement a log parser according to some embodiments.
  • FIGs. 17-1 through 17-21 provide an illustration of the process to construct a log parser.
  • Fig. 18 shows the process flow of an embodiment to address non-standard line formats.
  • Fig. 19 illustrates manipulation or categorization of line content.
  • Fig. 20 shows flowchart of an approach for efficiently identifying the correct delimiter elements within a set of log content according to some embodiments of the invention.
  • Figs. 21-1 through 21-5 illustrate the delimiter identification process.
  • Fig. 22 shows some example weights that may be applied to common elements in some applications of the invention.
  • Fig. 23 illustrates a flowchart of an example approach to perform key value extraction.
  • Figs. 24-1 through 24-12 illustrate the key value extraction process.
  • Figs. 25-1 through 25-2 and 26-1 through 26-2 illustrate example line configurations.
  • Fig. 27 shows an architecture of an example computing system with which the invention may be implemented.
  • Some embodiments of the invention provide an approach to automatically construct a log parser. Instead of requiring a person to manually create the contents of the log parser, the log contents themselves are used to construct he parser. Other additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.
  • This portion of the disclosure provides a description of a method and system for implementing high volume log collection and analytics, which can be used in conjunction with log parsers constructed as described below.
  • Fig. 1 A illustrates an example system 100 for configuring, collecting, and analyzing log data according to some embodiments of the invention.
  • System 100 includes a log analytics system 101 that in some embodiments is embodied as a cloud-based and/or SaaS-based
  • log analytics system 101 is capable of servicing log analytics functionality as a service on a hosted platform, such that each customer that needs the service does not need to individually install and configure the service components on the customer's own network.
  • the log analytics system 101 is capable of providing the log analytics service to multiple separate customers, and can be scaled to service any number of customers.
  • Each customer network 104 may include any number of hosts 109.
  • the hosts 109 are the computing platforms within the customer network 104 that generate log data as one or more log files.
  • the raw log data produced within hosts 109 may originate from any log- producing source.
  • the raw log data may originate from a database management system (DBMS), database application (DB App), middleware, operating system, hardware components, or any other log-producing application, component, or system.
  • DBMS database management system
  • DB App database application
  • middleware middleware
  • operating system hardware components
  • gateways 108 are provided in each customer network to communicate with the log analytics system 101.
  • the system 100 may include one or more users at one or more user stations 103 that use the system 100 to operate and interact with the log analytics system 101.
  • the user station 103 comprises any type of computing station that may be used to operate or interface with the log analytics system 101 in the system 100. Examples of such user stations include, for example, workstations, personal computers, mobile devices, or remote computing terminals.
  • the user station comprises a display device, such as a display monitor, for displaying a user interface to users at the user station.
  • the user station also comprises one or more input devices for the user to provide operational control over the activities of the system 100, such as a mouse or keyboard to manipulate a pointing object in a graphical user interface to generate user inputs.
  • the user stations 103 may be (although not required to be) located within the customer network 104.
  • the log analytics system 101 comprises functionality that is accessible to users at the user stations 101, e.g., where log analytics system 101 is implemented as a set of engines, mechanisms, and/or modules (whether hardware, software, or a mixture of hardware and software) to perform configuration, collection, and analysis of log data.
  • a user interface (UI) mechanism generates the UI to display the classification and analysis results, and to allow the user to interact with the log analytics system.
  • Fig. IB shows a flowchart of an approach to use system 100 to configure, collect, and analyze log data. This discussion of Fig. IB will refer to components illustrated for the system 100 in Fig. 1A.
  • log monitoring is configured within the system. This may occur, for example, by a user/customer to configure the type of log monitoring/data gathering desired by the user/customer.
  • a configuration mechanism 129 comprising UI controls is operable by the user to select and configure log collection configuration 111 and target representations 113 for the log collection configuration.
  • the log collection configuration 111 comprise the set of information (e.g., log rules, log source information, and log type information) that identify what data to collect (e.g., which log files), the location of the data to collect (e.g., directory locations), how to access the data (e.g., the format of the log and/or specific fields within the log to acquire), and/or when to collect the data (e.g., on a periodic basis).
  • the log collection configuration 111 may include out-of-the-box rules that are included by a service provider.
  • the log collection configuration 111 may also include customer-defined/customer-customized rules.
  • the target representations 113 identify "targets", which are individual components within the customer environment that that contain and/or produce logs. These targets are associated with specific components/hosts in the customer environment.
  • An example target may be a specific database application, which are associated with one or more logs one or more hosts.
  • the log analytics user can be insulated from the specifics of the exact hosts/components that pertain to the logs for a given target.
  • This information can be encapsulated in underlying metadata that is maintained by administrators of the system that understand the correspondence between the applications, hosts, and components in the system.
  • the next action at 122 is to capture the log data according to the user configurations.
  • the association between the log rules 111 and the target representations is sent to the customer network 104 for processing.
  • An agent of the log analytics system is present on each of the hosts 109 to collect data from the appropriate logs on the hosts 109.
  • data masking may be performed upon the captured data.
  • the masking is performed at collection time, which protects the customer data before it leaves the customer network.
  • various types of information in the collected log data such as user names and other personal information
  • Patterns are identified for such data, which can be removed and/or changed to proxy data before it is collected for the server. This allows the data to still be used for analysis purposes, while hiding the sensitive data.
  • Some embodiments permanently remove the sensitive data (e.g., change all such data to "***" symbols), or changed to data that is mapped so that the original data can be recovered.
  • the collected log data is delivered from the customer network 104 to the log analytics system 101.
  • the multiple hosts 109 in the customer network 104 provide the collected data to a smaller number of one or more gateways 108, which then sends the log data to edge services 106 at the log analytics system 101.
  • the edge services 106 receives the collected data one or more customer networks and places the data into an inbound data store for further processing by a log processing pipeline 107.
  • the log processing pipeline 107 performs a series of data processing and analytical operations upon the collected log data, which is described in more detail below.
  • the processed data is then stored into a data storage device 110.
  • the computer readable storage device 110 comprises any combination of hardware and software that allows for ready access to the data that is located at the computer readable storage device 110.
  • the computer readable storage device 110 could be implemented as computer memory operatively managed by an operating system.
  • the data in the computer readable storage device 110 could also be implemented as database objects, cloud objects, and/or files in a file system.
  • the processed data is stored within both a text/indexed data store 110a (e.g., as a SOLR cluster) and a raw/historical data store 110b (e.g., as a HDFS cluster).
  • reporting may be performed on the processed data using a reporting mechanism/UI 115.
  • the reporting UI 200 may include a log search facility 202, one or more dashboards 204, and/or any suitable applications 206 for
  • incident management may be performed upon the processed data.
  • One or more alert conditions can be configured within log analytics system such that upon the detection of the alert condition, an incident management mechanism 117 provides a notification to a designated set of users of the incident/alert.
  • a Corrective Action Engine 119 may perform any necessary actions to be taken within the customer network 104. For example, a log entry may be received that a database system is down. When such a log entry is identified, a possible automated corrective action is to attempt to bring the database system back up. The customer may create a corrective action script to address this situation. A trigger may be performed to run the script to perform the corrective action (e.g., the trigger causes an instruction to be sent to the agent on the customer network to run the script). In an alternative embodiment, the appropriate script for the situation is pushed down from the server to the customer network to be executed. In addition, at 136, any other additional functions and/or actions may be taken as appropriate based at last upon the processed data. [0066] Fig. 3 A provides a more detailed illustration of the internal structure of the log analytics system at a host environment 340 and the components within the customer
  • This architecture 300 is configured to provide a flow for log monitoring that is able to handle large amounts of log data ingest.
  • the LA (log analytics) agent 333 takes the log monitoring configuration data 332 (e.g., sniffer configuration or target-side configuration materials), and calls a log file 336 sniffer (also referred to herein as the "log collector") to gather log data from one or more log files 338.
  • a daemon manager 334 can be employed to interface with the log file sniffer 336.
  • the log file sniffer 336 reads from one or more log files 338 on the host machine 344.
  • the daemon manager 334 takes the log content and packages it up so that it can be handed back to the LA agent 333.
  • the system may include any number of different kinds of sniffers, and a log sniffer 336 is merely an example of a single type of sniffer that can be used in the system.
  • Other types of sniffers may therefore be employed within various embodiments of the invention, e.g., sniffers to monitor registries, databases, windows event logs, etc.
  • the log sniffer in some embodiments is configured to handle collective/compressed files, e.g., a Zip file.
  • the LA agent 333 sends the gathered log data to the gateway agent 330.
  • the gateway agent 330 packages up the log data that is collected from multiple customer
  • the hosts/servers essentially acting as an aggregator to aggregate the log content from multiple hosts.
  • the packaged content is then sent from the gateway agent 330 to the edge services 306.
  • the edge services 306 receive a large amount of data from multiple gateway agents 330 from any number of different customer environments 342.
  • the data is immediately stored into an inbound data storage device 304 (the "platform inbound store").
  • This acts as a queue for the log processing pipeline 308.
  • a data structure is provided to manage the items to be processed within the inbound data store.
  • a messaging platform 302 e.g., implemented using the Kafka product
  • a queue consumer 310 identifies the next item within the queue to be processed, which is then retrieved from the platform inbound store.
  • the queue consumer 310 comprises any entity that is capable of processing work within the system off the queue, such as a process, thread, node, or task.
  • the retrieved log data undergoes a "parse” stage 312, where the log entries are parsed and broken up into specific fields.
  • the "log type” configured for the log specifies how to break up the log entry into the desired fields.
  • the identified fields are normalized.
  • a "time” field may be represented in any number of different ways in different logs. This time field can be normalized into a single recognizable format (e.g., UTC format).
  • the word “error” may be represented in different ways on different systems (e.g., all upper case “ERROR”, all lower case “error”, first letter capitalized “Error”, or abbreviation "err”). This situation may require the different word forms/types to be normalized into a single format (e.g., all lower case un-abbreviated term "error").
  • the "transform" stage 316 can be used to synthesize new content from the log data.
  • tags can be added to the log data to provide additional information about the log entries.
  • field extraction can be performed to extract additional fields from the existing log entry fields.
  • a "condition evaluation" stage 318 is used to evaluate for specified conditions upon the log data. This stage can be performed to identify patterns within the log data, and to create/identify alerts conditions within the logs. Any type of notifications may be performed at this stage, including for example, emails/text messages/call sent to administrators/customers or alert to another system or mechanism.
  • a log writer 320 then writes the processed log data to one or more data stores 324.
  • the processed data is stored within both a text/indexed data store (e.g., as a SOLR cluster) and a raw and/or historical data store (e.g., as a HDFS cluster).
  • the log writer can also send the log data to another processing stage 322 and/or downstream processing engine.
  • some embodiments provide a side loading mechanism 350 to collect log data without to proceed through an agent 333 on the client side.
  • the user logs into the server to select one or more files on a local system.
  • the system will load that file at the server, and will sniff through that file (e.g., by having the user provide the log type, attempting likely log types, rolling through different log types, or by making an educated "guess" of the log type).
  • the sniffing results are then passed to the Edge Services and process as previously described.
  • the side loading mechanism 350 exists to gather the log files - where the agent/sniffer entities are either not installed and/or not needed on the client server 344.
  • Figs. 4A-B illustrate approaches to implement the log collection configuration. This approach allow for very large scale configuration of how to monitor log files having one or more log entries.
  • a log entry corresponds to a single logical row from a log file. In the actual log file, a single entry could take multiple lines due to carriage returns being part of the log entry content. This entire content is considered a single "entry”. Each entry starts with "#### ⁇ date" and could occupy a single physical line in the file or multiple lines separate by carriage returns.
  • the "Log Type" 406 defines how the system reads the log file, as well as how to decompose the log file into its parts.
  • a log file contains several base fields.
  • the base fields that exist may vary for different types of logs.
  • a "base parser” can be used to breaks a log entry into the specified fields.
  • the base parser may also perform transformations. For instance, a Date field can be converted to a normalized format and time adjusted to be in UTC so data from many locations can be mixed together.
  • the "Log Source” 404 defines where log files are located and how to read them.
  • the log source is a named definition that contains a list of log files described using patterns, along with the parser that is needed to parse that file.
  • one source could be "SSH Log files”. This source may list each log file related to SSH separately, or could describe the log files using a wildcard (e.g., "/var/log/ssh*").
  • a base parser can be chosen (e.g., by a user) to parse the base fields from the file. This approach can be used to ensure that for a single pattern that all files conform to the same base parse structure.
  • one source can choose from among multiple log types, and give a priority to those possible types. For example, types A, B, and C can be identified, where the analysis works through each of these in order to determine whether the source matches one of these identified types.
  • the user can choose multiple base parsers.
  • the same source may match against and be analyzed using multiple types.
  • the "Log Rule” 402 defines a set of sources along with conditions and actions to be triggered during continuous monitoring.
  • the "Targets” 408 identify individual components in an IT environment that contain logs. Associating a rule to a target starts the monitoring process in some embodiments.
  • one or more log rules are associated with one or more targets.
  • one or more log sources can be associated with one or more targets to create an instance of a target.
  • log rules are not even provided as an approach to create the associations - where only log source to target associations are provided to create target instances.
  • Fig. 5 shows a flowchart of an approach to implement a log collection configuration by associating a log rule with a target.
  • one or more log rules are created.
  • the rules are processed by a rules engine within the log processing system to implement rule-based handling of a given target. Therefore, the rule will include specific logic for handling a given target that it is associated with.
  • the rule can be used to specific a target type, which identifies the type of the target that the rule is intended to address.
  • a rule can be specified for a single target type or multiple target types.
  • the target type when monitoring a log file for a database instance, can be set to Database Instance so that reporting of activities in the log goes against the proper target type;
  • the target type can still be any managed target type, such as a database.
  • the rule may specify a source type, which identifies the type of log file that the rule is intended to address.
  • the rule may specify that the log file types will be: (i) File: OS level log file; (ii) Database Table: a table that stores log content in a database; (iii) Windows Event Log: read events from windows event as log content.
  • a target property filter may be specified in the rule to filter for targets to specify conditions under which the rule is applicable, such as for example, a particular operating system (OS), target version, and/or target platform.
  • OS operating system
  • target version e.g., a particular version of Linux OEL5 on X86 64 hardware.
  • the rule may also include: (a) the name of the rule; (b) a severity level indicating how important the outcome of this rule is if this rule leads to an event being generated; (c) a description of the rule; and/or (d) a textual rationale of why this monitoring is occurring.
  • one or more conditions can be established for which the rule will "trigger". Multiple conditions may be specified, where each condition can be combined with others using a Boolean operator. For example, a set of conditions that is ORed with others means that if any of these conditions match an entry in a log file under evaluation, then that entry triggers this rule. When the conditions are ANDed together, all clauses of the condition must be met for the condition to trigger an entry in a log file. The specified actions will then be taken as a response to this entry that is matched.
  • the "operator” in the condition is how the comparison is to be performed.
  • Actions may be specified to identify what to do when a match is found on the selected sources for a given condition. For example, one possible action is to capture a complete log entry as an observation when matching conditions of the rule. This approach lets the system/user, when monitoring a log from any source and when a single entry is seen that matches the conditions of this rule, to save that complete entry and store it in the repository as an observation.
  • Observations are stored for later viewing through the log observations UI or other reporting features.
  • Another possible action is to create an event entry for each matching condition. When a log entry is seen as matching the specified conditions, this approaches raise an event. In some embodiments, the event will be created directly at the agent. The source definition will define any special fields that may be needed for capturing events if there are any. An additional option for this action is to have repeat log entries bundled at the agent and only report the event at most only once for the time range the user specified. The matching conditions can be used to help identify the existence of a repeat entry.
  • Another example action is to create a metric for the rule to capture each occurrence of a matching condition. In this approach, a new metric is created for this rule using a metric subsystem.
  • the fields can be selected to include, for example, information such as "key” fields like target, time, source, etc.
  • targets are identified in the system.
  • the targets are individual components within the customer environment that that contain logs. These targets are associated with specific components/hosts in the customer environment.
  • Example targets include hosts, database application, middleware applications, and/or other software applications, which are associated with one or more logs one or more hosts. More details regarding an approach to specify targets are described below.
  • an association is made between a target and a rule.
  • Metadata may be maintained in the system to track the associations between a given target and a given rule.
  • a user interface may be provided that allows a user to see what targets a selected rule is associated with and/or to add more associations, where the associations are the way the rule becomes active by associating the rule against a real target.
  • log collection and processing are performed based at least in part upon the association between the rule and the target.
  • target- based configuration may involve various types of configuration data that is created at both the server-side and the target-side to implement the log collection as well as log processing.
  • target- based configuration may involve various types of configuration data that is created at both the server-side and the target-side to implement the log collection as well as log processing.
  • the log analytics user can be insulated from the specifics of the exact hosts/components that pertain to the logs for a given target.
  • This information can be encapsulated in underlying metadata that is maintained by administrators of the system that understand the correspondence between the applications, hosts, and components in the system.
  • log processing can also be configured by associating a log source to a target.
  • Fig. 6 shows a flowchart of an approach to implement a log collection configuration by associating a log source with a target.
  • the log source defines where log files are located and how to read them.
  • the log source may define a source type that indicates how the source content is gathered.
  • the following are example source types: (a) File - identifies a readable file from the OS level that can be accessed using regular OS-level file operations; (b) Database Table - a table that stores log entries (e.g.: database audit table); (c) Windows Event System - an API that provides access to event records.
  • One or more source names may be defined for the log source.
  • the log source may be associated with a description of the source. It is noted that log sources can also be used when creating log monitoring rules (as described above).
  • the log source may also be associated with a file pattern and/or pathname expression.
  • "/var /log/messages*" is an example of a file pattern (that may actually pertain to a number of multiple files).
  • file patterns one reason for their use in the present log analytics system is because it is possible that the exact location of the logs to monitor varies. Some of the time, a system will expect logs to be in a particular place, e.g., in a specific directory. When the system is dealing with a large number of streaming logs, it may not be clear which directory the logs are expected to be in. This prevents a system that relies upon static log file locations to operate correctly. Therefore, the file pattern is useful to address these possibly varying log locations.
  • a log source is created by specifying a source name and description for the log source.
  • the definition of the log source may comprise included file name patterns and excluded file name patterns.
  • the file name patterns are patterns that correspond to files (or directories) to include for the log source.
  • the excluded file name patterns correspond to patterns for files (or directories) to explicitly exclude from the log source, e.g., which is useful in the situation where the included file name pattern identifies a directory having numerous files, and some of those files (such as dummy files or non-log files) are excluded using the excluded file name pattern.
  • the system captures the pattern string, the description, and the base parser (log type) that will be used to parse the file.
  • the base parser may define the basic structure of the file, e.g., how to parse the data, hostname, and message from the file.
  • the definition of the log source may also specify whether the source contains secure log content. This is available so that a source creator can specify a special role that users must have to view any log data may be captured. This log data may include security-related content that not any target owner can view.
  • the log rules may reference log sources, and vice versa.
  • the system metadata tracks these associations, so that a count is maintained of rules that are currently using sources. This helps with understanding the impact if a source and/or rule is changed or deleted.
  • targets are components within the environment that that contain, correspond, and/or create logs or other data to be processed, where the targets are associated with specific components/hosts in the customer environment.
  • Example targets include hosts, database application, middleware applications, and/or other software applications, which are associated with one or more logs one or more hosts.
  • an association is made between a target and a source.
  • Metadata may be maintained in the system to track the associations between a given target and a given source.
  • a user interface may be provided that allows a user to see what targets a selected source is associated with and/or to add more associations.
  • the association of the target to the source creates, at 608, a specific instance of the log source.
  • a log source that generically specifies that a given file is located at a given directory location (e.g., c:/log_directory/log_file). It may be the case that any number of servers (Server A, Server B, Server C, Server D) within a customer environment may have a copy of that file (log file) in that directory (c:/log directory).
  • log collection and processing are performed based at least in part upon the association between the rule and the log source.
  • target-based configuration may involve various types of configuration data that is created at both the server-side and the target-side to implement the log collection and processing activities.
  • Associating rules/sources to targets provides knowledge that identifies where to physically enable log collections via the agents. This means that users do not need to know anything about where the targets are located. In addition, bulk association of rules/sources to targets can be facilitated. In some embodiments, rules/sources can be automatically associated to all targets based on the configuration. As noted above, out-of-the-box configurations can be provided by the service provider. In addition, users can create their own configurations, including extending the provided out-of-the-box configurations. This permits the users to customize without building their own content.
  • Fig. 7 shows a flowchart of an approach to implement target-based configuration for log monitoring. This process generates the creation, deployment, and/or updating of
  • configuration materials for log monitoring are embodied as configuration files that are used by the log monitoring system to manage and implement the log monitoring process.
  • target-based processing is initiated.
  • Example approaches for initiating target- based processing includes, for example, installation of a log analytics agent onto a specific log collection location.
  • the target-based processing pertains to associations made between one or more targets and one or more log sources and/or rules.
  • configuration materials are generated for the target-based processing.
  • the target-based configuration file is implemented as configuration XML files, although other formats may also be used to implement the configuration materials.
  • the target-based configuration file may be created at a master site (e.g., to create a master version 704), with specific versions then passed to both the server side and the target side.
  • the target-side materials 708 may comprise those portions of the configuration details that are pertinent for log collection efforts. This includes, for example, information about log source details and target details.
  • the server-side materials 706 may comprise portions of the configuration details that are pertinent to the server-side log processing. This includes, for example, information about parser details.
  • a database at the server maintains a master version and a target version of the configuration materials.
  • the target version includes configuration details that are pertinent to log collection efforts, and is passed to the customer environment to be used by the agent in the customer environment to collect the appropriate log data from the customer environment.
  • the master version includes the full set of configuration details needed at the server, and becomes the 'server side" materials when selected and used for processing at the server. This may occur, for example, when the log data collected at the targets are passed to the server, where the transmission of the log data includes an identifier that uniquely identifies the target-side materials used to collect the log data (e.g., the configuration version or "CV" number 903 shown in the example targets-side materials of Fig. 9).
  • the identifier is used to determine the corresponding master version of the materials that have the same identifier number (e.g., as shown in field 1003 in the example server-side materials of Fig. 10). That master version is then used as the server-side materials to process the received log data. Therefore, in this embodiment, the master version 704 and the server-side materials 706 are identical, but having different labels depending upon whether the material is currently in-use to process the log data. In an alternative embodiment, the master version may differ from a server version, e.g., where the materials are used on multiple servers with different configuration details.
  • the configuration materials are then distributed to the appropriate locations within the log processing system.
  • the target-side materials 708 are distributed to the customer system as the sniffer configuration files 332 shown in Fig. 3A.
  • the materials are "distributed" as the log configuration files 111 shown in Fig.1 A, where the distribution does not actually require the materials to be distributed across a network, but merely indicates that the materials are obtained from another component within the server (e.g., on an as-needed basis).
  • log collection processing is performed at the target using the target-side configuration materials.
  • server-side log processing is performed using the server-side configuration materials.
  • Fig. 8 shows a more detailed flowchart of an approach to implement target-based configuration for log monitoring according to some embodiments of the invention.
  • one or more work items for processing target associations are created in the system. For example, this type of work may be created upon installation of the log analytics agent onto a target, where recognition of this installation causes a work item to be created for the target-based
  • a list of target types are identified that have at least one auto- association rule (e.g., from a database of the associations).
  • a list of targets is generated for which there is a need to be associated with auto-enabled rules.
  • One or more consumer/worker entities may wake up periodically to process the work items. For example, a worker entity (e.g., thread or process) wakes up (e.g., every 10 seconds) to check whether there are any pending association tasks. The set of one or more workers will iterate through the tasks to process the work in the queue.
  • a worker entity e.g., thread or process
  • wakes up e.g., every 10 seconds
  • one of the workers identifies an association task to process.
  • the association request is processed by accessing information collected for the rules, sources, parsers, fields, and/or target. This action identifies what target is being addressed, finds that target, and then looks up details of the log source and/or log rule that has been associated with the target.
  • the worker then generate configuration content for the specific association task that it is handling.
  • the configuration content is embodied as XML content.
  • This action creates both the target-side details and the server-side details for the configuration materials.
  • this action will create configuration data for the server to process collected log data.
  • parser details in XML format are created for the server-side materials for the log data expected to be received.
  • this action will create configuration data for log collection from the target.
  • variable pathnames e.g., having variables instead of absolute pathnames
  • the same configurati on/XML file can be used to address multiple associations. For example, if multiple targets are on the same host, then a single configuration file may be generated for all of the targets on the host. In this case, step 808 described above appends the XML content to the same XML file for multiple iterations through the processing loop.
  • Updates may occur in a similar manner.
  • a change occurs that requires updating of the materials, then one or more new association tasks may be placed onto a queue and addressed as described above.
  • de-associations may also occur, e.g., where the log analytics agent is de-installed. In this situation, the configuration files may be deleted.
  • a target is deleted, a message may be broadcast to notify all listeners about this event by a target model service, which may be consumed to delete the corresponding associations and to update the XML content.
  • Fig. 9 illustrates example XML configuration content 900 according to some embodiments of the invention.
  • This is an example of target-side content that may be placed on the host that holds the target.
  • This XML configuration content 900 defines a rule to collect Linux system message logs with file pattern "/var/log/messages*" on host XYZ.us.oracle.com.
  • Portion 902 identifies a base parser for the association being addressed.
  • Portion 903 provides an identifier for the version number ("configuration version" or "CV") of the content 900, which is used to match up against the corresponding server-side materials having the same version number.
  • Portion 904 identifies the ID of a log rule.
  • Portion 906 identifies a specific target.
  • Portion 908 identifies a target type.
  • Portion 910 identifies a source type.
  • Portion 912 identifies a parser ID for the source. The logs will be parsed based on some defined parser. Such configuration files reside on sniffers and the log collection processes collect logs based on the defined log sources.
  • the FieldDef portion 1001 indicates the data type for the service.
  • the Log Source portion 1002 indicates the logs are of "os file" type.
  • the BaseParse portion 1004 defines the way to parse the log entries based on defined regular expressions in portion 1006.
  • Portion 1003 provides an identifier for the version number of the content 1000, which is used to match up against the corresponding target-side materials having the same version number.
  • target-source manual associations may also be performed.
  • a user interface may be provided to perform the manual associations. This also causes the above-described actions to be performed, but is triggered by the manual actions.
  • Re-syncshronization may be performed of target-source associations.
  • monitored targets connected through the agent can be associated with certain pre-defined log sources Similarly, when the agent is de- installed, such associations can be deleted from the appropriate database tables.
  • the target when a target is added to be monitored by an agent, the target can be associated with certain predefined log sources for that target type, and when the target is deleted from an agent, such association can be deleted from database tables.
  • a web service is provided in some embodiments to synchronize the associations periodically.
  • only the auto-associations are synched, and not the manual associations customized by users manually.
  • Associations may be performed for a specific log analytics agent.
  • a delta analysis can be performed between targets in a data model data store and targets in a log analytics data store to implement this action. Processing may occur where: (a) For targets in data model data store but not in log analytics data store, add associations for these targets; (b) For targets not in data model data store but in log analytics data store, delete associations for these targets; (c) For targets in data model data store and log analytics data store, keep the same associations for these targets in case of user customization.
  • One potential issue for adding associations pertains to the situation where a user may have deleted all associations for a particular target so there is no entry in the log analytics data store, but there is an entry in the data model data store. The issue is that when applying the above approach, the auto-associations not wanted could be brought in again after the synchronization operation. To avoid this, the system can record the user action to identify the potential issue.
  • associations may be synchronized for a specified tenant.
  • delta analysis can be performed between the agent for the data model data store and agent for the log analytics data store. Processing may occur by: (a) For an agent in the data model data store but not in the log analytics data store, add associations for these agents; (b) For agents not in the data model data store but in the log analytics data store, delete associations for these agents; (c) For agents in the data model data store and the log analytics data store, perform the same delta analysis and synchronization as described above.
  • Synchronization may be performed for associations for all tenants. When this action is performed, it should perform agent-level synchronization as described for each tenant. [0127] Turning the attention of this document to file patterns, one reason for their use in log analytics systems is because it is possible that the exact location of the logs to monitor varies. Most of the time, a system will expect logs to be in a particular place, in a specific directory. When the system dealing with a large number of streaming logs, it may not be clear which directory the logs are expected to be in. This prevents a system that relies upon static log file locations from operating correctly.
  • the inventive approach in some embodiments can associate log analysis rules to variable locations.
  • One approach is to use metadata that replaces variable parts that correspond to locations for the log files.
  • a path expression is used to represent the pathname for the log files, where the path expression includes a fixed portion and a varying portion, and different values are implemented for the variable part.
  • the placeholder for location is eventually replaced with the actual location in the directory path.
  • Some embodiments provide for "parameters", which are flexible fields (e.g., text fields) that users can use in either the include file name patterns or exclude file name patterns.
  • the parameters may be implemented by enclosing a parameter name in curly brackets ⁇ and ⁇ . A user-defined default value is provided in this source. A user can then provide a parameter override on a per target basis when associating a log monitoring rule using this source to a target.
  • the overrides are particularly applicable, for example, with regards to changes from out- of-the-box content (e.g., to override rules, definitions, etc. without actually changing the OOTB content). This is implemented, for example, by implementing a mapping/annotation table that includes the user overrides and indicate of an override for the OOTB content.
  • ORACLE HOME is a built-in parameter that does not need a default to be set by the user.
  • the system could be provided with a list of fixed parameters that
  • Fig. 11 shows a flowchart of one possible approach to implement this aspect of some embodiments of the invention.
  • identification is made of location content for which it is desirable to implement variable location processing. This situation may exist, for example, when the system is handling a large number of streaming logs from possibly a large number and/or uncertain of directory locations.
  • the log data may be located at target locations that are addressed using a pathname that varies for different database targets.
  • a path is specified for the target locations having a fixed portion and a varying portion.
  • the varying portion may be represented with one or more parameters.
  • the one or more parameters are replaced with values corresponding to one or more target log files, wherein a single rule for implementing log monitoring is associated with multiple different targets to be monitored.
  • configuration information from the log analytics system can be coupled to this approach to configure and setup the rules for identifying log file assignments.
  • configuration information examples include, for example, how a database is connected, how the components are connected, which datacenter is being used, etc.
  • Some embodiments specify how to map sources to targets based on their
  • association types may include, for example, "contains”, “application_contains”, “app_composite_contains”, “authenticated__by”,
  • the target relationship information/model can be used in other ways as well.
  • the target model can also be used to help correlate log entry findings to aid in root cause analysis.
  • the host model can be used for comparing all hosts in one system. For instance, if there are a number of databases in a first system, this feature can be used to see logs across these systems together, and in isolation from databases used for a second system.
  • Fig. 12 illustrates an architecture for implementing some embodiments of the inventive approach to associate log analysis rules to variable locations.
  • the log analytics engine 1202 operates by accessing log collection configuration files 1211.
  • Log collection configuration files 1211 is implemented to represent a path where the target location may have both a fixed portion and a varying portion.
  • the varying portion may be represented with one or more location parameters.
  • different locations may exist for logs 1202a, 1201b, and 1201c.
  • the specific location for the log of interest may be selected by the log analytics engine 1202, and processed to generate analysis results 1213.
  • the reference material 1210 may be accessed to identify the correct
  • Any suitable type of reference materials may be implemented.
  • a defined source Sourcel can be assigned to all related targets belonging to a certain system, and/or an association type and/or rule can be used as well.
  • target relationship information/models can be employed as well as the reference material.
  • Embodiments of the invention therefore provides improved functionality to perform target-based log monitoring. Two possible use cases this functionality includes log monitoring and ad hoc log browsing.
  • Log monitoring pertains, for example, to the situation where there is continuous monitoring and capture of logs.
  • Some embodiments of log monitoring pertains to the some or all of the following: (a) monitor any log for any target and capture significant entries from the logs; (b) create events based on some log entries; (c) identify existence of log entries that can affect a compliance score; (d) perform user as well as integrator defined monitoring; (e) capture log entries that are not events to enable analytics on a subset of all logs; (f) use cases such as intrusion detection, potential security risk detection, problem detection; (g) implement long term persistent storage of log contents; (h) search for log content; (i) customizable search- based views; (j) log anomaly detection and scoring
  • Ad hoc log browsing pertains, for example, to the situation where there is not continuous monitoring of logs.
  • the user can browse live logs on a host without having to collect the logs and send them up to the SaaS server.
  • the model for configuring what to monitor is similar to what was described earlier.
  • the difference pertains to the fact that the user can select a rule, source, and some filters from the UI and the search is sent down to agent to obtain log files that match and bring them back, storing them in a temporary storage in the server.
  • the user can continue to narrow their search down on that result set. If the user adds another target, rule, or extends the time range, the system goes back to the agent to obtain only the delta content, and not the entire content again.
  • the user can therefore get the same benefits of log analytics without configuring continuous log monitoring.
  • the feature can be very low- latency since the system only needs to go back to get more data from agent when the search is expanded. All searches that are narrowing down current result set goes against the data that have been cached from a previous get from the agent.
  • the embodiments of the invention can be used to store log data into a long-term centralized location in a raw/historical datastore.
  • target owners in the company IT department can monitor incoming issues for all responsible targets. This may include thousands of targets (hosts, databases, middle wares, and applications) that are managed by the SaaS log analytics system for the company. Many log entries (e.g., hundreds of GB of entries) may be generated each day. For compliance reasons, these logs may be required to be stored
  • the data center manager may wish to obtain some big pictures of them in long run and IT administrators may wish to search through them to figure out some possible causes of a particular issue.
  • a very large amount of logs could be stored in a centralized storage, on top of which users can search logs and view log trends with acceptable performance.
  • the log data can be stored in an off-line repository. This can be used, for example, when data kept online for a certain period of time, and then transferred offline. This is particularly applicable when there are different pricing tiers for the different types of storage (e.g., lower price for offline storage), and the user is given the choice of where to store the data. In this approach, the data may held in offline storage may be brought back online at a later point in time.
  • the logs can be searched to analyze for possible causes of issues. For example, when a particular issue occurs to a target, the target owner can analyze logs from various sources to pinpoint the causes of the issue. Particularly, time-related logs from different components of the same application or from different but related applications could be reported in a time- interleaved format in a consolidated view to help target owner to figure out possible causes of the issue. The target owner could perform some ad-hoc searches to find same or similar log entries over the time, and jump to the interested log entry, and then drill down to the detailed message and browse other logs generated before/after the interested point.
  • restrictions can be applied such that users have access only to logs for which access permissions are provided to those users.
  • Different classes of users may be associated with access to different sets of logs.
  • Various roles can be associated with permissions to access certain logs.
  • Some embodiments can be employed to view long-term log distribution, trends, and correlations. With many logs generated by many different targets and log sources over long time, data center managers may wish to view the long-term log distributions and patterns.
  • Some embodiments can be employed to search logs to identify causes of an application outage.
  • an IT administrator or target owner of a web application receives some notification that some customers who used the application reported that they could not complete their online transactions and the confirmation page could not be shown after the submit button was clicked.
  • the IT administrator can search the logs generated by the application with the user name as key and within the issue reporting time range. Some application exception may be found in the log indicating that some database error occurred when the application tried to commit the transaction.
  • the IT administrator could browse the logs around the application exception time to find some database errors, which was related for example to some hosting server partial disk failure and high volume of committing transactions.
  • a data center manager may define some tags for logs collected in the data center, such as security logs for production databases, security logs for development servers, logs for testing servers, noise logs, etc.
  • the data manager may be interested, for example, in knowing the followings: log distributions by these tags over the past half year, their daily incoming rates during last month, and whether there are any correlations between the security log entries for production databases and the changes of their compliance scores during a given time period.
  • Some embodiments permit log data to be stored as metrics.
  • the system will store several log fields as key fields.
  • the key fields will include (but may not be limited to): Time, Target, Rule, Source, and Log File.
  • the system may also create a hash or GUID to distinguish possible log entries that have the same time and all other key fields.
  • a metric extension is created and deployed. This metric extension will be named similar to the rule to make it easy for the user to reference it.
  • the log monitoring rule has a possible action to create an event when a log entry matches the condition of the rule. Additionally, users will be able to indicate that this event should also trigger a compliance violation which will cause an impact on the compliance score for a compliance standard and framework.
  • one possible use case is to provide a log browser, e.g., where browsing is employed to browse live logs on a host without collecting the logs and sending them to a SaaS Server.
  • the user can select a rule, source, and some filters from the UI and the search is sent down to agent to obtain log files that match and bring them back, storing them in a temporary storage in the server.
  • One use case for this feature is to allow users to browse a short time period of log files across multiple targets in a system to try to discover a source of a problem, especially when there is a rich topology mapping and dependency mapping of the customer's environment. This content can be used to help find related elements and show the logs together. This allows the users to see logs for all targets related to a given system for instance and see what happened across all targets in time sequence. In many cases, when there is a target failure, it may be a dependent target that is experiencing the problem, not the target that is failing.
  • the user may choose to start a new log browsing session in context of a
  • the target home page context is to be retained. This means that the outer shell of the page still belongs to the target home page, and just the content panel will contain the browse UI functionality.
  • multiple row-content can be provided per entry to show additional details per row. This is one row at a time, or the user could decide to perform this for all rows. Sorting can be provided on the parsed fields, but in addition, can be used to see additional details per row (including the original log entry).
  • Search filters can be provided.
  • a search filter in the form of a date range can be provided, e.g., where the options are Most Recent, and Specific Date Range. With the Most Recent option, the user can enter some time and scale of Minutes or Hours. With the
  • Targets, Sources, and Filters can be specified. These allow the users to select what they want to see in this log browsing session. After the user has selected the targets, sources, and applied any filters, they can begin the browse session to initiate retrieval of the logs from various targets and ultimately have them shown on the interface.
  • Search queries can be implemented in any suitable manner.
  • natural language search processing is performed to implement search queries.
  • the search can be performed across dependency graphs using the search processing.
  • Various relationships can be queried in the data, such as "runs on”, “used by”, “uses", and "member of.
  • the search query is a text expression (e.g., based on Lucene query language). Users can enter search query in the search box to search logs.
  • search query The following are example of what could be included in the search query: (a) Terms; (b) Fields; (c) Term modifiers; (d) Wildcard searches; (e) Fuzzy searches; (d) Proximity searches; (f) Range searches; (g) Boosting a term; (h) Boolean operators; (i) Grouping; (j) Field grouping; (k) Escaping special characters.
  • a tabular view can be provided of the search findings.
  • Some query refinement can be performed via table cells to allow users to add/remove some field-based conditions in the query text contained in the search box via UI actions. For example, when a user right-mouse clicks a field, a pop-up provides some options for him/her to add or remove a condition to filter the logs during the searches. This is convenient for users to modify the query text, and with this approach, users do not need to know the internal field names to be able to refine the query at field level.
  • a basic field shuttle is used to list all defined fields.
  • Some example fields that can be defined by the log entry metadata include: (a) Log file; (b) Entry content; (c) Rule name; (d) Source name; (e) Parser name; (f) Source type; (g) Target type; (h) Target name.
  • the values of these fields can be obtained from the agent with log entry (although source, parser, rule, target are all GUIDs/IDs) that will need to be looked up at display time.
  • the top n fields (e.g., 10) will be shown that would be suggested as making the most difference for that search.
  • a "more fields" link will lead to a popup for users to select other fields. Users can see more information of those fields on the popup than form the View menu.
  • the system could use any suitable algorithm, for example, to assign a number to each field that is influenced by how many rows in the search results having non-null value, or how many different values there are across all search results for that field, etc.
  • features include clickable bar charts and table pagination. With these navigation features, plus customizable time range, users should be able to jump to some interested point quickly.
  • some embodiments provide for drilling up from details to higher levels so users can easily navigate to desired log entries via bar graphs. An example use case is: after users drill down a few levels they may want to drill up back to a previous level to go down from another bar. After users identify an interested log entry via some searches, they likely want to explore logs from a particular log source around the interested log entry, or explore logs from multiple log sources around the interested log entry in time-interleaved pattern. Some embodiments provide an option for users to browse
  • a graphical view can be provided of the search findings. This allows the user to pick fields to render the results graphically.
  • Some embodiments pertain to improved techniques to address log distributions, trends, and correlations. For search findings resulted from a particular search, distributions can be based on log counts to give users some high-level information about the logs. For each distribution type, the top n (e.g., 5 or 10) items are listed with number of found logs (where a
  • target type By target type; (b) By target, such as target owner and/or lifecycle status; (c) By log source; (d)
  • the system can also provide options for users to switch between table view and the corresponding distribution chart view.
  • results can be filtered by selecting distribution items.
  • Users can filter the results table by selecting one or more distribution items. By default, all distribution items are selected and all log entries are listed in the results table. After selecting one or more distribution items, users can navigate the log entries via pagination. With one or more distribution items selected, when users click the search button for a new search, the selections of distribution items will be reset to be selected for all distribution items.
  • Some embodiments provide a feature to show search finding trends. Some embodiments provide a feature to show search finding correlations. Related to this feature, some embodiments provides launching links for users to navigate to search/view detailed logs when they perform correlation analysis among events, metrics, and infrastructure changes.
  • Launching links could be provided, e.g., for users to navigate to an IT analytics product to analyze/view detailed events/metrics when they wish to see some bigger pictures related to the logs here.
  • Another feature in some embodiments pertains to process-time extended field definitions. Even with the same baseline log type, it is possible for individual log entries to contain inconsistent information from one log to the next. This can be handled in some embodiments by defining base fields common to the log type, and to then permit extended field definitions for the additional data in the log entries.
  • a source definition defines log files to monitor.
  • the log files are parsed into their base fields based on the log type definition.
  • the base fields that are parsed from the log entries are Month, Day, Hour, Minute, Second, Host, Service, Port (optional), and Message.
  • the goal is to extract IP address and Port out of the second log entry. This goal may not be obtainable in certain implementations as part of the log type, e.g., since not every log entry has this structure.
  • the Message field for the second entry has the following content:
  • a definition is made for an Extended Field Definition on the Message field using a format such as:
  • the processing for implementing process-time extended field definitions comprises: identifying one or more log files to monitor, wherein some of the entries in the one or more log files may include additional data that does not exist in other entries or is inconsistent with entries in the other entries, such as an additional IP address field in one entry that does not appear in another entry; identifying a source definition for one or more log files to monitor; parsing the one or more log files into a plurality of base fields using the source definition; defining one or more extended fields for the one or more log files; and extracting the one or more extended fields from the one or more log files.
  • some embodiments permit the user to add extended field definitions. These are defined patterns that are seen within a field. A user could perform a create-like on a source and then the source and all extensions will become a new user-created source.
  • the extended field definition defines new fields to create based on the content in a given file field.
  • the extended field definitions (and tagging) can be applied retroactively. This allows past log data to be processed with after-defined field definitions and tags.
  • Fig. 14 shows some example field definitions 1302.
  • the user is specifying to look at the "Message" file field that comes from the log entry and is parsed by the file parser.
  • This Message field will have text in it, but the user has identified that they want to capture the SIGNALNAME part of the message as a new field for this specific message.
  • This new field (SIGNALNAME) can now become viewable in the captured log entries, viewable in the Log Browser, and can also be stored as part of a metric if a rule is created to do so.
  • the extended field definition uses the entire contents of the Message in this example. The user could bind either side of their expression with a wildcard pattern.
  • the definition could have been simply "*sending a ⁇ SIGNALNAME ⁇ ".
  • the text that is shown is known to be static text that never changes for this log message.
  • the use of [0-9]* in the expression means that any number of numeric characters can be located here, but they will just be ignored (since there is no field name associated to name this field.
  • the text that comes after the string "sending a" will get assigned to the variable SIGNALNAME.
  • the last entry is another example where the user has defined two new fields and in the first field, they have also defined the way to get this content using a regular expression.
  • HOSTNAME Everything that matches that expression should be added to a new extended field called the HOSTNAME. Anything after the first period will be put into a new extended field called DOMAINNAME.
  • the HOST field which came from the file parser will still have all of the content, but this extended field definition is telling our feature to add two NEW fields in addition to the HOST field (HOSTNAME and DOMAINNAME).
  • Some embodiments provide the ability to define regular expressions and save them with a name.
  • the regular expression for hostname used above is [a-zA-Z0-9 ⁇ -]+.
  • One example of a saved regular expression may be:
  • the new fields that will be created are HOSTNAME and DOMAINNAME.
  • the referenced regular expression that was created and saved is called IP Address.
  • Extended expression definitions can be evaluated at the agent (e.g., using a Perl parsing engine) directly with minor changes to the input string from the user.
  • field reference definitions can be provided. This provides a feature where users can provide a lookup table of a SQL query to transform a field which may have a not-easily-readable value into more human readable content.
  • Three example use cases highlight this need: (a) In a log entry, there may be an error code field (either a core field or an extended field) that simply has a number, where the user can provide a lookup reference so that the system adds another new field to store the textual description of what this error code means; (b) In a log entry, there may be a field (either a core file field or an extended field) that has the GUID of a target, and the system can provide a lookup using a SQL query to a target table that will create another new field that stores the display name of the target; (c) IP to hostname lookup may also be performed as a common use case, where in a log, there may be IP addresses for clients, where the IP addresses are used to look up hostnames
  • log types may also be defined to parse the log data.
  • One example log type pertains to the "Log Parser”, which is the parser that can be used to parse the core fields of the source.
  • Another example log type pertains to a "Saved Regular Expressions", which can be used when defining extended field definitions.
  • a hostname can be defined via a regular expression as "[a-zA-Z0-9 ⁇ -]+”. This regular expression can be saved with a name and then used a later time when creating extended field definitions.
  • a log parser is a meta-data definition of how to read a log file and extract the content into fields. Every log file can be described by a single parser to break each log entry into its base fields.
  • the log type may correspond to a parse expression field, such as for example, a Perl regular expression for parsing a file.
  • a parse expression field such as for example, a Perl regular expression for parsing a file.
  • Some fields may be very complex, meaning that the field will actually contain additionally structured content for some log entries but not for others. These may not be handled by the log file parser in some embodiments because it is not consistent in every line. Instead, when defining a source, extended fields can be defined to break this field into more fields to handle these cases.
  • Profiles can be implemented for various constructs in the system, such as parsers, rules, and sources.
  • the profiles capture differences between different usages and/or versions of data items and products for users.
  • a source profile can be created that accounts for different versions of a user's products that are monitored, e.g., where a source profile changes the source definition between version 1 and version 2 of a database being monitored.
  • Rule profiles may be used to account for differences in rules to be applied.
  • parser profiles can be provided to adjust parsing functionality, e.g., due to difference in date formats between logs from different geographic locations. Different regular expressions can be provided for the different parser profiles.
  • log files can have content that is always known to be one row per entry (syslog), or can have content that can span multiple lines (Java Log4j format).
  • the Log Entry Delimiter input lets the user specify to always parse this log file as one row per entry, or to provide a header parse expression that tells us how to find each new entry.
  • the entry start expression will typically be the same as the first few sections of the parse expression. The system uses this expression to detect when a new entry is seen versus seeing the continuation of the previous entry.
  • the entry start expression may be:
  • a table is maintained corresponding to parsed fields, and which starts empty (no rows) as the parse expression is empty.
  • the fields being defined are added to this table. This can be implemented by monitoring the text entered in this field and when a ')' is added, a function is called to determine how many fields have been defined. The system can ignore some cases of ( and ), e.g., when they are escaped or when they are used with control characters.
  • a log parser is typically constructed in a manual process by a person that must be both knowledgeable about the exact format of the log file to be analyzed, as well as skilled in the specific programming infrastructure that would be used to implement the parser.
  • This highly manual process requires significant amounts of time and resources from skilled technology personnel, both upfront to create the parser, as well as on an ongoing basis to maintain the parsers in the face of possible changes to the log file formats.
  • this manual approach necessarily requires a priori knowledge of the log file formats, which may not always be available before the log files start streamlining into the log analytics system.
  • the lack of a suitable parser could potentially bring the log analysis pipeline to a halt with respect to analysis of the affected log data.
  • Some embodiments of the invention solve these problems by providing an approach to automatically construct a log parser. Instead of requiring a person to manually create the contents of the log parser, the log contents themselves are used to construct he parser.
  • Fig. 15 shows a high level flowchart of an approach to implement this embodiment of the invention.
  • one or more lines of a log file are received for processing.
  • the log parser is constructed as each line of the log file is received in a streaming manner. This approach permits the parser to be constructed in real time as the contents of the log file are received.
  • a number of lines from the log file may be collected together before processing those lines. This approach may be useful to perform batch processing on the log file lines, e.g., to implement certain types of processing such as clustering or grouping analysis that may need a collected set of a minimum number of log file lines before processing.
  • the lines from the log file are analyzed.
  • the analysis is performed to identify the specific contents and differentiated sections within the log file lines.
  • additional lines are processed and more information is obtained about the lines, a greater level of certainty can be obtained of the basic structure of the log file lines. It is noted that the number of lines that needs to be reviewed to generate an accurate parser depends upon the complexity and content of the log file lines. However, using the techniques described herein, many log files may only need 10-20 lines (or even less) to be analyzed to construct an acceptably accurate log parser.
  • This ability to generate a log parser based upon review of a relatively small number of lines permits the log parser generation processes to be performed in a very time-efficient manner, and therefore improves the functioning of the computing system itself since it allows the log parser generation process to be performed in real-time as the log file data is streamed into the log analytics system.
  • the parser is then constructed based upon analysis of the lines from the log files. This is performed, for example, by scanning the contents of one or more sets of logs to construct a regular expression to parse the logs.
  • the present embodiment operates by walking through a selected set of the lines to identify commonalities between the lines, and to then construct a regular expression that can be used to generally parse through logs files containing similar lines of log entries.
  • Fig. 16 shows a more detailed flowchart of an approach to implement this process according to some embodiments.
  • a master list is created from the first line of the log being analyzed.
  • the master list comprises a mapping structure that maps the contents of the log file line to identified element types within the line. Examples of these element types include, for example, number/integer types, character types, and string types.
  • the analysis is performed by moving through the line under analysis to compare against the master list. This action is performed to identify the variable and non-variable parts of the line(s) being analyzed. This can be performed by starting from beginning of line, and moving forward until there is a mismatch. At this point, the process finds the next common character(s). One of the identified common characters is considered a
  • delimiter so that the intervening range is marked as variable. It is noted that the intervening range may be variable in size between the two lines, and so the algorithm should be robust enough to handle this.
  • An example algorithm for identifying the common parts that should be considered the delimiter is described in more detail below. The process loops through until the end of line is reached.
  • the master line can then be updated to reflect the common portions and the variable portions.
  • the values of the variable portions can be stored if desired.
  • the updated master line is ready to be processed.
  • One example type of processing as described in more detail below is, for at least one of the variable parts, assigning the at least one variable part to a least restrictive data type that encompasses a variability of values detected in the at least one variable part.
  • commonalities can be identified between the lines to then construct a regular expression from the commonalities.
  • the regular expression can be generated for the non- variable parts with placeholders for the variable parts to implement a log parser, where at least two different placeholders are associated with different data types.
  • FIGs. 17-1 through 17-21 provide an illustration of this process.
  • the first action is to select line 1 from the log file 1702 to construct a master list
  • each portion/unit of the content within Line 1 is examined to identify a unit type (also referred to herein as a "parse unit") that is associated with the portion of the line.
  • each portion of the line is identified from one of the following parse units: (a)string- this is a default parse unit type that correspond to any type of element that may exist within a string; (b) alpha- this parse unit type corresponds to any number of contiguous alphabetic elements; (c) integer- this parse unit type corresponds to any number of contiguous integer elements; and/or (d) field rule type - this parse unit type corresponds to a type that is identified based upon a rule definition, and may correlate to complex combinations of any numbers of characters, integers, or symbols. The more restrictive the type, the more favored is the selection of that type for element(s) within the line.
  • Figs. 17-3 through 17-8 illustrate the process of constructing a master list 1704 for Line 1 of log file 1702.
  • the first character "N” is retrieved and placed into the master list in the first position.
  • the parse unit type is also identified for this character.
  • the initial parse unit type of "string” is assigned to this character, since the process does not yet have enough information to know if the variability of this element within multiple lines should cause this element to be assigned to a different parse unit type.
  • the contents of Line 2 can be analyzed on an element-by-element basis relative to the master list 1704.
  • Fig. 17-10 shows the contents of Line 2 organized on an element-by-element basis.
  • Figs. 17-11 through 17-15 illustrate this comparison analysis between line 2 and the master list 1704.
  • Fig. 17-11 shows the analysis of the first element position within line 2 against the first element position in the master list 1704.
  • the master list 1704 includes "N" in the first position, which matches the element "N” in the same position within line 2. Therefore, this shows that the master list 1704 correctly indicates that the first element of the lines has "N” as its content.
  • a comparison of the third element position indicates a difference between the content of the master list 1704 and the content of line 2.
  • the master list 1704 has "B” in the third element position
  • line 2 include “S” in the third element position. This indicates that the third element position is a variable part of the line(s).
  • next common element(s) that should be considered a delimiter between common and variable portions.
  • the ".” element in the sixth element position is the next common element.
  • An approach is described in more detail below in conjunction with Fig. 20 that can be used to identify the next common element that should be considered a delimiter. It is noted that this approach of "skipping ahead" to find the next common portion permits any varying number of characters within each of the multiple lines to be compared, since it does not matter how many characters with each line are skipped to identify the next common character.
  • the variable portions include the third element position ("B” in the master list and "S” in line 2), the fourth element position ("o” in the master list and “u” in line 2), and the fifth element position ("b” in the master list and "e” in line 2).
  • variable portion forms an analysis range where its contents can be analyzed as a collective group of elements.
  • common parse unit types may be collapsed together, e.g., for the variable portion of both the master list and line 2, this corresponds to "Bob" for the master list and "Sue” from line 2.
  • the most restrictive parse unit type that correlates to these values is the alpha parse unit. Therefore, as shown in Fig. 17-15, the individual string values for the variable portion of the master list are replaced with the alpha parse unit.
  • the parse unit definition within the master list 1704 may also be used to track the specific contents from each of the lines that have been analyzed.
  • the "Bob" and “Sue” values from both line 1 and line 2 for this element position can be included within the parse unit definition for the alpha parse unit within the master list. This results in the master list 1704 shown in Fig. 17-16.
  • One reason for tracking these values is identify content values that can later be used to construct a regular expression. Another reason is to permit reconstruction of any of the underlying lines from the master list, e.g., where the master list essentially provides a compressed collective view of every line that was used to construct the list.
  • field rule types can be constructed that associate meaningful labels to these types of sequences.
  • the field rule type may include a rule definition that correlates to combinations of characters, integers, and/or symbols associated with a given sequence of interest.
  • Figs. 17-17 through 17-19 illustrate this process of identifying a field rule parse unit for the master list.
  • Each of the field rules correspond to a different type of sequence for which there is an interest in identifying a meaningful type for that sequence.
  • Examples of field rule types include field rules for IP addresses, timestamps, identifiers, and the like.
  • Fig. 17-18 shows an example field rule 1712 that may be applicable to the variable portions of the master list.
  • the field 1712 corresponds to a type (or name/label) of "Name", e.g., where the field rule may be applicable to identify a person's name.
  • the field rule 1712 may be associated with a regular expression 1714 to identify the portions of the master list that correlate to the field rule.
  • the regular expression 1714 is "[A-Z][a-z]+". This regular expression corresponds to any sequence of characters where the first character is a capital letter (from A-Z), followed by any number of subsequent non-capital letters (from a-z).
  • the recorded set of data for the alpha parse unit of the master list matches the regular expression of this "Name” field rule.
  • the master list 1704 can thereafter be updated to replace the alpha parse unit type(s) with the field rule parse unit.
  • processing of the master list to identify field rules may be performed in a post-processing action after construction of the master list— after analysis of multiple lines from the log file.
  • the field rules may be identified as the lines are individually analyzed in a streamed manner.
  • field rules may be identified both during processing of the lines, and also afterwards in a post-processing step.
  • the field rule processing is performed only for sections of the line that are identified as variable.
  • field rules may correlate to both constant and variable parts of the line.
  • the log parser 1722 may include the regular expression that was constructed from the master list, where the regular expression correlates to a line definition for the log file that can be used to read and parse each line from the file. Other items of data/metadata may be included with the log parser 1722.
  • the log parser 1722 may be associated with a parser identification number/name, and/or associated with a file type identifier.
  • One key advantage of this approach is that diverse sub-patterns can be efficiently detected, which separately matches high-level patterns and then attempts to characterize variable portions that were not fixed parts of the high-level patterns, using sub-pattern detection.
  • An example sub-pattern is key-value pairs.
  • This approach can also be used to define skeletal parts to construct a regular expression and build a parser that is capable of assigning parts of the expression to variables. This approach can also handle patterns that are below a similarity threshold by assigning variable parts to keep items in the same log consistent if possible. In some embodiments, the parser is generated for future processing rather than just categorizing.
  • logs having any level of complexity may be processed to construct a log parser.
  • log entries which are slightly more complex examples:
  • the present embodiment operates by walking through a selected set of the lines to identify commonalities between the lines, and to then construct a regular expression that can be used to generally parse through logs files containing similar lines of log entries.
  • a master list can be constructed from the first line, which is compared against the second line. The analysis identifies the variable and non-variable parts of the line(s) being analyzed.
  • variable values can also be stored if desired.
  • the updated master line can then be processed to construct a regular expression from the commonalities. For the above example lines, the following identify the common portions, the variable portions, along with the values of the variable portions.
  • Variable section 1 ⁇ 11.22.33.44 , 10.20.30.40 ⁇
  • Variable section 2 ⁇ Bob , Sue ⁇
  • Variable section 3 ⁇ 27 , 30 ⁇
  • the first portion of the line of the IP address may correspond to the following regular expression: " ⁇ [[0-9]* ⁇ . [0-9]* ⁇ .[0-9]* ⁇ .[0-9]* ⁇ . ⁇ ]". It is noted that a field rule (defined to include this regular expression) may also be used to correlate this portion of a line to an IP address.
  • line pre-processing may be performed ahead of time to prepare the log data for processing.
  • This is exactly the same content as exists in two lines in file 1702 of Fig. 17-1, but is spread over four lines in file 1902.
  • the problem is that a log parser generator that expects each line to be separately processed as a unitary unit would fail when faced with the log file structure shown in the log file 1902 of Fig. 19, since each log entry really encompasses two lines at a time (e.g., the first two lines as a first log entry and the third and fourth lines as a second log entry).
  • Fig. 18 shows the process flow of an embodiment to address this and other types of non-standard line format situations.
  • one or more lines are identified from the log file for pre-processing.
  • the lines are analyzed for grouping purposes.
  • One possible approach that can be taken to group lines together is to check timestamps of the lines. For example, it is possible in some systems that multiple lines that relate to one another only includes a timestamp for the first line. In this situation, lines are grouped together until another timestamp is identified. Even if each line includes its own timestamp, commonality of timestamp values permits multiple lines to be identified as parts of a unitary whole. As another alternative, clustering may be performed to cluster together grouping of lines that are supposed to link with one another. Another possibility is to perform pre-classification of lines to identify the line structures to identify lines that should be grouped together.
  • the grouped lines can be considered together for log parsing purposes.
  • one possible approach is to manipulate the lines so that grouped content appears within a single line.
  • another approach is to categorize the multiple related lines into a single log entry for analysis purposes.
  • Portion 1904a of Fig. 19 illustrates the approach where grouped content is manipulated so that the content appears within a single line.
  • the intent is for the first and second lines to be grouped with one another, while the third and fourth lines form another grouping.
  • Each of these newly formed single lines are then processed for generation of a log parser.
  • a new line is not created to combine multiple related lines together. Instead, the multiple lines are merely logically grouped together as a single log entry for analysis purposes. As shown in portion 1904b, a first logical log entry 1906s is formed by the first two lines and a second logical log entry 1906b is formed by the third and fourth lines.
  • the master list described above would be constructed by walking through the elements of both lines that pertain to a common logical log entry. In this situation, the "newline" character that separates the lines within a single log entry can be considered as merely another character to be identified and processed within the master list for a given entry.
  • sequential sections of a log line can be considered as a unit when constructing a log parser. Delimiters within the line can be identified to determine the sequential sections to identify.
  • common element ".” can be identified as a delimiter that sets the variable portion preceding it (e.g., "Bob” or "Sue”) apart from other portions of the line for analysis.
  • an inventive approach is provided to identify which of one or more common elements within a line should be considered a delimiter.
  • the approach operates by walking through a line to identify common elements, where a combination of the element position and element weight are considered to determine a score for each common element. The element having the greatest score (or least score depending upon how the score is calculated) can then be identified as the delimiter.
  • Fig. 20 shows flowchart of an approach for efficiently identifying the correct delimiter elements within a set of log content according to some embodiments of the invention.
  • the line content(s) for at least two lines are walked to identify a common element that borders a variable portion of the line, e.g., from left to right within the line.
  • scoring is calculated for each of the identified common elements.
  • the position of the common element within the line is first determined for the scoring.
  • the general idea is that all else being equal, a possible delimiter that is found earlier (e.g, closer to the left side when walking from left-to-right within the line) should be the first delimiter to be considered. For example, consider the following line: "Names:Bob Joe Sam”. In this example line, there is a first space between "Bob” and "Joe", and a second space between "Joe and
  • both spaces may be delimiters, but the first space (the one to the left between “Bob” and “Joe”) should be first identified. Only afterwards, as the delimiter identification process is run again, will the second space (between "Joe” and “Sam”) be identified as the next delimiter. Therefore, when choosing between the first space and the second space, the position of the first space should receive a more prominent score than the second space. This is accomplished, at 2010, by providing a score factor determined by the position of the element within the line. For example, either a sum or an average of the position of the element can be identified and associated with the element. [0232] In addition, the type of element that is found should also be factored into the delimiter score for the element.
  • Fig. 22 shows a chart 2202 having some example weights that may be applied to the identified common element in some applications of the invention. In particular, this figure is based upon the assumption that the lower the score, the more likely an element is found to be the delimiter. Therefore, each type of element shown in Fig. 22 is associated with a weighting factor, where the element type more likely to be a delimiter has a smaller weighting factor, and the element type less likely to the delimiter is associated with a greater weighting factor.
  • a space is highly likely to be considered as a delimiter; therefore, as shown in row 2206, this element type is associated with a very small weighting factor.
  • alpha-numeric characters are considered to be among the least likely elements type to be delimiters; therefore, as shown in row 2210, this element type is associated with a very high weighting factor.
  • Weighting factors may also be associated with more complex rules that consider combinations of elements. For example, since a typical IP address has sequences of numbers interspersed with the ".” element, this means that a ".” set between two sequences of integers is less likely to be a delimiter and more likely to be part of an IP address field.
  • non- integer numbers e.g., floating point numbers
  • non- integer numbers may include a decimal between two numbers (e.g., for ".” element between two number elements in "2.3”), which also makes the ".” element unlikely to be a delimiter and more likely to be part of the numeric value in this situation.
  • this combination of elements may be associated with a rule that identifies the ".” element between two integers, where this rule is associated with a weighting factor to bias against ".” in this type of combination of elements from being a delimiter.
  • a rule for this situation can be associated with a weighting factor to bias heavily in favor of an element previously identified as a delimiter as being considered again as a delimiter.
  • a delimiter can be identified from comparing the scores of the different possible delimiters. For example, if the scoring is configured such that lower scores correspond to a greater likelihood of being a delimiter, than the element in the line(s) having the lowest calculated score would be identified as the delimiter. The process can then be repeated to identify any number of additional delimiters (if they exist) within the line.
  • Figs. 21-1 through 21-5 illustrate the delimiter identification process.
  • FIG. 21-4 illustrates the process of calculating the score for each of these elements.
  • the relative position for the element within the line is calculated for each line. For element "o”, this element exists at position 1 for each line. Therefore, the sum of these position values is 2. For element this element exists at position 3 for each line. Therefore, the sum of the position values is 6.
  • a weighting factor is identified for each element.
  • Fig. 22 shows example weighting factors that may be used in the current scoring calculations. For element "o", this is an alpha-numeric character, which is associated with a weighting factor of 100 in the chart of Fig. 22. For element this is associated with a weighting factor of 1 in the chart of Fig. 22.
  • the scores are then compared to identify the lowest score, where the element having the lowest score is considered the delimiter.
  • the ".” element has a lower score than the "o” element (6 ⁇ 200). Therefore, the ".” element is identified as the delimiter.
  • Another technique that may be applied in some embodiments is to automatically perform key value extraction from the log data. This approach is particularly useful, for example, to implement the extended field definitions that were described above.
  • the current embodiment is implemented by identifying the first and last key value pair dividers in the lines, and to then process the content in-between with split functionality to extract the key value data.
  • Fig. 23 illustrates a flowchart of an example approach to perform key value extraction.
  • the process begins at 2302 by analyzing a line and identifying the first key -value divider that it sees.
  • the process exits if no appropriate key value divider is found in the line.
  • KV key -value
  • the process then returns back to the beginning of the identified range to extract key- value content.
  • identification is made of the key to the left of the very first key value divider.
  • the value to the right of the last KV divider is identified.
  • the identified portion of the line is then parsed to identify the key values. For example, the "split" function from Java or Perl can be used to perform this action. For the current line, this action therefore identifies the key values for each of the key value pairs in the lines. This approach therefore can be used to automatically perform key value extraction.
  • the process iterates through the rest of the key value pairs in the identified range to extract the key value data for all of the key value pairs.
  • Figs. 24-1 through 24-12 illustrate this process.
  • the process begins by analyzing the line 2402 and identifying the first key-value divider that is seen.
  • a key-value pair divider can be, for example, the space between different key-value pairs.
  • the process loops through this step to find additional pair dividers. For example, as shown in
  • identification is made of the key to the left of the next KV divider.
  • the KV divider is located between “ID” and "5". The key to the left of this divider is therefore "ID”.
  • the value to the right of the KV divider is identified. In the example line, the value to the right of the divider is "5".
  • pre-processing may be employed to identify the fact that a space may exist within the value field for this key value pair, and as shown in Fig. 26-2, will identify the correct portions of the line for the various key value pairs, where the space element between "Bob” and “Smith” is considered part of the value field and not a delimiter.
  • Another approach is to perform post-processing to correct any problematic assignments of content to the key or value fields. This approach can be used, for example, to check for incorrect type(s) of values within the key value fields.
  • This approach can be used, for example, to check for incorrect type(s) of values within the key value fields.
  • a rule may be configured for the log analytics system that restricts the range of type elements within a field recognizable as a "Name” field, which excludes the "/" character from a valid name. In this situation, the post-processing would be able to identify the incorrect portion of the value field based upon the "/" character within the "Bob 11/12/2017" field.
  • the analytics system may then choose to either exclude the entire extracted key/value content, or may choose to correct the erroneous content.
  • the correction may be implemented by scanning either forwards and/or backwards within line content to identify the correct set of content to be assigned to the key and/or value field. In this case, the "Bob 11/12/2017" field would be corrected to only associate "Bob” to the value field for key "Name”.
  • Fig. 27 is a block diagram of an illustrative computing system 1400 suitable for implementing an embodiment of the present invention.
  • Computer system 1400 includes a bus 1406 or other communication mechanism for communicating information, which interconnects subsystems and devices, such as processor 1407, system memory 1408 (e.g., RAM), static storage device 1409 (e.g., ROM), disk drive 1410 (e.g., magnetic or optical), communication interface 1414 (e.g., modem or Ethernet card), display 1411 (e.g., CRT or LCD), input device 1412 (e.g., keyboard), and cursor control.
  • processor 1407 system memory 1408 (e.g., RAM), static storage device 1409 (e.g., ROM), disk drive 1410 (e.g., magnetic or optical), communication interface 1414 (e.g., modem or Ethernet card), display 1411 (e.g., CRT or LCD), input device 1412 (e.g., keyboard), and cursor control.
  • system memory 1408 e
  • computer system 1400 performs specific operations by processor 1407 executing one or more sequences of one or more instructions contained in system memory 1408. Such instructions may be read into system memory 1408 from another computer readable/usable medium, such as static storage device 1409 or disk drive 1410.
  • hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention.
  • embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software.
  • the term "logic" shall mean any combination of software or hardware that is used to implement all or part of the invention.
  • Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 1410.
  • Volatile media includes dynamic memory, such as system memory 1408.
  • Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, cloud-based storage, or any other medium from which a computer can read.
  • execution of the sequences of instructions to practice the invention is performed by a single computer system 1400.
  • two or more computer systems 1400 coupled by communication link 1415 may perform the sequence of instructions required to practice the invention in coordination with one another.
  • Computer system 1400 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 1415 and communication interface 1414.
  • Received program code may be executed by processor 1407 as it is received, and/or stored in disk drive 1410, or other non-volatile storage for later execution.
  • Data may be accessed from a database 1432 that is maintained in a storage device 1431, which is accessed using data interface 1433.

Abstract

L'invention concerne un système, un procédé et un produit-programme d'ordinateur permettant de mettre en œuvre un procédé et un système d'analyse de journal qui peuvent configurer, collecter et analyser des enregistrements de journal de manière efficace. Selon l'invention, cette approche améliorée permet de générer automatiquement un analyseur de journal par analyse du contenu des lignes d'un journal. De plus, cette approche efficace permet d'extraire un contenu de valeur clé à partir du contenu du journal.
PCT/US2016/025739 2015-04-03 2016-04-01 Procédé et système de mise en œuvre d'un analyseur de journal dans un système d'analyse de journal WO2016161381A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP16716414.4A EP3278243A1 (fr) 2015-04-03 2016-04-01 Procédé et système de mise en uvre d'un analyseur de journal dans un système d'analyse de journal
CN201680029404.0A CN107660283B (zh) 2015-04-03 2016-04-01 用于在日志分析系统中实现日志解析器的方法和系统
CN202111494485.0A CN114168418A (zh) 2015-04-03 2016-04-01 用于在日志分析系统中实现日志解析器的方法和系统

Applications Claiming Priority (14)

Application Number Priority Date Filing Date Title
US201562142987P 2015-04-03 2015-04-03
US62/142,987 2015-04-03
US15/089,049 2016-04-01
US15/089,049 US9767171B2 (en) 2015-04-03 2016-04-01 Method and system for implementing an operating system hook in a log analytics system
US15/089,129 2016-04-01
US15/089,226 US11226975B2 (en) 2015-04-03 2016-04-01 Method and system for implementing machine learning classifications
US15/089,180 US10366096B2 (en) 2015-04-03 2016-04-01 Method and system for implementing a log parser in a log analytics system
US15/088,943 2016-04-01
US15/089,180 2016-04-01
US15/089,005 2016-04-01
US15/089,005 US10585908B2 (en) 2015-04-03 2016-04-01 Method and system for parameterizing log file location assignments for a log analytics system
US15/088,943 US10592521B2 (en) 2015-04-03 2016-04-01 Method and system for implementing target model configuration metadata for a log analytics system
US15/089,226 2016-04-01
US15/089,129 US10891297B2 (en) 2015-04-03 2016-04-01 Method and system for implementing collection-wise processing in a log analytics system

Publications (1)

Publication Number Publication Date
WO2016161381A1 true WO2016161381A1 (fr) 2016-10-06

Family

ID=55752787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/025739 WO2016161381A1 (fr) 2015-04-03 2016-04-01 Procédé et système de mise en œuvre d'un analyseur de journal dans un système d'analyse de journal

Country Status (1)

Country Link
WO (1) WO2016161381A1 (fr)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018111355A1 (fr) * 2016-12-15 2018-06-21 Nec Laboratories America, Inc. Détection d'anomalies de niveau de contenu de journaux hétérogènes
US10007571B2 (en) 2016-06-01 2018-06-26 International Business Machines Corporation Policy based dynamic data collection for problem analysis
CN109522037A (zh) * 2018-11-16 2019-03-26 北京车和家信息技术有限公司 文件处理方法、装置、服务器及计算机可读存储介质
CN109656894A (zh) * 2018-11-13 2019-04-19 平安科技(深圳)有限公司 日志规范化存储方法、装置、设备及可读存储介质
CN110019076A (zh) * 2018-08-20 2019-07-16 平安普惠企业管理有限公司 多系统日志数据的构建方法、装置、设备及可读存储介质
CN110362545A (zh) * 2019-05-27 2019-10-22 平安科技(深圳)有限公司 日志监控方法、装置、终端与计算机可读存储介质
CN110990350A (zh) * 2019-11-28 2020-04-10 泰康保险集团股份有限公司 日志的解析方法及装置
CN111046012A (zh) * 2019-12-02 2020-04-21 东软集团股份有限公司 巡检日志的抽取方法、装置、存储介质和电子设备
CN111506646A (zh) * 2020-03-16 2020-08-07 阿里巴巴集团控股有限公司 数据同步方法、装置、系统、存储介质及处理器
CN111984630A (zh) * 2020-09-01 2020-11-24 深圳壹账通智能科技有限公司 日志关联方法、装置和计算机设备
CN112286892A (zh) * 2020-07-01 2021-01-29 上海柯林布瑞信息技术有限公司 后关系型数据库的数据实时同步方法及装置、存储介质、终端
CN112685370A (zh) * 2020-12-17 2021-04-20 福建新大陆软件工程有限公司 一种日志采集方法、装置、设备和介质
CN113220907A (zh) * 2021-06-10 2021-08-06 京东科技控股股份有限公司 业务知识图谱的构建方法及装置、介质、电子设备
DE102022001118A1 (de) 2022-03-31 2022-05-25 Mercedes-Benz Group AG Verfahren zur Analyse von Ereignisprotokollen
CN117170984A (zh) * 2023-11-02 2023-12-05 麒麟软件有限公司 一种linux系统待机状态的异常检测方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620928B1 (en) * 2012-07-16 2013-12-31 International Business Machines Corporation Automatically generating a log parser given a sample log

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620928B1 (en) * 2012-07-16 2013-12-31 International Business Machines Corporation Automatically generating a log parser given a sample log

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGYONG YU ET AL: "Mass log data processing and mining based on Hadoop and cloud computing", COMPUTER SCIENCE&EDUCATION (ICCSE), 2012 7TH INTERNATIONAL CONFERENCE ON, IEEE, 14 July 2012 (2012-07-14), pages 197 - 202, XP032232566, ISBN: 978-1-4673-0241-8, DOI: 10.1109/ICCSE.2012.6295056 *
MEIYAPPAN NAGAPPAN ET AL: "Abstracting log lines to log event types for mining software system logs", MINING SOFTWARE REPOSITORIES (MSR), 2010 7TH IEEE WORKING CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 2 May 2010 (2010-05-02), pages 114 - 117, XP031675571, ISBN: 978-1-4244-6802-7 *
WEI XU ET AL: "Detecting large-scale system problems by mining console logs", PROCEEDINGS OF THE ACM SIGOPS 22ND SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP '09, 1 January 2009 (2009-01-01), New York, New York, USA, pages 117, XP055254995, ISBN: 978-1-60558-752-3, DOI: 10.1145/1629575.1629587 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10534659B2 (en) 2016-06-01 2020-01-14 International Business Machines Corporation Policy based dynamic data collection for problem analysis
US10007571B2 (en) 2016-06-01 2018-06-26 International Business Machines Corporation Policy based dynamic data collection for problem analysis
US10558515B2 (en) 2016-06-01 2020-02-11 International Business Machines Corporation Policy based dynamic data collection for problem analysis
WO2018111355A1 (fr) * 2016-12-15 2018-06-21 Nec Laboratories America, Inc. Détection d'anomalies de niveau de contenu de journaux hétérogènes
CN110019076B (zh) * 2018-08-20 2023-03-24 平安普惠企业管理有限公司 多系统日志数据的构建方法、装置、设备及可读存储介质
CN110019076A (zh) * 2018-08-20 2019-07-16 平安普惠企业管理有限公司 多系统日志数据的构建方法、装置、设备及可读存储介质
CN109656894A (zh) * 2018-11-13 2019-04-19 平安科技(深圳)有限公司 日志规范化存储方法、装置、设备及可读存储介质
CN109522037A (zh) * 2018-11-16 2019-03-26 北京车和家信息技术有限公司 文件处理方法、装置、服务器及计算机可读存储介质
CN110362545A (zh) * 2019-05-27 2019-10-22 平安科技(深圳)有限公司 日志监控方法、装置、终端与计算机可读存储介质
CN110990350A (zh) * 2019-11-28 2020-04-10 泰康保险集团股份有限公司 日志的解析方法及装置
CN110990350B (zh) * 2019-11-28 2023-06-16 泰康保险集团股份有限公司 日志的解析方法及装置
CN111046012A (zh) * 2019-12-02 2020-04-21 东软集团股份有限公司 巡检日志的抽取方法、装置、存储介质和电子设备
CN111046012B (zh) * 2019-12-02 2023-09-26 东软集团股份有限公司 巡检日志的抽取方法、装置、存储介质和电子设备
CN111506646B (zh) * 2020-03-16 2023-05-02 阿里巴巴集团控股有限公司 数据同步方法、装置、系统、存储介质及处理器
CN111506646A (zh) * 2020-03-16 2020-08-07 阿里巴巴集团控股有限公司 数据同步方法、装置、系统、存储介质及处理器
CN112286892A (zh) * 2020-07-01 2021-01-29 上海柯林布瑞信息技术有限公司 后关系型数据库的数据实时同步方法及装置、存储介质、终端
CN112286892B (zh) * 2020-07-01 2024-04-05 上海柯林布瑞信息技术有限公司 后关系型数据库的数据实时同步方法及装置、存储介质、终端
CN111984630A (zh) * 2020-09-01 2020-11-24 深圳壹账通智能科技有限公司 日志关联方法、装置和计算机设备
CN112685370B (zh) * 2020-12-17 2022-08-05 福建新大陆软件工程有限公司 一种日志采集方法、装置、设备和介质
CN112685370A (zh) * 2020-12-17 2021-04-20 福建新大陆软件工程有限公司 一种日志采集方法、装置、设备和介质
CN113220907A (zh) * 2021-06-10 2021-08-06 京东科技控股股份有限公司 业务知识图谱的构建方法及装置、介质、电子设备
CN113220907B (zh) * 2021-06-10 2024-04-05 京东科技控股股份有限公司 业务知识图谱的构建方法及装置、介质、电子设备
DE102022001118A1 (de) 2022-03-31 2022-05-25 Mercedes-Benz Group AG Verfahren zur Analyse von Ereignisprotokollen
CN117170984A (zh) * 2023-11-02 2023-12-05 麒麟软件有限公司 一种linux系统待机状态的异常检测方法及系统
CN117170984B (zh) * 2023-11-02 2024-01-30 麒麟软件有限公司 一种linux系统待机状态的异常检测方法及系统

Similar Documents

Publication Publication Date Title
US11194828B2 (en) Method and system for implementing a log parser in a log analytics system
US11971898B2 (en) Method and system for implementing machine learning classifications
US20220092062A1 (en) Method and system for implementing a log parser in a log analytics system
WO2016161381A1 (fr) Procédé et système de mise en œuvre d'un analyseur de journal dans un système d'analyse de journal
US11809405B2 (en) Generating and distributing delta files associated with mutable events in a distributed system
US11803548B1 (en) Automated generation of metrics from log data
US11915156B1 (en) Identifying leading indicators for target event prediction
US20170223030A1 (en) Detection of security transactions
US11436116B1 (en) Recovering pre-indexed data from a shared storage system following a failed indexer
US11841834B2 (en) Method and apparatus for efficient synchronization of search heads in a cluster using digests
US11704285B1 (en) Metrics and log integration
US11892988B1 (en) Content pack management
US11829415B1 (en) Mapping buckets and search peers to a bucket map identifier for searching
US11977523B1 (en) Enhanced data extraction via efficient extraction rule matching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16716414

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE