CN112632960A - Log analysis method and system based on dynamic field template - Google Patents

Log analysis method and system based on dynamic field template Download PDF

Info

Publication number
CN112632960A
CN112632960A CN202110011998.5A CN202110011998A CN112632960A CN 112632960 A CN112632960 A CN 112632960A CN 202110011998 A CN202110011998 A CN 202110011998A CN 112632960 A CN112632960 A CN 112632960A
Authority
CN
China
Prior art keywords
field
log
template
analyzed
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110011998.5A
Other languages
Chinese (zh)
Inventor
李陟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Venus Information Security Technology Co Ltd
Venustech Group Inc
Original Assignee
Beijing Venus Information Security Technology Co Ltd
Venustech Group Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Venus Information Security Technology Co Ltd, Venustech Group Inc filed Critical Beijing Venus Information Security Technology Co Ltd
Priority to CN202110011998.5A priority Critical patent/CN112632960A/en
Publication of CN112632960A publication Critical patent/CN112632960A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A log analysis method and system based on dynamic fields analyze logs by using a dynamic field template for recording log field names and field position sequences, automatically find field changes between a log format to be analyzed and an existing template by comparing the analyzed field contents with the length of the field contents in the template, and further dynamically update the template by activating a human-computer interaction interface to supplement new fields, so that the analysis method can automatically adapt to changes of the log format in the field sequences, the field numbers, the field content format changes and the like, and a regular expression does not need to be written again, thereby reducing the engineering implementation threshold of SOC products for front-end engineers.

Description

Log analysis method and system based on dynamic field template
Technical Field
The invention relates to the technical field of computer information, in particular to a method and a system for analyzing a security log in a large-scale network environment.
Background
A Security Operations Center (SOC) is a system and a device that manage Security events and Security alarms of a monitored network by monitoring and analyzing Security devices, systems and software logs. The core function of the SOC is the collection and analysis processing of the log. The analysis of the log refers to the field segmentation of the original log and the storage of the field content extracted from the log into the corresponding field predefined by the SOC system.
Because the logs collected by the SOC are from different software and hardware systems provided by different manufacturers, and the software and hardware systems are continuously updated and upgraded with the rapid development of the IT technology, the original log data collected by the SOC do not always have a universal unified format. These logs cannot be directly parsed and stored in the database, nor can they be directly analyzed by the analysis module.
The usual solution is to write a complex regular expression for each different model of device and different version of software or system log to provide parsing capability for the format log. The specific process comprises the following steps: the method comprises the steps of firstly writing a log identification regular expression matched with the overall format of a log according to a log sample, secondly writing an extraction regular expression according to the position of predefined field content in the log format, extracting required content from the log, and finally storing all the regular expression content related to the log in a configuration file form, wherein the regular expression is defined as a static template, and the static template is loaded when a log analysis module of the SOC system is started. During operation, the SOC firstly matches each new log with each pre-stored regular expression for log identification, identifies which pre-stored log format analysis template the log belongs to, and then extracts the content of all fields by using the field extraction related regular expression.
The current regular expressions are all written manually. However, since it is often unpredictable in advance what logs in different formats are needed to be parsed in the real environment of the user, although the SOC will store all the parsed logs in configuration files to implement direct parsing, a large number of unrecognizable log formats are still encountered in the implementation field. At this time, the front-end implementation engineer can only write the regular expression to analyze the log based on the interface provided by the SOC, and the front-end implementation engineer is limited by the technical capability of the front-end engineer and the field debugging and development environment, so that the work of manually writing the regular expression on the field becomes the work which is difficult to be implemented smoothly, and only after the log sample is sent back to the research and development center, the log analysis is completed by the development engineer at the rear end, so that the SOC equipment is often idle for a long time before the log analysis work is completed, the due effect of the SOC equipment cannot be exerted, and the network environment of the user cannot obtain effective safe operation and maintenance management service.
It can be seen that the above conventional log parsing is complicated by the fact that the log recognition mechanism based on the static template is too hard and rigid: after the field content is found by the regular expression according to the field name, matching is carried out according to the character format of the field content, and the method cannot adapt to slight log format change, namely when the field name, the field sequence, the field content format (such as a date display format), the word segmentation mode and the like of the log are different, a log identification mechanism fails, the log is considered to be a brand new log, and a set of log identification and field analysis regular expression needs to be rewritten for analysis. The analysis mechanism of the field content of the log also completely depends on the matching of the regular expression to the character by character for analysis, so that the analysis accuracy can be realized, but the adaptability to the change is lacked.
Disclosure of Invention
The invention provides a log analysis method and a log analysis system based on a dynamic field template, which can realize log analysis without depending on a regular expression, create an analysis template to analyze logs by only inputting field names of the logs and have better self-adaptive capacity to the change of log formats.
The present disclosure provides a log parsing method based on a dynamic field template, which includes the following steps:
searching a template which has the highest matching degree with a log to be analyzed in terms of field names and position sequences and meets the threshold requirement from an existing dynamic field template set, wherein the dynamic field template is used for recording the field names and the position sequences contained in a log file so as to represent the format of the log;
based on the matching position of the field in the searched template in the log to be analyzed, carrying out field segmentation and content extraction on the log to be analyzed;
if the length of the content of the divided field is larger than the threshold value, a new field is additionally input into the current template through human-computer interaction to update the current template, and the log to be analyzed is further analyzed by utilizing the updated template;
if the template with the matching degree meeting the threshold requirement cannot be found, fields in the log to be analyzed are input through human-computer interaction, a new dynamic field template is generated, and the log to be analyzed is analyzed by using the template.
Further, searching a template which has the highest matching degree with the log to be analyzed in terms of field name and position sequence and meets the requirement of a threshold value from the existing dynamic field template set, and the method comprises the following steps:
acquiring each template in the existing dynamic field template set one by one;
calculating the matching degree of the current template and the log to be analyzed according to the consistency degree of the field names and the position sequences contained in the current template and the log to be analyzed;
finding out a template with the highest matching degree in the set;
and judging whether the matching degree value meets the threshold requirement, if so, the template is the template with the highest matching degree and meeting the threshold requirement.
Further, according to the consistency degree of the field names and the position sequences contained in the current template and the log to be analyzed, the matching degree of the current template and the log to be analyzed is calculated, and the method comprises the following steps;
field name presence detection: acquiring each field name one by one from the current template, detecting whether the field name exists in the log to be analyzed, and recording the position of the field name in the log to be analyzed if the field name exists;
matching field sequences: sequencing the position sequence of the detected field names in the log to be analyzed, comparing the field name sequence in the current template, and judging whether the sequence of each field accords with the field sequence in the current template;
the matching degree is calculated based on the result of the above field name presence detection and the result of the field order matching.
Further, the matching degree is calculated based on the following formula:
matching degree ∑ (α × the field name presence detection result + β × the field sequential matching result)/number of log fields to be parsed,
wherein, alpha and beta are configurable parameters, alpha is more than or equal to 0.5, and alpha + beta is 1.
Further, before the step of calculating the matching degree according to the degree of consistency between the field names and the position sequences contained in the current template and the log to be analyzed, the method further includes: performing primary screening matching calculation on the current template and the log to be analyzed;
and only if the template successfully matched in the primary screening is successfully matched, the step of calculating the matching degree is carried out.
Further, the method for primary screening matching calculation comprises the following steps:
taking the field with the highest priority and/or the field with invariable sequence as the primary screening matching identification of the log; matching calculation is carried out on the current template and the preliminary screening matching identification of the log to be analyzed;
and if the matching calculation result exceeds a certain value, the primary screening matching is considered to be successful.
Further, the method for performing field segmentation and content extraction on the log to be analyzed based on the matching position of the field in the searched template in the log to be analyzed comprises the following steps:
and taking the content between two adjacent fields at the matching position as the field content of the previous field, and removing the front and back non-valid characters.
Further, the steps of additionally inputting new fields in the current template through human-computer interaction to update the current template, or generating a new dynamic field template through fields in the log to be analyzed input through human-computer interaction comprise:
displaying a log to be analyzed and a list of all field names of the log which are identified;
a user inputs the field names which are not identified in the log on an input interface;
after submission, the currently used dynamic field template is updated or a new dynamic field template is generated.
Further, the log parsing method based on the dynamic field template further includes: and the step of acquiring the log file to be analyzed from the outside comprises reading the log file from the external file or acquiring the log file from the SOC device.
The invention also provides a log analysis system based on the dynamic field template, which comprises the following steps: dynamic field template library module, template matching module, log field analysis module and interactive field input module, wherein:
the dynamic field template library module is used for storing the existing log dynamic field template, and the dynamic field template records the field name and the position sequence contained in the log file to indicate the format of the log;
the template matching module is used for searching a template which has the highest matching degree with the log to be analyzed in terms of field name and position sequence and meets the requirement of a threshold value in the dynamic field template library;
the log field analysis module is used for carrying out field segmentation and content extraction on the log to be analyzed based on the template found by the template matching module or the matching position of the field in the template obtained by the interactive field input module in the log to be analyzed;
and the interactive field entry module provides a human-computer interaction interface and is used for manually entering the field and generating or updating the dynamic field template when the dynamic field template with the matching degree meeting the threshold requirement cannot be found out or the field content length segmented by the log field analysis module is greater than the threshold value.
Further, the template matching module comprises:
the field name existence detection submodule is used for acquiring each field name one by one from the current template, detecting whether the field name exists in the log to be analyzed, and recording the position of the field name in the log to be analyzed if the field name exists;
the field sequence matching submodule is used for sequencing the position sequence of the detected field names in the log to be analyzed, comparing the field name sequence in the current template and judging whether the sequence of each field accords with the field sequence in the current template;
and the matching degree calculation operator module is used for calculating the matching degree of the current template and the log to be analyzed in the field name and position sequence based on the field name detection result and the field sequence matching result.
Further, the template matching module further comprises:
the primary screening matching sub-module is used for performing primary screening matching calculation on the current template and the log to be analyzed; and only if the template matched successfully in the primary screening is transferred to the field name existence detection and the calculation of other sub-modules.
Further, the log parsing system further includes:
and the log acquisition module is used for acquiring a log file to be analyzed from the outside, and reading the log file from the external file or acquiring the log file from the SOC device.
The log analyzing method and the log analyzing system analyze the log by utilizing the dynamic field template for recording the field name and the field position sequence of the log, automatically find the field change between the log format to be analyzed and the existing template by comparing the analyzed field content with the length of the field content in the template, and further dynamically update the template by activating a human-computer interaction interface to supplement a new field, so that the analyzing method can automatically adapt to the change of the log format in the aspects of field sequence, field number, field content format change and the like.
Compared with the prior art, the beneficial effect of this disclosure is: firstly, a regular expression is not required to be compiled, and a log analysis template is created only by inputting field names, so that the engineering implementation threshold of the SOC product is reduced for a front-end engineer; secondly, by utilizing a dynamic field name analysis technology, the problem that field content cannot be well adapted to field content change when a fixed regular expression is used for reading the field content is solved, the robustness of a log analysis algorithm in a log analysis process is greatly improved, and the log analysis algorithm can have good adaptability to log format change, including log field sequence change, new field increase, field number reduction, field content format change and the like.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the disclosure.
FIG. 1 shows a flowchart of an exemplary dynamic field template based log parsing method embodiment;
FIG. 2 shows an exemplary log to be parsed.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
The method mainly comprises the following steps:
s001: searching a template which has the highest matching degree with a log to be analyzed in terms of field names and position sequences and meets the threshold requirement from an existing dynamic field template set, wherein the dynamic field template is used for recording the field names and the position sequences contained in a log file so as to represent the format of the log;
s002: based on the matching position of the field in the searched template in the log to be analyzed, carrying out field segmentation and content extraction on the log to be analyzed;
s003: if the length of the content of the divided field is larger than the threshold value, a new field is additionally input into the current template through human-computer interaction to update the current template, and the log to be analyzed is further analyzed by utilizing the updated template;
s004: if the template with the matching degree meeting the threshold requirement cannot be found, fields in the log to be analyzed are input through human-computer interaction, a new dynamic field template is generated, and the log to be analyzed is analyzed by using the template.
Example two
As shown in fig. 1, the second embodiment first attempts to parse the log by using the conventional method:
(1) s01: matching and identifying the whole log format by using a traditional log analysis template based on a regular expression;
(2) s02: if the matching identification is successful, analyzing the log by using a regular expression directly based on the template;
if the matching identification fails, the log analysis method based on the dynamic field template is adopted to analyze, and the specific steps are as follows:
(3) s1: and searching a template which has the highest matching degree with the log to be analyzed in terms of field name and position sequence and meets the threshold requirement from the existing dynamic field template set, wherein the dynamic field template is used for recording the field name and the position sequence contained in the log file so as to represent the format of the log.
An exemplary dynamic field template format is as follows:
name of field Field name priority Order of fields Field order type Maximum length of field content
The field name is a character and is a name of the field. The field name priority is a number for specifying the order in which the field names are detected, i.e., the field names having a high priority are detected first. The field order is a number that indicates the order in which the field name appears throughout the log template. The field order type is 0 or 1, which respectively indicates that the order of the field in the whole field sequence is not variable and the order of the field in the whole field sequence is variable. The maximum length of the field content is a number, and the maximum character length of the field content is recorded.
Preferably, the specific step of S1 includes:
s101: circularly obtaining each log template in the log dynamic field template library;
s102: calculating the matching degree of the current template and the log to be analyzed according to the consistency degree of the field names and the position sequences contained in the current template and the log to be analyzed;
s103: sorting the matching degrees, and finding out a template with the highest matching degree in the set;
s104: and judging whether the matching degree value meets the set threshold requirement, if so, the template is the template with the highest matching degree and meeting the threshold requirement. The threshold of the degree of match is a configurable parameter.
Wherein, step S102 may further include:
s1021, field name existence detection: acquiring each field name one by one from the current template, detecting whether the complete field name exists in the log to be analyzed through full string matching, for example, if the complete field name exists, recording the result as 1, otherwise, recording the result as 0, and recording the position of the complete field name in the log to be analyzed;
s1022, matching field sequence: sequencing the positions of the detected field names in the log to be analyzed, comparing the sequence of the field names in the current template, and judging whether the sequence of each field accords with the sequence of the field in the current template, wherein the sequence of each field is 1 and the sequence of each field is 0;
s1023: the matching degree calculation is performed based on the result of the above field name presence detection and the result of the field order matching. The following formula is preferably employed:
sigma (alpha field name existence detection result + beta field sequence detection result)/number of log fields to be analyzed
The parameters alpha and beta are configurable parameters, alpha is more than or equal to 0.5, alpha + beta is 1, the support strength for the variable field sequence is adjusted, if the field sequence is frequently changed greatly, the value is reduced, the influence of the field sequence on the matching degree is reduced, otherwise, the value is increased, and the matching accuracy is improved.
Preferably, the embodiment further adds a preliminary screening matching step before step S102 to reduce the amount of calculation, that is:
step S112: firstly, performing primary screening matching calculation on the current template and the log to be analyzed, and only if the template is successfully matched in the primary screening, entering the step S102 to perform detailed calculation of the matching degree.
The preliminary screening matching can be generally performed according to the current template and the key field information contained in the log to be analyzed. Preferably comprising the steps of:
taking the field with the highest priority and/or the field with invariable sequence as the primary screening matching identification of the log;
matching calculation is carried out on the current template and the preliminary screening matching identification of the log to be analyzed;
and if the matching calculation result exceeds a certain value, the primary screening matching is considered to be successful.
(4) S2: and performing field segmentation and content extraction on the log to be analyzed based on the matching position of the field in the searched template in the log to be analyzed.
After the selection of the dynamic field template is completed, the process of analyzing the log is carried out based on the position of the field name in the template in the log to be analyzed: and taking the content between two adjacent field names in position as the field content of the previous field name, and filtering out the invalid characters. The regular expression of the process can be generated completely automatically by the program, namely, the extraction of the universal character string segment based on the starting position and the ending position.
(5) S3: if the length of the content of the divided field is larger than the threshold value, a new field is additionally input into the current template through human-computer interaction, and the log to be analyzed is further analyzed by utilizing the updated template.
Preferably, this step can be further embodied as:
s301, judging whether the difference between the maximum length of the content of the field extracted and the maximum length of the field name in the template exceeds a length threshold value or not for each divided field;
if the number of the fields exceeds the preset number, the new fields are considered to be possibly added, and the transition to the human-computer interaction interface is triggered, namely:
s302, inputting field names through human-computer interaction, increasing the support of the current template to new fields, and updating the current template;
then, analyzing the log to be analyzed again by using the updated template;
if not, the analysis work of log field segmentation and content extraction can be finished;
(6) if no template with matching degree meeting the threshold requirement can be found, for example, no available template exists currently, or all templates have low matching degree, the method is triggered to enter a human-computer interaction interface, that is, S302: inputting fields in a log to be analyzed through manual interaction to generate a new dynamic field template;
and then analyzing the log to be analyzed by utilizing the template.
As a preferred scheme, in this embodiment, the log field is entered through human-computer interaction as follows:
displaying the whole log to be analyzed and a list of all field names identified by the log on a screen, and waiting for a user to input a new field name in the log;
a user inputs a new field name on an input interface;
and after the log is submitted, updating the current template or generating a new template, and updating and displaying the field name list identified in the log to be analyzed.
The working process of the whole log parsing in connection with a specific application scenario is further explained as follows.
An exemplary to-be-parsed log is shown in FIG. 2, hereinafter "exemplary log".
(1) The specific scene one is as follows: no log template exists in the existing set of dynamic field templates that is capable of parsing the example log.
The process of performing the parsing process using the method of the exemplary embodiment is described as follows:
step 1, matching the log template in the dynamic field template by using the example log, wherein the matching cannot be successful.
And 2, activating an interactive field name entry interface, and entering field names by a user, wherein the field names respectively comprise SerialNum, GenTime, SrcIP6, SrcIPVer, DstIP6, DstIPVer, SMAC, DMAC, Content and EvtCount.
And 3, updating the log template, taking all characters on the left side of the most advanced field SerialNum as Syslog header information of the log, and respectively recording the position values of all the fields in the example log.
And 4, entering the log template matching process again, and matching the log template matching process with the template updated into the dynamic field template set in the step 3.
And 5, according to the position of each field name matched with the example log by the template, carrying out field segmentation and content extraction on the example log: the content between two adjacent field names is taken as the corresponding field content of the previous field name, and the non-valid characters are filtered out, for example, the content between GenTime and SrcIP is taken as 2016-04-2016:24:16, which is taken as the field content of GenTime.
(2) The specific scene one is as follows: available templates can be matched in the existing dynamic field template set, but there are increase, decrease and order changes compared with the field names in the example log, for example, the matched template1 includes fields SerialNum, GenTime, SMAC, DMAC, srcipip, srcipverl, DstIP, dstipverl, Content, Msg, EvtCount, compared with the example log which has no Msg field, has more fields SrcIP6 and DstIP6, and the sequence of SMAC and DMAC changes, then the parsing process of the log is as follows:
step 1, setting α to 0.8, β to 0.2, and setting the matching degree threshold to 0.75, and when template1 is matched, calculating the matching degree to be 0.87>0.75, and then the template matching is successful.
Step 2, the sample log is analyzed by using the position of the template1 and the matching field of the sample log: and acquiring the content between two adjacent field names as the corresponding field content of the previous field name, and filtering out invalid characters.
And 3, judging whether the maximum length of each field is abnormal or not, and activating an interactive field name entry interface if the length of the DstIP field and the length of the SrcIP field both exceed the maximum length threshold of the field content (the threshold is set to be 0.2), so that a user can enter the lacking field names.
And 4, updating the log template, taking all characters on the left side of the most front-recorded field SerialNum as Syslog header information of the log, and respectively recording position values of all recorded fields in the example log.
And 5, re-entering a log analysis process, and analyzing the log based on the updated template in the step 4.
And 6, acquiring the content between two adjacent field names as the corresponding field content of the previous field name according to the position of each field name, and filtering out invalid characters. And judging that the maximum length of each field accords with the template, and finishing log analysis.
According to the log analysis method based on the dynamic field template and the interactive manual field name input mode, which is provided by the exemplary embodiment, the log analysis method is used as a function extension of the traditional log identification based on the regular expression after failure, so that the SOC has higher adaptability during log identification, and no operator needs to manually write any regular expression in the whole process.
The log analysis system adopting the analysis method comprises the following exemplary embodiments:
the dynamic field template library module is used for storing the existing log dynamic field template;
the template matching module is used for searching a template which has the highest matching degree with the log to be analyzed in terms of field name and position sequence and meets the requirement of a threshold value in the dynamic field template library;
the log field analysis module is used for carrying out field segmentation and content extraction on the log to be analyzed based on the template found by the template matching module or the matching position of the field in the template obtained by the interactive field input module in the log to be analyzed;
and the interactive field entry module provides a human-computer interaction interface, and is used for manually entering the field to generate or update the dynamic field template when the dynamic field template with the matching degree meeting the threshold requirement cannot be found out or the length of the analyzed and segmented field content is greater than the threshold value.
As a preferred scheme, the system further comprises a log collection module, which is used for obtaining a log file to be analyzed from the outside, including reading from an external file or obtaining from the SOC device.
Wherein the exemplary template matching module further comprises a field name presence detection sub-module, a field order matching sub-module, and a match degree operator module.
As the preferred scheme, the template matching module also comprises a primary screening matching submodule, and only the template successfully matched by primary screening is transferred to field name existence detection and calculation of other submodules.
The foregoing is illustrative of the present invention and various modifications and changes in form or detail will readily occur to those skilled in the art based upon the teachings herein and the application of the principles and principles disclosed herein, which are to be regarded as illustrative rather than restrictive on the broad principles of the present invention.

Claims (13)

1. A log analysis method based on a dynamic field template is characterized by comprising the following steps:
searching a template which has the highest matching degree with a log to be analyzed in terms of field names and position sequences and meets the threshold requirement from an existing dynamic field template set, wherein the dynamic field template is used for recording the field names and the position sequences contained in a log file so as to represent the format of the log;
based on the matching position of the field in the searched template in the log to be analyzed, carrying out field segmentation and content extraction on the log to be analyzed;
if the length of the content of the divided field is larger than the threshold value, a new field is additionally input into the current template through human-computer interaction to update the current template, and the log to be analyzed is further analyzed by utilizing the updated template;
if the template with the matching degree meeting the threshold requirement cannot be found, fields in the log to be analyzed are input through human-computer interaction, a new dynamic field template is generated, and the log to be analyzed is analyzed by using the template.
2. The log parsing method as claimed in claim 1, wherein the step of searching for the template that has the highest matching degree with the log to be parsed in terms of field name and position order and meets the threshold requirement from the existing dynamic field template set comprises the following steps:
acquiring each template in the existing dynamic field template set one by one;
calculating the matching degree of the current template and the log to be analyzed according to the consistency degree of the field names and the position sequences contained in the current template and the log to be analyzed;
finding out a template with the highest matching degree in the set;
and judging whether the matching degree value meets the threshold requirement, if so, the template is the template with the highest matching degree and meeting the threshold requirement.
3. The log parsing method according to claim 2, wherein the calculating of the matching degree according to the degree of consistency between the field names and the position sequences contained in the current template and the log to be parsed comprises the following steps;
field name presence detection: acquiring each field name one by one from the current template, detecting whether the field name exists in the log to be analyzed, and recording the position of the field name in the log to be analyzed if the field name exists;
matching field sequences: sequencing the position sequence of the detected field names in the log to be analyzed, comparing the field name sequence in the current template, and judging whether the sequence of each field accords with the field sequence in the current template;
the matching degree is calculated based on the result of the above field name presence detection and the result of the field order matching.
4. The log parsing method as recited in claim 3, wherein the matching degree is calculated based on the following formula:
matching degree ∑ (α × the field name presence detection result + β × the field sequential matching result)/number of log fields to be parsed,
wherein, alpha and beta are configurable parameters, alpha is more than or equal to 0.5, and alpha + beta is 1.
5. The log parsing method according to claim 2, wherein before the step of calculating the matching degree according to the degree of coincidence between the field names and the position sequences contained in the current template and the log to be parsed, the method further comprises: performing primary screening matching calculation on the current template and the log to be analyzed;
and only if the template successfully matched in the primary screening is successfully matched, the step of calculating the matching degree is carried out.
6. The log parsing method as recited in claim 5, wherein the method for prescreening matching computation comprises the steps of:
taking the field with the highest priority and/or the field with invariable sequence as the primary screening matching identification of the log;
matching calculation is carried out on the current template and the preliminary screening matching identification of the log to be analyzed;
and if the matching calculation result exceeds a certain value, the primary screening matching is considered to be successful.
7. The log parsing method as claimed in claim 1, wherein the method for performing field segmentation and content extraction on the log to be parsed based on the matching position of the field in the searched template in the log to be parsed comprises:
and taking the content between two adjacent fields at the matching position as the field content of the previous field, and removing the front and back non-valid characters.
8. The log parsing method as claimed in claim 1, wherein the step of updating the current template by additionally entering new fields in the current template through human-computer interaction, or generating a new dynamic field template by entering fields in the log to be parsed through human-computer interaction comprises:
displaying a log to be analyzed and a list of all field names of the log which are identified;
a user inputs the field names which are not identified in the log on an input interface;
after submission, the currently used dynamic field template is updated or a new dynamic field template is generated.
9. The log parsing method as claimed in claim 1, further comprising the step of obtaining the log file to be parsed from outside, including reading from an external file or obtaining from an SOC device.
10. A log parsing system based on dynamic field template applying the log parsing method of any one of claims 1-9, comprising: dynamic field template library module, template matching module, log field analysis module and interactive field input module, wherein:
the dynamic field template library module is used for storing the existing log dynamic field template, and the dynamic field template records the field name and the position sequence contained in the log file to indicate the format of the log;
the template matching module is used for searching a template which has the highest matching degree with the log to be analyzed in terms of field name and position sequence and meets the requirement of a threshold value in the dynamic field template library;
the log field analysis module is used for carrying out field segmentation and content extraction on the log to be analyzed based on the template found by the template matching module or the matching position of the field in the template obtained by the interactive field input module in the log to be analyzed;
and the interactive field entry module provides a human-computer interaction interface and is used for manually entering the field and generating or updating the dynamic field template when the dynamic field template with the matching degree meeting the threshold requirement cannot be found out or the field content length segmented by the log field analysis module is greater than the threshold value.
11. The log parsing system of claim 10, wherein the template matching module comprises:
the field name existence detection submodule is used for acquiring each field name one by one from the current template, detecting whether the field name exists in the log to be analyzed, and recording the position of the field name in the log to be analyzed if the field name exists;
the field sequence matching submodule is used for sequencing the position sequence of the detected field names in the log to be analyzed, comparing the field name sequence in the current template and judging whether the sequence of each field accords with the field sequence in the current template;
and the matching degree calculation operator module is used for calculating the matching degree of the current template and the log to be analyzed in the field name and position sequence based on the field name detection result and the field sequence matching result.
12. The log parsing system of claim 11, wherein the template matching module further comprises:
the primary screening matching sub-module is used for performing primary screening matching calculation on the current template and the log to be analyzed; and only if the template matched successfully in the primary screening is transferred to the field name existence detection and the calculation of other sub-modules.
13. The log parsing system of claim 10, further comprising:
and the log acquisition module is used for acquiring a log file to be analyzed from the outside, and reading the log file from the external file or acquiring the log file from the SOC device.
CN202110011998.5A 2021-01-06 2021-01-06 Log analysis method and system based on dynamic field template Pending CN112632960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110011998.5A CN112632960A (en) 2021-01-06 2021-01-06 Log analysis method and system based on dynamic field template

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110011998.5A CN112632960A (en) 2021-01-06 2021-01-06 Log analysis method and system based on dynamic field template

Publications (1)

Publication Number Publication Date
CN112632960A true CN112632960A (en) 2021-04-09

Family

ID=75290807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110011998.5A Pending CN112632960A (en) 2021-01-06 2021-01-06 Log analysis method and system based on dynamic field template

Country Status (1)

Country Link
CN (1) CN112632960A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553309A (en) * 2021-07-28 2021-10-26 恒安嘉新(北京)科技股份公司 Log template determination method and device, electronic equipment and storage medium
CN113595787A (en) * 2021-07-27 2021-11-02 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
CN113760655A (en) * 2021-08-27 2021-12-07 中移(杭州)信息技术有限公司 Door lock log analysis method and device and computer readable storage medium
CN114385396A (en) * 2021-12-27 2022-04-22 华青融天(北京)软件股份有限公司 Log analysis method, device, equipment and medium
CN114785604A (en) * 2022-04-28 2022-07-22 北京安博通金安科技有限公司 Dynamic log analysis method, device, equipment and storage medium
CN116861865A (en) * 2023-06-26 2023-10-10 江苏常熟农村商业银行股份有限公司 EXCEL data processing method, device, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113595787A (en) * 2021-07-27 2021-11-02 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
CN113595787B (en) * 2021-07-27 2024-03-29 招商银行股份有限公司 Real-time log automatic alarm method, program and medium based on log template
CN113553309A (en) * 2021-07-28 2021-10-26 恒安嘉新(北京)科技股份公司 Log template determination method and device, electronic equipment and storage medium
CN113760655A (en) * 2021-08-27 2021-12-07 中移(杭州)信息技术有限公司 Door lock log analysis method and device and computer readable storage medium
CN114385396A (en) * 2021-12-27 2022-04-22 华青融天(北京)软件股份有限公司 Log analysis method, device, equipment and medium
CN114785604A (en) * 2022-04-28 2022-07-22 北京安博通金安科技有限公司 Dynamic log analysis method, device, equipment and storage medium
CN114785604B (en) * 2022-04-28 2023-11-07 北京安博通金安科技有限公司 Dynamic log analysis method, device, equipment and storage medium
CN116861865A (en) * 2023-06-26 2023-10-10 江苏常熟农村商业银行股份有限公司 EXCEL data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112632960A (en) Log analysis method and system based on dynamic field template
EP3846048A1 (en) Online log analysis method, system, and electronic terminal device thereof
CN111708860A (en) Information extraction method, device, equipment and storage medium
CN110263009B (en) Method, device and equipment for generating log classification rule and readable storage medium
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN107797916B (en) DDL statement auditing method and device
CN112445775B (en) Fault analysis method, device, equipment and storage medium of photoetching machine
CN110765195A (en) Data analysis method and device, storage medium and electronic equipment
US20120185584A1 (en) Recording application consumption details
CN112445912A (en) Fault log classification method, system, device and medium
CN116186759A (en) Sensitive data identification and desensitization method for privacy calculation
CN111563131A (en) Database entity relation generation method and device, computer equipment and storage medium
CN113141369B (en) Artificial intelligence-based firewall policy management method and related equipment
CN111581057B (en) General log analysis method, terminal device and storage medium
CN111966339B (en) Buried point parameter input method and device, computer equipment and storage medium
CN116484084B (en) Metadata blood-margin analysis method, medium and system based on application information mining
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN109299132B (en) SQL data processing method and system and electronic equipment
CN110727565B (en) Network equipment platform information collection method and system
CN112686029A (en) SQL new sentence identification method and device for database audit system
US8775528B2 (en) Computer readable recording medium storing linking keyword automatically extracting program, linking keyword automatically extracting method and apparatus
CN113342861B (en) Data management method and device in service scene
CN113343051B (en) Abnormal SQL detection model construction method and detection method
CN117636317A (en) Image processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination