CN109726185B - Log parsing method, system and computer readable medium based on syntax tree - Google Patents

Log parsing method, system and computer readable medium based on syntax tree Download PDF

Info

Publication number
CN109726185B
CN109726185B CN201811629058.7A CN201811629058A CN109726185B CN 109726185 B CN109726185 B CN 109726185B CN 201811629058 A CN201811629058 A CN 201811629058A CN 109726185 B CN109726185 B CN 109726185B
Authority
CN
China
Prior art keywords
matching
log
syntax tree
operator
syntax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811629058.7A
Other languages
Chinese (zh)
Other versions
CN109726185A (en
Inventor
施展
范渊
刘博�
龙文洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN201811629058.7A priority Critical patent/CN109726185B/en
Publication of CN109726185A publication Critical patent/CN109726185A/en
Application granted granted Critical
Publication of CN109726185B publication Critical patent/CN109726185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a log analysis method, a system and a computer readable medium based on a syntax tree, which relate to the technical field of log analysis and comprise the following steps: acquiring at least one original log to be analyzed; matching the original log with a lexical analyzer to obtain a first matching result; if the first matching result is matching failure, performing syntax analysis on the original log, and generating a syntax tree based on the syntax analysis result, wherein the syntax tree comprises a plurality of nodes, and each node is used for representing field information in the original log or representing operator information in the original log; and traversing the syntax tree, and acquiring field information in each node in the syntax tree by using the regular expression. According to the method provided by the invention, aiming at each original log to be analyzed, the original logs with different structures are analyzed through the same regular expression, so that the redundancy of the content of the regular expression is reduced, and the repeated labor process of a rule writer is reduced.

Description

Log parsing method, system and computer readable medium based on syntax tree
Technical Field
The present invention relates to the field of log parsing technologies, and in particular, to a method, a system, and a computer readable medium for log parsing based on a syntax tree.
Background
According to the current log analysis rule, firstly, the whole log is segmented by using a regular expression, then, the segmented content is analyzed and segmented in detail by using the regular expression again, and the regular expression required by the analysis rule is rewritten for matching aiming at different logs. However, there are cases where some contents can be matched by a previously written regular expression between different logs, which increases redundancy of parsing rules and repetitive labor of a person writing the parsing rules if the new rules are written again.
Disclosure of Invention
In view of this, the present invention provides a log parsing method, system and computer readable medium based on syntax tree, wherein for each original log to be parsed, the same regular expression is used to parse the original logs with different structures, so as to reduce the redundancy of the regular expression content and reduce the repeated labor process of the staff writing the regular expression.
In a first aspect, an embodiment of the present invention provides a log parsing method based on a syntax tree, including: acquiring at least one original log to be analyzed; matching each original log with a lexical analyzer to obtain a plurality of first matching results, wherein the first matching results are any one of the following items: matching is successful, and matching is failed; if each first matching result is a matching failure, performing syntax analysis on the original log, and generating a syntax tree based on the syntax analysis result, wherein the syntax tree comprises a plurality of nodes, and each node is used for representing field information in the original log or representing operator information in the original log; and traversing the syntax tree, and acquiring field information in each node in the syntax tree by using a regular expression.
Further, the method further comprises: and if the first matching result is that the matching is successful, acquiring field information in the original log by using a regular expression.
Further, parsing the raw log and generating a syntax tree based on a result of the parsing includes: performing operator matching on the original log by using a syntax analyzer to obtain a first operator identifier; segmenting the original log based on the first operator identification to obtain at least one log segment; identifying the first operator as a root node and the at least one log segment as a child node; and carrying out operator matching on the log segments to obtain a second matching result, and generating the syntax tree according to the second matching result.
Further, the operator matching is performed on the log segment to obtain a second matching result, and the generating of the syntax tree according to the second matching result includes: if the matching is determined to be successful based on the second matching result, obtaining a second operator identifier, segmenting the log segment based on the second operator identifier, and generating a new child node based on the segmentation result; and if the matching fails, performing element expression matching on the at least one log segment to obtain a third matching result, and generating a leaf node based on the third matching result to obtain a syntax tree.
Further, performing element expression matching on the at least one log segment to obtain a third matching result, and generating a leaf node based on the third matching result includes: matching the at least one log segment with an element expression in a parser to obtain a third matching result, wherein the third matching result is any one of the following items: matching is successful, and matching is failed; and if the third matching result is that the matching is successful, using the at least one log segment as a leaf node of the syntax tree.
Further, traversing the syntax tree comprises: classifying operator identifications corresponding to each node in the syntax tree according to a preset priority order to obtain the priority of each operator identification; taking the sequence of the priorities identified by the operators from high to low as the traversal sequence of the syntax tree; and traversing the syntax tree according to the traversal sequence.
Further, before obtaining at least one original log to be parsed, the method further includes: and constructing a lexical analyzer library and a grammar analyzer library.
In a second aspect, an embodiment of the present invention further provides a log parsing system based on a syntax tree, including: the system comprises an acquisition module, a lexical analysis module, a syntax analysis module and a field information acquisition module, wherein the acquisition module is used for acquiring at least one original log to be analyzed; the lexical analysis module is configured to match each original log with a lexical analyzer to obtain a plurality of first matching results, where the first matching results are any one of the following: matching is successful, and matching is failed; the syntax analysis module is configured to, if each first matching result is a matching failure, perform syntax analysis on the original log, and generate a syntax tree based on a syntax analysis result, where the syntax tree includes a plurality of nodes, and each node is used to represent field information in the original log or represent operator information in the original log; and the field information acquisition module is used for traversing the syntax tree and acquiring the field information in each node in the syntax tree by using a regular expression.
Further, the field information obtaining module is further configured to: and if the first matching result is that the matching is successful, acquiring field information in the original log by using a regular expression.
In a third aspect, an embodiment of the present invention further provides a computer-readable medium having a non-volatile program code executable by a processor, where the program code causes the processor to execute the method according to the first aspect.
In the embodiment of the invention, at least one original log to be analyzed is obtained, each original log is matched with a lexical analyzer to obtain a matching result, if the matching result is matching failure, the original log is subjected to syntactic analysis, a syntactic tree is generated based on the syntactic analysis result, the syntactic tree is traversed, the field information in each node in the syntactic tree is obtained by using a regular expression, and the original logs to be analyzed are analyzed by the same regular expression in a mode of analyzing the original logs with different structures, so that the content redundancy of the regular expression is reduced, and the repeated labor process of a worker writing the regular expression is reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a log parsing method based on a syntax tree according to an embodiment of the present invention;
FIG. 2 is a flowchart of another syntax tree based log parsing method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a structure of a first syntax tree according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a structure of a second syntax tree according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a structure of a third syntax tree according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a fourth syntax tree according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a fifth syntax tree according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating a sixth syntax tree according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a log parsing system based on syntax trees according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The first embodiment is as follows:
in accordance with an embodiment of the present invention, there is provided an embodiment of a syntax tree based log parsing method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a syntax tree-based log parsing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, obtaining at least one original log to be analyzed;
step S104, matching each original log with a lexical analyzer to obtain a plurality of first matching results, wherein the first matching results are any one of the following items: matching is successful, and matching is failed; if the first matching result is a matching failure, executing step S106;
step S106, carrying out syntactic analysis on the original log, and generating a syntactic tree based on a syntactic analysis result, wherein the syntactic tree comprises a plurality of nodes, and each node is used for representing field information in the original log or representing operator information in the original log;
step S108, traversing the syntax tree, acquiring field information in each node in the syntax tree by using the regular expression, and storing the field information as a key-value pair, namely a key-value pair, wherein the value is the content of the field information, and the key is a keyword corresponding to the content of the field information.
For example, if the field information in one node in the syntax tree is obtained as time information by using the regular expression, for example, 23:44:18, the stored key-value pair is: time-23: 44: 18.
In the embodiment of the invention, at least one original log to be analyzed is obtained, each original log is matched with a lexical analyzer to obtain a matching result, if the matching result is matching failure, the original log is subjected to syntactic analysis, a syntactic tree is generated based on the syntactic analysis result, the syntactic tree is traversed, the field information in each node in the syntactic tree is obtained by using a regular expression, and the original logs to be analyzed are analyzed by the same regular expression in a mode of analyzing the original logs with different structures, so that the content redundancy of the regular expression is reduced, and the repeated labor process of a worker writing the regular expression is reduced.
Optionally, if the first matching result in step S104 is a successful matching, then:
and acquiring field information in at least one original log to be analyzed by using the regular expression, and storing the field information as a key-value pair, namely a key-value pair.
Optionally, as shown in fig. 2, the parsing the original log and generating the syntax tree based on the parsing result in step S106 specifically includes the following steps:
step S1061, carrying out operator matching on the original log by using a syntax analyzer to obtain a first operator identifier.
Specifically, the operator identifications in the syntax analyzer are ranked according to a preset priority order to obtain the priority of each operator identification, the operator identifications are matched according to a sequence from top to bottom of the priority order, and the operator identification with the highest priority is obtained and serves as a first operator identification.
Step S1062, the original log is segmented based on the first operator identifier to obtain at least one log segment. Specifically, the original log is segmented with the position of the first operator mark in the original log as a reference.
For example, if the first operator identifies a middle position in the original day, the original log is divided into two log segments, based on the first operator, where: the portion of the original log before the identification of the first operator is used as a first log segment and the portion of the original log after the identification of the first operator is used as a second log segment.
Step S1063, identifying the first operator as a root node, and taking at least one log segment as a child node.
Step S1064, performing operator matching on the log segment to obtain a second matching result, and generating a syntax tree according to the second matching result.
Specifically, if the matching is determined to be successful based on the second matching result, a second operator identifier is obtained, the log segment is segmented based on the second operator identifier, and a new child node is generated based on the segmentation result;
and if the matching fails, performing element expression matching on at least one log segment to obtain a third matching result, and generating leaf nodes based on the third matching result to obtain the syntax tree. Wherein the third matching result comprises at least one of: matching is successful, and matching is failed.
Specifically, if the third matching result is that the matching is successful, at least one log segment is used as a leaf node of the syntax tree; and if the third matching result is that the matching fails, discarding the log segment.
Optionally, in the method provided in the embodiment of the present invention, the method for traversing a syntax tree includes:
classifying operator identifications corresponding to each node in the syntax tree according to a preset priority order to obtain the priority of each operator identification; for example, the priority of operator identification may be divided into four levels;
taking the sequence of the priorities identified by the operators from high to low as the traversal sequence of the syntax tree; for example, traversal is started from the root node with the highest priority of operator identification, and then the operator is traversed to identify the operator node with the low priority;
and traversing the syntax tree according to the traversal order.
The following log is taken as an example to specifically describe the log parsing process in the embodiment of the present invention:
the method comprises the following steps of obtaining at least one original log content to be analyzed:
<190>Sep 19 2018 02:03:45WH-USG9560-1%%01RT/6/CM(l)[978]:Slot=8/2,Vcpu=0;cpuid=2cpu=0totalmemory=492614580curmemory=108399660
performing operator matching on the original log by using a syntax analyzer to obtain a first operator identifier of "%% 01RT/6/CM (l) [978 ]", taking the first operator identifier as a root node of the log, and splitting the original log into left log segments based on the first operator identifier: sep 19201802: 03:45WH-USG9560-1, and Right Log section: slot 8/2, Vcpu 0; the cpu 2cpu 492614580 current memory 108399660 forms a tree with the left and right log segments as the left and right child nodes of the root node, and the form of the tree is shown in fig. 3.
The left log segment is parsed by the parser, the left log segment can be further split, the split left child node can continue to generate child nodes, and the structure of the tree is shown in fig. 4.
Matching the new child node generated by the left child node by using a regular expression in a lexical analyzer, wherein the obtained matching result is that the matching is successful, and the field information of the newly generated child node is obtained as follows: sep 19201802: 03:45 and WH-USG 9560-1.
Using a syntax analyzer to perform syntax analysis on the right log segment to obtain a second operator identifier; and performing segmentation operation on the right log segment according to a second operator, and generating two new child nodes of the right child node, wherein the log segment contents corresponding to the child nodes are respectively as follows: slot 8/2, Vcpu 0 and cpu 2cpu 0, 492614580 current 108399660, when the tree structure is as shown in fig. 5.
And continuously utilizing the syntax analyzer to perform syntax analysis on the newly generated child nodes to obtain that the third operator is marked as a space and the fourth operator is marked as ",", and segmenting the newly generated child nodes according to the space sum "," so as to continuously generate new child nodes, wherein the structure of the tree is shown in fig. 6 at this moment.
The new child node generated at this time may further continue to use "═ to perform segmentation to reach the minimum unit, and the segmented minimum unit may be successfully matched by the element expression in the parser, and the segmented minimum unit is taken as a leaf node of the tree to obtain the syntax tree, where the structure of the syntax tree is as shown in fig. 7.
Alternatively, the syntax tree shown in fig. 7 may be converted into an expression form, and the converted syntax tree structure is shown in fig. 8.
The operator identifiers in the syntax analyzer are graded according to a preset priority order to obtain four operator priorities, which are respectively:
first-level operator: "% \ d {1,2} RT/\ d +/CM \ (l \ d + \)"
Second-level operators: "{? - \ S } ","; "
Three-level operator: ",",""
Four-level operator: "═ or"
The element expression is: \\ S {3} \ d {1,2} \ d {4} \ d {1,2} \ w {1,2} \\ \ w {3} \ d {4} - \ d {1} \\ \ w + \ d +
And traversing the syntax tree to acquire the field information of each child node in the syntax tree.
Specifically, the value is first assigned by a default manner of an operator, for example, an "═ operator" in the syntax tree indicates that a right leaf node of the operator is given as a value to a variable whose key name is a left leaf node, and the obtained key-value pair is field information. Whereas for operators without default, time and host names such as in the syntax tree described above can be specified in the rules manually, and the form of node numbers can be used to indicate to which variable should be assigned.
Optionally, before the step S102 obtains at least one original log to be parsed, the method for parsing a log based on a syntax tree according to the embodiment of the present invention further includes the following steps: and constructing a lexical analyzer library and a grammar analyzer library.
Specifically, when a parser library is constructed, a professional is required to write a corresponding parser, wherein the parser is used for parsing the log and segmenting the log based on a parsing result.
Optionally, the parser comprises: operator, regular expression, traversal order and assignment rule. The traversal order is a preset traversal order and is a syntax tree traversal order.
For example, one parser structure in the parser library is:
{"op":"(\S*)=(\S*)","order":"center","ass":"RTL"},
wherein op represents operators, order represents traversal order, ass represents assignment rules, and (\ S) represents regular expressions.
Specifically, when a lexical analyzer library is constructed, a professional is required to write a corresponding lexical analyzer, wherein the lexical analyzer is used for extracting the content of the segmented log.
Optionally, the lexical analyzer includes a regular expression.
For example, one lexical analyzer structure in the lexical analyzer library is:
{ "accid" [ \ w \ d ] } indicates that the structure matched by the lexical analyzer can only consist of words or numbers.
As can be seen from the above description, in the embodiment of the present invention, at least one original log to be analyzed is obtained, each original log is matched with a lexical analyzer to obtain a matching result, if the matching result is a matching failure, the original log is parsed, a syntax tree is generated based on the parsing result, the syntax tree is traversed, field information in each node in the syntax tree is obtained by using a regular expression, and for each original log to be analyzed, the redundancy of the content of the regular expression is reduced, and the repeated labor process of a worker writing rules is reduced in a manner of analyzing the original logs with different structures by using the same regular expression.
Example two:
the embodiment of the present invention further provides a log parsing system based on the syntax tree, where the device is mainly used to execute the log parsing method based on the syntax tree provided in the above-mentioned content of the embodiment of the present invention, and a specific description is made below on the log parsing system based on the syntax tree provided in the embodiment of the present invention.
Fig. 9 is a schematic diagram of a syntax tree-based log parsing system according to an embodiment of the present invention, and as shown in fig. 9, the system mainly includes: an acquisition module 10, a lexical analysis module 20, a syntax analysis module 30 and a field information acquisition module 40, wherein,
an obtaining module 10, configured to obtain at least one original log to be analyzed;
a lexical analysis module 20, configured to match each original log with a lexical analyzer to obtain multiple first matching results, where the first matching result is any one of the following: matching is successful, and matching is failed;
a syntax analysis module 30, configured to, if each first matching result is a matching failure, perform syntax analysis on the original log, and generate a syntax tree based on the syntax analysis result, where the syntax tree includes a plurality of nodes, and each node is used to represent field information in the original log or represent operator information in the original log;
and the field information acquisition module 40 is used for traversing the syntax tree and acquiring the field information in each node in the syntax tree by using the regular expression.
Optionally, the field information obtaining module 40 is further configured to:
and if the first matching result is that the matching is successful, acquiring field information in at least one original log to be analyzed by using the regular expression.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Embodiments of the present invention also provide a computer readable medium having a non-volatile program code executable by a processor, where the program code causes the processor to execute the method provided in the first embodiment.
The computer program product for performing the log parsing method based on the syntax tree provided in the embodiment of the present invention includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and will not be described herein again.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A log parsing method based on a syntax tree, comprising:
acquiring at least one original log to be analyzed;
matching each original log with a lexical analyzer to obtain a plurality of first matching results, wherein the first matching results are any one of the following items: matching is successful, and matching is failed;
if each first matching result is a matching failure, performing syntax analysis on the original log, and generating a syntax tree based on the syntax analysis result, wherein the syntax tree comprises a plurality of nodes, and each node is used for representing field information in the original log or representing operator information in the original log;
traversing the syntax tree, and acquiring field information in each node in the syntax tree by using a regular expression;
parsing the raw log and generating a syntax tree based on a result of the parsing includes:
performing operator matching on the original log by using a syntax analyzer to obtain a first operator identifier;
segmenting the original log based on the first operator identification to obtain at least one log segment;
identifying the first operator as a root node and the at least one log segment as a child node;
and carrying out operator matching on the log segments to obtain a second matching result, and generating the syntax tree according to the second matching result.
2. The method of claim 1, further comprising:
and if the first matching result is that the matching is successful, acquiring field information in the original log by using a regular expression.
3. The method of claim 1, wherein performing operator matching on the log segment to obtain a second matching result, and wherein generating the syntax tree based on the second matching result comprises:
if the matching is determined to be successful based on the second matching result, obtaining a second operator identifier, segmenting the log segment based on the second operator identifier, and generating a new child node based on the segmentation result;
and if the matching fails, performing element expression matching on the at least one log segment to obtain a third matching result, and generating a leaf node based on the third matching result to obtain a syntax tree.
4. The method of claim 3, wherein performing element expression matching on the at least one log segment to obtain a third matching result, and wherein generating a leaf node based on the third matching result comprises:
matching the at least one log segment with an element expression in a parser to obtain a third matching result, wherein the third matching result is any one of the following items: matching is successful, and matching is failed;
and if the third matching result is that the matching is successful, using the at least one log segment as a leaf node of the syntax tree.
5. The method of claim 3, wherein traversing the syntax tree comprises:
classifying operator identifications corresponding to each node in the syntax tree according to a preset priority order to obtain the priority of each operator identification;
taking the sequence of the priorities identified by the operators from high to low as the traversal sequence of the syntax tree;
and traversing the syntax tree according to the traversal sequence.
6. The method of claim 1, wherein prior to obtaining at least one raw log to be parsed, the method further comprises:
and constructing a lexical analyzer library and a grammar analyzer library.
7. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1-6.
CN201811629058.7A 2018-12-28 2018-12-28 Log parsing method, system and computer readable medium based on syntax tree Active CN109726185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811629058.7A CN109726185B (en) 2018-12-28 2018-12-28 Log parsing method, system and computer readable medium based on syntax tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811629058.7A CN109726185B (en) 2018-12-28 2018-12-28 Log parsing method, system and computer readable medium based on syntax tree

Publications (2)

Publication Number Publication Date
CN109726185A CN109726185A (en) 2019-05-07
CN109726185B true CN109726185B (en) 2020-12-25

Family

ID=66297883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811629058.7A Active CN109726185B (en) 2018-12-28 2018-12-28 Log parsing method, system and computer readable medium based on syntax tree

Country Status (1)

Country Link
CN (1) CN109726185B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704290B (en) * 2019-09-27 2024-02-13 百度在线网络技术(北京)有限公司 Log analysis method and device
CN110826314B (en) * 2019-11-07 2023-08-22 中金智汇科技有限责任公司 Rule analysis method and device, electronic equipment and storage medium
CN112882713B (en) * 2019-11-29 2024-03-12 北京数安鑫云信息技术有限公司 Log analysis method, device, medium and computer equipment
CN111158691B (en) * 2019-12-05 2023-10-13 杭州安恒信息技术股份有限公司 Method for realizing rule engine dynamic
CN111177595B (en) * 2019-12-20 2024-04-05 杭州九略智能科技有限公司 Method for extracting asset information by templating HTTP protocol
CN111651781A (en) * 2020-06-05 2020-09-11 腾讯科技(深圳)有限公司 Log content protection method and device, computer equipment and storage medium
CN111859929B (en) * 2020-08-05 2024-04-09 杭州安恒信息技术股份有限公司 Data visualization method and device and related equipment thereof
CN116127960B (en) * 2023-04-17 2023-06-23 广东粤港澳大湾区国家纳米科技创新研究院 Information extraction method, information extraction device, storage medium and computer equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
CN102833269A (en) * 2012-09-18 2012-12-19 苏州山石网络有限公司 Detection method and device for cross site scripting and firewall with device
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN104573024A (en) * 2015-01-12 2015-04-29 国家电网公司 Self-adaptive extracting method and system for heterogeneous security log information under complex network system
CN105630656A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Log model based system robustness analysis method and apparatus
CN106202004A (en) * 2016-07-13 2016-12-07 上海轻维软件有限公司 Combined data cutting method based on regular expressions and separator
US9558176B2 (en) * 2013-12-06 2017-01-31 Microsoft Technology Licensing, Llc Discriminating between natural language and keyword language items

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130282739A1 (en) * 2012-04-18 2013-10-24 International Business Machines Corporation Generating a log parser by automatically identifying regular expressions matching a sample log
US11477264B2 (en) * 2016-06-29 2022-10-18 Nicira, Inc. Network workflow replay tool

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110812A (en) * 2007-08-29 2008-01-23 中兴通讯股份有限公司 Text command analyzing and processing method
CN102833269A (en) * 2012-09-18 2012-12-19 苏州山石网络有限公司 Detection method and device for cross site scripting and firewall with device
US9558176B2 (en) * 2013-12-06 2017-01-31 Microsoft Technology Licensing, Llc Discriminating between natural language and keyword language items
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN105630656A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Log model based system robustness analysis method and apparatus
CN104573024A (en) * 2015-01-12 2015-04-29 国家电网公司 Self-adaptive extracting method and system for heterogeneous security log information under complex network system
CN106202004A (en) * 2016-07-13 2016-12-07 上海轻维软件有限公司 Combined data cutting method based on regular expressions and separator

Also Published As

Publication number Publication date
CN109726185A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109726185B (en) Log parsing method, system and computer readable medium based on syntax tree
CN110888849B (en) Online log analysis method and system and electronic terminal equipment thereof
US9135289B2 (en) Matching transactions in multi-level records
US7877399B2 (en) Method, system, and computer program product for comparing two computer files
US10621211B2 (en) Language tag management on international data storage
JP4716709B2 (en) Structured document processing apparatus, structured document processing method, and program
US20110016451A1 (en) Method and system for generating test cases for a software application
JP2010506247A (en) Network-based method and apparatus for filtering junk information
CN109460386B (en) Malicious file homology analysis method and device based on multi-dimensional fuzzy hash matching
CN116149669B (en) Binary file-based software component analysis method, binary file-based software component analysis device and binary file-based medium
CN113672628A (en) Data blood margin analysis method, terminal device and medium
US9706005B2 (en) Providing automatable units for infrastructure support
US20230418578A1 (en) Systems and methods for detection of code clones
CN110688118A (en) Webpage optimization method and device
CN111581057B (en) General log analysis method, terminal device and storage medium
CN107330031B (en) Data storage method and device and electronic equipment
CN113806647A (en) Method for identifying development framework and related equipment
CN112667672A (en) Log analysis method and analysis device
CN113760891A (en) Data table generation method, device, equipment and storage medium
CN113688240B (en) Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium
CN113946516A (en) Code coverage rate determining method and device and storage medium
US10936241B2 (en) Method, apparatus, and computer program product for managing datasets
CN112686029A (en) SQL new sentence identification method and device for database audit system
CN113553587B (en) File detection method, device, equipment and readable storage medium
CN116033048B (en) Multi-protocol analysis method of Internet of things, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310052 188 Lianhui street, Xixing street, Binjiang District, Hangzhou, Zhejiang Province

Applicant after: Hangzhou Anheng Information Technology Co.,Ltd.

Address before: Zhejiang Zhongcai Building No. 68 Binjiang District road Hangzhou City, Zhejiang Province, the 310000 and 15 layer

Applicant before: Hangzhou Anheng Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant