CN108563629B - Automatic log analysis rule generation method and device - Google Patents

Automatic log analysis rule generation method and device Download PDF

Info

Publication number
CN108563629B
CN108563629B CN201810205205.1A CN201810205205A CN108563629B CN 108563629 B CN108563629 B CN 108563629B CN 201810205205 A CN201810205205 A CN 201810205205A CN 108563629 B CN108563629 B CN 108563629B
Authority
CN
China
Prior art keywords
log
parsing
rule
analysis
regular expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810205205.1A
Other languages
Chinese (zh)
Other versions
CN108563629A (en
Inventor
邸壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Renhe Credit Technology Co ltd
Original Assignee
Beijing Renhe Credit Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Renhe Credit Technology Co ltd filed Critical Beijing Renhe Credit Technology Co ltd
Priority to CN201810205205.1A priority Critical patent/CN108563629B/en
Publication of CN108563629A publication Critical patent/CN108563629A/en
Application granted granted Critical
Publication of CN108563629B publication Critical patent/CN108563629B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for automatically generating a log analysis rule, wherein the method comprises the following steps: a log word segmentation step, namely receiving a log of a newly added device and automatically segmenting words of the log of the newly added device; a grammar analysis step of giving grammar definition to the separated words; a regular generation step, namely generating a regular expression of an analysis rule according to the grammar definition; and a field mapping step, namely automatically applying the regular expression of the analysis rule to a server analysis engine. By the invention, the user can automatically complete the equipment log access without writing any code, thereby greatly reducing the difficulty and complexity of log analysis and improving the efficiency of developing the analysis rule of the log.

Description

Automatic log analysis rule generation method and device
Technical Field
The invention relates to the technical field of security management, in particular to a method and a device for automatically generating log analysis rules.
Background
In the prior art, a newly added device log in a computer is accessed by writing codes, so that the log analysis difficulty is high, the complexity is high, and the efficiency of analyzing rule development on the log is extremely low.
Disclosure of Invention
The invention aims to solve the technical problems that the log analysis is difficult and complex, and the efficiency of developing the analysis rule of the log is extremely low.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides an automatic generation method of a log analysis rule, which comprises the following steps: a log word segmentation step, namely receiving a log of a newly added device and automatically segmenting words of the log of the newly added device; a grammar analysis step of giving grammar definition to the separated words; a regular generation step, namely generating a regular expression of an analysis rule according to the grammar definition; and a field mapping step, namely automatically applying the regular expression of the analysis rule to a server analysis engine.
Preferably, in the step of segmenting words in the log, a finite state automaton is constructed, characters in the newly-added equipment log are analyzed one by one through the finite state automaton, when a stop word in a stop word dictionary is encountered, the finite state automaton is exited and a lexical mark is output, then the finite state automaton is returned to continue segmenting words until all characters in the newly-added equipment log are analyzed, and therefore the newly-added equipment log is segmented into a word list.
Preferably, a grammar analysis rule is built in the computer system or defined by a user, in the grammar analysis step, the lexical tag is received and matched with the lexical tag, if a grammar analysis rule matching the lexical tag exists, a grammar definition in the grammar analysis rule matching the lexical tag is given to each word in the divided word list, and if no grammar analysis rule matching the lexical tag exists, a default grammar analysis rule is given to the lexical tag.
Preferably, in the parsing step, the syntax definition includes one or more of a timestamp, an IP address, a URL address, a user agent, an integer, a floating point number, a file, a user name.
Preferably, in the parsing step, different lexical tags are respectively matched with the parsing rules, for the same lexical tag, the lexical tag is matched with the plurality of parsing rules, and the parsing rule with the largest matching degree with the lexical tag is selected.
Preferably, in the regular generating step, the combination defined by the grammar is converted into a regular expression of parsing rules, and is spliced with the log segments which are not successfully parsed.
Preferably, in the field mapping step, the server-side parsing engine performs a function operation on the fields in the parsing rule regular expression to map the fields in the parsing rule regular expression into final fields required by the server-side parsing engine.
Preferably, in the field mapping step, the analysis rule regular expression is automatically uploaded to a server, and is displayed to a user through a visual interface, and the user performs secondary confirmation and storage on the analysis rule regular expression through the visual interface, and re-issues the analysis rule regular expression to a server analysis engine.
Preferably, in the field mapping step, the parsing rule regular expression and the matching degree of the grammar analysis rule and the lexical label are automatically uploaded to a server, and are displayed to a user through a visual interface, and the user corrects the parsing rule regular expression through the visual interface and re-issues the parsing rule regular expression to a server side parsing engine.
The invention also provides an automatic log analysis rule generating device, which is used for executing the automatic log analysis rule generating method, and the automatic log analysis rule generating device comprises: the log word segmentation module is used for receiving the log of the newly added equipment and automatically segmenting words of the log of the newly added equipment; a grammar analysis module for assigning grammar definition to the divided words; the regular generating module generates a regular expression of an analysis rule according to the grammar definition; and the field mapping module is used for automatically applying the generated regular expression of the analysis rule to a server analysis engine.
Compared with the prior art, the invention has the following advantages and beneficial effects:
by the invention, the user can automatically complete the equipment log access without writing any code, thereby greatly reducing the difficulty and complexity of log analysis and improving the efficiency of developing the analysis rule of the log.
Drawings
FIG. 1 is a flow chart of a log segmentation step;
FIG. 2 is a flow chart of the parsing step;
fig. 3 is a block diagram of an automatic log analysis rule generation device.
Detailed Description
The present invention will now be described in further detail with reference to the attached figures, so that the invention will be more clearly and easily understood. Those of ordinary skill in the art will recognize that the described embodiments can be modified in various different ways, or combinations thereof, without departing from the spirit and scope of the present invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims. Furthermore, in the present description, the drawings are not to scale and like reference numerals refer to like parts.
Embodiments of the present invention are described in detail below with reference to fig. 1-3.
The automatic generation method of the log analysis rule comprises the following steps: the method comprises a log word segmentation step, a grammar analysis step, a regular generation step and a field mapping step.
In the log word segmentation step, a newly added device log is received, and automatic word segmentation is carried out on the newly added device log.
Preferably, in the step of segmenting words in the log, as shown in fig. 1, a Finite State Machine (FSM) is constructed, the FSM analyzes characters in the log of the newly added device one by one, when a stop word in the stop word dictionary is encountered, the FSM is exited and a lexical label (lexical token) is output, then the finite state machine is returned to continue segmenting words until all characters in the log of the newly added device are analyzed, so that the log of the newly added device is segmented into a word list. The stop word dictionary can be dynamically updated, and different stop word dictionaries can be set for different equipment types according to actual conditions.
In the parsing step, a grammar definition is given to the segmented words.
Preferably, parsing rules are built into the computer system or defined by the user, and in the parsing step, as shown in fig. 2, lexical tags are received and the parsing rules are matched with the lexical tags. And if the grammar analysis rule matched with the lexical label exists, each word in the divided word list is endowed with the grammar definition in the grammar analysis rule matched with the lexical label. And if no grammar analysis rule matched with the lexical label exists, giving a default grammar analysis rule to the lexical label.
Preferably, the parsing rule contains two parts, the first part is a grammar definition including, but not limited to, a timestamp, an IP address, a URL address, a User Agent (User-Agent), an integer, a floating point number, a file, a username, etc., and the second part is a regular expression definition, with different regular expressions formulated for different grammar definitions.
Preferably, in the parsing step, different lexical markers are respectively matched to the parsing rules in a multithread manner. And for the same lexical label, matching the lexical label with a plurality of grammar analysis rules, and selecting the grammar analysis rule with the maximum matching degree with the lexical label. Therefore, the matching result can be efficiently output.
In the regular generation step, a parsing rule regular expression is generated according to the grammar definition.
Preferably, in the regular generation step, the combination of grammar definitions is converted into a parsing rule regular expression and is spliced with the log segments which are not successfully parsed.
And in the field mapping step, automatically applying the analysis rule regular expression to a server analysis engine.
Preferably, in the field mapping step, the server-side parsing engine performs a function operation on the fields in the parsing rule regular expression to map the fields in the parsing rule regular expression into final fields required by the server-side parsing engine.
Preferably, in the field mapping step, the analysis rule regular expression is automatically uploaded to a server, and is displayed to a user through a visual interface, and the user performs secondary confirmation and storage on the analysis rule regular expression through the visual interface and re-issues the analysis rule regular expression to a server-side analysis engine.
Preferably, in the field mapping step, the parsing rule regular expression and the matching degree of the grammar analysis rule and the lexical label are automatically uploaded to a server, and are displayed to a user through a visual interface, and the user corrects the parsing rule regular expression through the visual interface and re-issues the parsing rule regular expression to a server side parsing engine so as to optimize the parsing rule regular expression.
The present invention further includes an automatic log parsing rule generating device for executing the automatic log parsing rule generating method, as shown in fig. 3, including: the log word segmentation module is used for receiving the newly added device log and automatically segmenting the newly added device log; a grammar analysis module for assigning grammar definition to the divided words; the regular generation module generates a regular expression of the analysis rule according to the grammar definition; and the field mapping module is used for automatically applying the generated regular expression of the analysis rule to the analysis engine of the server side.
By the invention, the user can automatically complete the equipment log access without writing any code, thereby greatly reducing the difficulty and complexity of log analysis and improving the efficiency of developing the analysis rule of the log.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A log parsing rule automatic generation method comprises the following steps:
a log word segmentation step, namely receiving a log of a newly added device and automatically segmenting words of the log of the newly added device;
a grammar analysis step of giving grammar definition to the separated words;
a regular generation step, namely generating a regular expression of an analysis rule according to the grammar definition; and
a field mapping step, automatically applying the analysis rule regular expression to a server analysis engine,
in the step of segmenting words in the log, a finite state automaton is constructed, characters in the newly-added equipment log are analyzed one by one through the finite state automaton, when a stop word in a stop word dictionary is encountered, the finite state automaton is exited, lexical marks are output, then the finite state automaton is returned to continue segmenting words until all characters in the newly-added equipment log are analyzed, and therefore the newly-added equipment log is segmented into word lists.
2. The automatic log parsing rule generating method of claim 1, wherein parsing rules are built in a computer system or defined by a user, and in the parsing step, the lexical tags are received and matched with the parsing rules,
if the grammar analysis rule matched with the lexical label exists, each word in the divided word list is endowed with the grammar definition in the grammar analysis rule matched with the lexical label,
and if no grammar analysis rule matched with the lexical label exists, giving a default grammar analysis rule to the lexical label.
3. The log parsing rule automatic generation method of claim 2, in the parsing step, the grammar definition includes one or more of a timestamp, an IP address, a URL address, a user agent, an integer, a floating point number, a file, a username.
4. The automatic log parsing rule generating method according to claim 2, wherein in the parsing step, different lexical tags are respectively matched with the parsing rules in a multithread manner, the lexical tags are matched with the plurality of parsing rules for the same lexical tag, and the parsing rule having the largest matching degree with the lexical tag is selected.
5. The method according to claim 2, wherein in the regular generating step, the combination of grammar definitions is converted into a regular expression of parsing rules, and the regular expression is spliced with log segments that have not been successfully parsed.
6. The method according to claim 5, wherein in the field mapping step, the server parsing engine performs a function operation on the fields in the parsing rule regular expression to map the fields in the parsing rule regular expression to the final fields required by the server parsing engine.
7. The method according to claim 6, wherein in the field mapping step, the analysis rule regular expression is automatically uploaded to a server, and is displayed to a user through a visual interface, and the user performs secondary confirmation and storage on the analysis rule regular expression through the visual interface and re-issues the analysis rule regular expression to a server analysis engine.
8. The method according to claim 7, wherein in the field mapping step, the matching degree between the parsing rule regular expression and the grammar analysis rule and the lexical label is automatically uploaded to a server, and is displayed to a user through a visual interface, and the user modifies the parsing rule regular expression through the visual interface and re-issues the modified parsing rule regular expression to a server side parsing engine.
9. An automatic log parsing rule generating device for executing the automatic log parsing rule generating method of any one of claims 1 to 8, the automatic log parsing rule generating device comprising:
the log word segmentation module is used for receiving the log of the newly added equipment and automatically segmenting words of the log of the newly added equipment;
a grammar analysis module for assigning grammar definition to the divided words;
the regular generating module generates a regular expression of an analysis rule according to the grammar definition; and
and the field mapping module is used for automatically applying the generated regular expression of the analysis rule to a server analysis engine.
CN201810205205.1A 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device Expired - Fee Related CN108563629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810205205.1A CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810205205.1A CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Publications (2)

Publication Number Publication Date
CN108563629A CN108563629A (en) 2018-09-21
CN108563629B true CN108563629B (en) 2022-04-19

Family

ID=63531515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810205205.1A Expired - Fee Related CN108563629B (en) 2018-03-13 2018-03-13 Automatic log analysis rule generation method and device

Country Status (1)

Country Link
CN (1) CN108563629B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968560B (en) * 2018-09-29 2023-05-23 北京国双科技有限公司 Configuration method, device and system of log collector
CN110134615B (en) * 2019-04-10 2022-03-01 百度在线网络技术(北京)有限公司 Method and device for acquiring log data by application program
CN110321457A (en) * 2019-04-19 2019-10-11 杭州玳数科技有限公司 Access log resolution rules generation method and device, log analytic method and system
CN111737950B (en) * 2020-08-27 2020-12-08 北京安帝科技有限公司 Method for judging abnormality of regional equipment of power plant
CN112667672B (en) * 2021-01-06 2024-05-10 北京启明星辰信息安全技术有限公司 Log analysis method and analysis device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1759354A (en) * 2003-01-09 2006-04-12 思科系统公司 Methods and apparatuses for evaluation of regular expressions of arbitrary size
CN1975725A (en) * 2006-12-12 2007-06-06 华为技术有限公司 Method and system for managing journal
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN104144071A (en) * 2013-05-10 2014-11-12 北京新媒传信科技有限公司 System log processing method and platform
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN106294673A (en) * 2016-08-08 2017-01-04 杭州玳数科技有限公司 A kind of method and system of User Defined rule real time parsing daily record data
CN106790109A (en) * 2016-12-26 2017-05-31 东软集团股份有限公司 Data matching method and device, protocol data analysis method, device and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8776217B2 (en) * 2006-11-03 2014-07-08 Alcatel Lucent Methods and apparatus for detecting unwanted traffic in one or more packet networks utilizing string analysis
US20110083123A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Automatically localizing root error through log analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1759354A (en) * 2003-01-09 2006-04-12 思科系统公司 Methods and apparatuses for evaluation of regular expressions of arbitrary size
CN1975725A (en) * 2006-12-12 2007-06-06 华为技术有限公司 Method and system for managing journal
CN102955914A (en) * 2011-08-19 2013-03-06 百度在线网络技术(北京)有限公司 Method and device for detecting security flaws of source files
CN104144071A (en) * 2013-05-10 2014-11-12 北京新媒传信科技有限公司 System log processing method and platform
CN104391881A (en) * 2014-10-30 2015-03-04 杭州安恒信息技术有限公司 Word segmentation algorithm-based log parsing method and word segmentation algorithm-based log parsing system
CN106294673A (en) * 2016-08-08 2017-01-04 杭州玳数科技有限公司 A kind of method and system of User Defined rule real time parsing daily record data
CN106790109A (en) * 2016-12-26 2017-05-31 东软集团股份有限公司 Data matching method and device, protocol data analysis method, device and system

Also Published As

Publication number Publication date
CN108563629A (en) 2018-09-21

Similar Documents

Publication Publication Date Title
CN108563629B (en) Automatic log analysis rule generation method and device
TWI636452B (en) Method and system of voice recognition
US11334692B2 (en) Extracting a knowledge graph from program source code
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
CN111177184A (en) Structured query language conversion method based on natural language and related equipment thereof
CN112036162B (en) Text error correction adaptation method and device, electronic equipment and storage medium
CN107680588B (en) Intelligent voice navigation method, device and storage medium
CN113051285B (en) SQL sentence conversion method, system, equipment and storage medium
CN111104423B (en) SQL statement generation method and device, electronic equipment and storage medium
US20130125098A1 (en) Transformation of Computer Programs
US8660969B1 (en) Training dependency parsers by jointly optimizing multiple objectives
CN111079408B (en) Language identification method, device, equipment and storage medium
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
US20140372105A1 (en) Submatch Extraction
CN107526742B (en) Method and apparatus for processing multilingual text
CN112416962A (en) Data query method, device and storage medium
CN113836925A (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN111190873B (en) Log mode extraction method and system for log training of cloud native system
CN113743101A (en) Text error correction method and device, electronic equipment and computer storage medium
CN114860942A (en) Text intention classification method, device, equipment and storage medium
CN104050156A (en) Device, method and electronic equipment for extracting maximum noun phrase
US9189475B2 (en) Indexing mechanism (nth phrasal index) for advanced leveraging for translation
CN111492364B (en) Data labeling method and device and storage medium
CN113312451B (en) Text label determining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220419