CN117573516A - Method and device for diagnosing problems from massive logs based on custom SQL - Google Patents

Method and device for diagnosing problems from massive logs based on custom SQL Download PDF

Info

Publication number
CN117573516A
CN117573516A CN202311408074.4A CN202311408074A CN117573516A CN 117573516 A CN117573516 A CN 117573516A CN 202311408074 A CN202311408074 A CN 202311408074A CN 117573516 A CN117573516 A CN 117573516A
Authority
CN
China
Prior art keywords
log
sql
defining
custom
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311408074.4A
Other languages
Chinese (zh)
Inventor
周朝卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unihub China Information Technology Co Ltd
Original Assignee
Unihub China Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unihub China Information Technology Co Ltd filed Critical Unihub China Information Technology Co Ltd
Priority to CN202311408074.4A priority Critical patent/CN117573516A/en
Publication of CN117573516A publication Critical patent/CN117573516A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/366Software debugging using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for diagnosing problems from mass logs based on custom SQL, wherein the method comprises the following steps: defining grammar rules of custom SQL; generating a parser code using ANTLR; defining a listener or visitor to traverse the parse tree; defining logic of log diagnosis in a monitor or visitor according to log diagnosis requirements; the parse tree is processed using a listener or visitor and corresponding operations are performed including defining a log parse template, defining a data source, mapping a log file into a table, and log querying. The method and the device are based on log diagnosis of custom SQL, have low use threshold and high diagnosis efficiency, support special grammar for log diagnosis such as regular expression definition log analysis template, definition data source, mapping log table, trace back and forth of abnormal data, keyword classification, natural language classification and the like, support various file systems such as local files, SFTP, HDFS, juiceFS and the like, and have cross-system log association capability.

Description

Method and device for diagnosing problems from massive logs based on custom SQL
Technical Field
The invention relates to the technical field of log diagnosis, in particular to a method and a device for diagnosing problems from massive logs based on custom SQL.
Background
Database systems, container platforms, big data components and various application software widely deployed in enterprises can continuously generate a huge amount of log information in the daily operation process. However, how to efficiently use such log information for fault diagnosis and problem localization to quickly find the root cause of a problem in the event of a fault is a challenge.
The conventional log diagnosis mainly relies on manual work, the log file is checked, the diagnosis efficiency is low, the context information of abnormal information is insufficient, the complex functions such as conditional query and statistical query cannot be supported, and the log diagnosis requirement of a large-scale system is difficult to deal with.
The main problems are as follows:
(1) Query retrieval is time-consuming and laborious. For logs in a single node such as a host or container, the log is checked or retrieved by an operating system command such as vi, cat, tail. Searching and filtering functions are weak, and manually searching and screening for a particular log entry can be time consuming when the log file is very large or contains a large number of entries. When viewing a log using vi commands, manual scrolling and jumping to a specific line number or key location is required. This can become very cumbersome for large log files, especially when frequent jumps and backtracking are required.
(2) The log search function has a disadvantage in handling abnormal log context information. When an exception key is retrieved, it is often necessary to view a certain number of lines (e.g., 10 lines) of data before and after the exception key in order to locate the root cause of the exception. However, conventional log retrieval fails to meet such a need for exception log context information.
(3) The recognition capability of the abnormal problem is insufficient. Identification of anomaly information relies primarily on experience of the person viewing the log, which can lead to erroneous anomaly problem localization or take a lot of time, and cannot rely on an accumulated library of experiences.
(4) The anomaly log does not converge to interfere with problem localization. The large number of exception logs is likely to be caused by the same exception trigger, resulting in a large number of duplicate exception logs. The situation brings a great deal of interference to the analysis of the abnormal log, and seriously influences the positioning capability of abnormal problems. Therefore, it is necessary to classify the repeated logs into one type of log. By converging similar abnormal logs, redundant information can be reduced, the analysis efficiency of the abnormal logs is improved, and the abnormal problems are better positioned.
(5) The log association capability between systems is lost. The log association capability between systems refers to the ability to effectively associate logs generated by different components or modules to better understand the overall operating conditions of the system and the order in which events occur. However, due to the lack of such capability, difficulties are presented to the tracking and localization of problems, extending the time for fault repair. Moreover, it is difficult to obtain a global view of the system, so that the interactions and potential problems in the system are not fully understood.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for diagnosing problems from massive logs based on custom SQL.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in an embodiment of the present invention, a method for diagnosing problems from massive logs based on custom SQL is provided, the method includes:
defining grammar rules of custom SQL;
generating a parser code using ANTLR;
defining a listener or visitor to traverse the parse tree;
according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data;
the parse tree is processed using a listener or visitor and the corresponding operations are performed.
Further, in SQL defining log parsing templates, labels defined by regular expressions are used.
Further, in SQL defining log queries, syntax is defined that traces N rows forward and/or backward according to anomaly keywords.
Further, in SQL defining log query, a classification mode is defined to realize repeated log classification.
Further, in SQL defining log query, multiple tables are associated to realize log association across systems.
In an embodiment of the present invention, a device for diagnosing problems from massive logs based on custom SQL is further provided, where the device includes:
the log diagnosis module is used for defining grammar rules of the custom SQL; generating a parser code using ANTLR; defining a listener or visitor to traverse the parse tree; according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data;
and the log processing execution module is used for processing the analysis tree by using the monitor or the visitor and executing corresponding operations.
Further, in SQL defining log parsing templates, labels defined by regular expressions are used.
Further, in SQL defining log queries, syntax is defined that traces N rows forward and/or backward according to anomaly keywords.
Further, in SQL defining log query, a classification mode is defined to realize repeated log classification.
Further, in SQL defining log query, multiple tables are associated to realize log association across systems.
In an embodiment of the present invention, a computer device is further provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the foregoing method for diagnosing problems from a massive log based on custom SQL when executing the computer program.
In an embodiment of the present invention, a computer-readable storage medium is also presented, in which a computer program is stored that performs a method for diagnosing problems from a massive journal based on custom SQL.
The beneficial effects are that:
1. the whole log diagnosis process is realized by using SQL, the using threshold of the SQL is low, and the diagnosis efficiency is high.
2. The method is based on the tag expansion capability defined by the regular expression, and can realize the analysis of any log file.
3. The invention supports classifying a large number of repeated abnormal log data, can reduce redundant information, improves the efficiency of abnormal log analysis and better locates abnormal problems.
4. The log retrieval of the invention supports backtracking of the front and rear data and can be associated with context information.
5. The invention supports a plurality of file systems such as local files, SFTP, HDFS, juiceFS and the like, and has the log association capability of a cross-system.
Drawings
FIG. 1 is a flow chart of a method for diagnosing problems from a massive log based on custom SQL implementations of the invention;
FIG. 2 is a flow chart of mapping log files into tables in accordance with the present invention;
FIG. 3 is a flow diagram of a log query of the present invention;
FIG. 4 is a schematic diagram of a device for diagnosing problems from massive logs based on custom SQL implementation of the present invention;
fig. 5 is a schematic diagram of the computer device structure of the present invention.
Detailed Description
The principles and spirit of the present invention will be described below with reference to several exemplary embodiments, with the understanding that these embodiments are merely provided to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the invention may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a method and a device for realizing problem diagnosis from massive logs based on custom SQL are provided, custom log diagnosis SQL is constructed, grammar rules are constructed, an ANTLR is used for generating a parser code, the generated parser code is used for generating a parse tree of the custom SQL, information such as a column, a table, a condition and an output table is acquired based on a visitor mode, and logic of log diagnosis is defined in a visitor according to the requirement of log diagnosis. For example: resolving files, mapping tables, data filtering, backtracking data and the like. The method and the device are based on log diagnosis of custom SQL, have low use threshold and high diagnosis efficiency, support special grammar for log diagnosis such as defining a log analysis template, defining a data source, mapping a log table, backtracking abnormal data back and forth, classifying keywords, classifying natural language and the like according to regular expressions, support various types of file systems such as local files and SFTP, HDFS, juiceFS and have cross-system log association capability.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments thereof.
FIG. 1 is a flow chart of a method for diagnosing problems from massive logs based on custom SQL implementation of the present invention. As shown in fig. 1, the method includes:
defining grammar rules of custom SQL;
generating a parser code using ANTLR;
defining a listener or visitor to traverse the parse tree; according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data;
the parse tree is processed using a listener or visitor and corresponding operations are performed including defining a log parse template, defining a data source, mapping a log file into a table, and log querying.
It should be noted that although the operations of the method of the present invention are described in a particular order in the above embodiments and the accompanying drawings, this does not require or imply that the operations must be performed in the particular order or that all of the illustrated operations be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
In order to more clearly explain the above method for diagnosing problems from a massive log based on the custom SQL implementation, a specific embodiment is described below, but it should be noted that this embodiment is only for better illustrating the present invention and is not meant to limit the present invention unduly.
Examples:
1. custom SQL implemented functions
Custom SQL is specially used for log diagnosis, and the following functions are realized:
1. the SQL query data comes from the log file, the log file data is analyzed, and the data is mapped into a table with a plurality of fields, so that the SQL query data can be used;
2. custom SQL supports standard SQL grammars, such as: selecting a field to be queried, filtering data of the field, associating two tables, and the like.
3. Based on the self-defined SQL, special grammar such as backtracking, keyword classification, natural language classification, temporary registration list and the like is added to realize log diagnosis more conveniently and rapidly.
4. Log files support a variety of storage types, such as: local files, HDFS, juiceFS, or SFTP, among others.
5. Multiple types of files are registered as temporary tables, so that log association capability of a cross-system is realized.
2. Implementation flow
1. General procedure
Defining grammar rules of custom SQL; generating a parser code using ANTLR; defining a listener or visitor to traverse the parse tree; according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data; the parse tree is processed using a listener or visitor and the corresponding operations are performed.
2. Defining grammar rules
Constructing a custom log diagnosis SQL, defining grammar rules of the custom SQL, and generating a parser code by using ANTLR. ANTLR (collectively ANother Tool for Language Recognition) is a powerful language recognition tool for building parsers, compilers, and other language processing tools.
For example, the SQL statement for log diagnostics is as follows:
select timestamp,level,name,message from log1 where name='aa'as output_t1;
the above SQL statement, where the log1 table is queried by the select statement, specifies the where condition, and the last "as output_t1" indicates that the result of the SQL query is registered as a temporary table output_t1.
Next, taking the above SQL statement as an example, how to define the grammar rules will be described.
The method comprises the following specific steps:
(1) Grammar rules defining custom SQL
The following is one example:
grammar CustomSQL;
rule of definition statement
start:selectStatement;
Rule of defining SELECT statement
selectStatement:'SELECT'column(','column)*'FROM'table'WHERE'condition'AS'outputTable;
Column rule definition/definition
column:ID;
Rule of definition table
table:ID;
Rule of definition condition
condition:column'='STRING;
Output table rule definition
outputTable:ID;
Definition identifier rules
ID:[a-zA-Z_][a-zA-Z0-9_]*;
Character string definition rule
STRING ' - (' \ "|' \r ' |' \n ') ' \is shown; blank character/ignore
WS:[\t\r\n]+->skip;
(2) Generating parser code
Using ANTLR to generate the parser code, an ANTLR command line tool or an integrated ANTLR plug-in may be used to generate the parser code.
The following is an example of generating Java code using the ANTLR command line tool:
antlr4 CustomSQL.g4
javac CustomSQL*.java
when using ANTLR to generate parser code, the following files are typically generated:
(a) Lexer file: files generated based on Lexer rules are typically named < grammar name > Lexer. The document contains implementation code of a lexical analyzer (Lexer) responsible for splitting the input stream into lexical units (Tokens). For example, the above example would generate a customsqlloser.
(b) Parser file: files generated based on the Parser rule are typically named < syntax name > Parser.java. The file contains implementation codes of a grammar analyzer (Parser), and is responsible for combining and analyzing lexical units according to the grammar rules of the custom SQL to generate an analysis tree. For example, the above example would generate a customsqlpaser. Java file.
(c) A tester or a lister file: depending on the code generator (code generator) selected, a initiator or a Listener file may be generated for traversing the parse tree and performing the corresponding operation. The initiator file is typically named < syntax name > baseinitiator.java, and the Listener file is typically named < syntax name > baselistener.java. For example, the above example would generate a CustomSQLBaseVistor. Java and a CustomSQBasellistener. Java file.
(d) Lexical rule file: a file containing the names of lexical units and corresponding integer identifications. For example, the above example would generate a customsqlloreader. Integer identification refers to a unique integer value assigned to each lexical element. These integer identifications are used to identify and represent different lexical elements. When the lexical analyzer scans the input text, it identifies the corresponding integer as the type of lexical unit according to the matched lexical rules. Integer identification acts as a bridge between the lexical analyzer and the subsequent parser (if any). The parser may parse and semantically process according to the type of the lexical unit.
In addition to the files described above, some auxiliary files and classes may be generated to support parsing and processing. The specific generation of these files depends on the content defined in the lexical and grammatical rules, as well as the configuration options of the ANTLR tool.
(3) Traversing parse tree using Listener (Listener) or Visitor (Visitor)
A Listener (Listener) or Visitor (Visitor) is defined to traverse the parse tree to obtain information such as columns, tables, conditions, and output tables based on Visitor patterns.
The following examples: defining a visitor class CustomSQLVistor, inheriting from the generated CustomSQLBaseVistor class, and rewriting the corresponding method of the rule to be processed.
(4) Processing parse tree and performing corresponding operation using visitor in main program (program where entry point of code execution is located)
This step is the key step in executing SQL logic.
The corresponding operations include:
(a) Query class: and logic for mapping the log file into a table, and carrying out core data processing such as inquiring, filtering, classifying, associating and the like on the data according to the log analysis template, the data source and the like.
(b) Management class: creating a table, creating a template, and creating a data source.
Examples are as follows:
2.1, analyzing the log file, mapping into a table
In order to realize universal log parsing, a regular expression is used for extracting and mapping fields of a log file. However, the regular expression has high use cost and is easy to make mistakes, in order to reduce the use cost of the regular expression, each type of data, such as time data or IP address and the like, is mapped into the label of the regular expression by predefining a regular expression analysis library, so that the definition is realized once and used at multiple places, and the complex regular expression is avoided being written each time.
(1) Defining tags in a configuration file to form a tag library
As shown in fig. 2, tags are defined in a configuration file to form a tag library. Tags may be defined once, used multiple times, i.e., tags may be multiplexed by multiple log parsing templates.
Each row of the configuration file defines a tag. The definition of each row consists of two parts: a label name and a label definition.
The label tag and label definition use space connection.
Tag definitions can be of three types: regular expressions, other labels, combinations of regular expressions and labels.
For example:
YEAR(?>\d\d){1,2}
MONTHNUM(?:0?[1-9]|1[0-2])
MONTHDAY(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
TIMESTAMP_ISO8601!<YEAR>-!<MONTHNUM>-!<MONTHDAY>[T]!<HOUR>:?!<MINUTE>(?::?!<SECOND>)?!<ISO8601_TIMEZONE>?
labels YEAR, MONTHNUM, MONTHDAY, etc. are defined directly by regular expressions.
TIMESTAMP-ISO 8601 is defined by a combination of regular expressions and tags, the tags referenced in the tag definition beginning with an exclamation mark (|) and wrapped with brackets. For example, the following are given! < yes > means reference to a yes tag.
(2) Label definition log parsing template using regular expression definition
As shown in fig. 2, a log parsing template is created by custom SQL. The SQL format is as follows: the create template has a template name with an expression label expression.
The template name is user-defined. The tag expression is generated by a plurality of tag combinations for parsing the log.
The log parsing templates can be multiplexed, and the tag expression is wrapped by 3 double quotation marks for specification.
Templates are defined using the following SQL statement, where k8s_tpl is the template name, customized by the user:
create template k8s_tpl with expression
"""
!<TIMESTAMP_ISO8601=timestamp>
\[!<LOGLEVEL=loglevel>\]!<WORD=type>'!<DATA=name>'!<GREEDYDATA=message>
"""
the sentence following the expression key is a tag expression.
The SQL statement above also implements the logic of template definition by ANTLR parsing.
The template example above is specific to parsing k8s logs.
The log of k8s is exemplified as follows:
2023-09-2410:15:01[INFO]Pod'myapp-1234'scheduled on Node'worker-1'
2023-09-24 10:15:02[INFO]Pod'myapp-1234'started running on Node'worker-1'
2023-09-24 10:15:03[INFO]Container'myapp-container'started successfully in Pod'myapp-1234'
2023-09-2410:15:05[ERROR]Container'myapp-container'encountered an error and exited in Pod'myapp-1234'
2023-09-2410:15:05[INFO]Pod'myapp-1234'is restarting due to container failure
2023-09-2410:15:06[INFO]Pod'myapp-1234'scheduled on Node'worker-2'
2023-09-24 10:15:07[INFO]Pod'myapp-1234'started running on Node'worker-2'
2023-09-24 10:15:08[INFO]Container'myapp-container'started successfully in Pod'myapp-1234'
2023-09-2410:15:12[INFO]Pod'myapp-1234'successfully deployed and is running on Node'worker-2'
from the above logs, it can be seen that each log is made up of several parts, taking the following log as an example:
2023-09-2410:15:05[ERROR]Container'myapp-container'encountered an error and exited in Pod'myapp-1234'
time: for example 2023-09-2410:15:05.
Log level: the ERROR, the log of other entries is also, for example, INFO and ERROR.
Type (2): the log of other entries also has Pod.
Name of Pod or Container: for example, myapp-container herein.
Detailed log information: for example encountered an error and exited in Pod 'myapp-1234'.
In the definition of templates, multiple regular expression defined labels are used to match different fields. For example, the following are given! < TIMESTAMP _iso 8601=timestamp > matches and extracts the timestamp.
In the above configuration file, a plurality of regular expression defined labels are used to match and extract different fields. For example, the following are given! < TIMESTAMP _iso 8601=timestamp > matches and extracts a timestamp, TIMESTAMP _iso8601 is a tag defined by a regular expression, timestamp is a field name, and is user-defined. Similarly, the following is given! < logevel=logevel > matches and extracts log levels.
The detailed logic is as follows:
the following is carried out < TIMESTAMP _ISO8601: timestamp >: matches the timestamp in ISO8601 format and extracts it into a field named timestamp.
[: the left bracket ([) character is matched.
The following is carried out < LOGLEVEL: LOGLEVEL >: the log level is matched according to the log level pattern and extracted into a field named loglevel.
\is \: right bracket (]) characters are matched.
The following is carried out < WORD: type >: a pattern of one word type is matched and extracted into a field named type as a type.
' is-! < DATA: name >': any non-line-feed characters bracketed by a prime are matched and extracted into a field named name.
The following is carried out < GREEDYDATA: message >: the remaining log messages are matched and extracted into a field named message.
Based on the template defined above, the log of K8s can be parsed, and a two-dimensional table can be generated after parsing, as shown in table 1 below:
TABLE 1
(3) Defining a data source
As shown in FIG. 2, the data sources are defined by custom SQL. The SQL format is as follows: create datasource < data Source name > with < combination of data Source definition options >.
The data source name is user-defined. The combination of data source definition configuration items has different options depending on the different data source types.
The definition data source is used to define the type of log file, connection information, and the like. If the file is a local file, only the type of the file needs to be specified. In the case of a file system such as SFTP, information such as an address, an account number, and a password of SFTP needs to be specified.
The data source is defined using the following SQL statement, where ds_sftp_inst01 is the data source name, customized by the user:
create datasource ds_sftp_inst01
with
dsType="sftp"
host="192.168.11.23"
port="22"
user="root"
password="123456"
wherein:
ds_sftp_inst01: data source name, user-defined
dsType: the log data source type is specified as SFTP.
host: the hostname of the SFTP is specified.
port: the port of the SFTP is specified.
user: the user name of the SFTP is specified.
password: the password of the SFTP is specified.
The SQL statement above also implements the logic of data source definition by ANTLR parsing.
(4) Mapping log files into tables
Mapping the log file into a table through custom SQL, wherein the SQL is in the following format:
creation table < custom Table name >
with
temp= '< template name >'.
datasource= '< data source name >'.
path= '< log path >';
in SQL, which maps log files into tables, log parsing templates and data sources are associated and log file paths are specified, thereby mapping log files into tables. The log file path supports wild card fuzzy matching.
The SQL example is as follows:
create table t_log
with
tempalte='k8s_tpl'
datasource='ds_sftp_inst01'
path='/data/app/dame_*.log'
2.2 Log queries
By the previous steps, mapping the log file into a table, the SQL statement can be used for inquiring the log, and several special SQL inquiry grammars are customized, specifically as follows:
the SQL logic for log diagnostics is mainly as follows:
defining a log parsing template by using labels defined by regular expressions;
defining a data source;
mapping the log file into a table;
SQL query logs.
The focus here is on the following SQL query log logic, as shown in FIG. 3:
(1) Standard SQL query filtering
Searching according to the abnormal keywords, inquiring related fields, and the like. The following examples:
select timestamp,level,name,message from log1 where message like'K8sTimeoutException';
the results of the SQL query may be registered as temporary table names, which are specified by the as statement for subsequent data association, and so on.
select timestamp,level,name,message from log1 where message like'K8sTimeoutException'as output_t1;
In the above example, the execution result of SQL is registered as a temporary table output_t1.
(2) Syntax for backtracking a certain number of lines forward and backward according to keywords
When an exception key is retrieved, it is often necessary to view a certain number of lines (e.g., 10 lines) of data before and after the exception key in order to locate the root cause of the exception. The syntax of the toN +10, toN-10 or toN ±10 three where conditions can be defined, respectively, represents tracing down 10 lines, tracing up 10 lines, and tracing up and down 10 lines, respectively.
Examples:
select timestamp,level,name,message from log1 where message like'K8sTimeoutException'and toN±10as output_t1;
(3) Categorizing logs based on keywords or based on natural language processing techniques (prior art)
The large number of exception logs is likely to be caused by the same exception trigger, resulting in a large number of duplicate exception logs. The situation brings a great deal of interference to the analysis of the abnormal log, and seriously influences the positioning capability of abnormal problems. Therefore, it is necessary to classify the repeated logs into one type of log. By converging similar abnormal logs, redundant information can be reduced, the analysis efficiency of the abnormal logs is improved, and the abnormal problems are better positioned.
By means of the with statement, the manner of categorization is defined.
with classifey= < categorization pattern >
< other parameters >
select statement
(a) If the classification mode is keywords, the classification is performed according to the keywords, the keywords need to be specified, and if a plurality of keywords are matched, vertical line connection is used.
As follows, the classification mode is designated as a keyword by classification= 'keyword', and the keyword is designated as either acceptance or Error by keyword= 'acceptance|error'.
with classify='keyword'
keyword='Exception|Error'
select statement
(b) If the categorization is natural language techniques, the method based on natural language is presented to identify and categorize a large number of repeated anomalies in the log. In SQL, a feature extraction method, a similarity calculation method, a clustering algorithm, and the like of natural language can be specified.
As follows, classification is specified as a natural language technology through classification = ' nlp ', feature is extracted as a Bag of Words model (Bag-of-Words) through extraction_method = Bag-of-Words ', similarity = ' Similarity ' specifies a Similarity calculation method comprising Cosine Similarity (Cosine Similarity), and Clustering = ' K-means ' specifies a Clustering algorithm as K-means.
with classify='nlp'
extract_method='Bag-of-Words'
Similarity='Cosine'
Clustering='K-means'
select statement
(4) Cross-system log correlation
Log association across systems by SQL association of multiple tables, it may be determined whether two log entries are related based on time stamps, error codes, or other critical information. For example, if two logs have the same timestamp, the same error code, and similar exception types, then they may have a dependency.
The SQL table name may be a table mapped by the log file or may be a temporary table registered by the execution result of SQL.
Based on the same inventive concept, the invention also provides a device for diagnosing problems from mass logs based on custom SQL. The implementation of the device can be referred to as implementation of the above method, and the repetition is not repeated. The term "module" as used below may be a combination of software and/or hardware that implements the intended function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
FIG. 4 is a schematic diagram of the architecture of the device for diagnosing problems from massive logs based on custom SQL implementation of the present invention. As shown in fig. 4, the apparatus includes:
the log diagnosis module 101 is used for defining grammar rules of the custom SQL; generating a parser code using ANTLR; defining a listener or visitor to traverse the parse tree; according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relation between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data.
In SQL, which defines log parse templates, labels defined by regular expressions are used.
In SQL defining log queries, a syntax is defined that traces N rows back forward and/or backward according to anomaly keywords.
In SQL defining log inquiry, defining classification mode to implement repeated log classification.
In SQL defining log inquiry, multiple tables are associated to realize log association across systems.
The log processing execution module 102 is configured to process the parse tree using a listener or visitor, and execute corresponding operations, including: defining a log parsing template, defining a data source, defining a mapping of log files into tables, and defining log queries.
It should be noted that while several modules of an apparatus for diagnosing problems from a massive journal based on custom SQL implementations are mentioned in the detailed description above, this partitioning is merely exemplary and not mandatory. Indeed, the features and functions of two or more modules described above may be embodied in one module in accordance with embodiments of the present invention. Conversely, the features and functions of one module described above may be further divided into a plurality of modules to be embodied.
Based on the foregoing inventive concept, as shown in fig. 5, the present invention further proposes a computer device 200, including a memory 210, a processor 220, and a computer program 230 stored in the memory 210 and capable of running on the processor 220, where the processor 220 implements the foregoing method for diagnosing problems from a massive log based on custom SQL when executing the computer program 230.
Based on the foregoing inventive concept, the present invention further provides a computer readable storage medium, where a computer program for executing the foregoing method for diagnosing problems from massive logs based on custom SQL is stored.
The method and the device for diagnosing the problems from mass logs based on the custom SQL provided by the invention have the following bright points:
1. the whole log diagnosis process is realized by using SQL, the using threshold of the SQL is low, and the diagnosis efficiency is high.
2. The method is based on the tag expansion capability defined by the regular expression, and can realize the analysis of any log file.
3. The invention supports classifying a large number of repeated abnormal log data, can reduce redundant information, improves the efficiency of abnormal log analysis and better locates abnormal problems.
4. The log retrieval of the invention supports backtracking of the front and rear data and can be associated with context information.
5. The invention supports a plurality of file systems such as local files, SFTP, HDFS, juiceFS and the like, and has the log association capability of a cross-system.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
It should be apparent to those skilled in the art that various modifications or variations can be made in the present invention without requiring any inventive effort by those skilled in the art based on the technical solutions of the present invention.

Claims (12)

1. A method for diagnosing problems from massive logs based on custom SQL is characterized by comprising the following steps:
defining grammar rules of custom SQL;
generating a parser code using ANTLR;
defining a listener or visitor to traverse the parse tree;
according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data;
the parse tree is processed using a listener or visitor and the corresponding operations are performed.
2. The method for diagnosing problems from a massive journal based on custom SQL of claim 1, wherein labels defined by regular expressions are used in SQL defining journal parsing templates.
3. The method for diagnosing problems from a huge amount of logs based on the custom SQL according to claim 1, wherein in the SQL for defining log queries, a syntax for tracing N numbers of lines forward and/or backward according to the anomaly keywords is defined.
4. The method for diagnosing problems from massive logs based on the custom SQL according to claim 1, wherein a classification mode is defined in the SQL for defining the log query, and repeated log classification is realized.
5. The method for diagnosing problems from massive logs based on the custom SQL according to claim 1, wherein in the SQL for defining log queries, a plurality of tables are associated to realize cross-system log association.
6. A device for diagnosing problems from massive logs based on custom SQL, the device comprising:
the log diagnosis module is used for defining grammar rules of the custom SQL; generating a parser code using ANTLR; defining a listener or visitor to traverse the parse tree; according to the requirement of log diagnosis, analyzing defined SQL sentences in a monitor or a visitor to realize logic of log diagnosis, wherein the logic comprises the steps of analyzing SQL for defining a log analysis template to generate a log analysis template, analyzing SQL for defining a data source to generate a data source, analyzing SQL for building a table to establish an association relationship between the log analysis template and the data source, designating a log file path to realize log file mapping into a table, and analyzing SQL for defining log inquiry to realize inquiry of log data;
and the log processing execution module is used for processing the analysis tree by using the monitor or the visitor and executing corresponding operations.
7. The apparatus for implementing diagnosis of problems from a massive journal based on custom SQL of claim 6, wherein labels defined by regular expressions are used in SQL defining journal parsing templates.
8. The apparatus for implementing diagnosis of problems from massive journal based on custom SQL according to claim 6, wherein in SQL defining journal queries, syntax is defined for tracing N numbers of rows forward and/or backward according to anomaly keywords.
9. The device for diagnosing problems from a huge amount of logs based on the custom SQL according to claim 6, wherein a classification mode is defined in the SQL for defining the log query, and repeated log classification is realized.
10. The device for diagnosing problems from massive logs based on the custom SQL according to claim 6, wherein in the SQL for defining log queries, a plurality of tables are associated to realize cross-system log association.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1-5 when executing the computer program.
12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for performing the method of any one of claims 1-5.
CN202311408074.4A 2023-10-27 2023-10-27 Method and device for diagnosing problems from massive logs based on custom SQL Pending CN117573516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311408074.4A CN117573516A (en) 2023-10-27 2023-10-27 Method and device for diagnosing problems from massive logs based on custom SQL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311408074.4A CN117573516A (en) 2023-10-27 2023-10-27 Method and device for diagnosing problems from massive logs based on custom SQL

Publications (1)

Publication Number Publication Date
CN117573516A true CN117573516A (en) 2024-02-20

Family

ID=89888948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311408074.4A Pending CN117573516A (en) 2023-10-27 2023-10-27 Method and device for diagnosing problems from massive logs based on custom SQL

Country Status (1)

Country Link
CN (1) CN117573516A (en)

Similar Documents

Publication Publication Date Title
CN109697162B (en) Software defect automatic detection method based on open source code library
CN107832229B (en) NLP-based system test case automatic generation method
CN102804147B (en) Perform the code check executive system of the code check of ABAP source code
CN106843840B (en) Source code version evolution annotation multiplexing method based on similarity analysis
CN110347598B (en) Test script generation method and device, server and storage medium
US20110202331A1 (en) Method and software for extracting chemical data
CN111177184A (en) Structured query language conversion method based on natural language and related equipment thereof
US20180190270A1 (en) System and method for semantic analysis of speech
CN107203468B (en) AST-based software version evolution comparative analysis method
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
CN109344355B (en) Automatic regression detection and block matching self-adaption method and device for webpage change
CN103440252A (en) Method and device for extracting parallel information in Chinese sentence
CN115309451A (en) Code clone detection method, device, equipment, storage medium and program product
Zhang et al. Automatically generating test templates from test names (n)
US20220414463A1 (en) Automated troubleshooter
CN114817298A (en) Method, device and equipment for extracting field-level data blood margin and storage medium
CN110909126A (en) Information query method and device
CN111488314A (en) Simulation log analysis method based on Python
CN111950263A (en) Log analysis method and system and electronic equipment
CN116226170A (en) Database statement conversion method and device, electronic equipment and storage medium
Cooke-Fox et al. Computer translation of IUPAC systematic organic chemical nomenclature. 3. Syntax analysis and semantic processing
JPWO2016035273A1 (en) TEXT PROCESSING SYSTEM, TEXT PROCESSING METHOD, AND STORAGE MEDIUM CONTAINING COMPUTER PROGRAM
US11301441B2 (en) Information processing system and information processing method
CN113032371A (en) Database grammar analysis method and device and computer equipment
US9274910B2 (en) Automatic test map generation for system verification test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination