CN115499303A

CN115499303A - Log analysis tool based on Flink

Info

Publication number: CN115499303A
Application number: CN202211038463.8A
Authority: CN
Inventors: 吴兵
Original assignee: Inspur Software Technology Co Ltd
Current assignee: Inspur Software Technology Co Ltd
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-12-20

Abstract

The invention particularly relates to a Flink-based log analysis tool. According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page. This log analysis tool based on Flink has promoted work efficiency, has practiced thrift the operation and maintenance cost, and the operation and maintenance personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to concrete problem through keyword retrieval on, very big shortening the time of mistake investigation and problem location, strengthened the stability of system.

Description

Log analysis tool based on Flink

Technical Field

The invention relates to the technical field of software development and use, in particular to a Flink-based log analysis tool.

Background

In the software development and use process, various distributed applications, micro service components and a large amount of data interaction exist, and the log recording and analysis are particularly important. The traditional log4j stores the log in a server file, and operation and maintenance personnel can search for the problem and the symptom through checking each log file one by one. However, with the expansion of the system architecture system, the logs are disorderly and distributed in all corners, problems are searched in a large number of log files, and too many manpower and material resources are consumed.

In order to reduce the operation and maintenance cost during problem troubleshooting and ensure the stability of the system, the invention provides a log analysis tool based on Flink.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a simple and efficient log analysis tool based on Flink.

The invention is realized by the following technical scheme:

a Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.

The method comprises the following steps:

s1, determining a source of log analysis;

s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;

s3, performing flat processing, disorganization and recombination on the data stream, performing primary data filtering, retaining the data meeting the conditions, discarding the data which do not meet the conditions, and finally performing union polymerization operation on the results of each small individual to form a final result;

s4, automatically collecting the context of various Exception handling mechanism exceptions appearing in the system running log, and recording the occurrence time;

and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification mode.

In step S1, the log analysis data is directly read from the collection.

In step S1, the log analysis data is read from the file.

In step S1, the log analysis data is Kafka transmission data.

In step S1, the log analysis data is read from the database.

In the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.

In the step S3, different data processing models, transform Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.

In step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.

The beneficial effects of the invention are: this log analysis tool based on Flink has promoted work efficiency, has practiced thrift the fortune dimension cost, and fortune dimension personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to specific problem through keyword retrieval on, very big shortening the time of error investigation and problem location, strengthened the stability of system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of the data aggregation process of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The Flink is a distributed stream data flow engine, the core of which is mainly written by Java, and part of code is written by Scala. In a popular way, flink is a streaming computing framework, and mainly functions to process streaming data, and the processed data is not distinguished from boundaries.

In the Flink, the whole Stream processing process is called Stream Dataflow, the operation of extracting data from a data Source is called Source Operator, the map () in the middle, the operations such as aggregation and statistics can be collectively called transformation Operators, and the final outflow of result data is called sink Operators.

A walking framework and a distributed processing engine, flink is used for performing stateful computation on unbounded and bounded data streams. Flink is designed to run in all common clustered environments, performing computations at memory speeds and at any scale.

An unbounded stream has a start but no defined end. They do not terminate and provide data at the time of generation. The unbounded stream must be processed continuously, i.e., the event must be processed immediately after the ingestion event. It is not possible to wait for all input data to arrive because the input is unbounded and will not complete at any point in time. Processing unbounded data typically requires that events be ingested in a particular order (e.g., the order in which the events occurred) in order to be able to infer result integrity.

Bounded flows have a defined start and end. Bounded flows can be processed by ingesting all data before performing any calculations. Processing bounded flows does not require ordered ingestion because the bounded datasets can always be ordered. The processing of bounded flows is also referred to as batch processing.

Flink is good at handling unbounded and bounded datasets. The precise control of time and state enables the runtime of the Flink to run any type of application on an unbounded stream. Bounded flows are handled internally by algorithms and data structures that are designed specifically for fixed-size data sets, resulting in superior performance.

According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.

The Flink-based log analysis tool comprises the following steps of:

s1, determining a source of log analysis;

s3, performing flattening processing, disordering recombination on the data stream, performing primary data filtering, retaining data meeting conditions, discarding unsatisfied data, and finally performing union polymerization operation on the results of all small individuals to form a final result;

and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification way.

In step S1, the log analysis data is directly read from the collection.

In step S1, the log analysis data is read from the file.

In step S1, the log analysis data is Kafka transmission data.

In step S1, the log analysis data is read from the database.

The Flink-based log analysis tool has no strict format requirement on the processed structured or unstructured data, can be unordered, ordered or relatively disordered data, and can ensure the real-time performance and accuracy of data analysis.

Compared with the prior art, the log analysis tool based on Flink has the following characteristics:

first, the working mechanism of streaming processing can bring faster and more efficient use experience to users, the memory occupied during the operation is lower,

secondly, various types of data sources can be processed, data processing models are formulated respectively according to different log types, data analysis, key content collection and the like are carried out according to the customized data processing models;

thirdly, the front end performs analysis and display, and the system can be operated by a single machine and can also be expanded into a distributed architecture;

fourthly, the log file is analyzed more comprehensively and specifically, and the working efficiency and the working quality of operation and maintenance and developers are fundamentally improved.

A Flink-based log analysis tool in the present example is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims

1. A Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.

2. The Flink-based log analysis tool of claim 1, wherein: the method comprises the following steps:

s1, determining a source of log analysis;

s4, automatically collecting contexts of various Exception handling mechanisms Exception appearing in the system running log, and recording occurrence time;

3. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is directly read from the collection.

4. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the file.

5. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is Kafka transmission data.

6. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the database.

7. The Flink-based log analysis tool of claim 2, wherein: in the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.

8. The Flink-based log analysis tool of claim 2, wherein: in the step S3, different data processing models, transformation Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.

9. The Flink-based log analysis tool of claim 2, wherein: in step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.