CN115499303A - Log analysis tool based on Flink - Google Patents

Log analysis tool based on Flink Download PDF

Info

Publication number
CN115499303A
CN115499303A CN202211038463.8A CN202211038463A CN115499303A CN 115499303 A CN115499303 A CN 115499303A CN 202211038463 A CN202211038463 A CN 202211038463A CN 115499303 A CN115499303 A CN 115499303A
Authority
CN
China
Prior art keywords
data
flink
log analysis
analysis tool
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038463.8A
Other languages
Chinese (zh)
Inventor
吴兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co Ltd filed Critical Inspur Software Technology Co Ltd
Priority to CN202211038463.8A priority Critical patent/CN115499303A/en
Publication of CN115499303A publication Critical patent/CN115499303A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation

Abstract

The invention particularly relates to a Flink-based log analysis tool. According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page. This log analysis tool based on Flink has promoted work efficiency, has practiced thrift the operation and maintenance cost, and the operation and maintenance personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to concrete problem through keyword retrieval on, very big shortening the time of mistake investigation and problem location, strengthened the stability of system.

Description

Log analysis tool based on Flink
Technical Field
The invention relates to the technical field of software development and use, in particular to a Flink-based log analysis tool.
Background
In the software development and use process, various distributed applications, micro service components and a large amount of data interaction exist, and the log recording and analysis are particularly important. The traditional log4j stores the log in a server file, and operation and maintenance personnel can search for the problem and the symptom through checking each log file one by one. However, with the expansion of the system architecture system, the logs are disorderly and distributed in all corners, problems are searched in a large number of log files, and too many manpower and material resources are consumed.
In order to reduce the operation and maintenance cost during problem troubleshooting and ensure the stability of the system, the invention provides a log analysis tool based on Flink.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient log analysis tool based on Flink.
The invention is realized by the following technical scheme:
a Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
The method comprises the following steps:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flat processing, disorganization and recombination on the data stream, performing primary data filtering, retaining the data meeting the conditions, discarding the data which do not meet the conditions, and finally performing union polymerization operation on the results of each small individual to form a final result;
s4, automatically collecting the context of various Exception handling mechanism exceptions appearing in the system running log, and recording the occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification mode.
In step S1, the log analysis data is directly read from the collection.
In step S1, the log analysis data is read from the file.
In step S1, the log analysis data is Kafka transmission data.
In step S1, the log analysis data is read from the database.
In the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
In the step S3, different data processing models, transform Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
In step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
The beneficial effects of the invention are: this log analysis tool based on Flink has promoted work efficiency, has practiced thrift the fortune dimension cost, and fortune dimension personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to specific problem through keyword retrieval on, very big shortening the time of error investigation and problem location, strengthened the stability of system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of the data aggregation process of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The Flink is a distributed stream data flow engine, the core of which is mainly written by Java, and part of code is written by Scala. In a popular way, flink is a streaming computing framework, and mainly functions to process streaming data, and the processed data is not distinguished from boundaries.
In the Flink, the whole Stream processing process is called Stream Dataflow, the operation of extracting data from a data Source is called Source Operator, the map () in the middle, the operations such as aggregation and statistics can be collectively called transformation Operators, and the final outflow of result data is called sink Operators.
A walking framework and a distributed processing engine, flink is used for performing stateful computation on unbounded and bounded data streams. Flink is designed to run in all common clustered environments, performing computations at memory speeds and at any scale.
An unbounded stream has a start but no defined end. They do not terminate and provide data at the time of generation. The unbounded stream must be processed continuously, i.e., the event must be processed immediately after the ingestion event. It is not possible to wait for all input data to arrive because the input is unbounded and will not complete at any point in time. Processing unbounded data typically requires that events be ingested in a particular order (e.g., the order in which the events occurred) in order to be able to infer result integrity.
Bounded flows have a defined start and end. Bounded flows can be processed by ingesting all data before performing any calculations. Processing bounded flows does not require ordered ingestion because the bounded datasets can always be ordered. The processing of bounded flows is also referred to as batch processing.
Flink is good at handling unbounded and bounded datasets. The precise control of time and state enables the runtime of the Flink to run any type of application on an unbounded stream. Bounded flows are handled internally by algorithms and data structures that are designed specifically for fixed-size data sets, resulting in superior performance.
According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
The Flink-based log analysis tool comprises the following steps of:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flattening processing, disordering recombination on the data stream, performing primary data filtering, retaining data meeting conditions, discarding unsatisfied data, and finally performing union polymerization operation on the results of all small individuals to form a final result;
s4, automatically collecting the context of various Exception handling mechanism exceptions appearing in the system running log, and recording the occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification way.
In step S1, the log analysis data is directly read from the collection.
In step S1, the log analysis data is read from the file.
In step S1, the log analysis data is Kafka transmission data.
In step S1, the log analysis data is read from the database.
In the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
In the step S3, different data processing models, transform Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
In step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
The Flink-based log analysis tool has no strict format requirement on the processed structured or unstructured data, can be unordered, ordered or relatively disordered data, and can ensure the real-time performance and accuracy of data analysis.
Compared with the prior art, the log analysis tool based on Flink has the following characteristics:
first, the working mechanism of streaming processing can bring faster and more efficient use experience to users, the memory occupied during the operation is lower,
secondly, various types of data sources can be processed, data processing models are formulated respectively according to different log types, data analysis, key content collection and the like are carried out according to the customized data processing models;
thirdly, the front end performs analysis and display, and the system can be operated by a single machine and can also be expanded into a distributed architecture;
fourthly, the log file is analyzed more comprehensively and specifically, and the working efficiency and the working quality of operation and maintenance and developers are fundamentally improved.
A Flink-based log analysis tool in the present example is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.

Claims (9)

1. A Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
2. The Flink-based log analysis tool of claim 1, wherein: the method comprises the following steps:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flattening processing, disordering recombination on the data stream, performing primary data filtering, retaining data meeting conditions, discarding unsatisfied data, and finally performing union polymerization operation on the results of all small individuals to form a final result;
s4, automatically collecting contexts of various Exception handling mechanisms Exception appearing in the system running log, and recording occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification mode.
3. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is directly read from the collection.
4. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the file.
5. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is Kafka transmission data.
6. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the database.
7. The Flink-based log analysis tool of claim 2, wherein: in the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
8. The Flink-based log analysis tool of claim 2, wherein: in the step S3, different data processing models, transformation Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
9. The Flink-based log analysis tool of claim 2, wherein: in step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
CN202211038463.8A 2022-08-29 2022-08-29 Log analysis tool based on Flink Pending CN115499303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038463.8A CN115499303A (en) 2022-08-29 2022-08-29 Log analysis tool based on Flink

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038463.8A CN115499303A (en) 2022-08-29 2022-08-29 Log analysis tool based on Flink

Publications (1)

Publication Number Publication Date
CN115499303A true CN115499303A (en) 2022-12-20

Family

ID=84465937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038463.8A Pending CN115499303A (en) 2022-08-29 2022-08-29 Log analysis tool based on Flink

Country Status (1)

Country Link
CN (1) CN115499303A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245158A (en) * 2019-06-10 2019-09-17 上海理想信息产业(集团)有限公司 A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology
CN113806429A (en) * 2020-06-11 2021-12-17 深信服科技股份有限公司 Canvas type log analysis method based on large data stream processing framework
CA3150183A1 (en) * 2021-02-25 2022-08-25 10353744 Canada Ltd. Flink streaming processing engine method and device for real-time recommendation and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245158A (en) * 2019-06-10 2019-09-17 上海理想信息产业(集团)有限公司 A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology
CN113806429A (en) * 2020-06-11 2021-12-17 深信服科技股份有限公司 Canvas type log analysis method based on large data stream processing framework
CA3150183A1 (en) * 2021-02-25 2022-08-25 10353744 Canada Ltd. Flink streaming processing engine method and device for real-time recommendation and computer equipment

Similar Documents

Publication Publication Date Title
US11379475B2 (en) Analyzing tags associated with high-latency and error spans for instrumented software
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
US8176476B2 (en) Analyzing software usage with instrumentation data
CN107704539B (en) Method and device for large-scale text information batch structuring
US7734775B2 (en) Method of semi-automatic data collection, data analysis, and model generation for the performance analysis of enterprise applications
CN107220892B (en) Intelligent preprocessing tool and method applied to massive P2P network loan financial data
WO2020238130A1 (en) Big data log monitoring method and apparatus, storage medium, and computer device
CN103077192B (en) A kind of data processing method and system thereof
Swarna et al. Apache Pig-a data flow framework based on Hadoop Map Reduce
US11630716B2 (en) Error handling during asynchronous processing of sequential data blocks
CN111581057B (en) General log analysis method, terminal device and storage medium
CN106919566A (en) A kind of query statistic method and system based on mass data
CN112631754A (en) Data processing method, data processing device, storage medium and electronic device
CN115499303A (en) Log analysis tool based on Flink
CN110580170A (en) software performance risk identification method and device
CN113535758B (en) Big data system and method for converting traditional database scripts into cloud in batch
CN114661571A (en) Model evaluation method, model evaluation device, electronic equipment and storage medium
CN109033196A (en) A kind of distributed data scheduling system and method
CN114168557A (en) Processing method and device for access log, computer equipment and storage medium
CN110908870B (en) Method and device for monitoring resources of mainframe, storage medium and equipment
CN110780867A (en) Development tool-oriented graphical log presentation method
US11507728B2 (en) Click to document
CN110765129B (en) High-performance online expense settlement statistical method and device
CN112131302B (en) Commercial data analysis method and platform
Dasu et al. Zen and the art of data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination