CN115499303A - Log analysis tool based on Flink - Google Patents
Log analysis tool based on Flink Download PDFInfo
- Publication number
- CN115499303A CN115499303A CN202211038463.8A CN202211038463A CN115499303A CN 115499303 A CN115499303 A CN 115499303A CN 202211038463 A CN202211038463 A CN 202211038463A CN 115499303 A CN115499303 A CN 115499303A
- Authority
- CN
- China
- Prior art keywords
- data
- flink
- log analysis
- analysis tool
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
Abstract
The invention particularly relates to a Flink-based log analysis tool. According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page. This log analysis tool based on Flink has promoted work efficiency, has practiced thrift the operation and maintenance cost, and the operation and maintenance personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to concrete problem through keyword retrieval on, very big shortening the time of mistake investigation and problem location, strengthened the stability of system.
Description
Technical Field
The invention relates to the technical field of software development and use, in particular to a Flink-based log analysis tool.
Background
In the software development and use process, various distributed applications, micro service components and a large amount of data interaction exist, and the log recording and analysis are particularly important. The traditional log4j stores the log in a server file, and operation and maintenance personnel can search for the problem and the symptom through checking each log file one by one. However, with the expansion of the system architecture system, the logs are disorderly and distributed in all corners, problems are searched in a large number of log files, and too many manpower and material resources are consumed.
In order to reduce the operation and maintenance cost during problem troubleshooting and ensure the stability of the system, the invention provides a log analysis tool based on Flink.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient log analysis tool based on Flink.
The invention is realized by the following technical scheme:
a Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
The method comprises the following steps:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flat processing, disorganization and recombination on the data stream, performing primary data filtering, retaining the data meeting the conditions, discarding the data which do not meet the conditions, and finally performing union polymerization operation on the results of each small individual to form a final result;
s4, automatically collecting the context of various Exception handling mechanism exceptions appearing in the system running log, and recording the occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification mode.
In step S1, the log analysis data is directly read from the collection.
In step S1, the log analysis data is read from the file.
In step S1, the log analysis data is Kafka transmission data.
In step S1, the log analysis data is read from the database.
In the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
In the step S3, different data processing models, transform Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
In step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
The beneficial effects of the invention are: this log analysis tool based on Flink has promoted work efficiency, has practiced thrift the fortune dimension cost, and fortune dimension personnel need not to browse the log file one by one again when carrying out the problem investigation to system problem and trouble that take place in production environment and test environment, can fix a position to specific problem through keyword retrieval on, very big shortening the time of error investigation and problem location, strengthened the stability of system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of the data aggregation process of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The Flink is a distributed stream data flow engine, the core of which is mainly written by Java, and part of code is written by Scala. In a popular way, flink is a streaming computing framework, and mainly functions to process streaming data, and the processed data is not distinguished from boundaries.
In the Flink, the whole Stream processing process is called Stream Dataflow, the operation of extracting data from a data Source is called Source Operator, the map () in the middle, the operations such as aggregation and statistics can be collectively called transformation Operators, and the final outflow of result data is called sink Operators.
A walking framework and a distributed processing engine, flink is used for performing stateful computation on unbounded and bounded data streams. Flink is designed to run in all common clustered environments, performing computations at memory speeds and at any scale.
An unbounded stream has a start but no defined end. They do not terminate and provide data at the time of generation. The unbounded stream must be processed continuously, i.e., the event must be processed immediately after the ingestion event. It is not possible to wait for all input data to arrive because the input is unbounded and will not complete at any point in time. Processing unbounded data typically requires that events be ingested in a particular order (e.g., the order in which the events occurred) in order to be able to infer result integrity.
Bounded flows have a defined start and end. Bounded flows can be processed by ingesting all data before performing any calculations. Processing bounded flows does not require ordered ingestion because the bounded datasets can always be ordered. The processing of bounded flows is also referred to as batch processing.
Flink is good at handling unbounded and bounded datasets. The precise control of time and state enables the runtime of the Flink to run any type of application on an unbounded stream. Bounded flows are handled internally by algorithms and data structures that are designed specifically for fixed-size data sets, resulting in superior performance.
According to the Flink-based log analysis tool, in the process of analyzing logs, log data analysis is performed through a custom data processing model transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
The Flink-based log analysis tool comprises the following steps of:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flattening processing, disordering recombination on the data stream, performing primary data filtering, retaining data meeting conditions, discarding unsatisfied data, and finally performing union polymerization operation on the results of all small individuals to form a final result;
s4, automatically collecting the context of various Exception handling mechanism exceptions appearing in the system running log, and recording the occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification way.
In step S1, the log analysis data is directly read from the collection.
In step S1, the log analysis data is read from the file.
In step S1, the log analysis data is Kafka transmission data.
In step S1, the log analysis data is read from the database.
In the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
In the step S3, different data processing models, transform Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
In step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
The Flink-based log analysis tool has no strict format requirement on the processed structured or unstructured data, can be unordered, ordered or relatively disordered data, and can ensure the real-time performance and accuracy of data analysis.
Compared with the prior art, the log analysis tool based on Flink has the following characteristics:
first, the working mechanism of streaming processing can bring faster and more efficient use experience to users, the memory occupied during the operation is lower,
secondly, various types of data sources can be processed, data processing models are formulated respectively according to different log types, data analysis, key content collection and the like are carried out according to the customized data processing models;
thirdly, the front end performs analysis and display, and the system can be operated by a single machine and can also be expanded into a distributed architecture;
fourthly, the log file is analyzed more comprehensively and specifically, and the working efficiency and the working quality of operation and maintenance and developers are fundamentally improved.
A Flink-based log analysis tool in the present example is described in detail above. While the present invention has been described with reference to specific examples, which are provided to assist in understanding the core concepts of the present invention, it is intended that all other embodiments that can be obtained by those skilled in the art without departing from the spirit of the present invention shall fall within the scope of the present invention.
Claims (9)
1. A Flink-based log analysis tool, comprising: in the analysis process of the log, log data analysis is carried out through a custom data processing model, transform Operators, and finally a data analysis result sink Operators is generated and displayed by a front-end web page.
2. The Flink-based log analysis tool of claim 1, wherein: the method comprises the following steps:
s1, determining a source of log analysis;
s2, processing the source data, performing sorting/sub-module processing after disordering and recombination, and returning a data stream DataStream;
s3, performing flattening processing, disordering recombination on the data stream, performing primary data filtering, retaining data meeting conditions, discarding unsatisfied data, and finally performing union polymerization operation on the results of all small individuals to form a final result;
s4, automatically collecting contexts of various Exception handling mechanisms Exception appearing in the system running log, and recording occurrence time;
and S5, classifying and collecting the logs of all levels in the system logs, and displaying the logs in a front-end classification mode.
3. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is directly read from the collection.
4. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the file.
5. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is Kafka transmission data.
6. The Flink-based log analysis tool of claim 2, wherein: in step S1, the log analysis data is read from the database.
7. The Flink-based log analysis tool of claim 2, wherein: in the step S2, data are disorderly recombined according to the using scene self-defined data processing and keyword collection mode.
8. The Flink-based log analysis tool of claim 2, wherein: in the step S3, different data processing models, transformation Operators, are customized for different types of logs, and the whole data stream is divided into a plurality of small individuals for consumption.
9. The Flink-based log analysis tool of claim 2, wherein: in step S3, the data is screened by using a select operator according to a specified rule, and the data satisfying the condition is retained and discarded unsatisfactorily.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211038463.8A CN115499303A (en) | 2022-08-29 | 2022-08-29 | Log analysis tool based on Flink |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211038463.8A CN115499303A (en) | 2022-08-29 | 2022-08-29 | Log analysis tool based on Flink |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115499303A true CN115499303A (en) | 2022-12-20 |
Family
ID=84465937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211038463.8A Pending CN115499303A (en) | 2022-08-29 | 2022-08-29 | Log analysis tool based on Flink |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115499303A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245158A (en) * | 2019-06-10 | 2019-09-17 | 上海理想信息产业(集团)有限公司 | A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology |
CN113806429A (en) * | 2020-06-11 | 2021-12-17 | 深信服科技股份有限公司 | Canvas type log analysis method based on large data stream processing framework |
CA3150183A1 (en) * | 2021-02-25 | 2022-08-25 | 10353744 Canada Ltd. | Flink streaming processing engine method and device for real-time recommendation and computer equipment |
-
2022
- 2022-08-29 CN CN202211038463.8A patent/CN115499303A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245158A (en) * | 2019-06-10 | 2019-09-17 | 上海理想信息产业(集团)有限公司 | A kind of multi-source heterogeneous generating date system and method based on Flink stream calculation technology |
CN113806429A (en) * | 2020-06-11 | 2021-12-17 | 深信服科技股份有限公司 | Canvas type log analysis method based on large data stream processing framework |
CA3150183A1 (en) * | 2021-02-25 | 2022-08-25 | 10353744 Canada Ltd. | Flink streaming processing engine method and device for real-time recommendation and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11379475B2 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
US11775501B2 (en) | Trace and span sampling and analysis for instrumented software | |
US8176476B2 (en) | Analyzing software usage with instrumentation data | |
CN107704539B (en) | Method and device for large-scale text information batch structuring | |
US7734775B2 (en) | Method of semi-automatic data collection, data analysis, and model generation for the performance analysis of enterprise applications | |
CN107220892B (en) | Intelligent preprocessing tool and method applied to massive P2P network loan financial data | |
WO2020238130A1 (en) | Big data log monitoring method and apparatus, storage medium, and computer device | |
CN103077192B (en) | A kind of data processing method and system thereof | |
Swarna et al. | Apache Pig-a data flow framework based on Hadoop Map Reduce | |
US11630716B2 (en) | Error handling during asynchronous processing of sequential data blocks | |
CN111581057B (en) | General log analysis method, terminal device and storage medium | |
CN106919566A (en) | A kind of query statistic method and system based on mass data | |
CN112631754A (en) | Data processing method, data processing device, storage medium and electronic device | |
CN115499303A (en) | Log analysis tool based on Flink | |
CN110580170A (en) | software performance risk identification method and device | |
CN113535758B (en) | Big data system and method for converting traditional database scripts into cloud in batch | |
CN114661571A (en) | Model evaluation method, model evaluation device, electronic equipment and storage medium | |
CN109033196A (en) | A kind of distributed data scheduling system and method | |
CN114168557A (en) | Processing method and device for access log, computer equipment and storage medium | |
CN110908870B (en) | Method and device for monitoring resources of mainframe, storage medium and equipment | |
CN110780867A (en) | Development tool-oriented graphical log presentation method | |
US11507728B2 (en) | Click to document | |
CN110765129B (en) | High-performance online expense settlement statistical method and device | |
CN112131302B (en) | Commercial data analysis method and platform | |
Dasu et al. | Zen and the art of data mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |