CN110727568A - Multi-source log data processing system and method in cloud environment - Google Patents

Multi-source log data processing system and method in cloud environment Download PDF

Info

Publication number
CN110727568A
CN110727568A CN201910880320.3A CN201910880320A CN110727568A CN 110727568 A CN110727568 A CN 110727568A CN 201910880320 A CN201910880320 A CN 201910880320A CN 110727568 A CN110727568 A CN 110727568A
Authority
CN
China
Prior art keywords
log
plug
data
processing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910880320.3A
Other languages
Chinese (zh)
Inventor
罗平
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN201910880320.3A priority Critical patent/CN110727568A/en
Publication of CN110727568A publication Critical patent/CN110727568A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a multi-source log data processing system and method in a cloud environment. The log source input module of the system of the invention provides the function of multi-source log access; the data preprocessing module provides a data classification function; the data processing module provides a log processing plug-in management function; the data storage module provides a back-end data storage management function. The invention carries out tagging processing on the text log data stream through a two-dimensional vector < ip, path >; yml file, defining log data stream processing link, and determining the processing sequence of the log among different plugins; scanning and loading all plug-ins in a plugin directory; yml file, construct complete data stream analysis link, original log file according to link identification, flow to different plug-in unit to process step by step. The invention solves the problems of high coupling, insufficient expansibility and the like of multi-source log processing; the method can be used for multi-source log data processing.

Description

Multi-source log data processing system and method in cloud environment
Technical Field
The invention relates to the technical field of log data processing, in particular to a multi-source log data processing system and method in a cloud environment.
Background
With the rapid development of various distributed technologies and the maturity of rich open-source distributed frameworks, the traditional large monolithic program is gradually deconstructed, and the service-oriented architecture (SOA) is shifted. Among them, the micro service architecture is a typical representative. However, a significant problem exists in such an architecture, and since each service component is deployed in a distributed manner, when an exception occurs in the system, the workload of checking the exception log is very heavy. Therefore, it is necessary to perform secondary parsing and structured storage on multi-source heterogeneous log data in order to support the later operation and maintenance work. Therefore, a uniform and dedicated platform is needed to complete the log management work; however, the existing log platform has the following problems:
log platform resource restriction
The current log analysis platform is a complex with intensive computation, intensive network and intensive storage, so the demand of the platform on hardware resources is very high, which also causes high cost.
Second, all function modules in the log platform are highly coupled
The log acquisition module, the log processing module, the storage module and other modules are highly coupled, so that upgrading iteration of products is not facilitated.
Third, the log analysis function has insufficient expansibility
The high coupling among all modules in the log platform leads each platform to aim at a specific language environment, thereby invisibly improving the familiarity difficulty of product codes of products; and the mixed functions among the log analysis functions lead the fixed functions not to be easily expanded or reduced and the functions to be reused, thereby improving the upgrading of the whole product framework and the product iteration cost and having poor robustness and expansibility.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a multi-source log data processing system and method in a cloud environment; and the secondary processing of multi-source log data is realized, and the problems of high coupling, insufficient expansibility and the like are avoided.
The technical scheme for solving the technical problems is as follows:
the system comprises a log source input module, a data preprocessing module, a data processing module and a data storage module; the log source input module provides a multi-source log access function; the data preprocessing module provides a data classification function; the data processing module provides a flexible management function of the log processing plug-in; the data storage module provides a back-end data storage management function.
The log source access module uniformly accesses original and heterogeneous text format logs of different sources to the log data stream processing platform.
The data preprocessing module uniformly and intensively manages logs of different sources at a log receiving end and identifies tags by characteristics; the tag is represented by a two-dimensional vector < ip, path >; the ip is a network connection default parameter, and the path is a log general parameter.
The data processing module carries out secondary processing on the log data, and the secondary processing is realized through a managed log processing plug-in; the system comprises a multi-language plug-in module, a specific function plug-in module and a plug-in management module;
the multi-language plug-in module provides a universal core plug-in library, plug-in APIs (application programming interfaces) of various language versions such as Java, Python, Ruby, Go and the like are provided, and a cross-language universal plug-in platform is realized;
the plug-in management module copies plug-in codes to an engineering catalog plugins, and the system realizes dynamic plug-in loading by regularly scanning the plugins catalog, wherein the dynamic plug-in loading, unloading, exception management and behavior management are included; the plug-in loading and unloading realize the loading or unloading of the plug-in module with specific functions; yml file unified planning of the whole data flow is provided by plug-in behavior management.
The plug-in is a log analysis module with json data types as input and output and completely independent functions; yml files can be analyzed to realize a specific combination mode.
The insert is divided into four parts, including: firstly, a tag judges whether a log data source meets requirements or not; secondly, leading the plug-in unit, and indicating which plug-in units process the subsequent data; thirdly, the kernel processing logic inside the plug-in realizes the kernel service of log analysis; and fourthly, accessing the plug-in subsequently, and showing that the output data of the plug-in is processed by the subsequent plug-ins at the next stage.
The method comprises the following steps:
step 1: the cloud platform builds a log data stream secondary processing system;
step 2: accessing text log data from different sources;
and step 3: performing tagging processing on text log data streams from different sources through a two-dimensional vector < ip, path >;
and 4, step 4: yml file, defining log data stream processing link, and determining the processing sequence of the log among different plug-ins;
and 5: the plug-in management module scans and loads all plug-ins in the plugin directory;
step 6: yml files are analyzed, a complete data stream analysis link is constructed, and original log files are sequentially and gradually streamed to different plug-ins for processing according to link identifications;
and 7: and storing the processed log into a back-end data processing module.
The invention provides a system and a method for secondary processing of multi-source log data streams in a cloud environment. And performing centralized management on source logs at the data preprocessing end, and uniquely identifying log sources by using a two-dimensional vector consisting of ip and path, so that the operation and maintenance workload of software is reduced, and excessive invasive operation on the log collection end is achieved. The log processing part is decoupled from the whole workflow into an independent module for independent management, log analysis is completed by combining different completely independent plugins, loading and unloading of the plugins are realized by dynamically scanning the plugins, the whole service is not required to be restarted, the software plug-in has the dynamic loading characteristic, and the product flexibility is improved. By defining the unified specification of plugin, the language version API is provided, so that the system has the cross-language characteristic. The log processing part has the characteristics of high cohesion, low coupling and high expandability, and is more favorable for product iteration. And the log processing part provides a data chain processing script plug _ playbook.yml to define a log stream processing chain, so that the log processing flow is simple and visual, the flow is controllable, and the exception tracking is simplified.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a system framework diagram of the present invention;
FIG. 2 is a flow chart of the method of the present invention;
FIG. 3 is a diagram of the plug-in logic structure of the present invention.
Detailed Description
As shown in fig. 1, the system for secondary processing of multi-source log data stream of the present invention includes 4 functional modules: 1. the log source input module is used for providing a multi-source log access function; 2. the data preprocessing module is used for providing a data classification function; 3. the data processing module provides a flexible management function of the log processing plug-in; 4. and the data storage module provides a function of back-end data storage management.
1. Log source access module
The log source access module uniformly accesses the text format logs of different sources to the log data stream processing platform. At this point, the original, heterogeneous logs generated by the different service components are accessed.
2. Data preprocessing module
And the data preprocessing module is used for uniformly and intensively managing logs of different sources at a log receiving end and identifying tag by characteristics. In the invention, a two-dimensional vector < ip, path > is adopted to represent the tag. Because a large amount of heterogeneous data exists in a log source, the data cannot be processed in a uniform manner, and therefore, the logs needing to be processed need to be identified through labeling features.
3. Data processing module
The data processing module carries out secondary processing on the log data, and the secondary processing is realized through the managed log processing plug-in; the system comprises a supported multilingual plug-in module, a plug-in module with a specific function and a plug-in management module.
The plugin refers to a log analysis module with json data types as input and output of the plugin and completely independent functions, the modules can be flexibly combined and shared, and the specific combination mode is realized by analyzing a plugin _ playlist.
The multi-language Plugin module is a universal Plugin platform which is realized by a system through providing a universal coreplugin specification and realizing a Plugin API for providing multiple language versions such as Java, Python, Ruby, Go and the like.
The plugin management module is used for copying the plugin codes to the plugins catalog to realize dynamic loading of plugins; the module can periodically and dynamically scan the directory, thereby realizing dynamic loading. plugin management includes plugin load, unload, exception management, and behavior management. The loading and unloading of the plug modules realize the loading and unloading of plug modules with specific functions. Yml file (referred to as log processing chain scenario herein) is provided to uniformly plan the processing flow of the whole data stream, and the management of the behavior of plugin is realized through the scenario file.
4. Data storage module
The data storage module provides an open back-end data storage architecture.
As shown in fig. 2, the secondary processing method of the multi-source log data stream of the present invention comprises the following basic steps:
step 1: the cloud platform establishes a multi-core CPU and a high-RAM cloud server for deploying a log data stream secondary processing system;
step 2: accessing text log data from different sources;
and step 3: performing tagging processing on text log data streams from different sources through a two-dimensional vector < ip, path >;
and 4, step 4: yml file, and defining a log data stream processing link to determine the processing sequence of the log among different plugins;
and 5: the Plugin management module scans and loads all the plug-ins in the Plugins directory;
step 6: yml files are analyzed, a complete data stream analysis link is constructed, and original log files are sequentially and gradually streamed to different plug-in processing units according to link identifications;
and 7: and storing the logs after the secondary processing into a back-end data processing module.
The log source is identified by a two-dimensional vector < ip, path > serving as a feature identifier tag, wherein ip in the vector is a network connection default parameter, and path is a log general parameter. Therefore, the uniqueness of the log source is identified by using a simple parameter mode without introducing an additional parameter to represent the log source at all. The log sources with different unique identifications can be identified by carrying out centralized and unified configuration at the receiving end, the configuration steps of the log acquisition end are simplified, and the additional configuration caused by manual invasion of operation and maintenance personnel to the acquisition end is reduced; compared with the traditional mode, the method has more technical advantages. When the traditional log collection mode is configured with identification, extra parameters such as id and tag are usually configured at the collection end to identify the log collection source, or the log source cannot be uniquely identified.
The conventional log processing platform has a very high code coupling degree of a log acquisition and processing module, can be regarded as a customized log processing platform, and is extremely unfriendly to universality, maintainability and expandability. According to the invention, log receiving and secondary processing are decoupled into two functional modules for management respectively, and architectural support is provided for product upgrading iteration. The log secondary processing loads and unloads the plugin with specific functions in a plugin mode, and brings strong flexibility and expandability to the module. The plugin can identify the introduction of a new plugin and the removal of an old plugin only under the scan engineering catalog plugins, so that the dynamic loading of the plugins is realized.
The invention uses core plugin lib to define the plugin specification in a unified way, provides the plugin API of a multi-language version, and is convenient for developers to realize the plugin API of a specific language on the basis of following the specification, so that the development of plug-ins has the characteristic of cross-language. The plug-in is a completely independent functional module, taking a GeoIPplugin plug-in for inquiring ip geographic position information as an example, a developer can realize plug-ins of different versions such as java and python, and uniformly pack related dependencies under a GeoIP directory.
As shown in fig. 3, the present invention provides a flexible logic architecture of plugin, which is divided into 4 parts: firstly, a tag judges whether a log data source meets requirements or not; secondly, leading the plug-in unit, and indicating which plug-in units process the subsequent data; note: only debug mode is active; thirdly, the internal core processing logic of the plugin realizes the core service of log analysis; fourthly, a subsequent access plug-in unit indicates that the output data of the plug-in unit is processed by the subsequent plug-in units at the next stage; note: only debug mode is active. The input and output of the plug-ins are json data, and guarantee is provided for data transmission among the plug-ins.
The log data stream processing chain provides a flexible log processing data chain definition mode; the main points are as follows: firstly, defining the processing flow of log data passing through different plug-ins by using plug-in data chain script plug _ playbook.yml; yml file defines log source of a plug-in process, and which plug-in receives the log as subsequent process, specific input and output fields in array form from tag (tag), pre _ plug-in name (with uniqueness), input field and output field. And secondly, dynamically defining a front plug-in and a rear plug-in of the single plugin (the function is effective only in a system debug mode), and mainly facilitating development and debugging of the plug-ins by developers to realize flexible dynamic data stream processing chain definition of a single plug-in.

Claims (8)

1. A multisource log data processing system under a cloud environment is characterized in that: the system comprises a log source input module, a data preprocessing module, a data processing module and a data storage module; the log source input module provides a multi-source log access function; the data preprocessing module provides a data classification function; the data processing module provides a flexible management function of the log processing plug-in; the data storage module provides a back-end data storage management function.
2. The system of claim 1, wherein: the log source access module uniformly accesses original and heterogeneous text format logs of different sources to the log data stream processing platform.
3. The system of claim 1, wherein: the data preprocessing module uniformly and intensively manages logs of different sources at a log receiving end and identifies tags by characteristics; the tag is represented by a two-dimensional vector < ip, path >; the ip is a network connection default parameter, and the path is a log general parameter.
4. The system of claim 2, wherein: the data preprocessing module uniformly and intensively manages logs of different sources at a log receiving end and identifies tags by characteristics; the tag is represented by a two-dimensional vector < ip, path >; the ip is a network connection default parameter, and the path is a log general parameter.
5. The system according to any one of claims 1 to 4, wherein: the data processing module carries out secondary processing on the log data, and the secondary processing is realized through a managed log processing plug-in; the system comprises a multi-language plug-in module, a specific function plug-in module and a plug-in management module;
the multi-language plug-in module provides a universal core plug-in library, plug-in APIs (application programming interfaces) of various language versions such as Java, Python, Ruby, Go and the like are provided, and a cross-language universal plug-in platform is realized;
the plug-in management module copies plug-in codes to an engineering catalog plugins, and the system realizes dynamic plug-in loading by regularly scanning the plugins catalog, wherein the dynamic plug-in loading, unloading, exception management and behavior management are included; the plug-in loading and unloading realize the loading or unloading of the plug-in module with specific functions; yml file unified planning of the whole data flow is provided by plug-in behavior management.
6. The system of claim 5, wherein: the plug-in is a log analysis module with json data types as input and output and completely independent functions; yml files can be analyzed to realize a specific combination mode.
7. The system of claim 5, wherein: the insert is divided into four parts, including: firstly, a tag judges whether a log data source meets requirements or not; secondly, leading the plug-in unit, and indicating which plug-in units process the subsequent data; thirdly, the kernel processing logic inside the plug-in realizes the kernel service of log analysis; and fourthly, accessing the plug-in subsequently, and showing that the output data of the plug-in is processed by the subsequent plug-ins at the next stage.
8. A multi-source log data processing method in a cloud environment is characterized by comprising the following steps: the method comprises the following steps:
step 1: the cloud platform builds a log data stream secondary processing system;
step 2: accessing text log data from different sources;
and step 3: performing tagging processing on text log data streams from different sources through a two-dimensional vector < ip, path >;
and 4, step 4: yml file, defining log data stream processing link, and determining the processing sequence of the log among different plug-ins;
and 5: the plug-in management module scans and loads all plug-ins in the plugin directory;
step 6: yml files are analyzed, a complete data stream analysis link is constructed, and original log files are sequentially and gradually streamed to different plug-ins for processing according to link identifications;
and 7: and storing the processed log into a back-end data processing module.
CN201910880320.3A 2019-09-18 2019-09-18 Multi-source log data processing system and method in cloud environment Pending CN110727568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910880320.3A CN110727568A (en) 2019-09-18 2019-09-18 Multi-source log data processing system and method in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910880320.3A CN110727568A (en) 2019-09-18 2019-09-18 Multi-source log data processing system and method in cloud environment

Publications (1)

Publication Number Publication Date
CN110727568A true CN110727568A (en) 2020-01-24

Family

ID=69219190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910880320.3A Pending CN110727568A (en) 2019-09-18 2019-09-18 Multi-source log data processing system and method in cloud environment

Country Status (1)

Country Link
CN (1) CN110727568A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463772A (en) * 2021-02-02 2021-03-09 北京信安世纪科技股份有限公司 Log processing method and device, log server and storage medium
CN113064869A (en) * 2021-03-23 2021-07-02 网易(杭州)网络有限公司 Log processing method and device, sending end, receiving end equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112463772A (en) * 2021-02-02 2021-03-09 北京信安世纪科技股份有限公司 Log processing method and device, log server and storage medium
CN113064869A (en) * 2021-03-23 2021-07-02 网易(杭州)网络有限公司 Log processing method and device, sending end, receiving end equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110297689B (en) Intelligent contract execution method, device, equipment and medium
CN108845940B (en) Enterprise-level information system automatic function testing method and system
US8832714B1 (en) Automated service interface optimization
US11366713B2 (en) System and method for automatically identifying and resolving computing errors
Candido et al. Test suite parallelization in open-source projects: A study on its usage and impact
CN111144839A (en) Project construction method, continuous integration system and terminal equipment
US11934287B2 (en) Method, electronic device and computer program product for processing data
CN111796855B (en) Incremental version updating method and device, storage medium and computer equipment
Zaccarelli et al. Stream2segment: An open‐source tool for downloading, processing, and visualizing massive event‐based seismic waveform datasets
CN113076253A (en) Test method and test device
KR20100002259A (en) A method and system for populating a software catalogue with related product information
CN110727568A (en) Multi-source log data processing system and method in cloud environment
CN110764760B (en) Method, apparatus, computer system, and medium for drawing program flow chart
CN113297081B (en) Execution method and device of continuous integrated pipeline
CN113419740A (en) Program data stream analysis method and device, electronic device and readable storage medium
US20110246967A1 (en) Methods and systems for automation framework extensibility
CN115291928A (en) Task automatic integration method and device of multiple technology stacks and electronic equipment
CN113821486B (en) Method and device for determining dependency relationship between pod libraries and electronic equipment
CN112835606B (en) Gray release method and device, electronic equipment and medium
CN110674024A (en) Electronic equipment integration test system and method thereof
US10958514B2 (en) Generating application-server provisioning configurations
US9720660B2 (en) Binary interface instrumentation
Behnamghader et al. A scalable and efficient approach for compiling and analyzing commit history
KR102614060B1 (en) Automatic analysis method for converting general applications into software-as-a-service applications
CN116452208B (en) Method, device, equipment and medium for determining change transaction code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination